Hi David,
I got most of the stuff working, and the loss is monotonically decreasing
by getting the history from iterator of state.
However, in the costFun, I need to know what current iteration is it for
miniBatch, which means for one iteration, if optimizer calls costFun
several times for line s
Hi David,
I got most of the stuff working, and the loss is monotonically decreasing
by getting the history from iterator of state.
However, in the costFun, I need to know what current iteration is it for
miniBatch, which means for one iteration, if optimizer calls costFun
several times for line s
That's right.
FWIW, caching should be automatic now, but it might be the version of
Breeze you're using doesn't do that yet.
Also, In breeze.util._ there's an implicit that adds a tee method to
iterator, and also a last method. Both are useful for things like this.
-- David
On Sun, Apr 27, 2014
Thanks for the info and good luck with 1.0.
Regards,
Art
On Fri, Apr 25, 2014 at 9:48 AM, Andrew Or wrote:
> Hi Art,
>
> First of all thanks a lot for your PRs. We are currently in the middle of
> all the Spark 1.0 release so most of us are swamped with the more core
> features. To answer you
We did it using scala xml with spark
We start by creating a rdd containing each page is store as a single line :
- split the xml dump with xml_split
- process each split with a shell script which remove "xml_split" tag
and siteinfo section, and put each page on a single line.
- copy resu