date:20150402

RE: Stochastic gradient descent performance

2015-04-02 Thread Ulanov, Alexander

Hi Shivaram, It sounds really interesting! With this time we can estimate if it worth considering to run an iterative algorithm on Spark. For example, for SGD on Imagenet (450K samples) we will spend 450K*50ms=62.5 hours to traverse all data by one example not considering the data loading,

Re: Unit test logs in Jenkins?

2015-04-02 Thread Steve Loughran

On 2 Apr 2015, at 06:31, Patrick Wendell pwend...@gmail.com wrote: Hey Marcelo, Great question. Right now, some of the more active developers have an account that allows them to log into this cluster to inspect logs (we copy the logs from each run to a node on that cluster). The

Re: org.spark-project.jetty and guava repo locations

2015-04-02 Thread Ted Yu

Take a look at the maven-shade-plugin in pom.xml. Here is the snippet for org.spark-project.jetty : relocation patternorg.eclipse.jetty/pattern shadedPatternorg.spark-project.jetty/shadedPattern includes

Re: Stochastic gradient descent performance

2015-04-02 Thread Joseph Bradley

When you say It seems that instead of sample it is better to shuffle data and then access it sequentially by mini-batches, are you sure that holds true for a big dataset in a cluster? As far as implementing it, I haven't looked carefully at GapSamplingIterator (in RandomSampler.scala) myself, but

Re: Unit test logs in Jenkins?

2015-04-02 Thread shane knapp

i agree with all of this. but can we please break up the tests and make them shorter? :) On Thu, Apr 2, 2015 at 8:54 AM, Nicholas Chammas nicholas.cham...@gmail.com wrote: This is secondary to Marcelo’s question, but I wanted to comment on this: Its main limitation is more cultural than

Test all the things (Was: Unit test logs in Jenkins?)

2015-04-02 Thread Nicholas Chammas

(Renaming thread so as to un-hijack Marcelo's request.) Sure, we definitely want tests running faster. Part of testing all the things will be factoring out stuff from the various builds that can be run just once. We've also tried in the past (with little success) to parallelize test execution

RE: Stochastic gradient descent performance

2015-04-02 Thread Ulanov, Alexander

Hi Joseph, Thank you for suggestion! It seems that instead of sample it is better to shuffle data and then access it sequentially by mini-batches. Could you suggest how to implement it? With regards to aggregate (reduce), I am wondering why it works so slow in local mode? Could you elaborate

Re: Unit test logs in Jenkins?

2015-04-02 Thread Marcelo Vanzin

On Thu, Apr 2, 2015 at 3:01 AM, Steve Loughran ste...@hortonworks.com wrote: That would be really helpful to debug build failures. The scalatest output isn't all that helpful. Potentially an issue with the test runner, rather than the tests themselves. Sorry, that was me over-generalizing.

RE: Stochastic gradient descent performance

Re: Unit test logs in Jenkins?

Re: org.spark-project.jetty and guava repo locations

Re: Stochastic gradient descent performance

Re: Unit test logs in Jenkins?

Test all the things (Was: Unit test logs in Jenkins?)

RE: Stochastic gradient descent performance

Re: Unit test logs in Jenkins?

8 matches

Site Navigation

Mail list logo

Footer information