Hi Shivaram,
It sounds really interesting! With this time we can estimate if it worth
considering to run an iterative algorithm on Spark. For example, for SGD on
Imagenet (450K samples) we will spend 450K*50ms=62.5 hours to traverse all data
by one example not considering the data loading,
On 2 Apr 2015, at 06:31, Patrick Wendell pwend...@gmail.com wrote:
Hey Marcelo,
Great question. Right now, some of the more active developers have an
account that allows them to log into this cluster to inspect logs (we
copy the logs from each run to a node on that cluster). The
Take a look at the maven-shade-plugin in pom.xml.
Here is the snippet for org.spark-project.jetty :
relocation
patternorg.eclipse.jetty/pattern
shadedPatternorg.spark-project.jetty/shadedPattern
includes
When you say It seems that instead of sample it is better to shuffle data
and then access it sequentially by mini-batches, are you sure that holds
true for a big dataset in a cluster? As far as implementing it, I haven't
looked carefully at GapSamplingIterator (in RandomSampler.scala) myself,
but
i agree with all of this. but can we please break up the tests and make
them shorter? :)
On Thu, Apr 2, 2015 at 8:54 AM, Nicholas Chammas nicholas.cham...@gmail.com
wrote:
This is secondary to Marcelo’s question, but I wanted to comment on this:
Its main limitation is more cultural than
(Renaming thread so as to un-hijack Marcelo's request.)
Sure, we definitely want tests running faster.
Part of testing all the things will be factoring out stuff from the
various builds that can be run just once.
We've also tried in the past (with little success) to parallelize test
execution
Hi Joseph,
Thank you for suggestion!
It seems that instead of sample it is better to shuffle data and then access it
sequentially by mini-batches. Could you suggest how to implement it?
With regards to aggregate (reduce), I am wondering why it works so slow in
local mode? Could you elaborate
On Thu, Apr 2, 2015 at 3:01 AM, Steve Loughran ste...@hortonworks.com wrote:
That would be really helpful to debug build failures. The scalatest
output isn't all that helpful.
Potentially an issue with the test runner, rather than the tests themselves.
Sorry, that was me over-generalizing.