)
for the "descent" step while Spark computes the gradients. The video was
recently uploaded here http://bit.ly/1JnvQAO.
Regards,
--
*Algorithms of the Mind **http://bit.ly/1ReQvEW <http://bit.ly/1ReQvEW>*
Christopher Nguyen
CEO & Co-Founder
www.Arimo.com (née Adatao)
linkedin.com/in/ctnguyen
Hi Kui,
DDF (open sourced) also aims to do something similar, adding RDBMS idioms,
and is already implemented on top of Spark.
One philosophy is that the DDF API aggressively hides the notion of
parallel datasets, exposing only (mutable) tables to users, on which they
can apply R and other
PM, oppokui oppo...@gmail.com wrote:
Thanks, Christopher. I saw it before, it is amazing. Last time I try to
download it from adatao, but no response after filling the table. How can I
download it or its source code? What is the license?
Kui
On Sep 6, 2014, at 8:08 PM, Christopher Nguyen c
Fantastic!
Sent while mobile. Pls excuse typos etc.
On Aug 19, 2014 4:09 PM, Haoyuan Li haoyuan...@gmail.com wrote:
Hi folks,
We've posted the first Tachyon meetup, which will be on August 25th and is
hosted by Yahoo! (Limited Space):
http://www.meetup.com/Tachyon/events/200387252/ . Hope
at 9:20 PM, Christopher Nguyen c...@adatao.com
wrote:
Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to
isolate the cause to serialization of the loaded model. And also try to
serialize the deserialized (loaded) model manually to see if that throws
any visible exceptions
+1 what Sean said. And if there are too many state/argument parameters for
your taste, you can always create a dedicated (serializable) class to
encapsulate them.
Sent while mobile. Pls excuse typos etc.
On Aug 13, 2014 6:58 AM, Sean Owen so...@cloudera.com wrote:
PS I think that solving not
Lance, some debugging ideas: you might try model.predict(RDD[Vector]) to
isolate the cause to serialization of the loaded model. And also try to
serialize the deserialized (loaded) model manually to see if that throws
any visible exceptions.
Sent while mobile. Pls excuse typos etc.
On Aug 13,
Hi sparkuser2345,
I'm inferring the problem statement is something like how do I make this
complete faster (given my compute resources)?
Several comments.
First, Spark only allows launching parallel tasks from the driver, not from
workers, which is why you're seeing the exception when you try.
Toby, #saveAsTextFile() and #saveAsObjectFile() are probably what you want
for your use case. As for Parquet support, that's newly arrived in Spark
1.0.0 together with SparkSQL so continue to watch this space.
Gerard's suggestion to look at JobServer, which you can generalize as
building a
Lakshmi, this is orthogonal to your question, but in case it's useful.
It sounds like you're trying to determine the home location of a user, or
something similar.
If that's the problem statement, the data pattern may suggest a far more
computationally efficient approach. For example, first map
Awesome work, Pat et al.!
--
Christopher T. Nguyen
Co-founder CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen
On Fri, May 30, 2014 at 3:12 AM, Patrick Wendell pwend...@gmail.com wrote:
I'm thrilled to announce the availability of Spark 1.0.0! Spark 1.0.0
is a milestone release as
Keith, do you mean bound as in (a) strictly control to some quantifiable
limit, or (b) try to minimize the amount used by each task?
If a, then that is outside the scope of Spark's memory management, which
you should think of as an application-level (that is, above JVM) mechanism.
In this scope,
Paco, that's a great video reference, thanks.
To be fair to our friends at Yahoo, who have done a tremendous amount to
help advance the cause of the BDAS stack, it's not FUD coming from them,
certainly not in any organized or intentional manner.
In vacuo we prefer Mesos ourselves, but also can't
Someone (Ze Ni, https://www.sics.se/people/ze-ni) has actually attempted
such a comparative study as a Masters thesis:
http://www.diva-portal.org/smash/get/diva2:605106/FULLTEXT01.pdf
According to this snapshot (c. 2013), Stratosphere is different from Spark
in not having an explicit concept of
Flavio, the two are best at two orthogonal use cases, HBase on the
transactional side, and Spark on the analytic side. Spark is not intended
for row-based random-access updates, while far more flexible and efficient
in dataset-scale aggregations and general computations.
So yes, you can easily
Avati, depending on your specific deployment config, there can be up to a
10X difference in data loading time. For example, we routinely parallel
load 10+GB data files across small 8-node clusters in 10-20 seconds, which
would take about 100s if bottlenecked over a 1GigE network. That's about
the
Aureliano, you're correct that this is not validation error, which is
computed as the residuals on out-of-training-sample data, and helps
minimize overfit variance.
However, in this example, the errors are correctly referred to as training
error, which is what you might compute on a per-iteration
Sung Hwan, strictly speaking, RDDs are immutable, so the canonical way to
get what you want is to transform to another RDD. But you might look at
MutablePair (
https://github.com/apache/spark/blob/60abc252545ec7a5d59957a32e764cd18f6c16b4/core/src/main/scala/org/apache/spark/util/MutablePair.scala)
have to implement. Is DDF
going to be an alternative to RDD?
Thanks again!
On Fri, Mar 28, 2014 at 7:02 PM, Christopher Nguyen c...@adatao.comwrote:
Sung Hwan, strictly speaking, RDDs are immutable, so the canonical way to
get what you want is to transform to another RDD. But you might look
+1 Michael, Reynold et al. This is key to some of the things we're doing.
--
Christopher T. Nguyen
Co-founder CEO, Adatao http://adatao.com
linkedin.com/in/ctnguyen
On Wed, Mar 26, 2014 at 2:58 PM, Michael Armbrust mich...@databricks.comwrote:
Hey Everyone,
This already went out to the
Deenar, when you say just once, have you defined across multiple what
(e.g., across multiple threads in the same JVM on the same machine)? In
principle you can have multiple executors on the same machine.
In any case, assuming it's the same JVM, have you considered using a
singleton that
Deenar, the singleton pattern I'm suggesting would look something like this:
public class TaskNonce {
private transient boolean mIsAlreadyDone;
private static transient TaskNonce mSingleton = new TaskNonce();
private transient Object mSyncObject = new Object();
public TaskNonce
Chanwit, that is awesome!
Improvements in shuffle operations should help improve life even more for
you. Great to see a data point on ARM.
Sent while mobile. Pls excuse typos etc.
On Mar 18, 2014 7:36 PM, Chanwit Kaewkasi chan...@gmail.com wrote:
Hi all,
We are a small team doing a research
Dana,
When you run multiple applications under Spark, and if each application
takes up the entire cluster resources, it is expected that one will block
the other completely, thus you're seeing that the wall time add together
sequentially. In addition there is some overhead associated with
24 matches
Mail list logo