Spark Streaming essentially does this by saving the DAG of DStreams, which
can deterministically regenerate the DAG of RDDs upon recovery from
failure. Along with that the progress information (which batches have
finished, which batches are queued, etc.) is also saved, so that upon
recovery the
+1 (non-binding)
Built and tested on Windows 7:
cd apache-spark
git fetch
git checkout v1.2.0-rc2
sbt assembly
[warn]
...
[warn]
[success] Total time: 720 s, completed Dec 11, 2014 8:57:36 AM
dir assembly\target\scala-2.10\spark-assembly-1.2.0-hadoop1.0.4.jar
110,361,054
Interesting, you saying StreamContext checkpoint can regenerate DAG stuff?
Best Regards
Jun Feng Liu
IBM China Systems Technology Laboratory in Beijing
Phone: 86-10-82452683
E-mail: liuj...@cn.ibm.com
BLD 28,ZGC Software Park
No.8 Rd.Dong Bei Wang West, Dist.Haidian Beijing 100193
Signatures and checksums are OK. License and notice still looks fine.
The plain-vanilla source release compiles with Maven 3.2.1 and passes
tests, on OS X 10.10 + Java 8.
On Wed, Dec 10, 2014 at 9:08 PM, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate
+1
Tested on OS X.
On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote:
Please vote on releasing the following candidate as Apache Spark version
1.2.0!
The tag to be voted on is v1.2.0-rc2 (commit a428c446e2):
+1 (non-binding). Tested on Ubuntu against YARN.
On Thu, Dec 11, 2014 at 9:38 AM, Reynold Xin r...@databricks.com wrote:
+1
Tested on OS X.
On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com
wrote:
Please vote on releasing the following candidate as Apache Spark
Hi,
I would like to contribute to Spark's Machine Learning library by adding
evaluation metrics that would be used to gauge the accuracy of a model given
a certain features' set. In particular, I seek to contribute the k-fold
validation metrics, f-beta metric among others on top of the current
Michael other Spark SQL junkies,
As I read through the Spark API docs, in particular those for the
org.apache.spark.sql package, I can't seem to find details about the Scala
classes representing the various SparkSQL DataTypes, for instance
DecimalType. I find DataType classes in
Hi, I'd recommend starting by checking out the existing helper
functionality for these tasks. There are helper methods to do K-fold
cross-validation in MLUtils:
https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/MLUtils.scala
The experimental spark.ml
I'm interested in understanding this as well. One of the main ways Tachyon
is supposed to realize performance gains without sacrificing durability is
by storing the lineage of data rather than full copies of it (similar to
Spark). But if Spark isn't sending lineage information into Tachyon, then
I don't think the lineage thing is even turned on in Tachyon - it was
mostly a research prototype, so I don't think it'd make sense for us to use
that.
On Thu, Dec 11, 2014 at 3:51 PM, Andrew Ash and...@andrewash.com wrote:
I'm interested in understanding this as well. One of the main ways
Hi all,
I just joined the list, so I donĀ¹t have a message history that would allow
me to reply to this post:
http://apache-spark-developers-list.1001551.n3.nabble.com/Terasort-example-
td9284.html
I am interested in running the terasort example. I cloned the repo
https://github.com/ehiggs/spark
Part of it can be found at:
https://github.com/apache/spark/pull/3429/files#diff-f88c3e731fcb17b1323b778807c35b38R34
Sorry it's a TO BE reviewed PR, but still should be informative.
Cheng Hao
-Original Message-
From: Alessandro Baretta [mailto:alexbare...@gmail.com]
Sent: Friday,
Hi, all
We found some bugs in hive-0.12, but we could not wait for hive
community fixing them.
We want to fix these bugs in our lab and build a new release which could
be recognized by spark.
As we know, spark depends on a special release of hive, like:
|dependency
Thanks. This is useful.
Alex
On Thu, Dec 11, 2014 at 4:35 PM, Cheng, Hao hao.ch...@intel.com wrote:
Part of it can be found at:
https://github.com/apache/spark/pull/3429/files#diff-f88c3e731fcb17b1323b778807c35b38R34
Sorry it's a TO BE reviewed PR, but still should be informative.
Cheng
15 matches
Mail list logo