spark git commit: [SPARK-6278][MLLIB] Mention the change of objective in linear regression
Repository: spark Updated Branches: refs/heads/branch-1.3 dc287f38f - 214f68103 [SPARK-6278][MLLIB] Mention the change of objective in linear regression As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. srowen Author: Xiangrui Meng m...@databricks.com Closes #4978 from mengxr/SPARK-6278 and squashes the following commits: fb3bbe6 [Xiangrui Meng] mention regularization parameter bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-6278 375fd09 [Xiangrui Meng] address Sean's comments f87ae71 [Xiangrui Meng] mention step size change (cherry picked from commit 7f13434a5c52b815c584ec773ab0e5df1a35ea86) Signed-off-by: Xiangrui Meng m...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/214f6810 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/214f6810 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/214f6810 Branch: refs/heads/branch-1.3 Commit: 214f68103219317416e2278e80b8fc0fb5a616f4 Parents: dc287f3 Author: Xiangrui Meng m...@databricks.com Authored: Fri Mar 13 10:27:28 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Fri Mar 13 10:27:34 2015 -0700 -- docs/mllib-guide.md | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/214f6810/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 4c7a7d9..03b948c 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -107,6 +107,8 @@ In the `spark.mllib` package, there were several breaking changes. The first ch * In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.) * In `Strategy`, the `checkpointDir` parameter has been removed. Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training. * `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`. This was never meant for external use. +* In linear regression (including Lasso and ridge regression), the squared loss is now divided by 2. + So in order to produce the same result as in 1.2, the regularization parameter needs to be divided by 2 and the step size needs to be multiplied by 2. ## Previous Spark Versions - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6252] [mllib] Added getLambda to Scala NaiveBayes
Repository: spark Updated Branches: refs/heads/master ea3d2eed9 - dc4abd4dc [SPARK-6252] [mllib] Added getLambda to Scala NaiveBayes Note: not relevant for Python API since it only has a static train method Author: Joseph K. Bradley joseph.kurata.brad...@gmail.com Author: Joseph K. Bradley jos...@databricks.com Closes #4969 from jkbradley/SPARK-6252 and squashes the following commits: a471d90 [Joseph K. Bradley] small edits from review 63eff48 [Joseph K. Bradley] Added getLambda to Scala NaiveBayes Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dc4abd4d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dc4abd4d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dc4abd4d Branch: refs/heads/master Commit: dc4abd4dc40deacab39bfa9572b06bf0ea6daa6d Parents: ea3d2ee Author: Joseph K. Bradley joseph.kurata.brad...@gmail.com Authored: Fri Mar 13 10:26:09 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Fri Mar 13 10:26:09 2015 -0700 -- .../org/apache/spark/mllib/classification/NaiveBayes.scala | 3 +++ .../apache/spark/mllib/classification/NaiveBayesSuite.scala | 8 2 files changed, 11 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dc4abd4d/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala index b11fd4f..2ebc7fa 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala @@ -166,6 +166,9 @@ class NaiveBayes private (private var lambda: Double) extends Serializable with this } + /** Get the smoothing parameter. Default: 1.0. */ + def getLambda: Double = lambda + /** * Run the algorithm with the configured parameters on an input RDD of LabeledPoint entries. * http://git-wip-us.apache.org/repos/asf/spark/blob/dc4abd4d/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala -- diff --git a/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala b/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala index 64dcc0f..5a27c7d 100644 --- a/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala +++ b/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala @@ -85,6 +85,14 @@ class NaiveBayesSuite extends FunSuite with MLlibTestSparkContext { assert(numOfPredictions input.length / 5) } + test(get, set params) { +val nb = new NaiveBayes() +nb.setLambda(2.0) +assert(nb.getLambda === 2.0) +nb.setLambda(3.0) +assert(nb.getLambda === 3.0) + } + test(Naive Bayes) { val nPoints = 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: SPARK-4044 [CORE] Thriftserver fails to start when JAVA_HOME points to JRE instead of JDK
Repository: spark Updated Branches: refs/heads/branch-1.3 a3493eb77 - 4aa41327d SPARK-4044 [CORE] Thriftserver fails to start when JAVA_HOME points to JRE instead of JDK Don't use JAR_CMD unless present in archive check. Add datanucleus always if present, to avoid needing a check involving JAR_CMD. Follow up to https://github.com/apache/spark/pull/4873 for branch 1.3. Author: Sean Owen so...@cloudera.com Closes #4981 from srowen/SPARK-4044.2 and squashes the following commits: 3aafc76 [Sean Owen] Don't use JAR_CMD unless present in archive check. Add datanucleus always if present, to avoid needing a check involving JAR_CMD Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4aa41327 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4aa41327 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4aa41327 Branch: refs/heads/branch-1.3 Commit: 4aa41327d164ed5b2830cb18eb47b93ebd27401b Parents: a3493eb Author: Sean Owen so...@cloudera.com Authored: Fri Mar 13 17:59:31 2015 + Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 17:59:31 2015 + -- bin/compute-classpath.sh | 25 - 1 file changed, 12 insertions(+), 13 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4aa41327/bin/compute-classpath.sh -- diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh index f4f6b7b..f28 100755 --- a/bin/compute-classpath.sh +++ b/bin/compute-classpath.sh @@ -93,14 +93,17 @@ if [ $num_jars -gt 1 ]; then exit 1 fi -# Verify that versions of java used to build the jars and run Spark are compatible -jar_error_check=$($JAR_CMD -tf $ASSEMBLY_JAR nonexistent/class/path 21) -if [[ $jar_error_check =~ invalid CEN header ]]; then - echo Loading Spark jar with '$JAR_CMD' failed. 12 - echo This is likely because Spark was compiled with Java 7 and run 12 - echo with Java 6. (see SPARK-1703). Please use Java 7 to run Spark 12 - echo or build Spark with Java 6. 12 - exit 1 +# Only able to make this check if 'jar' command is available +if [ $(command -v $JAR_CMD) ] ; then + # Verify that versions of java used to build the jars and run Spark are compatible + jar_error_check=$($JAR_CMD -tf $ASSEMBLY_JAR nonexistent/class/path 21) + if [[ $jar_error_check =~ invalid CEN header ]]; then +echo Loading Spark jar with '$JAR_CMD' failed. 12 +echo This is likely because Spark was compiled with Java 7 and run 12 +echo with Java 6. (see SPARK-1703). Please use Java 7 to run Spark 12 +echo or build Spark with Java 6. 12 +exit 1 + fi fi CLASSPATH=$CLASSPATH:$ASSEMBLY_JAR @@ -121,11 +124,7 @@ datanucleus_jars=$(find $datanucleus_dir 2/dev/null | grep datanucleus-.*\\ datanucleus_jars=$(echo $datanucleus_jars | tr \n : | sed s/:$//g) if [ -n $datanucleus_jars ]; then - hive_files=$($JAR_CMD -tf $ASSEMBLY_JAR org/apache/hadoop/hive/ql/exec 2/dev/null) - if [ -n $hive_files ]; then -echo Spark assembly has been built with Hive, including Datanucleus jars on classpath 12 -CLASSPATH=$CLASSPATH:$datanucleus_jars - fi + CLASSPATH=$CLASSPATH:$datanucleus_jars fi # Add test classes if we're running from SBT or Maven with SPARK_TESTING set to 1 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6036][CORE] avoid race condition between eventlogListener and akka actor system
Repository: spark Updated Branches: refs/heads/branch-1.3 4aa41327d - f81611dca [SPARK-6036][CORE] avoid race condition between eventlogListener and akka actor system For detail description, pls refer to [SPARK-6036](https://issues.apache.org/jira/browse/SPARK-6036). Author: Zhang, Liye liye.zh...@intel.com Closes #4785 from liyezhang556520/EventLogInProcess and squashes the following commits: 8b0b0a6 [Zhang, Liye] stop listener after DAGScheduler 79b15b3 [Zhang, Liye] SPARK-6036 avoid race condition between eventlogListener and akka actor system Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f81611dc Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f81611dc Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f81611dc Branch: refs/heads/branch-1.3 Commit: f81611dca7ce97ebd26262086ac0e2b5e5f997e5 Parents: 4aa4132 Author: Zhang, Liye liye.zh...@intel.com Authored: Thu Feb 26 23:11:43 2015 -0800 Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 18:06:17 2015 + -- core/src/main/scala/org/apache/spark/SparkContext.scala | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/f81611dc/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index 05c3210..f80338e 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -1375,17 +1375,17 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli stopped = true env.metricsSystem.report() metadataCleaner.cancel() -env.actorSystem.stop(heartbeatReceiver) cleaner.foreach(_.stop()) dagScheduler.stop() dagScheduler = null +listenerBus.stop() +eventLogger.foreach(_.stop()) +env.actorSystem.stop(heartbeatReceiver) progressBar.foreach(_.stop()) taskScheduler = null // TODO: Cache.stop()? env.stop() SparkEnv.set(null) -listenerBus.stop() -eventLogger.foreach(_.stop()) logInfo(Successfully stopped SparkContext) SparkContext.clearActiveContext() } else { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large enough
Repository: spark Updated Branches: refs/heads/branch-1.3 f81611dca - 9846790f4 [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large enough A simple try-catch wrapping KryoException to be more informative. Author: Lev Khomich levkhom...@gmail.com Closes #4947 from levkhomich/master and squashes the following commits: 0f7a947 [Lev Khomich] [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large enough Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9846790f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9846790f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9846790f Branch: refs/heads/branch-1.3 Commit: 9846790f49e2716e0b0c15f58e8547a1f04ba3ae Parents: f81611d Author: Lev Khomich levkhom...@gmail.com Authored: Tue Mar 10 10:55:42 2015 + Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 18:17:02 2015 + -- .../org/apache/spark/serializer/KryoSerializer.scala | 8 +++- .../apache/spark/serializer/KryoSerializerSuite.scala | 14 ++ 2 files changed, 21 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9846790f/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala -- diff --git a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala index 9ce64d4..dc7aa99 100644 --- a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala +++ b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala @@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: KryoSerializer) extends Serializ override def serialize[T: ClassTag](t: T): ByteBuffer = { output.clear() -kryo.writeClassAndObject(output, t) +try { + kryo.writeClassAndObject(output, t) +} catch { + case e: KryoException if e.getMessage.startsWith(Buffer overflow) = +throw new SparkException(sKryo serialization failed: ${e.getMessage}. To avoid this, + + increase spark.kryoserializer.buffer.max.mb value.) +} ByteBuffer.wrap(output.toBytes) } http://git-wip-us.apache.org/repos/asf/spark/blob/9846790f/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala -- diff --git a/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala b/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala index 523d898..6198df8 100644 --- a/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala +++ b/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala @@ -261,6 +261,20 @@ class KryoSerializerSuite extends FunSuite with SharedSparkContext { ser.serialize(HighlyCompressedMapStatus(BlockManagerId(exec-1, host, 1234), blockSizes)) } } + + test(serialization buffer overflow reporting) { +import org.apache.spark.SparkException +val kryoBufferMaxProperty = spark.kryoserializer.buffer.max.mb + +val largeObject = (1 to 100).toArray + +val conf = new SparkConf(false) +conf.set(kryoBufferMaxProperty, 1) + +val ser = new KryoSerializer(conf).newInstance() +val thrown = intercept[SparkException](ser.serialize(largeObject)) +assert(thrown.getMessage.contains(kryoBufferMaxProperty)) + } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work
Repository: spark Updated Branches: refs/heads/master 7f13434a5 - b943f5d90 [SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work Turns out, per the [convo on the JIRA](https://issues.apache.org/jira/browse/SPARK-4600), `diff` is acting exactly as should. It became a large misconception as I thought it meant set difference, when in fact it does not. To that extent I merely updated the `diff` documentation to, hopefully, better reflect its true intentions moving forward. Author: Brennon York brennon.y...@capitalone.com Closes #5015 from brennonyork/SPARK-4600 and squashes the following commits: 1e1d1e5 [Brennon York] reverted internal diff docs 92288f7 [Brennon York] reverted both the test suite and the diff function back to its origin functionality f428623 [Brennon York] updated diff documentation to better represent its function cc16d65 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600 66818b9 [Brennon York] added small secondary diff test 99ad412 [Brennon York] Merge remote-tracking branch 'upstream/master' into SPARK-4600 74b8c95 [Brennon York] corrected method by leveraging bitmask operations to correctly return only the portions of that are different from the calling VertexRDD 9717120 [Brennon York] updated diff impl to cause fewer objects to be created 710a21c [Brennon York] working diff given test case aa57f83 [Brennon York] updated to set ShortestPaths to run 'forward' rather than 'backward' Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b943f5d9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b943f5d9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b943f5d9 Branch: refs/heads/master Commit: b943f5d907df0607ecffb729f2bccfa436438d7e Parents: 7f13434 Author: Brennon York brennon.y...@capitalone.com Authored: Fri Mar 13 18:48:31 2015 + Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 18:48:31 2015 + -- graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/b943f5d9/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala -- diff --git a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala index 09ae3f9..40ecff7 100644 --- a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala +++ b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala @@ -122,8 +122,11 @@ abstract class VertexRDD[VD]( def mapValues[VD2: ClassTag](f: (VertexId, VD) = VD2): VertexRDD[VD2] /** - * Hides vertices that are the same between `this` and `other`; for vertices that are different, - * keeps the values from `other`. + * For each vertex present in both `this` and `other`, `diff` returns only those vertices with + * differing values; for values that are different, keeps the values from `other`. This is + * only guaranteed to work if the VertexRDDs share a common ancestor. + * + * @param other the other VertexRDD with which to diff against. */ def diff(other: VertexRDD[VD]): VertexRDD[VD] - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6133] Make sc.stop() idempotent
Repository: spark Updated Branches: refs/heads/branch-1.3 338bea7b3 - a08588c7e [SPARK-6133] Make sc.stop() idempotent Before we would get the following (benign) error if we called `sc.stop()` twice. This is because the listener bus would try to post the end event again even after it has already stopped. This happens occasionally when flaky tests fail, usually as a result of other sources of error. Either way we shouldn't be logging this error when it is not the cause of the failure. ``` ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerApplicationEnd(1425348445682) ``` Author: Andrew Or and...@databricks.com Closes #4871 from andrewor14/sc-stop and squashes the following commits: a14afc5 [Andrew Or] Move code after code 915db16 [Andrew Or] Move code into code Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a08588c7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a08588c7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a08588c7 Branch: refs/heads/branch-1.3 Commit: a08588c7eeaecf7003073c092320b37abd166191 Parents: 338bea7 Author: Andrew Or and...@databricks.com Authored: Tue Mar 3 15:09:57 2015 -0800 Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 18:33:27 2015 + -- core/src/main/scala/org/apache/spark/SparkContext.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a08588c7/core/src/main/scala/org/apache/spark/SparkContext.scala -- diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala b/core/src/main/scala/org/apache/spark/SparkContext.scala index f80338e..023d54e 100644 --- a/core/src/main/scala/org/apache/spark/SparkContext.scala +++ b/core/src/main/scala/org/apache/spark/SparkContext.scala @@ -1369,10 +1369,10 @@ class SparkContext(config: SparkConf) extends Logging with ExecutorAllocationCli /** Shut down the SparkContext. */ def stop() { SparkContext.SPARK_CONTEXT_CONSTRUCTOR_LOCK.synchronized { - postApplicationEnd() - ui.foreach(_.stop()) if (!stopped) { stopped = true +postApplicationEnd() +ui.foreach(_.stop()) env.metricsSystem.report() metadataCleaner.cancel() cleaner.foreach(_.stop()) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r1666516 - in /spark: releases/_posts/2015-03-13-spark-release-1-3-0.md site/releases/spark-release-1-3-0.html
Author: pwendell Date: Fri Mar 13 17:15:41 2015 New Revision: 1666516 URL: http://svn.apache.org/r1666516 Log: Incorrect link in Kafka API in 1.3 release notes Modified: spark/releases/_posts/2015-03-13-spark-release-1-3-0.md spark/site/releases/spark-release-1-3-0.html Modified: spark/releases/_posts/2015-03-13-spark-release-1-3-0.md URL: http://svn.apache.org/viewvc/spark/releases/_posts/2015-03-13-spark-release-1-3-0.md?rev=1666516r1=1666515r2=1666516view=diff == --- spark/releases/_posts/2015-03-13-spark-release-1-3-0.md (original) +++ spark/releases/_posts/2015-03-13-spark-release-1-3-0.md Fri Mar 13 17:15:41 2015 @@ -28,7 +28,7 @@ In this release Spark SQL [graduates fro In this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for [topic modeling](https://issues.apache.org/jira/browse/SPARK-1405), [multinomial logistic regression](https://issues.apache.org/jira/browse/SPARK-2309) for multiclass classification, [Gaussian mixture model (GMM)](https://issues.apache.org/jira/browse/SPARK-5012) and [power iteration clustering](https://issues.apache.org/jira/browse/SPARK-4259) for clustering, [FP-growth](https://issues.apache.org/jira/browse/SPARK-4001) for frequent pattern mining, and [block matrix abstraction](https://issues.apache.org/jira/browse/SPARK-4409) for distributed linear algebra. Initial support has been added for [model import/export](https://issues.apache.org/jira/browse/SPARK-4587) in exchangeable format, which will be expanded in future versions to cover more model types in Java/Python/Scala. The implementations of k-means and ALS receive [updates](https://issues.apache.org/jira/browse/SPARK-3424, h ttps://issues.apache.org/jira/browse/SPARK-3541) that lead to significant performance gain. PySpark now supports the [ML pipeline API](https://issues.apache.org/jira/browse/SPARK-4586) added in Spark 1.2, and [gradient boosted trees](https://issues.apache.org/jira/browse/SPARK-5094) and [Gaussian mixture model](https://issues.apache.org/jira/browse/SPARK-5012). Finally, the ML pipeline API has been ported to support the new DataFrames abstraction. ### Spark Streaming -Spark 1.3 introduces a new [*direct* Kafka API](https://issues.apache.org/jira/browse/SPARK-6946) ([docs](http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html)) which enables exactly-once delivery without the use of write ahead logs. It also adds a [Python Kafka API](https://issues.apache.org/jira/browse/SPARK-5047) along with infrastructure for additional Python APIâs in future releases. An online version of [logistic regression](https://issues.apache.org/jira/browse/SPARK-4979) and the ability to read [binary records](https://issues.apache.org/jira/browse/SPARK-4969) have also been added. For stateful operations, support has been added for loading of an [initial state RDD](https://issues.apache.org/jira/browse/SPARK-3660). Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics. +Spark 1.3 introduces a new [*direct* Kafka API](https://issues.apache.org/jira/browse/SPARK-4964) ([docs](http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html)) which enables exactly-once delivery without the use of write ahead logs. It also adds a [Python Kafka API](https://issues.apache.org/jira/browse/SPARK-5047) along with infrastructure for additional Python APIâs in future releases. An online version of [logistic regression](https://issues.apache.org/jira/browse/SPARK-4979) and the ability to read [binary records](https://issues.apache.org/jira/browse/SPARK-4969) have also been added. For stateful operations, support has been added for loading of an [initial state RDD](https://issues.apache.org/jira/browse/SPARK-3660). Finally, the streaming programming guide has been updated to include information about SQL and DataFrame operations within streaming applications, and important clarifications to the fault-tolerance semantics. ### GraphX GraphX adds a handful of utility functions in this release, including conversion into a [canonical edge graph](https://issues.apache.org/jira/browse/SPARK-4917). Modified: spark/site/releases/spark-release-1-3-0.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-1-3-0.html?rev=1666516r1=1666515r2=1666516view=diff == --- spark/site/releases/spark-release-1-3-0.html (original) +++ spark/site/releases/spark-release-1-3-0.html Fri Mar 13 17:15:41 2015 @@ -187,7 +187,7 @@ pIn this release Spark MLlib introduces several new algorithms: latent Dirichlet allocation (LDA) for a href=https://issues.apache.org/jira/browse/SPARK-1405;topic modeling/a, a
spark git commit: SPARK-4300 [CORE] Race condition during SparkWorker shutdown
Repository: spark Updated Branches: refs/heads/branch-1.3 170af49bb - a3493eb77 SPARK-4300 [CORE] Race condition during SparkWorker shutdown Close appender saving stdout/stderr before destroying process to avoid exception on reading closed input stream. (This also removes a redundant `waitFor()` although it was harmless) CC tdas since I think you wrote this method. Author: Sean Owen so...@cloudera.com Closes #4787 from srowen/SPARK-4300 and squashes the following commits: e0cdabf [Sean Owen] Close appender saving stdout/stderr before destroying process to avoid exception on reading closed input stream Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3493eb7 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3493eb7 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3493eb7 Branch: refs/heads/branch-1.3 Commit: a3493eb77a0aa7d3048e657459ebaa22e98ccf0c Parents: 170af49 Author: Sean Owen so...@cloudera.com Authored: Thu Feb 26 14:08:56 2015 -0800 Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 17:54:31 2015 + -- .../scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a3493eb7/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala index 2ec10f8..a3ec803 100644 --- a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala +++ b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala @@ -86,14 +86,13 @@ private[spark] class ExecutorRunner( var exitCode: Option[Int] = None if (process != null) { logInfo(Killing process!) - process.destroy() - process.waitFor() if (stdoutAppender != null) { stdoutAppender.stop() } if (stderrAppender != null) { stderrAppender.stop() } + process.destroy() exitCode = Some(process.waitFor()) } worker ! ExecutorStateChanged(appId, execId, state, message, exitCode) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6275][Documentation]Miss toDF() function in docs/sql-programming-guide.md
Repository: spark Updated Branches: refs/heads/branch-1.3 a08588c7e - 301278126 [SPARK-6275][Documentation]Miss toDF() function in docs/sql-programming-guide.md Miss `toDF()` function in docs/sql-programming-guide.md Author: zzcclp xm_...@sina.com Closes #4977 from zzcclp/SPARK-6275 and squashes the following commits: 9a96c7b [zzcclp] Miss toDF() Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30127812 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30127812 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30127812 Branch: refs/heads/branch-1.3 Commit: 30127812629a53a1b45c4d90b70c5cc55dd28fb6 Parents: a08588c Author: zzcclp xm_...@sina.com Authored: Thu Mar 12 15:07:15 2015 + Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 18:44:52 2015 + -- docs/sql-programming-guide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/30127812/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index b1e309c..11c29e2 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -358,7 +358,7 @@ import sqlContext.implicits._ case class Person(name: String, age: Int) // Create an RDD of Person objects and register it as a table. -val people = sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p = Person(p(0), p(1).trim.toInt)) +val people = sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p = Person(p(0), p(1).trim.toInt)).toDF() people.registerTempTable(people) // SQL statements can be run by using the sql methods provided by sqlContext. - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r1666542 - in /spark: documentation.md site/documentation.html
Author: pwendell Date: Fri Mar 13 18:49:38 2015 New Revision: 1666542 URL: http://svn.apache.org/r1666542 Log: Fixing latest doc link (some how I reverted changes) Modified: spark/documentation.md spark/site/documentation.html Modified: spark/documentation.md URL: http://svn.apache.org/viewvc/spark/documentation.md?rev=1666542r1=1666541r2=1666542view=diff == --- spark/documentation.md (original) +++ spark/documentation.md Fri Mar 13 18:49:38 2015 @@ -12,7 +12,8 @@ navigation: pSetup instructions, programming guides, and other documentation are available for each version of Spark below:/p ul - lia href={{site.url}}docs/latest/Spark 1.2.1 (latest release)/a/li + lia href={{site.url}}docs/latest/Spark 1.3.0 (latest release)/a/li + lia href={{site.url}}docs/1.2.1/Spark 1.2.1/a/li lia href={{site.url}}docs/1.1.1/Spark 1.1.1/a/li lia href={{site.url}}docs/1.0.2/Spark 1.0.2/a/li lia href={{site.url}}docs/0.9.2/Spark 0.9.2/a/li Modified: spark/site/documentation.html URL: http://svn.apache.org/viewvc/spark/site/documentation.html?rev=1666542r1=1666541r2=1666542view=diff == --- spark/site/documentation.html (original) +++ spark/site/documentation.html Fri Mar 13 18:49:38 2015 @@ -172,7 +172,8 @@ pSetup instructions, programming guides, and other documentation are available for each version of Spark below:/p ul - lia href=/docs/latest/Spark 1.2.1 (latest release)/a/li + lia href=/docs/latest/Spark 1.3.0 (latest release)/a/li + lia href=/docs/1.2.1/Spark 1.2.1/a/li lia href=/docs/1.1.1/Spark 1.1.1/a/li lia href=/docs/1.0.2/Spark 1.0.2/a/li lia href=/docs/0.9.2/Spark 0.9.2/a/li - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()
Repository: spark Updated Branches: refs/heads/branch-1.3 dbee7e16c - 170af49bb [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect() Because circular reference between JavaObject and JavaMember, an Java object can not be released until Python GC kick in, then it will cause memory leak in collect(), which may consume lots of memory in JVM. This PR change the way we sending collected data back into Python from local file to socket, which could avoid any disk IO during collect, also avoid any referrers of Java object in Python. cc JoshRosen Author: Davies Liu dav...@databricks.com Closes #4923 from davies/fix_collect and squashes the following commits: d730286 [Davies Liu] address comments 24c92a4 [Davies Liu] fix style ba54614 [Davies Liu] use socket to transfer data from JVM 9517c8f [Davies Liu] fix memory leak in collect() (cherry picked from commit 8767565cef01d847f57b7293d8b63b2422009b90) Signed-off-by: Josh Rosen joshro...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/170af49b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/170af49b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/170af49b Branch: refs/heads/branch-1.3 Commit: 170af49bb0b183b2f4cb3ebbb3e9ab5327f907c9 Parents: dbee7e1 Author: Davies Liu dav...@databricks.com Authored: Mon Mar 9 16:24:06 2015 -0700 Committer: Josh Rosen joshro...@databricks.com Committed: Fri Mar 13 10:44:10 2015 -0700 -- .../org/apache/spark/api/python/PythonRDD.scala | 76 +++- python/pyspark/context.py | 13 ++-- python/pyspark/rdd.py | 30 python/pyspark/sql/dataframe.py | 14 +--- 4 files changed, 82 insertions(+), 51 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/170af49b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala -- diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala index fd89669..4c71b69 100644 --- a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala +++ b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala @@ -19,26 +19,27 @@ package org.apache.spark.api.python import java.io._ import java.net._ -import java.util.{List = JList, ArrayList = JArrayList, Map = JMap, UUID, Collections} - -import org.apache.spark.input.PortableDataStream +import java.util.{Collections, ArrayList = JArrayList, List = JList, Map = JMap} import scala.collection.JavaConversions._ import scala.collection.mutable import scala.language.existentials import com.google.common.base.Charsets.UTF_8 - import org.apache.hadoop.conf.Configuration import org.apache.hadoop.io.compress.CompressionCodec -import org.apache.hadoop.mapred.{InputFormat, OutputFormat, JobConf} +import org.apache.hadoop.mapred.{InputFormat, JobConf, OutputFormat} import org.apache.hadoop.mapreduce.{InputFormat = NewInputFormat, OutputFormat = NewOutputFormat} + import org.apache.spark._ -import org.apache.spark.api.java.{JavaSparkContext, JavaPairRDD, JavaRDD} +import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext} import org.apache.spark.broadcast.Broadcast +import org.apache.spark.input.PortableDataStream import org.apache.spark.rdd.RDD import org.apache.spark.util.Utils +import scala.util.control.NonFatal + private[spark] class PythonRDD( @transient parent: RDD[_], command: Array[Byte], @@ -344,21 +345,33 @@ private[spark] object PythonRDD extends Logging { /** * Adapter for calling SparkContext#runJob from Python. * - * This method will return an iterator of an array that contains all elements in the RDD + * This method will serve an iterator of an array that contains all elements in the RDD * (effectively a collect()), but allows you to run on a certain subset of partitions, * or to enable local execution. + * + * @return the port number of a local socket which serves the data collected from this job. */ def runJob( sc: SparkContext, rdd: JavaRDD[Array[Byte]], partitions: JArrayList[Int], - allowLocal: Boolean): Iterator[Array[Byte]] = { + allowLocal: Boolean): Int = { type ByteArray = Array[Byte] type UnrolledPartition = Array[ByteArray] val allPartitions: Array[UnrolledPartition] = sc.runJob(rdd, (x: Iterator[ByteArray]) = x.toArray, partitions, allowLocal) val flattenedPartition: UnrolledPartition = Array.concat(allPartitions: _*) -flattenedPartition.iterator +serveIterator(flattenedPartition.iterator, + sserve RDD ${rdd.id} with partitions ${partitions.mkString(,)}) + } + + /**
spark git commit: SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush output
Repository: spark Updated Branches: refs/heads/branch-1.3 214f68103 - dbee7e16c SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush output Join on output threads to make sure any lingering output from process reaches stdout, stderr before exiting CC andrewor14 since I believe he created this section of code Author: Sean Owen so...@cloudera.com Closes #4788 from srowen/SPARK-4704 and squashes the following commits: ad7114e [Sean Owen] Join on output threads to make sure any lingering output from process reaches stdout, stderr before exiting Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dbee7e16 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dbee7e16 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dbee7e16 Branch: refs/heads/branch-1.3 Commit: dbee7e16c7434326cce6f6d5ab494093c60ee097 Parents: 214f681 Author: Sean Owen so...@cloudera.com Authored: Thu Feb 26 12:56:54 2015 -0800 Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 17:43:05 2015 + -- .../org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dbee7e16/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala index 2eab998..311048c 100644 --- a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala +++ b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala @@ -17,8 +17,6 @@ package org.apache.spark.deploy -import java.io.File - import scala.collection.JavaConversions._ import org.apache.spark.util.{RedirectThread, Utils} @@ -164,6 +162,8 @@ private[spark] object SparkSubmitDriverBootstrapper { } } val returnCode = process.waitFor() +stdoutThread.join() +stderrThread.join() sys.exit(returnCode) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[1/2] spark git commit: [SPARK-6132] ContextCleaner race condition across SparkContexts
Repository: spark Updated Branches: refs/heads/branch-1.3 9846790f4 - 338bea7b3 [SPARK-6132] ContextCleaner race condition across SparkContexts The problem is that `ContextCleaner` may clean variables that belong to a different `SparkContext`. This can happen if the `SparkContext` to which the cleaner belongs stops, and a new one is started immediately afterwards in the same JVM. In this case, if the cleaner is in the middle of cleaning a broadcast, for instance, it will do so through `SparkEnv.get.blockManager`, which could be one that belongs to a different `SparkContext`. JoshRosen and I suspect that this is the cause of many flaky tests, most notably the `JavaAPISuite`. We were able to reproduce the failure locally (though it is not deterministic and very hard to reproduce). Author: Andrew Or and...@databricks.com Closes #4869 from andrewor14/cleaner-masquerade and squashes the following commits: 29168c0 [Andrew Or] Synchronize ContextCleaner stop Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3cdc8a35 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3cdc8a35 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3cdc8a35 Branch: refs/heads/branch-1.3 Commit: 3cdc8a35a7b9bbdf418988d0fe4524d413dce23c Parents: 9846790 Author: Andrew Or and...@databricks.com Authored: Tue Mar 3 13:44:05 2015 -0800 Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 18:20:50 2015 + -- .../scala/org/apache/spark/ContextCleaner.scala | 35 ++-- 1 file changed, 24 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/3cdc8a35/core/src/main/scala/org/apache/spark/ContextCleaner.scala -- diff --git a/core/src/main/scala/org/apache/spark/ContextCleaner.scala b/core/src/main/scala/org/apache/spark/ContextCleaner.scala index ede1e23..201e5ec 100644 --- a/core/src/main/scala/org/apache/spark/ContextCleaner.scala +++ b/core/src/main/scala/org/apache/spark/ContextCleaner.scala @@ -104,9 +104,19 @@ private[spark] class ContextCleaner(sc: SparkContext) extends Logging { cleaningThread.start() } - /** Stop the cleaner. */ + /** + * Stop the cleaning thread and wait until the thread has finished running its current task. + */ def stop() { stopped = true +// Interrupt the cleaning thread, but wait until the current task has finished before +// doing so. This guards against the race condition where a cleaning thread may +// potentially clean similarly named variables created by a different SparkContext, +// resulting in otherwise inexplicable block-not-found exceptions (SPARK-6132). +synchronized { + cleaningThread.interrupt() +} +cleaningThread.join() } /** Register a RDD for cleanup when it is garbage collected. */ @@ -135,16 +145,19 @@ private[spark] class ContextCleaner(sc: SparkContext) extends Logging { try { val reference = Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT)) .map(_.asInstanceOf[CleanupTaskWeakReference]) -reference.map(_.task).foreach { task = - logDebug(Got cleaning task + task) - referenceBuffer -= reference.get - task match { -case CleanRDD(rddId) = - doCleanupRDD(rddId, blocking = blockOnCleanupTasks) -case CleanShuffle(shuffleId) = - doCleanupShuffle(shuffleId, blocking = blockOnShuffleCleanupTasks) -case CleanBroadcast(broadcastId) = - doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks) +// Synchronize here to avoid being interrupted on stop() +synchronized { + reference.map(_.task).foreach { task = +logDebug(Got cleaning task + task) +referenceBuffer -= reference.get +task match { + case CleanRDD(rddId) = +doCleanupRDD(rddId, blocking = blockOnCleanupTasks) + case CleanShuffle(shuffleId) = +doCleanupShuffle(shuffleId, blocking = blockOnShuffleCleanupTasks) + case CleanBroadcast(broadcastId) = +doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks) +} } } } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r1666540 - in /spark: news/_posts/2015-03-13-spark-1-3-0-released.md site/documentation.html site/news/index.html site/news/spark-1-3-0-released.html
Author: pwendell Date: Fri Mar 13 18:47:57 2015 New Revision: 1666540 URL: http://svn.apache.org/r1666540 Log: Fixing old link in release news item Modified: spark/news/_posts/2015-03-13-spark-1-3-0-released.md spark/site/documentation.html spark/site/news/index.html spark/site/news/spark-1-3-0-released.html Modified: spark/news/_posts/2015-03-13-spark-1-3-0-released.md URL: http://svn.apache.org/viewvc/spark/news/_posts/2015-03-13-spark-1-3-0-released.md?rev=1666540r1=1666539r2=1666540view=diff == --- spark/news/_posts/2015-03-13-spark-1-3-0-released.md (original) +++ spark/news/_posts/2015-03-13-spark-1-3-0-released.md Fri Mar 13 18:47:57 2015 @@ -11,6 +11,6 @@ meta: _edit_last: '4' _wpas_done_all: '1' --- -We are happy to announce the availability of a href={{site.url}}releases/spark-release-1-2-0.html title=Spark Release 1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 174 developers and more than 1,000 commits! +We are happy to announce the availability of a href={{site.url}}releases/spark-release-1-3-0.html title=Spark Release 1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark's largest release ever, with contributions from 174 developers and more than 1,000 commits! Visit the a href={{site.url}}releases/spark-release-1-3-0.html title=Spark Release 1.3.0release notes/a to read about the new features, or a href={{site.url}}downloads.htmldownload/a the release today. Modified: spark/site/documentation.html URL: http://svn.apache.org/viewvc/spark/site/documentation.html?rev=1666540r1=1666539r2=1666540view=diff == --- spark/site/documentation.html (original) +++ spark/site/documentation.html Fri Mar 13 18:47:57 2015 @@ -172,8 +172,7 @@ pSetup instructions, programming guides, and other documentation are available for each version of Spark below:/p ul - lia href=/docs/latest/Spark 1.3.0 (latest release)/a/li - lia href=/docs/1.2.1/Spark 1.2.1/a/li + lia href=/docs/latest/Spark 1.2.1 (latest release)/a/li lia href=/docs/1.1.1/Spark 1.1.1/a/li lia href=/docs/1.0.2/Spark 1.0.2/a/li lia href=/docs/0.9.2/Spark 0.9.2/a/li Modified: spark/site/news/index.html URL: http://svn.apache.org/viewvc/spark/site/news/index.html?rev=1666540r1=1666539r2=1666540view=diff == --- spark/site/news/index.html (original) +++ spark/site/news/index.html Fri Mar 13 18:47:57 2015 @@ -174,7 +174,7 @@ h3 class=entry-titlea href=/news/spark-1-3-0-released.htmlSpark 1.3.0 released/a/h3 div class=entry-dateMarch 13, 2015/div /header -div class=entry-contentpWe are happy to announce the availability of a href=/releases/spark-release-1-2-0.html title=Spark Release 1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark#8217;s largest release ever, with contributions from 174 developers and more than 1,000 commits!/p +div class=entry-contentpWe are happy to announce the availability of a href=/releases/spark-release-1-3-0.html title=Spark Release 1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark#8217;s largest release ever, with contributions from 174 developers and more than 1,000 commits!/p /div /article Modified: spark/site/news/spark-1-3-0-released.html URL: http://svn.apache.org/viewvc/spark/site/news/spark-1-3-0-released.html?rev=1666540r1=1666539r2=1666540view=diff == --- spark/site/news/spark-1-3-0-released.html (original) +++ spark/site/news/spark-1-3-0-released.html Fri Mar 13 18:47:57 2015 @@ -170,7 +170,7 @@ h2Spark 1.3.0 released/h2 -pWe are happy to announce the availability of a href=/releases/spark-release-1-2-0.html title=Spark Release 1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark#8217;s largest release ever, with contributions from 174 developers and more than 1,000 commits!/p +pWe are happy to announce the availability of a href=/releases/spark-release-1-3-0.html title=Spark Release 1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It is Spark#8217;s largest release ever, with contributions from 174 developers and more than 1,000 commits!/p pVisit the a href=/releases/spark-release-1-3-0.html title=Spark Release 1.3.0release notes/a to read about the new features, or a href=/downloads.htmldownload/a the release today./p - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail:
spark git commit: [SPARK-6278][MLLIB] Mention the change of objective in linear regression
Repository: spark Updated Branches: refs/heads/master dc4abd4dc - 7f13434a5 [SPARK-6278][MLLIB] Mention the change of objective in linear regression As discussed in the RC3 vote thread, we should mention the change of objective in linear regression in the migration guide. srowen Author: Xiangrui Meng m...@databricks.com Closes #4978 from mengxr/SPARK-6278 and squashes the following commits: fb3bbe6 [Xiangrui Meng] mention regularization parameter bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into SPARK-6278 375fd09 [Xiangrui Meng] address Sean's comments f87ae71 [Xiangrui Meng] mention step size change Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7f13434a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7f13434a Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7f13434a Branch: refs/heads/master Commit: 7f13434a5c52b815c584ec773ab0e5df1a35ea86 Parents: dc4abd4 Author: Xiangrui Meng m...@databricks.com Authored: Fri Mar 13 10:27:28 2015 -0700 Committer: Xiangrui Meng m...@databricks.com Committed: Fri Mar 13 10:27:28 2015 -0700 -- docs/mllib-guide.md | 2 ++ 1 file changed, 2 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/7f13434a/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 598374f..f8e8794 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -102,6 +102,8 @@ In the `spark.mllib` package, there were several breaking changes. The first ch * In `DecisionTree`, the deprecated class method `train` has been removed. (The object/static `train` methods remain.) * In `Strategy`, the `checkpointDir` parameter has been removed. Checkpointing is still supported, but the checkpoint directory must be set before calling tree and tree ensemble training. * `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was a public API but is now private, declared `private[python]`. This was never meant for external use. +* In linear regression (including Lasso and ridge regression), the squared loss is now divided by 2. + So in order to produce the same result as in 1.2, the regularization parameter needs to be divided by 2 and the step size needs to be multiplied by 2. ## Previous Spark Versions - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[2/2] spark git commit: [SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet
[SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet If the cleaner is stopped, we shouldn't print a huge stack trace when the cleaner thread is interrupted because we purposefully did this. Author: Andrew Or and...@databricks.com Closes #4882 from andrewor14/cleaner-interrupt and squashes the following commits: 8652120 [Andrew Or] Just a hot fix Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/338bea7b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/338bea7b Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/338bea7b Branch: refs/heads/branch-1.3 Commit: 338bea7b33a0faaa62c94ace334a79c0b1716a01 Parents: 3cdc8a3 Author: Andrew Or and...@databricks.com Authored: Tue Mar 3 20:49:45 2015 -0800 Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 18:21:44 2015 + -- core/src/main/scala/org/apache/spark/ContextCleaner.scala | 1 + 1 file changed, 1 insertion(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/338bea7b/core/src/main/scala/org/apache/spark/ContextCleaner.scala -- diff --git a/core/src/main/scala/org/apache/spark/ContextCleaner.scala b/core/src/main/scala/org/apache/spark/ContextCleaner.scala index 201e5ec..98e4401 100644 --- a/core/src/main/scala/org/apache/spark/ContextCleaner.scala +++ b/core/src/main/scala/org/apache/spark/ContextCleaner.scala @@ -161,6 +161,7 @@ private[spark] class ContextCleaner(sc: SparkContext) extends Logging { } } } catch { +case ie: InterruptedException if stopped = // ignore case e: Exception = logError(Error in cleaning thread, e) } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6285] [SQL] Removes unused ParquetTestData and duplicated TestGroupWriteSupport
Repository: spark Updated Branches: refs/heads/master b943f5d90 - cdc34ed91 [SPARK-6285] [SQL] Removes unused ParquetTestData and duplicated TestGroupWriteSupport All the contents in this file are not referenced anywhere and should have been removed in #4116 when I tried to get rid of the old Parquet test suites. !-- Reviewable:start -- [img src=https://reviewable.io/review_button.png; height=40 alt=Review on Reviewable/](https://reviewable.io/reviews/apache/spark/5010) !-- Reviewable:end -- Author: Cheng Lian l...@databricks.com Closes #5010 from liancheng/spark-6285 and squashes the following commits: 06ed057 [Cheng Lian] Removes unused ParquetTestData and duplicated TestGroupWriteSupport Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdc34ed9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdc34ed9 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdc34ed9 Branch: refs/heads/master Commit: cdc34ed9108688fea32ad170b1ba344fe047716b Parents: b943f5d Author: Cheng Lian l...@databricks.com Authored: Sat Mar 14 07:09:53 2015 +0800 Committer: Cheng Lian l...@databricks.com Committed: Sat Mar 14 07:09:53 2015 +0800 -- .../spark/sql/parquet/ParquetTestData.scala | 466 --- 1 file changed, 466 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/cdc34ed9/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala deleted file mode 100644 index e4a10aa..000 --- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala +++ /dev/null @@ -1,466 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the License); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - *http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an AS IS BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -package org.apache.spark.sql.parquet - -import java.io.File - -import org.apache.hadoop.conf.Configuration -import org.apache.hadoop.fs.{FileStatus, FileSystem, Path} -import org.apache.hadoop.mapreduce.Job -import org.apache.spark.sql.test.TestSQLContext - -import parquet.example.data.{GroupWriter, Group} -import parquet.example.data.simple.{NanoTime, SimpleGroup} -import parquet.hadoop.{ParquetReader, ParquetFileReader, ParquetWriter} -import parquet.hadoop.api.WriteSupport -import parquet.hadoop.api.WriteSupport.WriteContext -import parquet.hadoop.example.GroupReadSupport -import parquet.hadoop.util.ContextUtil -import parquet.io.api.RecordConsumer -import parquet.schema.{MessageType, MessageTypeParser} - -import org.apache.spark.util.Utils - -// Write support class for nested groups: ParquetWriter initializes GroupWriteSupport -// with an empty configuration (it is after all not intended to be used in this way?) -// and members are private so we need to make our own in order to pass the schema -// to the writer. -private class TestGroupWriteSupport(schema: MessageType) extends WriteSupport[Group] { - var groupWriter: GroupWriter = null - override def prepareForWrite(recordConsumer: RecordConsumer): Unit = { -groupWriter = new GroupWriter(recordConsumer, schema) - } - override def init(configuration: Configuration): WriteContext = { -new WriteContext(schema, new java.util.HashMap[String, String]()) - } - override def write(record: Group) { -groupWriter.write(record) - } -} - -private[sql] object ParquetTestData { - - val testSchema = -message myrecord { - optional boolean myboolean; - optional int32 myint; - optional binary mystring (UTF8); - optional int64 mylong; - optional float myfloat; - optional double mydouble; - optional int96 mytimestamp; - } - - // field names for test assertion error messages - val testSchemaFieldNames = Seq( -myboolean:Boolean, -myint:Int, -mystring:String, -mylong:Long, -myfloat:Float, -mydouble:Double, -mytimestamp:Timestamp - ) - - val subTestSchema = - - message myrecord { -
spark git commit: [SPARK-6317][SQL]Fixed HIVE console startup issue
Repository: spark Updated Branches: refs/heads/master cdc34ed91 - e360d5e4a [SPARK-6317][SQL]Fixed HIVE console startup issue Author: vinodkc vinod.kc...@gmail.com Author: Vinod K C vinod...@huawei.com Closes #5011 from vinodkc/HIVE_console_startupError and squashes the following commits: b43925f [vinodkc] Changed order of import b4f5453 [Vinod K C] Fixed HIVE console startup issue Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e360d5e4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e360d5e4 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e360d5e4 Branch: refs/heads/master Commit: e360d5e4adf287444c10e72f8e4d57548839bf6e Parents: cdc34ed Author: vinodkc vinod.kc...@gmail.com Authored: Sat Mar 14 07:17:54 2015 +0800 Committer: Cheng Lian l...@databricks.com Committed: Sat Mar 14 07:17:54 2015 +0800 -- project/SparkBuild.scala | 4 ++-- sql/README.md| 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e360d5e4/project/SparkBuild.scala -- diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala index 4a06b98..f4c74c4 100644 --- a/project/SparkBuild.scala +++ b/project/SparkBuild.scala @@ -269,8 +269,8 @@ object SQL { |import org.apache.spark.sql.catalyst.plans.logical._ |import org.apache.spark.sql.catalyst.rules._ |import org.apache.spark.sql.catalyst.util._ -|import org.apache.spark.sql.Dsl._ |import org.apache.spark.sql.execution +|import org.apache.spark.sql.functions._ |import org.apache.spark.sql.test.TestSQLContext._ |import org.apache.spark.sql.types._ |import org.apache.spark.sql.parquet.ParquetTestData.stripMargin, @@ -300,8 +300,8 @@ object Hive { |import org.apache.spark.sql.catalyst.plans.logical._ |import org.apache.spark.sql.catalyst.rules._ |import org.apache.spark.sql.catalyst.util._ -|import org.apache.spark.sql.Dsl._ |import org.apache.spark.sql.execution +|import org.apache.spark.sql.functions._ |import org.apache.spark.sql.hive._ |import org.apache.spark.sql.hive.test.TestHive._ |import org.apache.spark.sql.types._ http://git-wip-us.apache.org/repos/asf/spark/blob/e360d5e4/sql/README.md -- diff --git a/sql/README.md b/sql/README.md index a792499..48f8334 100644 --- a/sql/README.md +++ b/sql/README.md @@ -36,8 +36,8 @@ import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.plans.logical._ import org.apache.spark.sql.catalyst.rules._ import org.apache.spark.sql.catalyst.util._ -import org.apache.spark.sql.Dsl._ import org.apache.spark.sql.execution +import org.apache.spark.sql.functions._ import org.apache.spark.sql.hive._ import org.apache.spark.sql.hive.test.TestHive._ import org.apache.spark.sql.types._ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5845][Shuffle] Time to cleanup spilled shuffle files not included in shuffle write time
Repository: spark Updated Branches: refs/heads/master 3980ebdf1 - 0af9ea74a [SPARK-5845][Shuffle] Time to cleanup spilled shuffle files not included in shuffle write time I've added a timer in the right place to fix this inaccuracy. Author: Ilya Ganelin ilya.gane...@capitalone.com Closes #4965 from ilganeli/SPARK-5845 and squashes the following commits: bfabf88 [Ilya Ganelin] Changed to using a foreach vs. getorelse 3e059b0 [Ilya Ganelin] Switched to using getorelse b946d08 [Ilya Ganelin] Fixed error with option 9434b50 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into SPARK-5845 db8647e [Ilya Ganelin] Added update for shuffleWriteTime around spilled file cleanup in ExternalSorter Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0af9ea74 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0af9ea74 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0af9ea74 Branch: refs/heads/master Commit: 0af9ea74a07ecdc08c43fa63cb9c9f0c57e3029b Parents: 3980ebd Author: Ilya Ganelin ilya.gane...@capitalone.com Authored: Fri Mar 13 13:21:04 2015 + Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 13:21:04 2015 + -- .../scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala | 3 +++ 1 file changed, 3 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/0af9ea74/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala -- diff --git a/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala b/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala index 27496c5..fa2e617 100644 --- a/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala +++ b/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala @@ -88,7 +88,10 @@ private[spark] class SortShuffleWriter[K, V, C]( } finally { // Clean up our sorter, which may have its own intermediate files if (sorter != null) { +val startTime = System.nanoTime() sorter.stop() +context.taskMetrics.shuffleWriteMetrics.foreach( + _.incShuffleWriteTime(System.nanoTime - startTime)) sorter = null } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r1666484 - in /spark: downloads.md js/downloads.js site/community.html site/docs/latest site/downloads.html site/examples.html site/index.html site/js/downloads.js
Author: pwendell Date: Fri Mar 13 15:48:06 2015 New Revision: 1666484 URL: http://svn.apache.org/r1666484 Log: Initial 1.3.0 code Modified: spark/downloads.md spark/js/downloads.js spark/site/community.html spark/site/docs/latest spark/site/downloads.html spark/site/examples.html spark/site/index.html spark/site/js/downloads.js Modified: spark/downloads.md URL: http://svn.apache.org/viewvc/spark/downloads.md?rev=1666484r1=1666483r2=1666484view=diff == --- spark/downloads.md (original) +++ spark/downloads.md Fri Mar 13 15:48:06 2015 @@ -16,9 +16,9 @@ $(document).ready(function() { ## Download Spark -The latest release of Spark is Spark 1.2.1, released on February 9, 2015 -a href={{site.url}}releases/spark-release-1-2-1.html(release notes)/a -a href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97;(git tag)/abr/ +The latest release of Spark is Spark 1.3.0, released on March 13, 2015 +a href={{site.url}}releases/spark-release-1-3-0.html(release notes)/a +a href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc;(git tag)/abr/ 1. Chose a Spark release: select id=sparkVersionSelect onChange=javascript:onVersionSelect();/selectbr @@ -41,7 +41,7 @@ Spark artifacts are [hosted in Maven Cen groupId: org.apache.spark artifactId: spark-core_2.10 -version: 1.2.1 +version: 1.3.0 ### Development and Maintenance Branches If you are interested in working with the newest under-development code or contributing to Spark development, you can also check out the master branch from Git: @@ -49,8 +49,8 @@ If you are interested in working with th # Master development branch git clone git://github.com/apache/spark.git -# 1.2 maintenance branch with stability fixes on top of Spark 1.2.1 -git clone git://github.com/apache/spark.git -b branch-1.2 +# 1.3 maintenance branch with stability fixes on top of Spark 1.3.0 +git clone git://github.com/apache/spark.git -b branch-1.3 Once you've downloaded Spark, you can find instructions for installing and building it on the a href={{site.url}}documentation.htmldocumentation page/a. Modified: spark/js/downloads.js URL: http://svn.apache.org/viewvc/spark/js/downloads.js?rev=1666484r1=1666483r2=1666484view=diff == --- spark/js/downloads.js (original) +++ spark/js/downloads.js Fri Mar 13 15:48:06 2015 @@ -26,6 +26,7 @@ var packagesV3 = [mapr3, mapr4].concat(p // 1.1.0+ var packagesV4 = [hadoop2p4, hadoop2p3, mapr3, mapr4].concat(packagesV1); +addRelease(1.3.0, new Date(3/13/2015), sources.concat(packagesV4), true); addRelease(1.2.1, new Date(2/9/2015), sources.concat(packagesV4), true); addRelease(1.2.0, new Date(12/18/2014), sources.concat(packagesV4), true); addRelease(1.1.1, new Date(11/26/2014), sources.concat(packagesV4), true); Modified: spark/site/community.html URL: http://svn.apache.org/viewvc/spark/site/community.html?rev=1666484r1=1666483r2=1666484view=diff == --- spark/site/community.html (original) +++ spark/site/community.html Fri Mar 13 15:48:06 2015 @@ -188,8 +188,6 @@ /li /ul -pThe StackOverflow tag a href=http://stackoverflow.com/questions/tagged/apache-spark;codeapache-spark/code/a is an unofficial but active forum for Spark users' questions and answers./p - pa name=events/a/p h3Events and Meetups/h3 Modified: spark/site/docs/latest URL: http://svn.apache.org/viewvc/spark/site/docs/latest?rev=1666484r1=1666483r2=1666484view=diff == --- spark/site/docs/latest (original) +++ spark/site/docs/latest Fri Mar 13 15:48:06 2015 @@ -1 +1 @@ -link 1.2.1 \ No newline at end of file +link 1.3.0 \ No newline at end of file Modified: spark/site/downloads.html URL: http://svn.apache.org/viewvc/spark/site/downloads.html?rev=1666484r1=1666483r2=1666484view=diff == --- spark/site/downloads.html (original) +++ spark/site/downloads.html Fri Mar 13 15:48:06 2015 @@ -176,21 +176,21 @@ $(document).ready(function() { h2 id=download-sparkDownload Spark/h2 -pThe latest release of Spark is Spark 1.2.1, released on February 9, 2015 -a href=/releases/spark-release-1-2-1.html(release notes)/a -a href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97;(git tag)/abr //p +pThe latest release of Spark is Spark 1.3.0, released on March 13, 2015 +a href=/releases/spark-release-1-3-0.html(release notes)/a +a href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc;(git tag)/abr //p ol
svn commit: r1666486 [2/2] - in /spark: _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/
Modified: spark/site/releases/spark-release-0-3.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-3.html?rev=1666486r1=1666485r2=1666486view=diff == --- spark/site/releases/spark-release-0-3.html (original) +++ spark/site/releases/spark-release-0-3.html Fri Mar 13 15:51:27 2015 @@ -105,7 +105,7 @@ /a ul class=dropdown-menu lia href=/documentation.htmlOverview/a/li - lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li + lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li /ul /li lia href=/examples.htmlExamples/a/li @@ -135,6 +135,9 @@ h5Latest News/h5 ul class=list-unstyled + lia href=/news/spark-1-3-0-released.htmlSpark 1.3.0 released/a + span class=small(Mar 13, 2015)/span/li + lia href=/news/spark-1-2-1-released.htmlSpark 1.2.1 released/a span class=small(Feb 09, 2015)/span/li @@ -144,9 +147,6 @@ lia href=/news/spark-1-2-0-released.htmlSpark 1.2.0 released/a span class=small(Dec 18, 2014)/span/li - lia href=/news/spark-1-1-1-released.htmlSpark 1.1.1 released/a - span class=small(Nov 26, 2014)/span/li - /ul p class=small style=text-align: right;a href=/news/index.htmlArchive/a/p /div Modified: spark/site/releases/spark-release-0-5-0.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-5-0.html?rev=1666486r1=1666485r2=1666486view=diff == --- spark/site/releases/spark-release-0-5-0.html (original) +++ spark/site/releases/spark-release-0-5-0.html Fri Mar 13 15:51:27 2015 @@ -105,7 +105,7 @@ /a ul class=dropdown-menu lia href=/documentation.htmlOverview/a/li - lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li + lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li /ul /li lia href=/examples.htmlExamples/a/li @@ -135,6 +135,9 @@ h5Latest News/h5 ul class=list-unstyled + lia href=/news/spark-1-3-0-released.htmlSpark 1.3.0 released/a + span class=small(Mar 13, 2015)/span/li + lia href=/news/spark-1-2-1-released.htmlSpark 1.2.1 released/a span class=small(Feb 09, 2015)/span/li @@ -144,9 +147,6 @@ lia href=/news/spark-1-2-0-released.htmlSpark 1.2.0 released/a span class=small(Dec 18, 2014)/span/li - lia href=/news/spark-1-1-1-released.htmlSpark 1.1.1 released/a - span class=small(Nov 26, 2014)/span/li - /ul p class=small style=text-align: right;a href=/news/index.htmlArchive/a/p /div Modified: spark/site/releases/spark-release-0-5-1.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-5-1.html?rev=1666486r1=1666485r2=1666486view=diff == --- spark/site/releases/spark-release-0-5-1.html (original) +++ spark/site/releases/spark-release-0-5-1.html Fri Mar 13 15:51:27 2015 @@ -105,7 +105,7 @@ /a ul class=dropdown-menu lia href=/documentation.htmlOverview/a/li - lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li + lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li /ul /li lia href=/examples.htmlExamples/a/li @@ -135,6 +135,9 @@ h5Latest News/h5 ul class=list-unstyled + lia href=/news/spark-1-3-0-released.htmlSpark 1.3.0 released/a + span class=small(Mar 13, 2015)/span/li + lia href=/news/spark-1-2-1-released.htmlSpark 1.2.1 released/a span class=small(Feb 09, 2015)/span/li @@ -144,9 +147,6 @@ lia href=/news/spark-1-2-0-released.htmlSpark 1.2.0 released/a span class=small(Dec 18, 2014)/span/li - lia href=/news/spark-1-1-1-released.htmlSpark 1.1.1 released/a - span class=small(Nov 26, 2014)/span/li - /ul p class=small style=text-align: right;a href=/news/index.htmlArchive/a/p /div Modified: spark/site/releases/spark-release-0-5-2.html URL: http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-5-2.html?rev=1666486r1=1666485r2=1666486view=diff == --- spark/site/releases/spark-release-0-5-2.html (original) +++ spark/site/releases/spark-release-0-5-2.html Fri Mar 13 15:51:27 2015 @@ -105,7 +105,7 @@ /a ul class=dropdown-menu lia href=/documentation.htmlOverview/a/li - lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li + lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li
spark git commit: [CORE][minor] remove unnecessary ClassTag in `DAGScheduler`
Repository: spark Updated Branches: refs/heads/master 9048e8102 - ea3d2eed9 [CORE][minor] remove unnecessary ClassTag in `DAGScheduler` This existed at the very beginning, but became unnecessary after [this commit](https://github.com/apache/spark/commit/37d8f37a8ec110416fba0d51d8ba70370ac380c1#diff-6a9ff7fb74fd490a50462d45db2d5e11L272). I think we should remove it if we don't plan to use it in the future. Author: Wenchen Fan cloud0...@outlook.com Closes #4992 from cloud-fan/small and squashes the following commits: e857f2e [Wenchen Fan] remove unnecessary ClassTag Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea3d2eed Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea3d2eed Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea3d2eed Branch: refs/heads/master Commit: ea3d2eed9b0a94b34543d9a9df87dc63a279deb1 Parents: 9048e81 Author: Wenchen Fan cloud0...@outlook.com Authored: Fri Mar 13 14:08:56 2015 + Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 14:08:56 2015 + -- core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/ea3d2eed/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala index bc84e23..e4170a5 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala @@ -26,7 +26,6 @@ import scala.collection.mutable.{ArrayBuffer, HashMap, HashSet, Map, Stack} import scala.concurrent.Await import scala.concurrent.duration._ import scala.language.postfixOps -import scala.reflect.ClassTag import scala.util.control.NonFatal import akka.pattern.ask @@ -497,7 +496,7 @@ class DAGScheduler( waiter } - def runJob[T, U: ClassTag]( + def runJob[T, U]( rdd: RDD[T], func: (TaskContext, Iterator[T]) = U, partitions: Seq[Int], - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-6197][CORE] handle json exception when hisotry file not finished writing
Repository: spark Updated Branches: refs/heads/master 69ff8e8cf - 9048e8102 [SPARK-6197][CORE] handle json exception when hisotry file not finished writing For details, please refer to [SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197) Author: Zhang, Liye liye.zh...@intel.com Closes #4927 from liyezhang556520/jsonParseError and squashes the following commits: 5cbdc82 [Zhang, Liye] without unnecessary wrap 2b48831 [Zhang, Liye] small changes with sean owen's comments 2973024 [Zhang, Liye] handle json exception when file not finished writing Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9048e810 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9048e810 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9048e810 Branch: refs/heads/master Commit: 9048e8102e3f564842fa0dc6e82edce70b7dd3d7 Parents: 69ff8e8 Author: Zhang, Liye liye.zh...@intel.com Authored: Fri Mar 13 13:59:54 2015 + Committer: Sean Owen so...@cloudera.com Committed: Fri Mar 13 14:00:45 2015 + -- .../org/apache/spark/deploy/master/Master.scala | 3 ++- .../spark/scheduler/ReplayListenerBus.scala | 25 2 files changed, 23 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9048e810/core/src/main/scala/org/apache/spark/deploy/master/Master.scala -- diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala index 1581429..22935c9 100644 --- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala +++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala @@ -764,8 +764,9 @@ private[spark] class Master( val replayBus = new ReplayListenerBus() val ui = SparkUI.createHistoryUI(new SparkConf, replayBus, new SecurityManager(conf), appName + status, HistoryServer.UI_PATH_PREFIX + s/${app.id}) + val maybeTruncated = eventLogFile.endsWith(EventLoggingListener.IN_PROGRESS) try { -replayBus.replay(logInput, eventLogFile) +replayBus.replay(logInput, eventLogFile, maybeTruncated) } finally { logInput.close() } http://git-wip-us.apache.org/repos/asf/spark/blob/9048e810/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala -- diff --git a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala index 95273c7..86f357a 100644 --- a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala +++ b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala @@ -21,6 +21,7 @@ import java.io.{InputStream, IOException} import scala.io.Source +import com.fasterxml.jackson.core.JsonParseException import org.json4s.jackson.JsonMethods._ import org.apache.spark.Logging @@ -40,15 +41,31 @@ private[spark] class ReplayListenerBus extends SparkListenerBus with Logging { * * @param logData Stream containing event log data. * @param sourceName Filename (or other source identifier) from whence @logData is being read + * @param maybeTruncated Indicate whether log file might be truncated (some abnormal situations + *encountered, log file might not finished writing) or not */ - def replay(logData: InputStream, sourceName: String): Unit = { + def replay( + logData: InputStream, + sourceName: String, + maybeTruncated: Boolean = false): Unit = { var currentLine: String = null var lineNumber: Int = 1 try { val lines = Source.fromInputStream(logData).getLines() - lines.foreach { line = -currentLine = line -postToAll(JsonProtocol.sparkEventFromJson(parse(line))) + while (lines.hasNext) { +currentLine = lines.next() +try { + postToAll(JsonProtocol.sparkEventFromJson(parse(currentLine))) +} catch { + case jpe: JsonParseException = +// We can only ignore exception from last line of the file that might be truncated +if (!maybeTruncated || lines.hasNext) { + throw jpe +} else { + logWarning(sGot JsonParseException from log file $sourceName + +s at line $lineNumber, the file might not have finished writing cleanly.) +} +} lineNumber += 1 } } catch { - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide
Repository: spark Updated Branches: refs/heads/branch-1.3 23069bd02 - dc287f38f [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide Also fixed a bunch of minor styling issues. !-- Reviewable:start -- [img src=https://reviewable.io/review_button.png; height=40 alt=Review on Reviewable/](https://reviewable.io/reviews/apache/spark/5001) !-- Reviewable:end -- Author: Cheng Lian l...@databricks.com Closes #5001 from liancheng/parquet-doc and squashes the following commits: 89ad3db [Cheng Lian] Addresses @rxin's comments 7eb6955 [Cheng Lian] Docs for the new Parquet data source 415eefb [Cheng Lian] Some minor formatting improvements (cherry picked from commit 69ff8e8cfbecd81fd54100c4dab332c3bc992316) Signed-off-by: Cheng Lian l...@databricks.com Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dc287f38 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dc287f38 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dc287f38 Branch: refs/heads/branch-1.3 Commit: dc287f38f1cc192b7fa6ec0e83b36254f1cfec10 Parents: 23069bd Author: Cheng Lian l...@databricks.com Authored: Fri Mar 13 21:34:50 2015 +0800 Committer: Cheng Lian l...@databricks.com Committed: Fri Mar 13 21:36:47 2015 +0800 -- docs/sql-programming-guide.md | 237 - 1 file changed, 180 insertions(+), 57 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/dc287f38/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 9c363bc..b1e309c 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -21,14 +21,14 @@ The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell` or the `pyspark` shell. -## Starting Point: SQLContext +## Starting Point: `SQLContext` div class=codetabs div data-lang=scala markdown=1 The entry point into all functionality in Spark SQL is the -[SQLContext](api/scala/index.html#org.apache.spark.sql.SQLContext) class, or one of its -descendants. To create a basic SQLContext, all you need is a SparkContext. +[`SQLContext`](api/scala/index.html#org.apache.spark.sql.`SQLContext`) class, or one of its +descendants. To create a basic `SQLContext`, all you need is a SparkContext. {% highlight scala %} val sc: SparkContext // An existing SparkContext. @@ -43,8 +43,8 @@ import sqlContext.implicits._ div data-lang=java markdown=1 The entry point into all functionality in Spark SQL is the -[SQLContext](api/java/index.html#org.apache.spark.sql.SQLContext) class, or one of its -descendants. To create a basic SQLContext, all you need is a SparkContext. +[`SQLContext`](api/java/index.html#org.apache.spark.sql.SQLContext) class, or one of its +descendants. To create a basic `SQLContext`, all you need is a SparkContext. {% highlight java %} JavaSparkContext sc = ...; // An existing JavaSparkContext. @@ -56,8 +56,8 @@ SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); div data-lang=python markdown=1 The entry point into all relational functionality in Spark is the -[SQLContext](api/python/pyspark.sql.SQLContext-class.html) class, or one -of its decedents. To create a basic SQLContext, all you need is a SparkContext. +[`SQLContext`](api/python/pyspark.sql.SQLContext-class.html) class, or one +of its decedents. To create a basic `SQLContext`, all you need is a SparkContext. {% highlight python %} from pyspark.sql import SQLContext @@ -67,20 +67,20 @@ sqlContext = SQLContext(sc) /div /div -In addition to the basic SQLContext, you can also create a HiveContext, which provides a -superset of the functionality provided by the basic SQLContext. Additional features include +In addition to the basic `SQLContext`, you can also create a `HiveContext`, which provides a +superset of the functionality provided by the basic `SQLContext`. Additional features include the ability to write queries using the more complete HiveQL parser, access to Hive UDFs, and the -ability to read data from Hive tables. To use a HiveContext, you do not need to have an -existing Hive setup, and all of the data sources available to a SQLContext are still available. -HiveContext is only packaged separately to avoid including all of Hive's dependencies in the default -Spark build. If these dependencies are not a problem for your application then using HiveContext -is recommended for the 1.3 release of Spark. Future releases will focus on bringing SQLContext up -to feature parity with a HiveContext. +ability to read data from Hive tables.
spark git commit: [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide
Repository: spark Updated Branches: refs/heads/master 0af9ea74a - 69ff8e8cf [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide Also fixed a bunch of minor styling issues. !-- Reviewable:start -- [img src=https://reviewable.io/review_button.png; height=40 alt=Review on Reviewable/](https://reviewable.io/reviews/apache/spark/5001) !-- Reviewable:end -- Author: Cheng Lian l...@databricks.com Closes #5001 from liancheng/parquet-doc and squashes the following commits: 89ad3db [Cheng Lian] Addresses @rxin's comments 7eb6955 [Cheng Lian] Docs for the new Parquet data source 415eefb [Cheng Lian] Some minor formatting improvements Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/69ff8e8c Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/69ff8e8c Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/69ff8e8c Branch: refs/heads/master Commit: 69ff8e8cfbecd81fd54100c4dab332c3bc992316 Parents: 0af9ea7 Author: Cheng Lian l...@databricks.com Authored: Fri Mar 13 21:34:50 2015 +0800 Committer: Cheng Lian l...@databricks.com Committed: Fri Mar 13 21:34:50 2015 +0800 -- docs/sql-programming-guide.md | 237 - 1 file changed, 180 insertions(+), 57 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/69ff8e8c/docs/sql-programming-guide.md -- diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md index 76aa1a5..11c29e2 100644 --- a/docs/sql-programming-guide.md +++ b/docs/sql-programming-guide.md @@ -21,14 +21,14 @@ The DataFrame API is available in [Scala](api/scala/index.html#org.apache.spark. All of the examples on this page use sample data included in the Spark distribution and can be run in the `spark-shell` or the `pyspark` shell. -## Starting Point: SQLContext +## Starting Point: `SQLContext` div class=codetabs div data-lang=scala markdown=1 The entry point into all functionality in Spark SQL is the -[SQLContext](api/scala/index.html#org.apache.spark.sql.SQLContext) class, or one of its -descendants. To create a basic SQLContext, all you need is a SparkContext. +[`SQLContext`](api/scala/index.html#org.apache.spark.sql.`SQLContext`) class, or one of its +descendants. To create a basic `SQLContext`, all you need is a SparkContext. {% highlight scala %} val sc: SparkContext // An existing SparkContext. @@ -43,8 +43,8 @@ import sqlContext.implicits._ div data-lang=java markdown=1 The entry point into all functionality in Spark SQL is the -[SQLContext](api/java/index.html#org.apache.spark.sql.SQLContext) class, or one of its -descendants. To create a basic SQLContext, all you need is a SparkContext. +[`SQLContext`](api/java/index.html#org.apache.spark.sql.SQLContext) class, or one of its +descendants. To create a basic `SQLContext`, all you need is a SparkContext. {% highlight java %} JavaSparkContext sc = ...; // An existing JavaSparkContext. @@ -56,8 +56,8 @@ SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc); div data-lang=python markdown=1 The entry point into all relational functionality in Spark is the -[SQLContext](api/python/pyspark.sql.SQLContext-class.html) class, or one -of its decedents. To create a basic SQLContext, all you need is a SparkContext. +[`SQLContext`](api/python/pyspark.sql.SQLContext-class.html) class, or one +of its decedents. To create a basic `SQLContext`, all you need is a SparkContext. {% highlight python %} from pyspark.sql import SQLContext @@ -67,20 +67,20 @@ sqlContext = SQLContext(sc) /div /div -In addition to the basic SQLContext, you can also create a HiveContext, which provides a -superset of the functionality provided by the basic SQLContext. Additional features include +In addition to the basic `SQLContext`, you can also create a `HiveContext`, which provides a +superset of the functionality provided by the basic `SQLContext`. Additional features include the ability to write queries using the more complete HiveQL parser, access to Hive UDFs, and the -ability to read data from Hive tables. To use a HiveContext, you do not need to have an -existing Hive setup, and all of the data sources available to a SQLContext are still available. -HiveContext is only packaged separately to avoid including all of Hive's dependencies in the default -Spark build. If these dependencies are not a problem for your application then using HiveContext -is recommended for the 1.3 release of Spark. Future releases will focus on bringing SQLContext up -to feature parity with a HiveContext. +ability to read data from Hive tables. To use a `HiveContext`, you do not need to have an +existing Hive setup, and all of the data sources available to a
Git Push Summary
Repository: spark Updated Tags: refs/tags/v1.3.0 [created] 4aaf48d46 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org