date:20150313

spark git commit: [SPARK-6278][MLLIB] Mention the change of objective in linear regression

2015-03-13 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 dc287f38f - 214f68103


[SPARK-6278][MLLIB] Mention the change of objective in linear regression

As discussed in the RC3 vote thread, we should mention the change of objective 
in linear regression in the migration guide. srowen

Author: Xiangrui Meng m...@databricks.com

Closes #4978 from mengxr/SPARK-6278 and squashes the following commits:

fb3bbe6 [Xiangrui Meng] mention regularization parameter
bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
SPARK-6278
375fd09 [Xiangrui Meng] address Sean's comments
f87ae71 [Xiangrui Meng] mention step size change

(cherry picked from commit 7f13434a5c52b815c584ec773ab0e5df1a35ea86)
Signed-off-by: Xiangrui Meng m...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/214f6810
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/214f6810
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/214f6810

Branch: refs/heads/branch-1.3
Commit: 214f68103219317416e2278e80b8fc0fb5a616f4
Parents: dc287f3
Author: Xiangrui Meng m...@databricks.com
Authored: Fri Mar 13 10:27:28 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Fri Mar 13 10:27:34 2015 -0700

--
 docs/mllib-guide.md | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/214f6810/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 4c7a7d9..03b948c 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -107,6 +107,8 @@ In the `spark.mllib` package, there were several breaking 
changes.  The first ch
 * In `DecisionTree`, the deprecated class method `train` has been removed. 
 (The object/static `train` methods remain.)
 * In `Strategy`, the `checkpointDir` parameter has been removed.  
Checkpointing is still supported, but the checkpoint directory must be set 
before calling tree and tree ensemble training.
 * `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was 
a public API but is now private, declared `private[python]`.  This was never 
meant for external use.
+* In linear regression (including Lasso and ridge regression), the squared 
loss is now divided by 2.
+  So in order to produce the same result as in 1.2, the regularization 
parameter needs to be divided by 2 and the step size needs to be multiplied by 
2.
 
 ## Previous Spark Versions
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6252] [mllib] Added getLambda to Scala NaiveBayes

2015-03-13 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master ea3d2eed9 - dc4abd4dc


[SPARK-6252] [mllib] Added getLambda to Scala NaiveBayes

Note: not relevant for Python API since it only has a static train method

Author: Joseph K. Bradley joseph.kurata.brad...@gmail.com
Author: Joseph K. Bradley jos...@databricks.com

Closes #4969 from jkbradley/SPARK-6252 and squashes the following commits:

a471d90 [Joseph K. Bradley] small edits from review
63eff48 [Joseph K. Bradley] Added getLambda to Scala NaiveBayes


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dc4abd4d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dc4abd4d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dc4abd4d

Branch: refs/heads/master
Commit: dc4abd4dc40deacab39bfa9572b06bf0ea6daa6d
Parents: ea3d2ee
Author: Joseph K. Bradley joseph.kurata.brad...@gmail.com
Authored: Fri Mar 13 10:26:09 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Fri Mar 13 10:26:09 2015 -0700

--
 .../org/apache/spark/mllib/classification/NaiveBayes.scala   | 3 +++
 .../apache/spark/mllib/classification/NaiveBayesSuite.scala  | 8 
 2 files changed, 11 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dc4abd4d/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
index b11fd4f..2ebc7fa 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
@@ -166,6 +166,9 @@ class NaiveBayes private (private var lambda: Double) 
extends Serializable with
 this
   }
 
+  /** Get the smoothing parameter. Default: 1.0. */
+  def getLambda: Double = lambda
+
   /**
* Run the algorithm with the configured parameters on an input RDD of 
LabeledPoint entries.
*

http://git-wip-us.apache.org/repos/asf/spark/blob/dc4abd4d/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala
--
diff --git 
a/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala
 
b/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala
index 64dcc0f..5a27c7d 100644
--- 
a/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala
+++ 
b/mllib/src/test/scala/org/apache/spark/mllib/classification/NaiveBayesSuite.scala
@@ -85,6 +85,14 @@ class NaiveBayesSuite extends FunSuite with 
MLlibTestSparkContext {
 assert(numOfPredictions  input.length / 5)
   }
 
+  test(get, set params) {
+val nb = new NaiveBayes()
+nb.setLambda(2.0)
+assert(nb.getLambda === 2.0)
+nb.setLambda(3.0)
+assert(nb.getLambda === 3.0)
+  }
+
   test(Naive Bayes) {
 val nPoints = 1
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: SPARK-4044 [CORE] Thriftserver fails to start when JAVA_HOME points to JRE instead of JDK

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 a3493eb77 - 4aa41327d


SPARK-4044 [CORE] Thriftserver fails to start when JAVA_HOME points to JRE 
instead of JDK

Don't use JAR_CMD unless present in archive check. Add datanucleus always if 
present, to avoid needing a check involving JAR_CMD.

Follow up to https://github.com/apache/spark/pull/4873 for branch 1.3.

Author: Sean Owen so...@cloudera.com

Closes #4981 from srowen/SPARK-4044.2 and squashes the following commits:

3aafc76 [Sean Owen] Don't use JAR_CMD unless present in archive check. Add 
datanucleus always if present, to avoid needing a check involving JAR_CMD


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4aa41327
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4aa41327
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4aa41327

Branch: refs/heads/branch-1.3
Commit: 4aa41327d164ed5b2830cb18eb47b93ebd27401b
Parents: a3493eb
Author: Sean Owen so...@cloudera.com
Authored: Fri Mar 13 17:59:31 2015 +
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 17:59:31 2015 +

--
 bin/compute-classpath.sh | 25 -
 1 file changed, 12 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4aa41327/bin/compute-classpath.sh
--
diff --git a/bin/compute-classpath.sh b/bin/compute-classpath.sh
index f4f6b7b..f28 100755
--- a/bin/compute-classpath.sh
+++ b/bin/compute-classpath.sh
@@ -93,14 +93,17 @@ if [ $num_jars -gt 1 ]; then
   exit 1
 fi
 
-# Verify that versions of java used to build the jars and run Spark are 
compatible
-jar_error_check=$($JAR_CMD -tf $ASSEMBLY_JAR nonexistent/class/path 21)
-if [[ $jar_error_check =~ invalid CEN header ]]; then
-  echo Loading Spark jar with '$JAR_CMD' failed.  12
-  echo This is likely because Spark was compiled with Java 7 and run  12
-  echo with Java 6. (see SPARK-1703). Please use Java 7 to run Spark  12
-  echo or build Spark with Java 6. 12
-  exit 1
+# Only able to make this check if 'jar' command is available
+if [ $(command -v $JAR_CMD) ] ; then
+  # Verify that versions of java used to build the jars and run Spark are 
compatible
+  jar_error_check=$($JAR_CMD -tf $ASSEMBLY_JAR nonexistent/class/path 21)
+  if [[ $jar_error_check =~ invalid CEN header ]]; then
+echo Loading Spark jar with '$JAR_CMD' failed.  12
+echo This is likely because Spark was compiled with Java 7 and run  12
+echo with Java 6. (see SPARK-1703). Please use Java 7 to run Spark  12
+echo or build Spark with Java 6. 12
+exit 1
+  fi
 fi
 
 CLASSPATH=$CLASSPATH:$ASSEMBLY_JAR
@@ -121,11 +124,7 @@ datanucleus_jars=$(find $datanucleus_dir 2/dev/null | 
grep datanucleus-.*\\
 datanucleus_jars=$(echo $datanucleus_jars | tr \n : | sed s/:$//g)
 
 if [ -n $datanucleus_jars ]; then
-  hive_files=$($JAR_CMD -tf $ASSEMBLY_JAR org/apache/hadoop/hive/ql/exec 
2/dev/null)
-  if [ -n $hive_files ]; then
-echo Spark assembly has been built with Hive, including Datanucleus jars 
on classpath 12
-CLASSPATH=$CLASSPATH:$datanucleus_jars
-  fi
+  CLASSPATH=$CLASSPATH:$datanucleus_jars
 fi
 
 # Add test classes if we're running from SBT or Maven with SPARK_TESTING set 
to 1


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6036][CORE] avoid race condition between eventlogListener and akka actor system

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 4aa41327d - f81611dca


[SPARK-6036][CORE] avoid race condition between eventlogListener and akka actor 
system

For detail description, pls refer to 
[SPARK-6036](https://issues.apache.org/jira/browse/SPARK-6036).

Author: Zhang, Liye liye.zh...@intel.com

Closes #4785 from liyezhang556520/EventLogInProcess and squashes the following 
commits:

8b0b0a6 [Zhang, Liye] stop listener after DAGScheduler
79b15b3 [Zhang, Liye] SPARK-6036 avoid race condition between eventlogListener 
and akka actor system


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f81611dc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f81611dc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f81611dc

Branch: refs/heads/branch-1.3
Commit: f81611dca7ce97ebd26262086ac0e2b5e5f997e5
Parents: 4aa4132
Author: Zhang, Liye liye.zh...@intel.com
Authored: Thu Feb 26 23:11:43 2015 -0800
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 18:06:17 2015 +

--
 core/src/main/scala/org/apache/spark/SparkContext.scala | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f81611dc/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index 05c3210..f80338e 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -1375,17 +1375,17 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
 stopped = true
 env.metricsSystem.report()
 metadataCleaner.cancel()
-env.actorSystem.stop(heartbeatReceiver)
 cleaner.foreach(_.stop())
 dagScheduler.stop()
 dagScheduler = null
+listenerBus.stop()
+eventLogger.foreach(_.stop())
+env.actorSystem.stop(heartbeatReceiver)
 progressBar.foreach(_.stop())
 taskScheduler = null
 // TODO: Cache.stop()?
 env.stop()
 SparkEnv.set(null)
-listenerBus.stop()
-eventLogger.foreach(_.stop())
 logInfo(Successfully stopped SparkContext)
 SparkContext.clearActiveContext()
   } else {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large enough

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 f81611dca - 9846790f4


[SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large 
enough

A simple try-catch wrapping KryoException to be more informative.

Author: Lev Khomich levkhom...@gmail.com

Closes #4947 from levkhomich/master and squashes the following commits:

0f7a947 [Lev Khomich] [SPARK-6087][CORE] Provide actionable exception if Kryo 
buffer is not large enough


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9846790f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9846790f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9846790f

Branch: refs/heads/branch-1.3
Commit: 9846790f49e2716e0b0c15f58e8547a1f04ba3ae
Parents: f81611d
Author: Lev Khomich levkhom...@gmail.com
Authored: Tue Mar 10 10:55:42 2015 +
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 18:17:02 2015 +

--
 .../org/apache/spark/serializer/KryoSerializer.scala  |  8 +++-
 .../apache/spark/serializer/KryoSerializerSuite.scala | 14 ++
 2 files changed, 21 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9846790f/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala 
b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
index 9ce64d4..dc7aa99 100644
--- a/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
+++ b/core/src/main/scala/org/apache/spark/serializer/KryoSerializer.scala
@@ -158,7 +158,13 @@ private[spark] class KryoSerializerInstance(ks: 
KryoSerializer) extends Serializ
 
   override def serialize[T: ClassTag](t: T): ByteBuffer = {
 output.clear()
-kryo.writeClassAndObject(output, t)
+try {
+  kryo.writeClassAndObject(output, t)
+} catch {
+  case e: KryoException if e.getMessage.startsWith(Buffer overflow) =
+throw new SparkException(sKryo serialization failed: ${e.getMessage}. 
To avoid this,  +
+  increase spark.kryoserializer.buffer.max.mb value.)
+}
 ByteBuffer.wrap(output.toBytes)
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9846790f/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
--
diff --git 
a/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala 
b/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
index 523d898..6198df8 100644
--- a/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
+++ b/core/src/test/scala/org/apache/spark/serializer/KryoSerializerSuite.scala
@@ -261,6 +261,20 @@ class KryoSerializerSuite extends FunSuite with 
SharedSparkContext {
   ser.serialize(HighlyCompressedMapStatus(BlockManagerId(exec-1, host, 
1234), blockSizes))
 }
   }
+
+  test(serialization buffer overflow reporting) {
+import org.apache.spark.SparkException
+val kryoBufferMaxProperty = spark.kryoserializer.buffer.max.mb
+
+val largeObject = (1 to 100).toArray
+
+val conf = new SparkConf(false)
+conf.set(kryoBufferMaxProperty, 1)
+
+val ser = new KryoSerializer(conf).newInstance()
+val thrown = intercept[SparkException](ser.serialize(largeObject))
+assert(thrown.getMessage.contains(kryoBufferMaxProperty))
+  }
 }
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 7f13434a5 - b943f5d90


[SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work

Turns out, per the [convo on the 
JIRA](https://issues.apache.org/jira/browse/SPARK-4600), `diff` is acting 
exactly as should. It became a large misconception as I thought it meant set 
difference, when in fact it does not. To that extent I merely updated the 
`diff` documentation to, hopefully, better reflect its true intentions moving 
forward.

Author: Brennon York brennon.y...@capitalone.com

Closes #5015 from brennonyork/SPARK-4600 and squashes the following commits:

1e1d1e5 [Brennon York] reverted internal diff docs
92288f7 [Brennon York] reverted both the test suite and the diff function back 
to its origin functionality
f428623 [Brennon York] updated diff documentation to better represent its 
function
cc16d65 [Brennon York] Merge remote-tracking branch 'upstream/master' into 
SPARK-4600
66818b9 [Brennon York] added small secondary diff test
99ad412 [Brennon York] Merge remote-tracking branch 'upstream/master' into 
SPARK-4600
74b8c95 [Brennon York] corrected  method by leveraging bitmask operations to 
correctly return only the portions of  that are different from the calling 
VertexRDD
9717120 [Brennon York] updated diff impl to cause fewer objects to be created
710a21c [Brennon York] working diff given test case
aa57f83 [Brennon York] updated to set ShortestPaths to run 'forward' rather 
than 'backward'


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b943f5d9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b943f5d9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b943f5d9

Branch: refs/heads/master
Commit: b943f5d907df0607ecffb729f2bccfa436438d7e
Parents: 7f13434
Author: Brennon York brennon.y...@capitalone.com
Authored: Fri Mar 13 18:48:31 2015 +
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 18:48:31 2015 +

--
 graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b943f5d9/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
--
diff --git a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
index 09ae3f9..40ecff7 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/VertexRDD.scala
@@ -122,8 +122,11 @@ abstract class VertexRDD[VD](
   def mapValues[VD2: ClassTag](f: (VertexId, VD) = VD2): VertexRDD[VD2]
 
   /**
-   * Hides vertices that are the same between `this` and `other`; for vertices 
that are different,
-   * keeps the values from `other`.
+   * For each vertex present in both `this` and `other`, `diff` returns only 
those vertices with
+   * differing values; for values that are different, keeps the values from 
`other`. This is
+   * only guaranteed to work if the VertexRDDs share a common ancestor.
+   *
+   * @param other the other VertexRDD with which to diff against.
*/
   def diff(other: VertexRDD[VD]): VertexRDD[VD]
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6133] Make sc.stop() idempotent

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 338bea7b3 - a08588c7e


[SPARK-6133] Make sc.stop() idempotent

Before we would get the following (benign) error if we called `sc.stop()` 
twice. This is because the listener bus would try to post the end event again 
even after it has already stopped. This happens occasionally when flaky tests 
fail, usually as a result of other sources of error. Either way we shouldn't be 
logging this error when it is not the cause of the failure.
```
ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event 
SparkListenerApplicationEnd(1425348445682)
```

Author: Andrew Or and...@databricks.com

Closes #4871 from andrewor14/sc-stop and squashes the following commits:

a14afc5 [Andrew Or] Move code after code
915db16 [Andrew Or] Move code into code


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a08588c7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a08588c7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a08588c7

Branch: refs/heads/branch-1.3
Commit: a08588c7eeaecf7003073c092320b37abd166191
Parents: 338bea7
Author: Andrew Or and...@databricks.com
Authored: Tue Mar 3 15:09:57 2015 -0800
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 18:33:27 2015 +

--
 core/src/main/scala/org/apache/spark/SparkContext.scala | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a08588c7/core/src/main/scala/org/apache/spark/SparkContext.scala
--
diff --git a/core/src/main/scala/org/apache/spark/SparkContext.scala 
b/core/src/main/scala/org/apache/spark/SparkContext.scala
index f80338e..023d54e 100644
--- a/core/src/main/scala/org/apache/spark/SparkContext.scala
+++ b/core/src/main/scala/org/apache/spark/SparkContext.scala
@@ -1369,10 +1369,10 @@ class SparkContext(config: SparkConf) extends Logging 
with ExecutorAllocationCli
   /** Shut down the SparkContext. */
   def stop() {
 SparkContext.SPARK_CONTEXT_CONSTRUCTOR_LOCK.synchronized {
-  postApplicationEnd()
-  ui.foreach(_.stop())
   if (!stopped) {
 stopped = true
+postApplicationEnd()
+ui.foreach(_.stop())
 env.metricsSystem.report()
 metadataCleaner.cancel()
 cleaner.foreach(_.stop())


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r1666516 - in /spark: releases/_posts/2015-03-13-spark-release-1-3-0.md site/releases/spark-release-1-3-0.html

2015-03-13 Thread pwendell

Author: pwendell
Date: Fri Mar 13 17:15:41 2015
New Revision: 1666516

URL: http://svn.apache.org/r1666516
Log:
Incorrect link in Kafka API in 1.3 release notes

Modified:
spark/releases/_posts/2015-03-13-spark-release-1-3-0.md
spark/site/releases/spark-release-1-3-0.html

Modified: spark/releases/_posts/2015-03-13-spark-release-1-3-0.md
URL: 
http://svn.apache.org/viewvc/spark/releases/_posts/2015-03-13-spark-release-1-3-0.md?rev=1666516r1=1666515r2=1666516view=diff
==
--- spark/releases/_posts/2015-03-13-spark-release-1-3-0.md (original)
+++ spark/releases/_posts/2015-03-13-spark-release-1-3-0.md Fri Mar 13 17:15:41 
2015
@@ -28,7 +28,7 @@ In this release Spark SQL [graduates fro
 In this release Spark MLlib introduces several new algorithms: latent 
Dirichlet allocation (LDA) for [topic 
modeling](https://issues.apache.org/jira/browse/SPARK-1405), [multinomial 
logistic regression](https://issues.apache.org/jira/browse/SPARK-2309) for 
multiclass classification, [Gaussian mixture model 
(GMM)](https://issues.apache.org/jira/browse/SPARK-5012) and [power iteration 
clustering](https://issues.apache.org/jira/browse/SPARK-4259) for clustering, 
[FP-growth](https://issues.apache.org/jira/browse/SPARK-4001) for frequent 
pattern mining, and [block matrix 
abstraction](https://issues.apache.org/jira/browse/SPARK-4409) for distributed 
linear algebra. Initial support has been added for [model 
import/export](https://issues.apache.org/jira/browse/SPARK-4587) in 
exchangeable format, which will be expanded in future versions to cover more 
model types in Java/Python/Scala. The implementations of k-means and ALS 
receive [updates](https://issues.apache.org/jira/browse/SPARK-3424, h
 ttps://issues.apache.org/jira/browse/SPARK-3541) that lead to significant 
performance gain. PySpark now supports the [ML pipeline 
API](https://issues.apache.org/jira/browse/SPARK-4586) added in Spark 1.2, and 
[gradient boosted trees](https://issues.apache.org/jira/browse/SPARK-5094) and 
[Gaussian mixture model](https://issues.apache.org/jira/browse/SPARK-5012). 
Finally, the ML pipeline API has been ported to support the new DataFrames 
abstraction.
 
 ### Spark Streaming
-Spark 1.3 introduces a new [*direct* Kafka 
API](https://issues.apache.org/jira/browse/SPARK-6946) 
([docs](http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html)) 
which enables exactly-once delivery without the use of write ahead logs. It 
also adds a [Python Kafka 
API](https://issues.apache.org/jira/browse/SPARK-5047) along with 
infrastructure for additional Python APIâs in future releases. An online 
version of [logistic 
regression](https://issues.apache.org/jira/browse/SPARK-4979) and the ability 
to read [binary records](https://issues.apache.org/jira/browse/SPARK-4969) have 
also been added. For stateful operations, support has been added for loading of 
an [initial state RDD](https://issues.apache.org/jira/browse/SPARK-3660). 
Finally, the streaming programming guide has been updated to include 
information about SQL and DataFrame operations within streaming applications, 
and important clarifications to the fault-tolerance semantics. 
+Spark 1.3 introduces a new [*direct* Kafka 
API](https://issues.apache.org/jira/browse/SPARK-4964) 
([docs](http://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html)) 
which enables exactly-once delivery without the use of write ahead logs. It 
also adds a [Python Kafka 
API](https://issues.apache.org/jira/browse/SPARK-5047) along with 
infrastructure for additional Python APIâs in future releases. An online 
version of [logistic 
regression](https://issues.apache.org/jira/browse/SPARK-4979) and the ability 
to read [binary records](https://issues.apache.org/jira/browse/SPARK-4969) have 
also been added. For stateful operations, support has been added for loading of 
an [initial state RDD](https://issues.apache.org/jira/browse/SPARK-3660). 
Finally, the streaming programming guide has been updated to include 
information about SQL and DataFrame operations within streaming applications, 
and important clarifications to the fault-tolerance semantics. 
 
 ### GraphX
 GraphX adds a handful of utility functions in this release, including 
conversion into a [canonical edge 
graph](https://issues.apache.org/jira/browse/SPARK-4917).

Modified: spark/site/releases/spark-release-1-3-0.html
URL: 
http://svn.apache.org/viewvc/spark/site/releases/spark-release-1-3-0.html?rev=1666516r1=1666515r2=1666516view=diff
==
--- spark/site/releases/spark-release-1-3-0.html (original)
+++ spark/site/releases/spark-release-1-3-0.html Fri Mar 13 17:15:41 2015
@@ -187,7 +187,7 @@
 pIn this release Spark MLlib introduces several new algorithms: latent 
Dirichlet allocation (LDA) for a 
href=https://issues.apache.org/jira/browse/SPARK-1405;topic modeling/a, a

spark git commit: SPARK-4300 [CORE] Race condition during SparkWorker shutdown

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 170af49bb - a3493eb77


SPARK-4300 [CORE] Race condition during SparkWorker shutdown

Close appender saving stdout/stderr before destroying process to avoid 
exception on reading closed input stream.
(This also removes a redundant `waitFor()` although it was harmless)

CC tdas since I think you wrote this method.

Author: Sean Owen so...@cloudera.com

Closes #4787 from srowen/SPARK-4300 and squashes the following commits:

e0cdabf [Sean Owen] Close appender saving stdout/stderr before destroying 
process to avoid exception on reading closed input stream


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a3493eb7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a3493eb7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a3493eb7

Branch: refs/heads/branch-1.3
Commit: a3493eb77a0aa7d3048e657459ebaa22e98ccf0c
Parents: 170af49
Author: Sean Owen so...@cloudera.com
Authored: Thu Feb 26 14:08:56 2015 -0800
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 17:54:31 2015 +

--
 .../scala/org/apache/spark/deploy/worker/ExecutorRunner.scala | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a3493eb7/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala 
b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
index 2ec10f8..a3ec803 100644
--- a/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/worker/ExecutorRunner.scala
@@ -86,14 +86,13 @@ private[spark] class ExecutorRunner(
 var exitCode: Option[Int] = None
 if (process != null) {
   logInfo(Killing process!)
-  process.destroy()
-  process.waitFor()
   if (stdoutAppender != null) {
 stdoutAppender.stop()
   }
   if (stderrAppender != null) {
 stderrAppender.stop()
   }
+  process.destroy()
   exitCode = Some(process.waitFor())
 }
 worker ! ExecutorStateChanged(appId, execId, state, message, exitCode)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6275][Documentation]Miss toDF() function in docs/sql-programming-guide.md

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 a08588c7e - 301278126


[SPARK-6275][Documentation]Miss toDF() function in docs/sql-programming-guide.md

Miss `toDF()` function in docs/sql-programming-guide.md

Author: zzcclp xm_...@sina.com

Closes #4977 from zzcclp/SPARK-6275 and squashes the following commits:

9a96c7b [zzcclp] Miss toDF()


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30127812
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/30127812
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/30127812

Branch: refs/heads/branch-1.3
Commit: 30127812629a53a1b45c4d90b70c5cc55dd28fb6
Parents: a08588c
Author: zzcclp xm_...@sina.com
Authored: Thu Mar 12 15:07:15 2015 +
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 18:44:52 2015 +

--
 docs/sql-programming-guide.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/30127812/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index b1e309c..11c29e2 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -358,7 +358,7 @@ import sqlContext.implicits._
 case class Person(name: String, age: Int)
 
 // Create an RDD of Person objects and register it as a table.
-val people = 
sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p 
= Person(p(0), p(1).trim.toInt))
+val people = 
sc.textFile(examples/src/main/resources/people.txt).map(_.split(,)).map(p 
= Person(p(0), p(1).trim.toInt)).toDF()
 people.registerTempTable(people)
 
 // SQL statements can be run by using the sql methods provided by sqlContext.


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r1666542 - in /spark: documentation.md site/documentation.html

2015-03-13 Thread pwendell

Author: pwendell
Date: Fri Mar 13 18:49:38 2015
New Revision: 1666542

URL: http://svn.apache.org/r1666542
Log:
Fixing latest doc link (some how I reverted changes)

Modified:
spark/documentation.md
spark/site/documentation.html

Modified: spark/documentation.md
URL: 
http://svn.apache.org/viewvc/spark/documentation.md?rev=1666542r1=1666541r2=1666542view=diff
==
--- spark/documentation.md (original)
+++ spark/documentation.md Fri Mar 13 18:49:38 2015
@@ -12,7 +12,8 @@ navigation:
 pSetup instructions, programming guides, and other documentation are 
available for each version of Spark below:/p
 
 ul
-  lia href={{site.url}}docs/latest/Spark 1.2.1 (latest release)/a/li
+  lia href={{site.url}}docs/latest/Spark 1.3.0 (latest release)/a/li
+  lia href={{site.url}}docs/1.2.1/Spark 1.2.1/a/li
   lia href={{site.url}}docs/1.1.1/Spark 1.1.1/a/li
   lia href={{site.url}}docs/1.0.2/Spark 1.0.2/a/li
   lia href={{site.url}}docs/0.9.2/Spark 0.9.2/a/li

Modified: spark/site/documentation.html
URL: 
http://svn.apache.org/viewvc/spark/site/documentation.html?rev=1666542r1=1666541r2=1666542view=diff
==
--- spark/site/documentation.html (original)
+++ spark/site/documentation.html Fri Mar 13 18:49:38 2015
@@ -172,7 +172,8 @@
 pSetup instructions, programming guides, and other documentation are 
available for each version of Spark below:/p
 
 ul
-  lia href=/docs/latest/Spark 1.2.1 (latest release)/a/li
+  lia href=/docs/latest/Spark 1.3.0 (latest release)/a/li
+  lia href=/docs/1.2.1/Spark 1.2.1/a/li
   lia href=/docs/1.1.1/Spark 1.1.1/a/li
   lia href=/docs/1.0.2/Spark 1.0.2/a/li
   lia href=/docs/0.9.2/Spark 0.9.2/a/li



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()

2015-03-13 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 dbee7e16c - 170af49bb


[SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()

Because circular reference between JavaObject and JavaMember, an Java object 
can not be released until Python GC kick in, then it will cause memory leak in 
collect(), which may consume lots of memory in JVM.

This PR change the way we sending collected data back into Python from local 
file to socket, which could avoid any disk IO during collect, also avoid any 
referrers of Java object in Python.

cc JoshRosen

Author: Davies Liu dav...@databricks.com

Closes #4923 from davies/fix_collect and squashes the following commits:

d730286 [Davies Liu] address comments
24c92a4 [Davies Liu] fix style
ba54614 [Davies Liu] use socket to transfer data from JVM
9517c8f [Davies Liu] fix memory leak in collect()

(cherry picked from commit 8767565cef01d847f57b7293d8b63b2422009b90)
Signed-off-by: Josh Rosen joshro...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/170af49b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/170af49b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/170af49b

Branch: refs/heads/branch-1.3
Commit: 170af49bb0b183b2f4cb3ebbb3e9ab5327f907c9
Parents: dbee7e1
Author: Davies Liu dav...@databricks.com
Authored: Mon Mar 9 16:24:06 2015 -0700
Committer: Josh Rosen joshro...@databricks.com
Committed: Fri Mar 13 10:44:10 2015 -0700

--
 .../org/apache/spark/api/python/PythonRDD.scala | 76 +++-
 python/pyspark/context.py   | 13 ++--
 python/pyspark/rdd.py   | 30 
 python/pyspark/sql/dataframe.py | 14 +---
 4 files changed, 82 insertions(+), 51 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/170af49b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
index fd89669..4c71b69 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
@@ -19,26 +19,27 @@ package org.apache.spark.api.python
 
 import java.io._
 import java.net._
-import java.util.{List = JList, ArrayList = JArrayList, Map = JMap, UUID, 
Collections}
-
-import org.apache.spark.input.PortableDataStream
+import java.util.{Collections, ArrayList = JArrayList, List = JList, Map = 
JMap}
 
 import scala.collection.JavaConversions._
 import scala.collection.mutable
 import scala.language.existentials
 
 import com.google.common.base.Charsets.UTF_8
-
 import org.apache.hadoop.conf.Configuration
 import org.apache.hadoop.io.compress.CompressionCodec
-import org.apache.hadoop.mapred.{InputFormat, OutputFormat, JobConf}
+import org.apache.hadoop.mapred.{InputFormat, JobConf, OutputFormat}
 import org.apache.hadoop.mapreduce.{InputFormat = NewInputFormat, 
OutputFormat = NewOutputFormat}
+
 import org.apache.spark._
-import org.apache.spark.api.java.{JavaSparkContext, JavaPairRDD, JavaRDD}
+import org.apache.spark.api.java.{JavaPairRDD, JavaRDD, JavaSparkContext}
 import org.apache.spark.broadcast.Broadcast
+import org.apache.spark.input.PortableDataStream
 import org.apache.spark.rdd.RDD
 import org.apache.spark.util.Utils
 
+import scala.util.control.NonFatal
+
 private[spark] class PythonRDD(
 @transient parent: RDD[_],
 command: Array[Byte],
@@ -344,21 +345,33 @@ private[spark] object PythonRDD extends Logging {
   /**
* Adapter for calling SparkContext#runJob from Python.
*
-   * This method will return an iterator of an array that contains all 
elements in the RDD
+   * This method will serve an iterator of an array that contains all elements 
in the RDD
* (effectively a collect()), but allows you to run on a certain subset of 
partitions,
* or to enable local execution.
+   *
+   * @return the port number of a local socket which serves the data collected 
from this job.
*/
   def runJob(
   sc: SparkContext,
   rdd: JavaRDD[Array[Byte]],
   partitions: JArrayList[Int],
-  allowLocal: Boolean): Iterator[Array[Byte]] = {
+  allowLocal: Boolean): Int = {
 type ByteArray = Array[Byte]
 type UnrolledPartition = Array[ByteArray]
 val allPartitions: Array[UnrolledPartition] =
   sc.runJob(rdd, (x: Iterator[ByteArray]) = x.toArray, partitions, 
allowLocal)
 val flattenedPartition: UnrolledPartition = Array.concat(allPartitions: _*)
-flattenedPartition.iterator
+serveIterator(flattenedPartition.iterator,
+  sserve RDD ${rdd.id} with partitions ${partitions.mkString(,)})
+  }
+
+  /**

spark git commit: SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush output

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 214f68103 - dbee7e16c


SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush output

Join on output threads to make sure any lingering output from process reaches 
stdout, stderr before exiting

CC andrewor14 since I believe he created this section of code

Author: Sean Owen so...@cloudera.com

Closes #4788 from srowen/SPARK-4704 and squashes the following commits:

ad7114e [Sean Owen] Join on output threads to make sure any lingering output 
from process reaches stdout, stderr before exiting


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dbee7e16
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dbee7e16
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dbee7e16

Branch: refs/heads/branch-1.3
Commit: dbee7e16c7434326cce6f6d5ab494093c60ee097
Parents: 214f681
Author: Sean Owen so...@cloudera.com
Authored: Thu Feb 26 12:56:54 2015 -0800
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 17:43:05 2015 +

--
 .../org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala  | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dbee7e16/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
index 2eab998..311048c 100644
--- 
a/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
+++ 
b/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
@@ -17,8 +17,6 @@
 
 package org.apache.spark.deploy
 
-import java.io.File
-
 import scala.collection.JavaConversions._
 
 import org.apache.spark.util.{RedirectThread, Utils}
@@ -164,6 +162,8 @@ private[spark] object SparkSubmitDriverBootstrapper {
   }
 }
 val returnCode = process.waitFor()
+stdoutThread.join()
+stderrThread.join()
 sys.exit(returnCode)
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark git commit: [SPARK-6132] ContextCleaner race condition across SparkContexts

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 9846790f4 - 338bea7b3


[SPARK-6132] ContextCleaner race condition across SparkContexts

The problem is that `ContextCleaner` may clean variables that belong to a 
different `SparkContext`. This can happen if the `SparkContext` to which the 
cleaner belongs stops, and a new one is started immediately afterwards in the 
same JVM. In this case, if the cleaner is in the middle of cleaning a 
broadcast, for instance, it will do so through `SparkEnv.get.blockManager`, 
which could be one that belongs to a different `SparkContext`.

JoshRosen and I suspect that this is the cause of many flaky tests, most 
notably the `JavaAPISuite`. We were able to reproduce the failure locally 
(though it is not deterministic and very hard to reproduce).

Author: Andrew Or and...@databricks.com

Closes #4869 from andrewor14/cleaner-masquerade and squashes the following 
commits:

29168c0 [Andrew Or] Synchronize ContextCleaner stop


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3cdc8a35
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3cdc8a35
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3cdc8a35

Branch: refs/heads/branch-1.3
Commit: 3cdc8a35a7b9bbdf418988d0fe4524d413dce23c
Parents: 9846790
Author: Andrew Or and...@databricks.com
Authored: Tue Mar 3 13:44:05 2015 -0800
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 18:20:50 2015 +

--
 .../scala/org/apache/spark/ContextCleaner.scala | 35 ++--
 1 file changed, 24 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3cdc8a35/core/src/main/scala/org/apache/spark/ContextCleaner.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ContextCleaner.scala 
b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
index ede1e23..201e5ec 100644
--- a/core/src/main/scala/org/apache/spark/ContextCleaner.scala
+++ b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
@@ -104,9 +104,19 @@ private[spark] class ContextCleaner(sc: SparkContext) 
extends Logging {
 cleaningThread.start()
   }
 
-  /** Stop the cleaner. */
+  /**
+   * Stop the cleaning thread and wait until the thread has finished running 
its current task.
+   */
   def stop() {
 stopped = true
+// Interrupt the cleaning thread, but wait until the current task has 
finished before
+// doing so. This guards against the race condition where a cleaning 
thread may
+// potentially clean similarly named variables created by a different 
SparkContext,
+// resulting in otherwise inexplicable block-not-found exceptions 
(SPARK-6132).
+synchronized {
+  cleaningThread.interrupt()
+}
+cleaningThread.join()
   }
 
   /** Register a RDD for cleanup when it is garbage collected. */
@@ -135,16 +145,19 @@ private[spark] class ContextCleaner(sc: SparkContext) 
extends Logging {
   try {
 val reference = 
Option(referenceQueue.remove(ContextCleaner.REF_QUEUE_POLL_TIMEOUT))
   .map(_.asInstanceOf[CleanupTaskWeakReference])
-reference.map(_.task).foreach { task =
-  logDebug(Got cleaning task  + task)
-  referenceBuffer -= reference.get
-  task match {
-case CleanRDD(rddId) =
-  doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
-case CleanShuffle(shuffleId) =
-  doCleanupShuffle(shuffleId, blocking = 
blockOnShuffleCleanupTasks)
-case CleanBroadcast(broadcastId) =
-  doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
+// Synchronize here to avoid being interrupted on stop()
+synchronized {
+  reference.map(_.task).foreach { task =
+logDebug(Got cleaning task  + task)
+referenceBuffer -= reference.get
+task match {
+  case CleanRDD(rddId) =
+doCleanupRDD(rddId, blocking = blockOnCleanupTasks)
+  case CleanShuffle(shuffleId) =
+doCleanupShuffle(shuffleId, blocking = 
blockOnShuffleCleanupTasks)
+  case CleanBroadcast(broadcastId) =
+doCleanupBroadcast(broadcastId, blocking = blockOnCleanupTasks)
+}
   }
 }
   } catch {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r1666540 - in /spark: news/_posts/2015-03-13-spark-1-3-0-released.md site/documentation.html site/news/index.html site/news/spark-1-3-0-released.html

2015-03-13 Thread pwendell

Author: pwendell
Date: Fri Mar 13 18:47:57 2015
New Revision: 1666540

URL: http://svn.apache.org/r1666540
Log:
Fixing old link in release news item

Modified:
spark/news/_posts/2015-03-13-spark-1-3-0-released.md
spark/site/documentation.html
spark/site/news/index.html
spark/site/news/spark-1-3-0-released.html

Modified: spark/news/_posts/2015-03-13-spark-1-3-0-released.md
URL: 
http://svn.apache.org/viewvc/spark/news/_posts/2015-03-13-spark-1-3-0-released.md?rev=1666540r1=1666539r2=1666540view=diff
==
--- spark/news/_posts/2015-03-13-spark-1-3-0-released.md (original)
+++ spark/news/_posts/2015-03-13-spark-1-3-0-released.md Fri Mar 13 18:47:57 
2015
@@ -11,6 +11,6 @@ meta:
   _edit_last: '4'
   _wpas_done_all: '1'
 ---
-We are happy to announce the availability of a 
href={{site.url}}releases/spark-release-1-2-0.html title=Spark Release 
1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 
1.X line. It is Spark's largest release ever, with contributions from 174 
developers and more than 1,000 commits!
+We are happy to announce the availability of a 
href={{site.url}}releases/spark-release-1-3-0.html title=Spark Release 
1.3.0Spark 1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 
1.X line. It is Spark's largest release ever, with contributions from 174 
developers and more than 1,000 commits!
 
 Visit the a href={{site.url}}releases/spark-release-1-3-0.html title=Spark 
Release 1.3.0release notes/a to read about the new features, or a 
href={{site.url}}downloads.htmldownload/a the release today.

Modified: spark/site/documentation.html
URL: 
http://svn.apache.org/viewvc/spark/site/documentation.html?rev=1666540r1=1666539r2=1666540view=diff
==
--- spark/site/documentation.html (original)
+++ spark/site/documentation.html Fri Mar 13 18:47:57 2015
@@ -172,8 +172,7 @@
 pSetup instructions, programming guides, and other documentation are 
available for each version of Spark below:/p
 
 ul
-  lia href=/docs/latest/Spark 1.3.0 (latest release)/a/li
-  lia href=/docs/1.2.1/Spark 1.2.1/a/li
+  lia href=/docs/latest/Spark 1.2.1 (latest release)/a/li
   lia href=/docs/1.1.1/Spark 1.1.1/a/li
   lia href=/docs/1.0.2/Spark 1.0.2/a/li
   lia href=/docs/0.9.2/Spark 0.9.2/a/li

Modified: spark/site/news/index.html
URL: 
http://svn.apache.org/viewvc/spark/site/news/index.html?rev=1666540r1=1666539r2=1666540view=diff
==
--- spark/site/news/index.html (original)
+++ spark/site/news/index.html Fri Mar 13 18:47:57 2015
@@ -174,7 +174,7 @@
   h3 class=entry-titlea href=/news/spark-1-3-0-released.htmlSpark 
1.3.0 released/a/h3
   div class=entry-dateMarch 13, 2015/div
 /header
-div class=entry-contentpWe are happy to announce the availability of 
a href=/releases/spark-release-1-2-0.html title=Spark Release 1.3.0Spark 
1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It 
is Spark#8217;s largest release ever, with contributions from 174 developers 
and more than 1,000 commits!/p
+div class=entry-contentpWe are happy to announce the availability of 
a href=/releases/spark-release-1-3-0.html title=Spark Release 1.3.0Spark 
1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It 
is Spark#8217;s largest release ever, with contributions from 174 developers 
and more than 1,000 commits!/p
 
 /div
   /article

Modified: spark/site/news/spark-1-3-0-released.html
URL: 
http://svn.apache.org/viewvc/spark/site/news/spark-1-3-0-released.html?rev=1666540r1=1666539r2=1666540view=diff
==
--- spark/site/news/spark-1-3-0-released.html (original)
+++ spark/site/news/spark-1-3-0-released.html Fri Mar 13 18:47:57 2015
@@ -170,7 +170,7 @@
 h2Spark 1.3.0 released/h2
 
 
-pWe are happy to announce the availability of a 
href=/releases/spark-release-1-2-0.html title=Spark Release 1.3.0Spark 
1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It 
is Spark#8217;s largest release ever, with contributions from 174 developers 
and more than 1,000 commits!/p
+pWe are happy to announce the availability of a 
href=/releases/spark-release-1-3-0.html title=Spark Release 1.3.0Spark 
1.3.0/a! Spark 1.3.0 is the third release on the API-compatible 1.X line. It 
is Spark#8217;s largest release ever, with contributions from 174 developers 
and more than 1,000 commits!/p
 
 pVisit the a href=/releases/spark-release-1-3-0.html title=Spark Release 
1.3.0release notes/a to read about the new features, or a 
href=/downloads.htmldownload/a the release today./p
 



-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail:

spark git commit: [SPARK-6278][MLLIB] Mention the change of objective in linear regression

2015-03-13 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master dc4abd4dc - 7f13434a5


[SPARK-6278][MLLIB] Mention the change of objective in linear regression

As discussed in the RC3 vote thread, we should mention the change of objective 
in linear regression in the migration guide. srowen

Author: Xiangrui Meng m...@databricks.com

Closes #4978 from mengxr/SPARK-6278 and squashes the following commits:

fb3bbe6 [Xiangrui Meng] mention regularization parameter
bfd6cff [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
SPARK-6278
375fd09 [Xiangrui Meng] address Sean's comments
f87ae71 [Xiangrui Meng] mention step size change


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7f13434a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7f13434a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7f13434a

Branch: refs/heads/master
Commit: 7f13434a5c52b815c584ec773ab0e5df1a35ea86
Parents: dc4abd4
Author: Xiangrui Meng m...@databricks.com
Authored: Fri Mar 13 10:27:28 2015 -0700
Committer: Xiangrui Meng m...@databricks.com
Committed: Fri Mar 13 10:27:28 2015 -0700

--
 docs/mllib-guide.md | 2 ++
 1 file changed, 2 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7f13434a/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 598374f..f8e8794 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -102,6 +102,8 @@ In the `spark.mllib` package, there were several breaking 
changes.  The first ch
 * In `DecisionTree`, the deprecated class method `train` has been removed. 
 (The object/static `train` methods remain.)
 * In `Strategy`, the `checkpointDir` parameter has been removed.  
Checkpointing is still supported, but the checkpoint directory must be set 
before calling tree and tree ensemble training.
 * `PythonMLlibAPI` (the interface between Scala/Java and Python for MLlib) was 
a public API but is now private, declared `private[python]`.  This was never 
meant for external use.
+* In linear regression (including Lasso and ridge regression), the squared 
loss is now divided by 2.
+  So in order to produce the same result as in 1.2, the regularization 
parameter needs to be divided by 2 and the step size needs to be multiplied by 
2.
 
 ## Previous Spark Versions
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[2/2] spark git commit: [SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet

2015-03-13 Thread srowen

[SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet

If the cleaner is stopped, we shouldn't print a huge stack trace when the 
cleaner thread is interrupted because we purposefully did this.

Author: Andrew Or and...@databricks.com

Closes #4882 from andrewor14/cleaner-interrupt and squashes the following 
commits:

8652120 [Andrew Or] Just a hot fix


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/338bea7b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/338bea7b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/338bea7b

Branch: refs/heads/branch-1.3
Commit: 338bea7b33a0faaa62c94ace334a79c0b1716a01
Parents: 3cdc8a3
Author: Andrew Or and...@databricks.com
Authored: Tue Mar 3 20:49:45 2015 -0800
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 18:21:44 2015 +

--
 core/src/main/scala/org/apache/spark/ContextCleaner.scala | 1 +
 1 file changed, 1 insertion(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/338bea7b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ContextCleaner.scala 
b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
index 201e5ec..98e4401 100644
--- a/core/src/main/scala/org/apache/spark/ContextCleaner.scala
+++ b/core/src/main/scala/org/apache/spark/ContextCleaner.scala
@@ -161,6 +161,7 @@ private[spark] class ContextCleaner(sc: SparkContext) 
extends Logging {
   }
 }
   } catch {
+case ie: InterruptedException if stopped = // ignore
 case e: Exception = logError(Error in cleaning thread, e)
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6285] [SQL] Removes unused ParquetTestData and duplicated TestGroupWriteSupport

2015-03-13 Thread lian

Repository: spark
Updated Branches:
  refs/heads/master b943f5d90 - cdc34ed91


[SPARK-6285] [SQL] Removes unused ParquetTestData and duplicated 
TestGroupWriteSupport

All the contents in this file are not referenced anywhere and should have been 
removed in #4116 when I tried to get rid of the old Parquet test suites.

!-- Reviewable:start --
[img src=https://reviewable.io/review_button.png; height=40 alt=Review on 
Reviewable/](https://reviewable.io/reviews/apache/spark/5010)
!-- Reviewable:end --

Author: Cheng Lian l...@databricks.com

Closes #5010 from liancheng/spark-6285 and squashes the following commits:

06ed057 [Cheng Lian] Removes unused ParquetTestData and duplicated 
TestGroupWriteSupport


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cdc34ed9
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cdc34ed9
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cdc34ed9

Branch: refs/heads/master
Commit: cdc34ed9108688fea32ad170b1ba344fe047716b
Parents: b943f5d
Author: Cheng Lian l...@databricks.com
Authored: Sat Mar 14 07:09:53 2015 +0800
Committer: Cheng Lian l...@databricks.com
Committed: Sat Mar 14 07:09:53 2015 +0800

--
 .../spark/sql/parquet/ParquetTestData.scala | 466 ---
 1 file changed, 466 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cdc34ed9/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala
deleted file mode 100644
index e4a10aa..000
--- a/sql/core/src/main/scala/org/apache/spark/sql/parquet/ParquetTestData.scala
+++ /dev/null
@@ -1,466 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one or more
- * contributor license agreements.  See the NOTICE file distributed with
- * this work for additional information regarding copyright ownership.
- * The ASF licenses this file to You under the Apache License, Version 2.0
- * (the License); you may not use this file except in compliance with
- * the License.  You may obtain a copy of the License at
- *
- *http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an AS IS BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-
-package org.apache.spark.sql.parquet
-
-import java.io.File
-
-import org.apache.hadoop.conf.Configuration
-import org.apache.hadoop.fs.{FileStatus, FileSystem, Path}
-import org.apache.hadoop.mapreduce.Job
-import org.apache.spark.sql.test.TestSQLContext
-
-import parquet.example.data.{GroupWriter, Group}
-import parquet.example.data.simple.{NanoTime, SimpleGroup}
-import parquet.hadoop.{ParquetReader, ParquetFileReader, ParquetWriter}
-import parquet.hadoop.api.WriteSupport
-import parquet.hadoop.api.WriteSupport.WriteContext
-import parquet.hadoop.example.GroupReadSupport
-import parquet.hadoop.util.ContextUtil
-import parquet.io.api.RecordConsumer
-import parquet.schema.{MessageType, MessageTypeParser}
-
-import org.apache.spark.util.Utils
-
-// Write support class for nested groups: ParquetWriter initializes 
GroupWriteSupport
-// with an empty configuration (it is after all not intended to be used in 
this way?)
-// and members are private so we need to make our own in order to pass the 
schema
-// to the writer.
-private class TestGroupWriteSupport(schema: MessageType) extends 
WriteSupport[Group] {
-  var groupWriter: GroupWriter = null
-  override def prepareForWrite(recordConsumer: RecordConsumer): Unit = {
-groupWriter = new GroupWriter(recordConsumer, schema)
-  }
-  override def init(configuration: Configuration): WriteContext = {
-new WriteContext(schema, new java.util.HashMap[String, String]())
-  }
-  override def write(record: Group) {
-groupWriter.write(record)
-  }
-}
-
-private[sql] object ParquetTestData {
-
-  val testSchema =
-message myrecord {
-  optional boolean myboolean;
-  optional int32 myint;
-  optional binary mystring (UTF8);
-  optional int64 mylong;
-  optional float myfloat;
-  optional double mydouble;
-  optional int96 mytimestamp;
-  }
-
-  // field names for test assertion error messages
-  val testSchemaFieldNames = Seq(
-myboolean:Boolean,
-myint:Int,
-mystring:String,
-mylong:Long,
-myfloat:Float,
-mydouble:Double,
-mytimestamp:Timestamp
-  )
-
-  val subTestSchema =
-
-  message myrecord {
-

spark git commit: [SPARK-6317][SQL]Fixed HIVE console startup issue

2015-03-13 Thread lian

Repository: spark
Updated Branches:
  refs/heads/master cdc34ed91 - e360d5e4a


[SPARK-6317][SQL]Fixed HIVE console startup issue

Author: vinodkc vinod.kc...@gmail.com
Author: Vinod K C vinod...@huawei.com

Closes #5011 from vinodkc/HIVE_console_startupError and squashes the following 
commits:

b43925f [vinodkc] Changed order of import
b4f5453 [Vinod K C] Fixed HIVE console startup issue


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e360d5e4
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e360d5e4
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e360d5e4

Branch: refs/heads/master
Commit: e360d5e4adf287444c10e72f8e4d57548839bf6e
Parents: cdc34ed
Author: vinodkc vinod.kc...@gmail.com
Authored: Sat Mar 14 07:17:54 2015 +0800
Committer: Cheng Lian l...@databricks.com
Committed: Sat Mar 14 07:17:54 2015 +0800

--
 project/SparkBuild.scala | 4 ++--
 sql/README.md| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e360d5e4/project/SparkBuild.scala
--
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 4a06b98..f4c74c4 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -269,8 +269,8 @@ object SQL {
 |import org.apache.spark.sql.catalyst.plans.logical._
 |import org.apache.spark.sql.catalyst.rules._
 |import org.apache.spark.sql.catalyst.util._
-|import org.apache.spark.sql.Dsl._
 |import org.apache.spark.sql.execution
+|import org.apache.spark.sql.functions._
 |import org.apache.spark.sql.test.TestSQLContext._
 |import org.apache.spark.sql.types._
 |import org.apache.spark.sql.parquet.ParquetTestData.stripMargin,
@@ -300,8 +300,8 @@ object Hive {
 |import org.apache.spark.sql.catalyst.plans.logical._
 |import org.apache.spark.sql.catalyst.rules._
 |import org.apache.spark.sql.catalyst.util._
-|import org.apache.spark.sql.Dsl._
 |import org.apache.spark.sql.execution
+|import org.apache.spark.sql.functions._
 |import org.apache.spark.sql.hive._
 |import org.apache.spark.sql.hive.test.TestHive._
 |import org.apache.spark.sql.types._

http://git-wip-us.apache.org/repos/asf/spark/blob/e360d5e4/sql/README.md
--
diff --git a/sql/README.md b/sql/README.md
index a792499..48f8334 100644
--- a/sql/README.md
+++ b/sql/README.md
@@ -36,8 +36,8 @@ import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.plans.logical._
 import org.apache.spark.sql.catalyst.rules._
 import org.apache.spark.sql.catalyst.util._
-import org.apache.spark.sql.Dsl._
 import org.apache.spark.sql.execution
+import org.apache.spark.sql.functions._
 import org.apache.spark.sql.hive._
 import org.apache.spark.sql.hive.test.TestHive._
 import org.apache.spark.sql.types._


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5845][Shuffle] Time to cleanup spilled shuffle files not included in shuffle write time

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 3980ebdf1 - 0af9ea74a


[SPARK-5845][Shuffle] Time to cleanup spilled shuffle files not included in 
shuffle write time

I've added a timer in the right place to fix this inaccuracy.

Author: Ilya Ganelin ilya.gane...@capitalone.com

Closes #4965 from ilganeli/SPARK-5845 and squashes the following commits:

bfabf88 [Ilya Ganelin] Changed to using a foreach vs. getorelse
3e059b0 [Ilya Ganelin] Switched to using getorelse
b946d08 [Ilya Ganelin] Fixed error with option
9434b50 [Ilya Ganelin] Merge remote-tracking branch 'upstream/master' into 
SPARK-5845
db8647e [Ilya Ganelin] Added update for shuffleWriteTime around spilled file 
cleanup in ExternalSorter


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0af9ea74
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0af9ea74
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0af9ea74

Branch: refs/heads/master
Commit: 0af9ea74a07ecdc08c43fa63cb9c9f0c57e3029b
Parents: 3980ebd
Author: Ilya Ganelin ilya.gane...@capitalone.com
Authored: Fri Mar 13 13:21:04 2015 +
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 13:21:04 2015 +

--
 .../scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala   | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0af9ea74/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala 
b/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
index 27496c5..fa2e617 100644
--- a/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
+++ b/core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala
@@ -88,7 +88,10 @@ private[spark] class SortShuffleWriter[K, V, C](
 } finally {
   // Clean up our sorter, which may have its own intermediate files
   if (sorter != null) {
+val startTime = System.nanoTime()
 sorter.stop()
+context.taskMetrics.shuffleWriteMetrics.foreach(
+  _.incShuffleWriteTime(System.nanoTime - startTime))
 sorter = null
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r1666484 - in /spark: downloads.md js/downloads.js site/community.html site/docs/latest site/downloads.html site/examples.html site/index.html site/js/downloads.js

2015-03-13 Thread pwendell

Author: pwendell
Date: Fri Mar 13 15:48:06 2015
New Revision: 1666484

URL: http://svn.apache.org/r1666484
Log:
Initial 1.3.0 code

Modified:
spark/downloads.md
spark/js/downloads.js
spark/site/community.html
spark/site/docs/latest
spark/site/downloads.html
spark/site/examples.html
spark/site/index.html
spark/site/js/downloads.js

Modified: spark/downloads.md
URL: 
http://svn.apache.org/viewvc/spark/downloads.md?rev=1666484r1=1666483r2=1666484view=diff
==
--- spark/downloads.md (original)
+++ spark/downloads.md Fri Mar 13 15:48:06 2015
@@ -16,9 +16,9 @@ $(document).ready(function() {
 
 ## Download Spark
 
-The latest release of Spark is Spark 1.2.1, released on February 9, 2015
-a href={{site.url}}releases/spark-release-1-2-1.html(release notes)/a
-a 
href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97;(git
 tag)/abr/
+The latest release of Spark is Spark 1.3.0, released on March 13, 2015
+a href={{site.url}}releases/spark-release-1-3-0.html(release notes)/a
+a 
href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc;(git
 tag)/abr/
 
 1. Chose a Spark release:
   select id=sparkVersionSelect 
onChange=javascript:onVersionSelect();/selectbr
@@ -41,7 +41,7 @@ Spark artifacts are [hosted in Maven Cen
 
 groupId: org.apache.spark
 artifactId: spark-core_2.10
-version: 1.2.1
+version: 1.3.0
 
 ### Development and Maintenance Branches
 If you are interested in working with the newest under-development code or 
contributing to Spark development, you can also check out the master branch 
from Git:
@@ -49,8 +49,8 @@ If you are interested in working with th
 # Master development branch
 git clone git://github.com/apache/spark.git
 
-# 1.2 maintenance branch with stability fixes on top of Spark 1.2.1
-git clone git://github.com/apache/spark.git -b branch-1.2
+# 1.3 maintenance branch with stability fixes on top of Spark 1.3.0
+git clone git://github.com/apache/spark.git -b branch-1.3
 
 Once you've downloaded Spark, you can find instructions for installing and 
building it on the a href={{site.url}}documentation.htmldocumentation 
page/a.
 

Modified: spark/js/downloads.js
URL: 
http://svn.apache.org/viewvc/spark/js/downloads.js?rev=1666484r1=1666483r2=1666484view=diff
==
--- spark/js/downloads.js (original)
+++ spark/js/downloads.js Fri Mar 13 15:48:06 2015
@@ -26,6 +26,7 @@ var packagesV3 = [mapr3, mapr4].concat(p
 // 1.1.0+
 var packagesV4 = [hadoop2p4, hadoop2p3, mapr3, mapr4].concat(packagesV1);
 
+addRelease(1.3.0, new Date(3/13/2015), sources.concat(packagesV4), true);
 addRelease(1.2.1, new Date(2/9/2015), sources.concat(packagesV4), true);
 addRelease(1.2.0, new Date(12/18/2014), sources.concat(packagesV4), true);
 addRelease(1.1.1, new Date(11/26/2014), sources.concat(packagesV4), true);

Modified: spark/site/community.html
URL: 
http://svn.apache.org/viewvc/spark/site/community.html?rev=1666484r1=1666483r2=1666484view=diff
==
--- spark/site/community.html (original)
+++ spark/site/community.html Fri Mar 13 15:48:06 2015
@@ -188,8 +188,6 @@
   /li
 /ul
 
-pThe StackOverflow tag a 
href=http://stackoverflow.com/questions/tagged/apache-spark;codeapache-spark/code/a
 is an unofficial but active forum for Spark users' questions and answers./p
-
 pa name=events/a/p
 h3Events and Meetups/h3
 

Modified: spark/site/docs/latest
URL: 
http://svn.apache.org/viewvc/spark/site/docs/latest?rev=1666484r1=1666483r2=1666484view=diff
==
--- spark/site/docs/latest (original)
+++ spark/site/docs/latest Fri Mar 13 15:48:06 2015
@@ -1 +1 @@
-link 1.2.1
\ No newline at end of file
+link 1.3.0
\ No newline at end of file

Modified: spark/site/downloads.html
URL: 
http://svn.apache.org/viewvc/spark/site/downloads.html?rev=1666484r1=1666483r2=1666484view=diff
==
--- spark/site/downloads.html (original)
+++ spark/site/downloads.html Fri Mar 13 15:48:06 2015
@@ -176,21 +176,21 @@ $(document).ready(function() {
 
 h2 id=download-sparkDownload Spark/h2
 
-pThe latest release of Spark is Spark 1.2.1, released on February 9, 2015
-a href=/releases/spark-release-1-2-1.html(release notes)/a
-a 
href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b6eaf77d4332bfb0a698849b1f5f917d20d70e97;(git
 tag)/abr //p
+pThe latest release of Spark is Spark 1.3.0, released on March 13, 2015
+a href=/releases/spark-release-1-3-0.html(release notes)/a
+a 
href=https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=4aaf48d46d13129f0f9bdafd771dd80fe568a7dc;(git
 tag)/abr //p
 
 ol

svn commit: r1666486 [2/2] - in /spark: _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

2015-03-13 Thread pwendell

Modified: spark/site/releases/spark-release-0-3.html
URL: 
http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-3.html?rev=1666486r1=1666485r2=1666486view=diff
==
--- spark/site/releases/spark-release-0-3.html (original)
+++ spark/site/releases/spark-release-0-3.html Fri Mar 13 15:51:27 2015
@@ -105,7 +105,7 @@
 /a
 ul class=dropdown-menu
   lia href=/documentation.htmlOverview/a/li
-  lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li
+  lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li
 /ul
   /li
   lia href=/examples.htmlExamples/a/li
@@ -135,6 +135,9 @@
   h5Latest News/h5
   ul class=list-unstyled
 
+  lia href=/news/spark-1-3-0-released.htmlSpark 1.3.0 
released/a
+  span class=small(Mar 13, 2015)/span/li
+
   lia href=/news/spark-1-2-1-released.htmlSpark 1.2.1 
released/a
   span class=small(Feb 09, 2015)/span/li
 
@@ -144,9 +147,6 @@
   lia href=/news/spark-1-2-0-released.htmlSpark 1.2.0 
released/a
   span class=small(Dec 18, 2014)/span/li
 
-  lia href=/news/spark-1-1-1-released.htmlSpark 1.1.1 
released/a
-  span class=small(Nov 26, 2014)/span/li
-
   /ul
   p class=small style=text-align: right;a 
href=/news/index.htmlArchive/a/p
 /div

Modified: spark/site/releases/spark-release-0-5-0.html
URL: 
http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-5-0.html?rev=1666486r1=1666485r2=1666486view=diff
==
--- spark/site/releases/spark-release-0-5-0.html (original)
+++ spark/site/releases/spark-release-0-5-0.html Fri Mar 13 15:51:27 2015
@@ -105,7 +105,7 @@
 /a
 ul class=dropdown-menu
   lia href=/documentation.htmlOverview/a/li
-  lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li
+  lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li
 /ul
   /li
   lia href=/examples.htmlExamples/a/li
@@ -135,6 +135,9 @@
   h5Latest News/h5
   ul class=list-unstyled
 
+  lia href=/news/spark-1-3-0-released.htmlSpark 1.3.0 
released/a
+  span class=small(Mar 13, 2015)/span/li
+
   lia href=/news/spark-1-2-1-released.htmlSpark 1.2.1 
released/a
   span class=small(Feb 09, 2015)/span/li
 
@@ -144,9 +147,6 @@
   lia href=/news/spark-1-2-0-released.htmlSpark 1.2.0 
released/a
   span class=small(Dec 18, 2014)/span/li
 
-  lia href=/news/spark-1-1-1-released.htmlSpark 1.1.1 
released/a
-  span class=small(Nov 26, 2014)/span/li
-
   /ul
   p class=small style=text-align: right;a 
href=/news/index.htmlArchive/a/p
 /div

Modified: spark/site/releases/spark-release-0-5-1.html
URL: 
http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-5-1.html?rev=1666486r1=1666485r2=1666486view=diff
==
--- spark/site/releases/spark-release-0-5-1.html (original)
+++ spark/site/releases/spark-release-0-5-1.html Fri Mar 13 15:51:27 2015
@@ -105,7 +105,7 @@
 /a
 ul class=dropdown-menu
   lia href=/documentation.htmlOverview/a/li
-  lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li
+  lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li
 /ul
   /li
   lia href=/examples.htmlExamples/a/li
@@ -135,6 +135,9 @@
   h5Latest News/h5
   ul class=list-unstyled
 
+  lia href=/news/spark-1-3-0-released.htmlSpark 1.3.0 
released/a
+  span class=small(Mar 13, 2015)/span/li
+
   lia href=/news/spark-1-2-1-released.htmlSpark 1.2.1 
released/a
   span class=small(Feb 09, 2015)/span/li
 
@@ -144,9 +147,6 @@
   lia href=/news/spark-1-2-0-released.htmlSpark 1.2.0 
released/a
   span class=small(Dec 18, 2014)/span/li
 
-  lia href=/news/spark-1-1-1-released.htmlSpark 1.1.1 
released/a
-  span class=small(Nov 26, 2014)/span/li
-
   /ul
   p class=small style=text-align: right;a 
href=/news/index.htmlArchive/a/p
 /div

Modified: spark/site/releases/spark-release-0-5-2.html
URL: 
http://svn.apache.org/viewvc/spark/site/releases/spark-release-0-5-2.html?rev=1666486r1=1666485r2=1666486view=diff
==
--- spark/site/releases/spark-release-0-5-2.html (original)
+++ spark/site/releases/spark-release-0-5-2.html Fri Mar 13 15:51:27 2015
@@ -105,7 +105,7 @@
 /a
 ul class=dropdown-menu
   lia href=/documentation.htmlOverview/a/li
-  lia href=/docs/latest/Latest Release (Spark 1.2.1)/a/li
+  lia href=/docs/latest/Latest Release (Spark 1.3.0)/a/li

spark git commit: [CORE][minor] remove unnecessary ClassTag in `DAGScheduler`

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 9048e8102 - ea3d2eed9


[CORE][minor] remove unnecessary ClassTag in `DAGScheduler`

This existed at the very beginning, but became unnecessary after [this 
commit](https://github.com/apache/spark/commit/37d8f37a8ec110416fba0d51d8ba70370ac380c1#diff-6a9ff7fb74fd490a50462d45db2d5e11L272).
 I think we should remove it if we don't plan to use it in the future.

Author: Wenchen Fan cloud0...@outlook.com

Closes #4992 from cloud-fan/small and squashes the following commits:

e857f2e [Wenchen Fan] remove unnecessary ClassTag


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ea3d2eed
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ea3d2eed
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ea3d2eed

Branch: refs/heads/master
Commit: ea3d2eed9b0a94b34543d9a9df87dc63a279deb1
Parents: 9048e81
Author: Wenchen Fan cloud0...@outlook.com
Authored: Fri Mar 13 14:08:56 2015 +
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 14:08:56 2015 +

--
 core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ea3d2eed/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
--
diff --git a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
index bc84e23..e4170a5 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
@@ -26,7 +26,6 @@ import scala.collection.mutable.{ArrayBuffer, HashMap, 
HashSet, Map, Stack}
 import scala.concurrent.Await
 import scala.concurrent.duration._
 import scala.language.postfixOps
-import scala.reflect.ClassTag
 import scala.util.control.NonFatal
 
 import akka.pattern.ask
@@ -497,7 +496,7 @@ class DAGScheduler(
 waiter
   }
 
-  def runJob[T, U: ClassTag](
+  def runJob[T, U](
   rdd: RDD[T],
   func: (TaskContext, Iterator[T]) = U,
   partitions: Seq[Int],


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6197][CORE] handle json exception when hisotry file not finished writing

2015-03-13 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 69ff8e8cf - 9048e8102


[SPARK-6197][CORE] handle json exception when hisotry file not finished writing

For details, please refer to 
[SPARK-6197](https://issues.apache.org/jira/browse/SPARK-6197)

Author: Zhang, Liye liye.zh...@intel.com

Closes #4927 from liyezhang556520/jsonParseError and squashes the following 
commits:

5cbdc82 [Zhang, Liye] without unnecessary wrap
2b48831 [Zhang, Liye] small changes with sean owen's comments
2973024 [Zhang, Liye] handle json exception when file not finished writing


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9048e810
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9048e810
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9048e810

Branch: refs/heads/master
Commit: 9048e8102e3f564842fa0dc6e82edce70b7dd3d7
Parents: 69ff8e8
Author: Zhang, Liye liye.zh...@intel.com
Authored: Fri Mar 13 13:59:54 2015 +
Committer: Sean Owen so...@cloudera.com
Committed: Fri Mar 13 14:00:45 2015 +

--
 .../org/apache/spark/deploy/master/Master.scala |  3 ++-
 .../spark/scheduler/ReplayListenerBus.scala | 25 
 2 files changed, 23 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9048e810/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
index 1581429..22935c9 100644
--- a/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/master/Master.scala
@@ -764,8 +764,9 @@ private[spark] class Master(
   val replayBus = new ReplayListenerBus()
   val ui = SparkUI.createHistoryUI(new SparkConf, replayBus, new 
SecurityManager(conf),
 appName + status, HistoryServer.UI_PATH_PREFIX + s/${app.id})
+  val maybeTruncated = 
eventLogFile.endsWith(EventLoggingListener.IN_PROGRESS)
   try {
-replayBus.replay(logInput, eventLogFile)
+replayBus.replay(logInput, eventLogFile, maybeTruncated)
   } finally {
 logInput.close()
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/9048e810/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala 
b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
index 95273c7..86f357a 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/ReplayListenerBus.scala
@@ -21,6 +21,7 @@ import java.io.{InputStream, IOException}
 
 import scala.io.Source
 
+import com.fasterxml.jackson.core.JsonParseException
 import org.json4s.jackson.JsonMethods._
 
 import org.apache.spark.Logging
@@ -40,15 +41,31 @@ private[spark] class ReplayListenerBus extends 
SparkListenerBus with Logging {
*
* @param logData Stream containing event log data.
* @param sourceName Filename (or other source identifier) from whence 
@logData is being read
+   * @param maybeTruncated Indicate whether log file might be truncated (some 
abnormal situations 
+   *encountered, log file might not finished writing) or not
*/
-  def replay(logData: InputStream, sourceName: String): Unit = {
+  def replay(
+  logData: InputStream,
+  sourceName: String,
+  maybeTruncated: Boolean = false): Unit = {
 var currentLine: String = null
 var lineNumber: Int = 1
 try {
   val lines = Source.fromInputStream(logData).getLines()
-  lines.foreach { line =
-currentLine = line
-postToAll(JsonProtocol.sparkEventFromJson(parse(line)))
+  while (lines.hasNext) {
+currentLine = lines.next()
+try {
+  postToAll(JsonProtocol.sparkEventFromJson(parse(currentLine)))
+} catch {
+  case jpe: JsonParseException =
+// We can only ignore exception from last line of the file that 
might be truncated
+if (!maybeTruncated || lines.hasNext) {
+  throw jpe
+} else {
+  logWarning(sGot JsonParseException from log file $sourceName + 
+s at line $lineNumber, the file might not have finished 
writing cleanly.)
+}
+}
 lineNumber += 1
   }
 } catch {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide

2015-03-13 Thread lian

Repository: spark
Updated Branches:
  refs/heads/branch-1.3 23069bd02 - dc287f38f


[SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide

Also fixed a bunch of minor styling issues.

!-- Reviewable:start --
[img src=https://reviewable.io/review_button.png; height=40 alt=Review on 
Reviewable/](https://reviewable.io/reviews/apache/spark/5001)
!-- Reviewable:end --

Author: Cheng Lian l...@databricks.com

Closes #5001 from liancheng/parquet-doc and squashes the following commits:

89ad3db [Cheng Lian] Addresses @rxin's comments
7eb6955 [Cheng Lian] Docs for the new Parquet data source
415eefb [Cheng Lian] Some minor formatting improvements

(cherry picked from commit 69ff8e8cfbecd81fd54100c4dab332c3bc992316)
Signed-off-by: Cheng Lian l...@databricks.com


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dc287f38
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dc287f38
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dc287f38

Branch: refs/heads/branch-1.3
Commit: dc287f38f1cc192b7fa6ec0e83b36254f1cfec10
Parents: 23069bd
Author: Cheng Lian l...@databricks.com
Authored: Fri Mar 13 21:34:50 2015 +0800
Committer: Cheng Lian l...@databricks.com
Committed: Fri Mar 13 21:36:47 2015 +0800

--
 docs/sql-programming-guide.md | 237 -
 1 file changed, 180 insertions(+), 57 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dc287f38/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 9c363bc..b1e309c 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -21,14 +21,14 @@ The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in the `spark-shell` or the `pyspark` shell.
 
 
-## Starting Point: SQLContext
+## Starting Point: `SQLContext`
 
 div class=codetabs
 div data-lang=scala  markdown=1
 
 The entry point into all functionality in Spark SQL is the
-[SQLContext](api/scala/index.html#org.apache.spark.sql.SQLContext) class, or 
one of its
-descendants.  To create a basic SQLContext, all you need is a SparkContext.
+[`SQLContext`](api/scala/index.html#org.apache.spark.sql.`SQLContext`) class, 
or one of its
+descendants.  To create a basic `SQLContext`, all you need is a SparkContext.
 
 {% highlight scala %}
 val sc: SparkContext // An existing SparkContext.
@@ -43,8 +43,8 @@ import sqlContext.implicits._
 div data-lang=java markdown=1
 
 The entry point into all functionality in Spark SQL is the
-[SQLContext](api/java/index.html#org.apache.spark.sql.SQLContext) class, or 
one of its
-descendants.  To create a basic SQLContext, all you need is a SparkContext.
+[`SQLContext`](api/java/index.html#org.apache.spark.sql.SQLContext) class, or 
one of its
+descendants.  To create a basic `SQLContext`, all you need is a SparkContext.
 
 {% highlight java %}
 JavaSparkContext sc = ...; // An existing JavaSparkContext.
@@ -56,8 +56,8 @@ SQLContext sqlContext = new 
org.apache.spark.sql.SQLContext(sc);
 div data-lang=python  markdown=1
 
 The entry point into all relational functionality in Spark is the
-[SQLContext](api/python/pyspark.sql.SQLContext-class.html) class, or one
-of its decedents.  To create a basic SQLContext, all you need is a 
SparkContext.
+[`SQLContext`](api/python/pyspark.sql.SQLContext-class.html) class, or one
+of its decedents.  To create a basic `SQLContext`, all you need is a 
SparkContext.
 
 {% highlight python %}
 from pyspark.sql import SQLContext
@@ -67,20 +67,20 @@ sqlContext = SQLContext(sc)
 /div
 /div
 
-In addition to the basic SQLContext, you can also create a HiveContext, which 
provides a
-superset of the functionality provided by the basic SQLContext. Additional 
features include
+In addition to the basic `SQLContext`, you can also create a `HiveContext`, 
which provides a
+superset of the functionality provided by the basic `SQLContext`. Additional 
features include
 the ability to write queries using the more complete HiveQL parser, access to 
Hive UDFs, and the
-ability to read data from Hive tables.  To use a HiveContext, you do not need 
to have an
-existing Hive setup, and all of the data sources available to a SQLContext are 
still available.
-HiveContext is only packaged separately to avoid including all of Hive's 
dependencies in the default
-Spark build.  If these dependencies are not a problem for your application 
then using HiveContext
-is recommended for the 1.3 release of Spark.  Future releases will focus on 
bringing SQLContext up
-to feature parity with a HiveContext.
+ability to read data from Hive tables.

spark git commit: [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide

2015-03-13 Thread lian

Repository: spark
Updated Branches:
  refs/heads/master 0af9ea74a - 69ff8e8cf


[SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide

Also fixed a bunch of minor styling issues.

!-- Reviewable:start --
[img src=https://reviewable.io/review_button.png; height=40 alt=Review on 
Reviewable/](https://reviewable.io/reviews/apache/spark/5001)
!-- Reviewable:end --

Author: Cheng Lian l...@databricks.com

Closes #5001 from liancheng/parquet-doc and squashes the following commits:

89ad3db [Cheng Lian] Addresses @rxin's comments
7eb6955 [Cheng Lian] Docs for the new Parquet data source
415eefb [Cheng Lian] Some minor formatting improvements


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/69ff8e8c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/69ff8e8c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/69ff8e8c

Branch: refs/heads/master
Commit: 69ff8e8cfbecd81fd54100c4dab332c3bc992316
Parents: 0af9ea7
Author: Cheng Lian l...@databricks.com
Authored: Fri Mar 13 21:34:50 2015 +0800
Committer: Cheng Lian l...@databricks.com
Committed: Fri Mar 13 21:34:50 2015 +0800

--
 docs/sql-programming-guide.md | 237 -
 1 file changed, 180 insertions(+), 57 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/69ff8e8c/docs/sql-programming-guide.md
--
diff --git a/docs/sql-programming-guide.md b/docs/sql-programming-guide.md
index 76aa1a5..11c29e2 100644
--- a/docs/sql-programming-guide.md
+++ b/docs/sql-programming-guide.md
@@ -21,14 +21,14 @@ The DataFrame API is available in 
[Scala](api/scala/index.html#org.apache.spark.
 All of the examples on this page use sample data included in the Spark 
distribution and can be run in the `spark-shell` or the `pyspark` shell.
 
 
-## Starting Point: SQLContext
+## Starting Point: `SQLContext`
 
 div class=codetabs
 div data-lang=scala  markdown=1
 
 The entry point into all functionality in Spark SQL is the
-[SQLContext](api/scala/index.html#org.apache.spark.sql.SQLContext) class, or 
one of its
-descendants.  To create a basic SQLContext, all you need is a SparkContext.
+[`SQLContext`](api/scala/index.html#org.apache.spark.sql.`SQLContext`) class, 
or one of its
+descendants.  To create a basic `SQLContext`, all you need is a SparkContext.
 
 {% highlight scala %}
 val sc: SparkContext // An existing SparkContext.
@@ -43,8 +43,8 @@ import sqlContext.implicits._
 div data-lang=java markdown=1
 
 The entry point into all functionality in Spark SQL is the
-[SQLContext](api/java/index.html#org.apache.spark.sql.SQLContext) class, or 
one of its
-descendants.  To create a basic SQLContext, all you need is a SparkContext.
+[`SQLContext`](api/java/index.html#org.apache.spark.sql.SQLContext) class, or 
one of its
+descendants.  To create a basic `SQLContext`, all you need is a SparkContext.
 
 {% highlight java %}
 JavaSparkContext sc = ...; // An existing JavaSparkContext.
@@ -56,8 +56,8 @@ SQLContext sqlContext = new 
org.apache.spark.sql.SQLContext(sc);
 div data-lang=python  markdown=1
 
 The entry point into all relational functionality in Spark is the
-[SQLContext](api/python/pyspark.sql.SQLContext-class.html) class, or one
-of its decedents.  To create a basic SQLContext, all you need is a 
SparkContext.
+[`SQLContext`](api/python/pyspark.sql.SQLContext-class.html) class, or one
+of its decedents.  To create a basic `SQLContext`, all you need is a 
SparkContext.
 
 {% highlight python %}
 from pyspark.sql import SQLContext
@@ -67,20 +67,20 @@ sqlContext = SQLContext(sc)
 /div
 /div
 
-In addition to the basic SQLContext, you can also create a HiveContext, which 
provides a
-superset of the functionality provided by the basic SQLContext. Additional 
features include
+In addition to the basic `SQLContext`, you can also create a `HiveContext`, 
which provides a
+superset of the functionality provided by the basic `SQLContext`. Additional 
features include
 the ability to write queries using the more complete HiveQL parser, access to 
Hive UDFs, and the
-ability to read data from Hive tables.  To use a HiveContext, you do not need 
to have an
-existing Hive setup, and all of the data sources available to a SQLContext are 
still available.
-HiveContext is only packaged separately to avoid including all of Hive's 
dependencies in the default
-Spark build.  If these dependencies are not a problem for your application 
then using HiveContext
-is recommended for the 1.3 release of Spark.  Future releases will focus on 
bringing SQLContext up
-to feature parity with a HiveContext.
+ability to read data from Hive tables.  To use a `HiveContext`, you do not 
need to have an
+existing Hive setup, and all of the data sources available to a

Git Push Summary

2015-03-13 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.3.0 [created] 4aaf48d46

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6278][MLLIB] Mention the change of objective in linear regression

spark git commit: [SPARK-6252] [mllib] Added getLambda to Scala NaiveBayes

spark git commit: SPARK-4044 [CORE] Thriftserver fails to start when JAVA_HOME points to JRE instead of JDK

spark git commit: [SPARK-6036][CORE] avoid race condition between eventlogListener and akka actor system

spark git commit: [SPARK-6087][CORE] Provide actionable exception if Kryo buffer is not large enough

spark git commit: [SPARK-4600][GraphX]: org.apache.spark.graphx.VertexRDD.diff does not work

spark git commit: [SPARK-6133] Make sc.stop() idempotent

svn commit: r1666516 - in /spark: releases/_posts/2015-03-13-spark-release-1-3-0.md site/releases/spark-release-1-3-0.html

spark git commit: SPARK-4300 [CORE] Race condition during SparkWorker shutdown

spark git commit: [SPARK-6275][Documentation]Miss toDF() function in docs/sql-programming-guide.md

svn commit: r1666542 - in /spark: documentation.md site/documentation.html

spark git commit: [SPARK-6194] [SPARK-677] [PySpark] fix memory leak in collect()

spark git commit: SPARK-4704 [CORE] SparkSubmitDriverBootstrap doesn't flush output

[1/2] spark git commit: [SPARK-6132] ContextCleaner race condition across SparkContexts

svn commit: r1666540 - in /spark: news/_posts/2015-03-13-spark-1-3-0-released.md site/documentation.html site/news/index.html site/news/spark-1-3-0-released.html

spark git commit: [SPARK-6278][MLLIB] Mention the change of objective in linear regression

[2/2] spark git commit: [SPARK-6132][HOTFIX] ContextCleaner InterruptedException should be quiet

spark git commit: [SPARK-6285] [SQL] Removes unused ParquetTestData and duplicated TestGroupWriteSupport

spark git commit: [SPARK-6317][SQL]Fixed HIVE console startup issue

spark git commit: [SPARK-5845][Shuffle] Time to cleanup spilled shuffle files not included in shuffle write time

svn commit: r1666484 - in /spark: downloads.md js/downloads.js site/community.html site/docs/latest site/downloads.html site/examples.html site/index.html site/js/downloads.js

svn commit: r1666486 [2/2] - in /spark: _layouts/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/

spark git commit: [CORE][minor] remove unnecessary ClassTag in `DAGScheduler`

spark git commit: [SPARK-6197][CORE] handle json exception when hisotry file not finished writing

spark git commit: [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide

spark git commit: [SPARK-5310] [SQL] [DOC] Parquet section for the SQL programming guide

Git Push Summary

27 matches

Site Navigation

Mail list logo

Footer information