date:20150714

spark git commit: [SPARK-8997] [MLLIB] Performance improvements in LocalPrefixSpan

2015-07-14 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master f0e129740 -> 1bb8accbc


[SPARK-8997] [MLLIB] Performance improvements in LocalPrefixSpan

Improves the performance of LocalPrefixSpan by implementing optimizations 
proposed in [SPARK-8997](https://issues.apache.org/jira/browse/SPARK-8997)

Author: Feynman Liang 
Author: Feynman Liang 
Author: Xiangrui Meng 

Closes #7360 from feynmanliang/SPARK-8997-improve-prefixspan and squashes the 
following commits:

59db2f5 [Feynman Liang] Merge pull request #1 from mengxr/SPARK-8997
91e4357 [Xiangrui Meng] update LocalPrefixSpan impl
9212256 [Feynman Liang] MengXR code review comments
f055d82 [Feynman Liang] Fix failing scalatest
2e00cba [Feynman Liang] Depth first projections
70b93e3 [Feynman Liang] Performance improvements in LocalPrefixSpan, fix tests


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1bb8accb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1bb8accb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1bb8accb

Branch: refs/heads/master
Commit: 1bb8accbc95a0f0856a8bb715f1e94c3ff96a8c7
Parents: f0e1297
Author: Feynman Liang 
Authored: Tue Jul 14 23:50:57 2015 -0700
Committer: Xiangrui Meng 
Committed: Tue Jul 14 23:50:57 2015 -0700

--
 .../spark/mllib/fpm/LocalPrefixSpan.scala   | 95 
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala |  5 +-
 .../spark/mllib/fpm/PrefixSpanSuite.scala   | 14 +--
 3 files changed, 44 insertions(+), 70 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1bb8accb/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
index 39c48b0..7ead632 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
@@ -17,58 +17,49 @@
 
 package org.apache.spark.mllib.fpm
 
+import scala.collection.mutable
+
 import org.apache.spark.Logging
-import org.apache.spark.annotation.Experimental
 
 /**
- *
- * :: Experimental ::
- *
  * Calculate all patterns of a projected database in local.
  */
-@Experimental
 private[fpm] object LocalPrefixSpan extends Logging with Serializable {
 
   /**
* Calculate all patterns of a projected database.
* @param minCount minimum count
* @param maxPatternLength maximum pattern length
-   * @param prefix prefix
-   * @param projectedDatabase the projected dabase
+   * @param prefixes prefixes in reversed order
+   * @param database the projected database
* @return a set of sequential pattern pairs,
-   * the key of pair is sequential pattern (a list of items),
+   * the key of pair is sequential pattern (a list of items in 
reversed order),
* the value of pair is the pattern's count.
*/
   def run(
   minCount: Long,
   maxPatternLength: Int,
-  prefix: Array[Int],
-  projectedDatabase: Array[Array[Int]]): Array[(Array[Int], Long)] = {
-val frequentPrefixAndCounts = getFreqItemAndCounts(minCount, 
projectedDatabase)
-val frequentPatternAndCounts = frequentPrefixAndCounts
-  .map(x => (prefix ++ Array(x._1), x._2))
-val prefixProjectedDatabases = getPatternAndProjectedDatabase(
-  prefix, frequentPrefixAndCounts.map(_._1), projectedDatabase)
-
-val continueProcess = prefixProjectedDatabases.nonEmpty && prefix.length + 
1 < maxPatternLength
-if (continueProcess) {
-  val nextPatterns = prefixProjectedDatabases
-.map(x => run(minCount, maxPatternLength, x._1, x._2))
-.reduce(_ ++ _)
-  frequentPatternAndCounts ++ nextPatterns
-} else {
-  frequentPatternAndCounts
+  prefixes: List[Int],
+  database: Array[Array[Int]]): Iterator[(List[Int], Long)] = {
+if (prefixes.length == maxPatternLength || database.isEmpty) return 
Iterator.empty
+val frequentItemAndCounts = getFreqItemAndCounts(minCount, database)
+val filteredDatabase = database.map(x => 
x.filter(frequentItemAndCounts.contains))
+frequentItemAndCounts.iterator.flatMap { case (item, count) =>
+  val newPrefixes = item :: prefixes
+  val newProjected = project(filteredDatabase, item)
+  Iterator.single((newPrefixes, count)) ++
+run(minCount, maxPatternLength, newPrefixes, newProjected)
 }
   }
 
   /**
-   * calculate suffix sequence following a prefix in a sequence
-   * @param prefix prefix
-   * @param sequence sequence
+   * Calculate suffix sequence immediately after the first occurrence of an 
item.
+   * @param item item to get suffix after
+   * @param sequence sequence to extra

spark git commit: [SPARK-8279][SQL]Add math function round

2015-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 3f6296fed -> f0e129740


[SPARK-8279][SQL]Add math function round

JIRA: https://issues.apache.org/jira/browse/SPARK-8279

Author: Yijie Shen 

Closes #6938 from yijieshen/udf_round_3 and squashes the following commits:

07a124c [Yijie Shen] remove useless def children
392b65b [Yijie Shen] add negative scale test in DecimalSuite
61760ee [Yijie Shen] address reviews
302a78a [Yijie Shen] Add dataframe function test
31dfe7c [Yijie Shen] refactor round to make it readable
8c7a949 [Yijie Shen] rebase & inputTypes update
9555e35 [Yijie Shen] tiny style fix
d10be4a [Yijie Shen] use TypeCollection to specify wanted input and implicit 
cast
c3b9839 [Yijie Shen] rely on implict cast to handle string input
b0bff79 [Yijie Shen] make round's inner method's name more meaningful
9bd6930 [Yijie Shen] revert accidental change
e6f44c4 [Yijie Shen] refactor eval and genCode
1b87540 [Yijie Shen] modify checkInputDataTypes using foldable
5486b2d [Yijie Shen] DataFrame API modification
2077888 [Yijie Shen] codegen versioned eval
6cd9a64 [Yijie Shen] refactor Round's constructor
9be894e [Yijie Shen] add round functions in o.a.s.sql.functions
7c83e13 [Yijie Shen] more tests on round
56db4bb [Yijie Shen] Add decimal support to Round
7e163ae [Yijie Shen] style fix
653d047 [Yijie Shen] Add math function round


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f0e12974
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f0e12974
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f0e12974

Branch: refs/heads/master
Commit: f0e129740dc2442a21dfa7fbd97360df87291095
Parents: 3f6296f
Author: Yijie Shen 
Authored: Tue Jul 14 23:30:41 2015 -0700
Committer: Reynold Xin 
Committed: Tue Jul 14 23:30:41 2015 -0700

--
 .../catalyst/analysis/FunctionRegistry.scala|   1 +
 .../spark/sql/catalyst/expressions/math.scala   | 203 ++-
 .../analysis/ExpressionTypeCheckingSuite.scala  |  17 ++
 .../expressions/MathFunctionsSuite.scala|  44 
 .../spark/sql/types/decimal/DecimalSuite.scala  |  23 ++-
 .../scala/org/apache/spark/sql/functions.scala  |  32 +++
 .../apache/spark/sql/MathExpressionsSuite.scala |  15 ++
 .../hive/execution/HiveCompatibilitySuite.scala |   7 +-
 8 files changed, 329 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f0e12974/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 6b1a94e..ec75f51 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -117,6 +117,7 @@ object FunctionRegistry {
 expression[Pow]("power"),
 expression[UnaryPositive]("positive"),
 expression[Rint]("rint"),
+expression[Round]("round"),
 expression[ShiftLeft]("shiftleft"),
 expression[ShiftRight]("shiftright"),
 expression[ShiftRightUnsigned]("shiftrightunsigned"),

http://git-wip-us.apache.org/repos/asf/spark/blob/f0e12974/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
index 4b7fe05..a7ad452 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/math.scala
@@ -19,8 +19,10 @@ package org.apache.spark.sql.catalyst.expressions
 
 import java.{lang => jl}
 
-import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.TypeCheckResult
+import 
org.apache.spark.sql.catalyst.analysis.TypeCheckResult.{TypeCheckSuccess, 
TypeCheckFailure}
 import org.apache.spark.sql.catalyst.expressions.codegen._
+import org.apache.spark.sql.catalyst.InternalRow
 import org.apache.spark.sql.types._
 import org.apache.spark.unsafe.types.UTF8String
 
@@ -520,3 +522,202 @@ case class Logarithm(left: Expression, right: Expression)
 """
   }
 }
+
+/**
+ * Round the `child`'s result to `scale` decimal place when `scale` >= 0
+ * or round at integral part when `scale` < 0.
+ * For example, round(31.415, 2) would eval to 31.42 and round(31.415, -1) 
would eval to 30.
+ *
+ * Child of IntegralType would eval to itself when `scale` >= 0.
+ * Ch

spark git commit: [SPARK-8018] [MLLIB] KMeans should accept initial cluster centers as param

2015-07-14 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 469276965 -> 3f6296fed


[SPARK-8018] [MLLIB] KMeans should accept initial cluster centers as param

 This allows Kmeans to be initialized using an existing set of cluster centers 
provided as  a KMeansModel object. This mode of initialization performs a 
single run.

Author: FlytxtRnD 

Closes #6737 from FlytxtRnD/Kmeans-8018 and squashes the following commits:

94b56df [FlytxtRnD] style correction
ef95ee2 [FlytxtRnD] style correction
c446c58 [FlytxtRnD] documentation and numRuns warning change
06d13ef [FlytxtRnD] numRuns corrected
d12336e [FlytxtRnD] numRuns variable modifications
07f8554 [FlytxtRnD] remove setRuns from setIntialModel
e721dfe [FlytxtRnD] Merge remote-tracking branch 'upstream/master' into 
Kmeans-8018
242ead1 [FlytxtRnD] corrected == to === in assert
714acb5 [FlytxtRnD] added numRuns
60c8ce2 [FlytxtRnD] ignore runs parameter and initialModel test suite changed
582e6d9 [FlytxtRnD] Merge remote-tracking branch 'upstream/master' into 
Kmeans-8018
3f5fc8e [FlytxtRnD] test case modified and one runs condition added
cd5dc5c [FlytxtRnD] Merge remote-tracking branch 'upstream/master' into 
Kmeans-8018
16f1b53 [FlytxtRnD] Merge branch 'Kmeans-8018', remote-tracking branch 
'upstream/master' into Kmeans-8018
e9c35d7 [FlytxtRnD] Remove getInitialModel and match cluster count criteria
6959861 [FlytxtRnD] Accept initial cluster centers in KMeans


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3f6296fe
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3f6296fe
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3f6296fe

Branch: refs/heads/master
Commit: 3f6296fed4ee10f53e728eb1e02f13338839b94d
Parents: 4692769
Author: FlytxtRnD 
Authored: Tue Jul 14 23:29:02 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Jul 14 23:29:02 2015 -0700

--
 docs/mllib-clustering.md|  1 +
 .../apache/spark/mllib/clustering/KMeans.scala  | 41 +---
 .../spark/mllib/clustering/KMeansSuite.scala| 22 +++
 3 files changed, 58 insertions(+), 6 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/3f6296fe/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index d72dc20..0fc7036 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -33,6 +33,7 @@ guaranteed to find a globally optimal solution, and when run 
multiple times on
 a given dataset, the algorithm returns the best clustering result).
 * *initializationSteps* determines the number of steps in the k-means\|\| 
algorithm.
 * *epsilon* determines the distance threshold within which we consider k-means 
to have converged.
+* *initialModel* is an optional set of cluster centers used for 
initialization. If this parameter is supplied, only one run is performed.
 
 **Examples**
 

http://git-wip-us.apache.org/repos/asf/spark/blob/3f6296fe/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
index 0f8d6a3..6829713 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/KMeans.scala
@@ -156,6 +156,21 @@ class KMeans private (
 this
   }
 
+  // Initial cluster centers can be provided as a KMeansModel object rather 
than using the
+  // random or k-means|| initializationMode
+  private var initialModel: Option[KMeansModel] = None
+
+  /**
+   * Set the initial starting point, bypassing the random initialization or 
k-means||
+   * The condition model.k == this.k must be met, failure results
+   * in an IllegalArgumentException.
+   */
+  def setInitialModel(model: KMeansModel): this.type = {
+require(model.k == k, "mismatched cluster count")
+initialModel = Some(model)
+this
+  }
+
   /**
* Train a K-means model on the given set of points; `data` should be cached 
for high
* performance, because this is an iterative algorithm.
@@ -193,20 +208,34 @@ class KMeans private (
 
 val initStartTime = System.nanoTime()
 
-val centers = if (initializationMode == KMeans.RANDOM) {
-  initRandom(data)
+// Only one run is allowed when initialModel is given
+val numRuns = if (initialModel.nonEmpty) {
+  if (runs > 1) logWarning("Ignoring runs; one run is allowed when 
initialModel is given.")
+  1
 } else {
-  initKMeansParallel(data)
+  runs
 }
 
+val centers = initialModel match {
+  case Some(kMeansCenters) => {
+Array(kMea

spark git commit: [SPARK-6259] [MLLIB] Python API for LDA

2015-07-14 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master c6b1a9e74 -> 469276965


[SPARK-6259] [MLLIB] Python API for LDA

I implemented the Python API for LDA. But I didn't implemented a method for 
`LDAModel.describeTopics()`, beause it's a little hard to implement it now. And 
adding document about that and an example code would fit for another issue.

TODO: LDAModel.describeTopics() in Python must be also implemented. But it 
would be nice to fit for another issue. Implementing it is a little hard, since 
the return value of `describeTopics` in Scala consists of Tuple classes.

Author: Yu ISHIKAWA 

Closes #6791 from yu-iskw/SPARK-6259 and squashes the following commits:

6855f59 [Yu ISHIKAWA] LDA inherits object
28bd165 [Yu ISHIKAWA] Change the place of testing code
d7a332a [Yu ISHIKAWA] Remove the doc comment about the optimizer's default value
083e226 [Yu ISHIKAWA] Add the comment about the supported values and the 
default value of `optimizer`
9f8bed8 [Yu ISHIKAWA] Simplify casting
faa9764 [Yu ISHIKAWA] Add some comments for the LDA paramters
98f645a [Yu ISHIKAWA] Remove the interface for `describeTopics`. Because it is 
not implemented.
57ac03d [Yu ISHIKAWA] Remove the unnecessary import in Python unit testing
73412c3 [Yu ISHIKAWA] Fix the typo
2278829 [Yu ISHIKAWA] Fix the indentation
39514ec [Yu ISHIKAWA] Modify how to cast the input data
8117e18 [Yu ISHIKAWA] Fix the validation problems by `lint-scala`
77fd1b7 [Yu ISHIKAWA] Not use LabeledPoint
68f0653 [Yu ISHIKAWA] Support some parameters for `ALS.train()` in Python
25ef2ac [Yu ISHIKAWA] Resolve conflicts with rebasing


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/46927696
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/46927696
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/46927696

Branch: refs/heads/master
Commit: 4692769655e09d129a62a89a8ffb5d635675aa4d
Parents: c6b1a9e
Author: Yu ISHIKAWA 
Authored: Tue Jul 14 23:27:42 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Jul 14 23:27:42 2015 -0700

--
 .../spark/mllib/api/python/PythonMLLibAPI.scala | 33 ++
 python/pyspark/mllib/clustering.py  | 66 +++-
 2 files changed, 98 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/46927696/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
index e628059..c58a640 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala
@@ -503,6 +503,39 @@ private[python] class PythonMLLibAPI extends Serializable {
   }
 
   /**
+   * Java stub for Python mllib LDA.run()
+   */
+  def trainLDAModel(
+  data: JavaRDD[java.util.List[Any]],
+  k: Int,
+  maxIterations: Int,
+  docConcentration: Double,
+  topicConcentration: Double,
+  seed: java.lang.Long,
+  checkpointInterval: Int,
+  optimizer: String): LDAModel = {
+val algo = new LDA()
+  .setK(k)
+  .setMaxIterations(maxIterations)
+  .setDocConcentration(docConcentration)
+  .setTopicConcentration(topicConcentration)
+  .setCheckpointInterval(checkpointInterval)
+  .setOptimizer(optimizer)
+
+if (seed != null) algo.setSeed(seed)
+
+val documents = data.rdd.map(_.asScala.toArray).map { r =>
+  r(0) match {
+case i: java.lang.Integer => (i.toLong, r(1).asInstanceOf[Vector])
+case i: java.lang.Long => (i.toLong, r(1).asInstanceOf[Vector])
+case _ => throw new IllegalArgumentException("input values contains 
invalid type value.")
+  }
+}
+algo.run(documents)
+  }
+
+
+  /**
* Java stub for Python mllib FPGrowth.train().  This stub returns a handle
* to the Java object instead of the content of the Java object.  Extra care
* needs to be taken in the Python code to ensure it gets freed on exit; see

http://git-wip-us.apache.org/repos/asf/spark/blob/46927696/python/pyspark/mllib/clustering.py
--
diff --git a/python/pyspark/mllib/clustering.py 
b/python/pyspark/mllib/clustering.py
index ed4d78a..8a92f69 100644
--- a/python/pyspark/mllib/clustering.py
+++ b/python/pyspark/mllib/clustering.py
@@ -31,13 +31,15 @@ from pyspark import SparkContext
 from pyspark.rdd import RDD, ignore_unicode_prefix
 from pyspark.mllib.common import JavaModelWrapper, callMLlibFunc, 
callJavaFunc, _py2java, _java2py
 from pyspark.mllib.linalg import SparseVector, _convert_to_vector, Dens

spark git commit: Revert SPARK-6910 and SPARK-9027

2015-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master f23a721c1 -> c6b1a9e74


Revert SPARK-6910 and SPARK-9027

Revert #7216 and #7386.  These patch seems to be causing quite a few test 
failures:

```
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.GeneratedMethodAccessor322.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:351)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getPartitionsByFilter$1.apply(ClientWrapper.scala:320)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$getPartitionsByFilter$1.apply(ClientWrapper.scala:318)
at 
org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$withHiveState$1.apply(ClientWrapper.scala:180)
at 
org.apache.spark.sql.hive.client.ClientWrapper.retryLocked(ClientWrapper.scala:135)
at 
org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:172)
at 
org.apache.spark.sql.hive.client.ClientWrapper.getPartitionsByFilter(ClientWrapper.scala:318)
at 
org.apache.spark.sql.hive.client.HiveTable.getPartitions(ClientInterface.scala:78)
at 
org.apache.spark.sql.hive.MetastoreRelation.getHiveQlPartitions(HiveMetastoreCatalog.scala:670)
at 
org.apache.spark.sql.hive.execution.HiveTableScan.doExecute(HiveTableScan.scala:137)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:90)
at 
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:90)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:89)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:164)
at 
org.apache.spark.sql.execution.Exchange$$anonfun$doExecute$1.apply(Exchange.scala:151)
at 
org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:48)
... 85 more
Caused by: MetaException(message:Filtering is supported only on partition keys 
of type string)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$FilterBuilder.setError(ExpressionTree.java:185)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.getJdoFilterPushdownParam(ExpressionTree.java:452)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilterOverPartitions(ExpressionTree.java:357)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$LeafNode.generateJDOFilter(ExpressionTree.java:279)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree$TreeNode.generateJDOFilter(ExpressionTree.java:243)
at 
org.apache.hadoop.hive.metastore.parser.ExpressionTree.generateJDOFilterFragment(ExpressionTree.java:590)
at 
org.apache.hadoop.hive.metastore.ObjectStore.makeQueryFilterString(ObjectStore.java:2417)
at 
org.apache.hadoop.hive.metastore.ObjectStore.getPartitionsViaOrmFilter(ObjectStore.java:2029)
at 
org.apache.hadoop.hive.metastore.ObjectStore.access$500(ObjectStore.java:146)
at 
org.apache.hadoop.hive.metastore.ObjectStore$4.getJdoResult(ObjectStore.java:2332)
```
https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-Maven-with-YARN/2945/HADOOP_PROFILE=hadoop-2.4,label=centos/testReport/junit/org.apache.spark.sql.hive.execution/SortMergeCompatibilitySuite/auto_sortmerge_join_16/

Author: Michael Armbrust 

Closes #7409 from marmbrus/revertMetastorePushdown and squashes the following 
commits:

92fabd3 [Michael Armbrust] Revert SPARK-6910 and SPARK-9027
5d3bdf2 [Michael Armbrust] Revert "[SPARK-9027] [SQL] Generalize metastore 
predicate pushdown"


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c6b1a9e7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c6b1a9e7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c6b1a9e7

Branch: refs/heads/master
Commit: c6b1a9e74e34267dc198e57a184c41498ca9d6a3
Parents: f23a721
Author: Michael Armbrust 
Authored: Tue Jul 14 22:57:39 2015 -0700
Committer: Reynold Xin 
Committed: Tue Jul 14 22:57:39 2015 -0700

--
 .../spark/sql/hive/HiveMetastoreCatalog.scala   | 58 +++
 .../org/apache/spark/sql/hive/HiveShim.scala|  1 -
 .../apache/spark/sql/hive/HiveStrategies.scala  |  4 +-
 .../spark/sql/hive/client/ClientInterface.scala | 11 +--
 .../spark/sql/hive/client/ClientWrapper.scala   | 21 +++---
 .../apache/spark/sql/hive/client/HiveShim.scala | 72 +-
 .../sql/hive/execution/HiveTableScan.scala  |  7 +-
 .../spark/sql/hive/client/FiltersSuite.scala|

spark git commit: [SPARK-8993][SQL] More comprehensive type checking in expressions.

2015-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master f650a005e -> f23a721c1


[SPARK-8993][SQL] More comprehensive type checking in expressions.

This patch makes the following changes:

1. ExpectsInputTypes only defines expected input types, but does not perform 
any implicit type casting.
2. ImplicitCastInputTypes is a new trait that defines both expected input 
types, as well as performs implicit type casting.
3. BinaryOperator has a new abstract function "inputType", which defines the 
expected input type for both left/right. Concrete BinaryOperator expressions no 
longer perform any implicit type casting.
4. For BinaryOperators, convert NullType (i.e. null literals) into some 
accepted type so BinaryOperators don't need to handle NullTypes.

TODOs needed: fix unit tests for error reporting.

I'm intentionally not changing anything in aggregate expressions because yhuai 
is doing a big refactoring on that right now.

Author: Reynold Xin 

Closes #7348 from rxin/typecheck and squashes the following commits:

8fcf814 [Reynold Xin] Fixed ordering of cases.
3bb63e7 [Reynold Xin] Style fix.
f45408f [Reynold Xin] Comment update.
aa7790e [Reynold Xin] Moved RemoveNullTypes into ImplicitTypeCasts.
438ea07 [Reynold Xin] space
d55c9e5 [Reynold Xin] Removes NullTypes.
360d124 [Reynold Xin] Fixed the rule.
fb66657 [Reynold Xin] Convert NullType into some accepted type for 
BinaryOperators.
2e22330 [Reynold Xin] Fixed unit tests.
4932d57 [Reynold Xin] Style fix.
d061691 [Reynold Xin] Rename existing ExpectsInputTypes -> 
ImplicitCastInputTypes.
e4727cc [Reynold Xin] BinaryOperator should not be doing implicit cast.
d017861 [Reynold Xin] Improve expression type checking.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f23a721c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f23a721c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f23a721c

Branch: refs/heads/master
Commit: f23a721c10b64ec5c6768634fc5e9e7b60ee7ca8
Parents: f650a00
Author: Reynold Xin 
Authored: Tue Jul 14 22:52:53 2015 -0700
Committer: Reynold Xin 
Committed: Tue Jul 14 22:52:53 2015 -0700

--
 .../catalyst/analysis/FunctionRegistry.scala|  1 +
 .../catalyst/analysis/HiveTypeCoercion.scala| 43 ++
 .../expressions/ExpectsInputTypes.scala | 17 +++-
 .../sql/catalyst/expressions/Expression.scala   | 44 +-
 .../sql/catalyst/expressions/ScalaUDF.scala |  2 +-
 .../sql/catalyst/expressions/arithmetic.scala   | 84 +---
 .../sql/catalyst/expressions/bitwise.scala  | 30 +++
 .../spark/sql/catalyst/expressions/math.scala   | 18 ++---
 .../spark/sql/catalyst/expressions/misc.scala   |  8 +-
 .../sql/catalyst/expressions/predicates.scala   | 83 ++-
 .../catalyst/expressions/stringOperations.scala | 36 -
 .../spark/sql/catalyst/util/TypeUtils.scala |  8 --
 .../spark/sql/types/AbstractDataType.scala  | 35 
 .../catalyst/analysis/AnalysisErrorSuite.scala  |  2 +-
 .../analysis/ExpressionTypeCheckingSuite.scala  |  6 +-
 .../analysis/HiveTypeCoercionSuite.scala| 56 +
 .../apache/spark/sql/MathExpressionsSuite.scala |  1 -
 17 files changed, 309 insertions(+), 165 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f23a721c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index ed69c42..6b1a94e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -17,6 +17,7 @@
 
 package org.apache.spark.sql.catalyst.analysis
 
+import scala.language.existentials
 import scala.reflect.ClassTag
 import scala.util.{Failure, Success, Try}
 

http://git-wip-us.apache.org/repos/asf/spark/blob/f23a721c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
index 8cb7199..15da5ee 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala
@@ -214,19 +214,6 @@ object HiveTypeCoercion {
   }
 
 Union(newLeft, newRight)
-
-  /

spark git commit: [SPARK-8808] [SPARKR] Fix assignments in SparkR.

2015-07-14 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/master 5572fd0c5 -> f650a005e


[SPARK-8808] [SPARKR] Fix assignments in SparkR.

Author: Sun Rui 

Closes #7395 from sun-rui/SPARK-8808 and squashes the following commits:

ce603bc [Sun Rui] Use '<-' instead of '='.
88590b1 [Sun Rui] Use '<-' instead of '='.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f650a005
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f650a005
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f650a005

Branch: refs/heads/master
Commit: f650a005e03ecd800c9005a496cc6a0d8eb68c93
Parents: 5572fd0
Author: Sun Rui 
Authored: Tue Jul 14 22:21:01 2015 -0700
Committer: Shivaram Venkataraman 
Committed: Tue Jul 14 22:21:01 2015 -0700

--
 R/pkg/R/DataFrame.R | 2 +-
 R/pkg/R/client.R| 4 ++--
 R/pkg/R/group.R | 4 ++--
 R/pkg/R/utils.R | 4 ++--
 R/pkg/inst/tests/test_binaryFile.R  | 2 +-
 R/pkg/inst/tests/test_binary_function.R | 2 +-
 R/pkg/inst/tests/test_rdd.R | 4 ++--
 R/pkg/inst/tests/test_textFile.R| 2 +-
 R/pkg/inst/tests/test_utils.R   | 2 +-
 9 files changed, 13 insertions(+), 13 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f650a005/R/pkg/R/DataFrame.R
--
diff --git a/R/pkg/R/DataFrame.R b/R/pkg/R/DataFrame.R
index 6070282..2088137 100644
--- a/R/pkg/R/DataFrame.R
+++ b/R/pkg/R/DataFrame.R
@@ -1328,7 +1328,7 @@ setMethod("write.df",
 jmode <- callJStatic("org.apache.spark.sql.api.r.SQLUtils", 
"saveMode", mode)
 options <- varargsToEnv(...)
 if (!is.null(path)) {
-options[['path']] = path
+options[['path']] <- path
 }
 callJMethod(df@sdf, "save", source, jmode, options)
   })

http://git-wip-us.apache.org/repos/asf/spark/blob/f650a005/R/pkg/R/client.R
--
diff --git a/R/pkg/R/client.R b/R/pkg/R/client.R
index 78c7a30..6f77215 100644
--- a/R/pkg/R/client.R
+++ b/R/pkg/R/client.R
@@ -36,9 +36,9 @@ connectBackend <- function(hostname, port, timeout = 6000) {
 
 determineSparkSubmitBin <- function() {
   if (.Platform$OS.type == "unix") {
-sparkSubmitBinName = "spark-submit"
+sparkSubmitBinName <- "spark-submit"
   } else {
-sparkSubmitBinName = "spark-submit.cmd"
+sparkSubmitBinName <- "spark-submit.cmd"
   }
   sparkSubmitBinName
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/f650a005/R/pkg/R/group.R
--
diff --git a/R/pkg/R/group.R b/R/pkg/R/group.R
index 8f1c68f..576ac72 100644
--- a/R/pkg/R/group.R
+++ b/R/pkg/R/group.R
@@ -87,7 +87,7 @@ setMethod("count",
 setMethod("agg",
   signature(x = "GroupedData"),
   function(x, ...) {
-cols = list(...)
+cols <- list(...)
 stopifnot(length(cols) > 0)
 if (is.character(cols[[1]])) {
   cols <- varargsToEnv(...)
@@ -97,7 +97,7 @@ setMethod("agg",
   if (!is.null(ns)) {
 for (n in ns) {
   if (n != "") {
-cols[[n]] = alias(cols[[n]], n)
+cols[[n]] <- alias(cols[[n]], n)
   }
 }
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/f650a005/R/pkg/R/utils.R
--
diff --git a/R/pkg/R/utils.R b/R/pkg/R/utils.R
index ea629a6..950ba74 100644
--- a/R/pkg/R/utils.R
+++ b/R/pkg/R/utils.R
@@ -41,8 +41,8 @@ convertJListToRList <- function(jList, flatten, 
logicalUpperBound = NULL,
   if (isInstanceOf(obj, "scala.Tuple2")) {
 # JavaPairRDD[Array[Byte], Array[Byte]].
 
-keyBytes = callJMethod(obj, "_1")
-valBytes = callJMethod(obj, "_2")
+keyBytes <- callJMethod(obj, "_1")
+valBytes <- callJMethod(obj, "_2")
 res <- list(unserialize(keyBytes),
   unserialize(valBytes))
   } else {

http://git-wip-us.apache.org/repos/asf/spark/blob/f650a005/R/pkg/inst/tests/test_binaryFile.R
--
diff --git a/R/pkg/inst/tests/test_binaryFile.R 
b/R/pkg/inst/tests/test_binaryFile.R
index ccaea18..f2452ed 100644
--- a/R/pkg/inst/tests/test_binaryFile.R
+++ b/R/pkg/inst/tests/test_binaryFile.R
@@ -20,7 +20,7 @@ context("functions on binary files")
 # JavaSparkContext handle
 sc <- sparkR.init()
 
-mockFile = c("Spark is pretty.", "Spark is awesome.")
+mockFile <- c("Spark is pretty

spark git commit: [HOTFIX] Adding new names to known contributors

2015-07-14 Thread pwendell

Repository: spark
Updated Branches:
  refs/heads/master bb870e72f -> 5572fd0c5


[HOTFIX] Adding new names to known contributors


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5572fd0c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5572fd0c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5572fd0c

Branch: refs/heads/master
Commit: 5572fd0c518acd2e4483ff41bea1eb1cffd543ce
Parents: bb870e7
Author: Patrick Wendell 
Authored: Tue Jul 14 21:44:47 2015 -0700
Committer: Patrick Wendell 
Committed: Tue Jul 14 21:44:47 2015 -0700

--
 dev/create-release/known_translations | 9 +
 1 file changed, 9 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5572fd0c/dev/create-release/known_translations
--
diff --git a/dev/create-release/known_translations 
b/dev/create-release/known_translations
index 5f2671a..e462302 100644
--- a/dev/create-release/known_translations
+++ b/dev/create-release/known_translations
@@ -129,3 +129,12 @@ yongtang - Yong Tang
 ypcat - Pei-Lun Lee
 zhichao-li - Zhichao Li
 zzcclp - Zhichao Zhang
+979969786 - Yuming Wang
+Rosstin - Rosstin Murphy
+ameyc - Amey Chaugule
+animeshbaranawal - Animesh Baranawal
+cafreeman - Chris Freeman
+lee19 - Lee
+lockwobr - Brian Lockwood
+navis - Navis Ryu
+pparkkin - Paavo Parkkinen


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r1691124 - in /spark/site/docs/1.4.1: ./ api/ api/R/ api/java/ api/java/org/ api/java/org/apache/ api/java/org/apache/spark/ api/java/org/apache/spark/annotation/ api/java/org/apache/spark

2015-07-14 Thread pwendell

Author: pwendell
Date: Wed Jul 15 04:17:01 2015
New Revision: 1691124

URL: http://svn.apache.org/r1691124
Log:
Spark 1.4.1 docs


[This commit notification would consist of 734 parts, 
which exceeds the limit of 50 ones, so it was shortened to the summary.]

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r9826 - /dev/spark/spark-1.4.1-rc4-bin/ /release/spark/spark-1.4.1/

2015-07-14 Thread pwendell

Author: pwendell
Date: Wed Jul 15 03:29:55 2015
New Revision: 9826

Log:
Spark release 1.4.1


Added:
release/spark/spark-1.4.1/
  - copied from r9825, dev/spark/spark-1.4.1-rc4-bin/
Removed:
dev/spark/spark-1.4.1-rc4-bin/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r9825 - in /dev/spark/spark-1.4.1-rc4-bin: spark-1.4.1-bin-hadoop2.4-without-hive.tgz spark-1.4.1-bin-hadoop2.4-without-hive.tgz.asc spark-1.4.1-bin-hadoop2.4-without-hive.tgz.md5 spark-1.

2015-07-14 Thread pwendell

Author: pwendell
Date: Wed Jul 15 03:28:40 2015
New Revision: 9825

Log:
Removing hive developer build

Removed:
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz.asc
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz.sha


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

svn commit: r9824 - /dev/spark/spark-1.4.1-rc4-bin/

2015-07-14 Thread pwendell

Author: pwendell
Date: Wed Jul 15 03:25:59 2015
New Revision: 9824

Log:
Adding Spark 1.4.1 RC4

Added:
dev/spark/spark-1.4.1-rc4-bin/
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.asc   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1-scala2.11.tgz   (with 
props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1-scala2.11.tgz.asc   
(with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1-scala2.11.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1-scala2.11.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1.tgz   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1.tgz.asc   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.3.tgz   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.3.tgz.asc   (with 
props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.3.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.3.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz   
(with props)

dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz.asc   
(with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4-without-hive.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4.tgz   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4.tgz.asc   (with 
props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.4.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.6.tgz   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.6.tgz.asc   (with 
props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.6.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop2.6.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-without-hadoop.tgz   (with 
props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-without-hadoop.tgz.asc   
(with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-without-hadoop.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-without-hadoop.tgz.sha
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1.tgz   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1.tgz.asc   (with props)
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1.tgz.md5
dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1.tgz.sha

Added: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz
==
Binary file - no diff available.

Propchange: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz
--
svn:mime-type = application/x-gzip

Added: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.asc
==
Binary file - no diff available.

Propchange: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.asc
--
svn:mime-type = application/pgp-signature

Added: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.md5
==
--- dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.md5 (added)
+++ dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.md5 Wed Jul 15 
03:25:59 2015
@@ -0,0 +1 @@
+spark-1.4.1-bin-cdh4.tgz: 49 B9 4C 92 1B 82 36 3D  2D 7F 88 20 9D 0A 70 A7

Added: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.sha
==
--- dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.sha (added)
+++ dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-cdh4.tgz.sha Wed Jul 15 
03:25:59 2015
@@ -0,0 +1,3 @@
+spark-1.4.1-bin-cdh4.tgz: EDD359E8 2B0516AB 611ADB14 BC0A1E4B 292F43BB 0407B7A0
+  96C166BD DBAB87DE 4BE08544 09F6F862 953E326F E782749D
+  50EC29C1 B65076A6 FD62C9E5 89156D26

Added: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1-scala2.11.tgz
==
Binary file - no diff available.

Propchange: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1-scala2.11.tgz
--
svn:mime-type = application/x-gzip

Added: dev/spark/spark-1.4.1-rc4-bin/spark-1.4.1-bin-hadoop1-scala2.11.tgz.asc
=

spark git commit: [SPARK-5523] [CORE] [STREAMING] Add a cache for hostname in TaskMetrics to decrease the memory usage and GC overhead

2015-07-14 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master f957796c4 -> bb870e72f


[SPARK-5523] [CORE] [STREAMING] Add a cache for hostname in TaskMetrics to 
decrease the memory usage and GC overhead

Hostname in TaskMetrics will be created through deserialization, mostly the 
number of hostname is only the order of number of cluster node, so adding a 
cache layer to dedup the object could reduce the memory usage and alleviate GC 
overhead, especially for long-running and fast job generation applications like 
Spark Streaming.

Author: jerryshao 
Author: Saisai Shao 

Closes #5064 from jerryshao/SPARK-5523 and squashes the following commits:

3e2412a [jerryshao] Address the comments
b092a81 [Saisai Shao] Add a pool to cache the hostname


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bb870e72
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bb870e72
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bb870e72

Branch: refs/heads/master
Commit: bb870e72f42b6ce8d056df259f6fcf41808d7ed2
Parents: f957796
Author: jerryshao 
Authored: Tue Jul 14 19:54:02 2015 -0700
Committer: Tathagata Das 
Committed: Tue Jul 14 19:54:02 2015 -0700

--
 .../org/apache/spark/executor/TaskMetrics.scala | 20 
 1 file changed, 20 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bb870e72/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
--
diff --git a/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala 
b/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
index a3b4561..e80feee 100644
--- a/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
+++ b/core/src/main/scala/org/apache/spark/executor/TaskMetrics.scala
@@ -17,11 +17,15 @@
 
 package org.apache.spark.executor
 
+import java.io.{IOException, ObjectInputStream}
+import java.util.concurrent.ConcurrentHashMap
+
 import scala.collection.mutable.ArrayBuffer
 
 import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.executor.DataReadMethod.DataReadMethod
 import org.apache.spark.storage.{BlockId, BlockStatus}
+import org.apache.spark.util.Utils
 
 /**
  * :: DeveloperApi ::
@@ -210,10 +214,26 @@ class TaskMetrics extends Serializable {
   private[spark] def updateInputMetrics(): Unit = synchronized {
 inputMetrics.foreach(_.updateBytesRead())
   }
+
+  @throws(classOf[IOException])
+  private def readObject(in: ObjectInputStream): Unit = Utils.tryOrIOException 
{
+in.defaultReadObject()
+// Get the hostname from cached data, since hostname is the order of 
number of nodes in
+// cluster, so using cached hostname will decrease the object number and 
alleviate the GC
+// overhead.
+_hostname = TaskMetrics.getCachedHostName(_hostname)
+  }
 }
 
 private[spark] object TaskMetrics {
+  private val hostNameCache = new ConcurrentHashMap[String, String]()
+
   def empty: TaskMetrics = new TaskMetrics
+
+  def getCachedHostName(host: String): String = {
+val canonicalHost = hostNameCache.putIfAbsent(host, host)
+if (canonicalHost != null) canonicalHost else host
+  }
 }
 
 /**


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8820] [STREAMING] Add a configuration to set checkpoint dir.

2015-07-14 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master cc57d705e -> f957796c4


[SPARK-8820] [STREAMING] Add a configuration to set checkpoint dir.

Add a configuration to set checkpoint directory  for convenience to user.
[Jira Address](https://issues.apache.org/jira/browse/SPARK-8820)

Author: huangzhaowei 

Closes #7218 from SaintBacchus/SPARK-8820 and squashes the following commits:

d49fe4b [huangzhaowei] Rename the configuration name
66ea47c [huangzhaowei] Add the unit test.
dd0acc1 [huangzhaowei] [SPARK-8820][Streaming] Add a configuration to set 
checkpoint dir.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f957796c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f957796c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f957796c

Branch: refs/heads/master
Commit: f957796c4b3c3cd95edfc64500a045f7e810ee87
Parents: cc57d70
Author: huangzhaowei 
Authored: Tue Jul 14 19:20:49 2015 -0700
Committer: Tathagata Das 
Committed: Tue Jul 14 19:20:49 2015 -0700

--
 .../scala/org/apache/spark/streaming/StreamingContext.scala | 2 ++
 .../org/apache/spark/streaming/StreamingContextSuite.scala  | 9 +
 2 files changed, 11 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f957796c/streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
--
diff --git 
a/streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala 
b/streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
index 6b78a82..92438f1 100644
--- a/streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
+++ b/streaming/src/main/scala/org/apache/spark/streaming/StreamingContext.scala
@@ -201,6 +201,8 @@ class StreamingContext private[streaming] (
 
   private var shutdownHookRef: AnyRef = _
 
+  conf.getOption("spark.streaming.checkpoint.directory").foreach(checkpoint)
+
   /**
* Return the associated Spark context
*/

http://git-wip-us.apache.org/repos/asf/spark/blob/f957796c/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
--
diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
 
b/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
index 289a159..f588cf5 100644
--- 
a/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
+++ 
b/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
@@ -115,6 +115,15 @@ class StreamingContextSuite extends SparkFunSuite with 
BeforeAndAfter with Timeo
 assert(ssc.conf.getTimeAsSeconds("spark.cleaner.ttl", "-1") === 10)
   }
 
+  test("checkPoint from conf") {
+val checkpointDirectory = Utils.createTempDir().getAbsolutePath()
+
+val myConf = SparkContext.updatedConf(new SparkConf(false), master, 
appName)
+myConf.set("spark.streaming.checkpoint.directory", checkpointDirectory)
+val ssc = new StreamingContext(myConf, batchDuration)
+assert(ssc.checkpointDir != null)
+  }
+
   test("state matching") {
 import StreamingContextState._
 assert(INITIALIZED === INITIALIZED)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9050] [SQL] Remove unused newOrdering argument from Exchange (cleanup after SPARK-8317)

2015-07-14 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master e965a798d -> cc57d705e


[SPARK-9050] [SQL] Remove unused newOrdering argument from Exchange (cleanup 
after SPARK-8317)

SPARK-8317 changed the SQL Exchange operator so that it no longer pushed 
sorting into Spark's shuffle layer, a change which allowed more efficient 
SQL-specific sorters to be used.

This patch performs some leftover cleanup based on those changes:

- Exchange's constructor should no longer accept a `newOrdering` since it's no 
longer used and no longer works as expected.
- `addOperatorsIfNecessary` looked at shuffle input's output ordering to decide 
whether to sort, but this is the wrong node to be examining: it needs to look 
at whether the post-shuffle node has the right ordering, since shuffling will 
not preserve row orderings.  Thanks to davies for spotting this.

Author: Josh Rosen 

Closes #7407 from JoshRosen/SPARK-9050 and squashes the following commits:

e70be50 [Josh Rosen] No need to wrap line
e866494 [Josh Rosen] Refactor addOperatorsIfNecessary to make code clearer
2e467da [Josh Rosen] Remove `newOrdering` from Exchange.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cc57d705
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cc57d705
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cc57d705

Branch: refs/heads/master
Commit: cc57d705e732aefc2f3d3f438e84d71705b2eb65
Parents: e965a79
Author: Josh Rosen 
Authored: Tue Jul 14 18:55:34 2015 -0700
Committer: Josh Rosen 
Committed: Tue Jul 14 18:55:34 2015 -0700

--
 .../apache/spark/sql/execution/Exchange.scala   | 37 
 .../spark/sql/execution/SparkStrategies.scala   |  3 +-
 2 files changed, 16 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cc57d705/sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
index 4b783e3..feea4f2 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/Exchange.scala
@@ -35,21 +35,13 @@ import org.apache.spark.{HashPartitioner, Partitioner, 
RangePartitioner, SparkEn
 
 /**
  * :: DeveloperApi ::
- * Performs a shuffle that will result in the desired `newPartitioning`.  
Optionally sorts each
- * resulting partition based on expressions from the partition key.  It is 
invalid to construct an
- * exchange operator with a `newOrdering` that cannot be calculated using the 
partitioning key.
+ * Performs a shuffle that will result in the desired `newPartitioning`.
  */
 @DeveloperApi
-case class Exchange(
-newPartitioning: Partitioning,
-newOrdering: Seq[SortOrder],
-child: SparkPlan)
-  extends UnaryNode {
+case class Exchange(newPartitioning: Partitioning, child: SparkPlan) extends 
UnaryNode {
 
   override def outputPartitioning: Partitioning = newPartitioning
 
-  override def outputOrdering: Seq[SortOrder] = newOrdering
-
   override def output: Seq[Attribute] = child.output
 
   /**
@@ -279,23 +271,24 @@ private[sql] case class EnsureRequirements(sqlContext: 
SQLContext) extends Rule[
   partitioning: Partitioning,
   rowOrdering: Seq[SortOrder],
   child: SparkPlan): SparkPlan = {
-val needSort = rowOrdering.nonEmpty && child.outputOrdering != 
rowOrdering
-val needsShuffle = child.outputPartitioning != partitioning
 
-val withShuffle = if (needsShuffle) {
-  Exchange(partitioning, Nil, child)
-} else {
-  child
+def addShuffleIfNecessary(child: SparkPlan): SparkPlan = {
+  if (child.outputPartitioning != partitioning) {
+Exchange(partitioning, child)
+  } else {
+child
+  }
 }
 
-val withSort = if (needSort) {
-  sqlContext.planner.BasicOperators.getSortOperator(
-rowOrdering, global = false, withShuffle)
-} else {
-  withShuffle
+def addSortIfNecessary(child: SparkPlan): SparkPlan = {
+  if (rowOrdering.nonEmpty && child.outputOrdering != rowOrdering) {
+sqlContext.planner.BasicOperators.getSortOperator(rowOrdering, 
global = false, child)
+  } else {
+child
+  }
 }
 
-withSort
+addSortIfNecessary(addShuffleIfNecessary(child))
   }
 
   if (meetsRequirements && compatible && !needsAnySort) {

http://git-wip-us.apache.org/repos/asf/spark/blob/cc57d705/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala
---

Git Push Summary

2015-07-14 Thread pwendell

Repository: spark
Updated Tags:  refs/tags/v1.4.1 [created] dbaa5c294

-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9045] Fix Scala 2.11 build break in UnsafeExternalRowSorter

2015-07-14 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 11e5c3728 -> e965a798d


[SPARK-9045] Fix Scala 2.11 build break in UnsafeExternalRowSorter

This fixes a compilation break in under Scala 2.11:

```
[error] 
/home/jenkins/workspace/Spark-Master-Scala211-Compile/sql/catalyst/src/main/java/org/apache/spark/sql/execution/UnsafeExternalRowSorter.java:135:
 error:  is 
not abstract and does not override abstract method 
minBy(Function1,Ordering) in TraversableOnce
[error]   return new AbstractScalaRowIterator() {
[error] ^
[error]   where B,A are type-variables:
[error] B extends Object declared in method 
minBy(Function1,Ordering)
[error] A extends Object declared in interface TraversableOnce
[error] 1 error
```

The workaround for this is to make `AbstractScalaRowIterator` into a concrete 
class.

Author: Josh Rosen 

Closes #7405 from JoshRosen/SPARK-9045 and squashes the following commits:

cbcbb4c [Josh Rosen] Forgot that we can't use the ??? operator anymore
577ba60 [Josh Rosen] [SPARK-9045] Fix Scala 2.11 build break in 
UnsafeExternalRowSorter.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e965a798
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e965a798
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e965a798

Branch: refs/heads/master
Commit: e965a798d09a9fba61b104c5cc0b65cdc28d27f6
Parents: 11e5c37
Author: Josh Rosen 
Authored: Tue Jul 14 17:21:48 2015 -0700
Committer: Josh Rosen 
Committed: Tue Jul 14 17:21:48 2015 -0700

--
 .../org/apache/spark/sql/AbstractScalaRowIterator.scala  | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e965a798/sql/catalyst/src/main/scala/org/apache/spark/sql/AbstractScalaRowIterator.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/AbstractScalaRowIterator.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/AbstractScalaRowIterator.scala
index cfefb13..1090bdb 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/AbstractScalaRowIterator.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/AbstractScalaRowIterator.scala
@@ -17,11 +17,14 @@
 
 package org.apache.spark.sql
 
-import org.apache.spark.sql.catalyst.InternalRow
-
 /**
  * Shim to allow us to implement [[scala.Iterator]] in Java. Scala 2.11+ has 
an AbstractIterator
  * class for this, but that class is `private[scala]` in 2.10. We need to 
explicitly fix this to
- * `Row` in order to work around a spurious IntelliJ compiler error.
+ * `Row` in order to work around a spurious IntelliJ compiler error. This 
cannot be an abstract
+ * class because that leads to compilation errors under Scala 2.11.
  */
-private[spark] abstract class AbstractScalaRowIterator extends 
Iterator[InternalRow]
+private[spark] class AbstractScalaRowIterator[T] extends Iterator[T] {
+  override def hasNext: Boolean = throw new NotImplementedError
+
+  override def next(): T = throw new NotImplementedError
+}


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses

2015-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 740b034f1 -> 11e5c3728


[SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix 
existing uses

This pull request adds a Scalastyle regex rule which fails the style check if 
`Class.forName` is used directly.  `Class.forName` always loads classes from 
the default / system classloader, but in a majority of cases, we should be 
using Spark's own `Utils.classForName` instead, which tries to load classes 
from the current thread's context classloader and falls back to the classloader 
which loaded Spark when the context classloader is not defined.


[https://reviewable.io/review_button.png"; height=40 alt="Review on 
Reviewable"/>](https://reviewable.io/reviews/apache/spark/7350)


Author: Josh Rosen 

Closes #7350 from JoshRosen/ban-Class.forName and squashes the following 
commits:

e3e96f7 [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
ban-Class.forName
c0b7885 [Josh Rosen] Hopefully fix the last two cases
d707ba7 [Josh Rosen] Fix uses of Class.forName that I missed in my first 
cleanup pass
046470d [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
ban-Class.forName
62882ee [Josh Rosen] Fix uses of Class.forName or add exclusion.
d9abade [Josh Rosen] Add stylechecker rule to ban uses of Class.forName


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11e5c372
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11e5c372
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11e5c372

Branch: refs/heads/master
Commit: 11e5c372862ec00e57460b37ccfee51c6d93c5f7
Parents: 740b034
Author: Josh Rosen 
Authored: Tue Jul 14 16:08:17 2015 -0700
Committer: Reynold Xin 
Committed: Tue Jul 14 16:08:17 2015 -0700

--
 .../src/main/scala/org/apache/spark/Logging.scala |  2 +-
 .../scala/org/apache/spark/SparkContext.scala | 11 +--
 .../main/scala/org/apache/spark/SparkEnv.scala|  2 +-
 .../org/apache/spark/api/r/RBackendHandler.scala  | 18 ++
 .../apache/spark/broadcast/BroadcastManager.scala |  3 ++-
 .../org/apache/spark/deploy/SparkHadoopUtil.scala |  4 ++--
 .../org/apache/spark/deploy/SparkSubmit.scala |  2 +-
 .../spark/deploy/SparkSubmitArguments.scala   |  2 +-
 .../spark/deploy/history/HistoryServer.scala  |  2 +-
 .../org/apache/spark/deploy/master/Master.scala   |  2 +-
 .../deploy/rest/SubmitRestProtocolMessage.scala   |  2 +-
 .../spark/deploy/worker/DriverWrapper.scala   |  2 +-
 .../spark/deploy/worker/WorkerArguments.scala |  2 ++
 .../org/apache/spark/executor/Executor.scala  |  2 +-
 .../org/apache/spark/io/CompressionCodec.scala|  3 +--
 .../spark/mapred/SparkHadoopMapRedUtil.scala  |  5 +++--
 .../mapreduce/SparkHadoopMapReduceUtil.scala  |  9 +
 .../org/apache/spark/metrics/MetricsSystem.scala  |  6 --
 .../scala/org/apache/spark/rdd/HadoopRDD.scala|  6 +++---
 .../main/scala/org/apache/spark/rpc/RpcEnv.scala  |  3 +--
 .../apache/spark/serializer/JavaSerializer.scala  |  5 -
 .../apache/spark/serializer/KryoSerializer.scala  |  2 ++
 .../spark/serializer/SerializationDebugger.scala  |  2 ++
 .../apache/spark/storage/ExternalBlockStore.scala |  2 +-
 .../org/apache/spark/util/ClosureCleaner.scala|  2 ++
 .../org/apache/spark/util/SizeEstimator.scala |  2 ++
 .../main/scala/org/apache/spark/util/Utils.scala  | 11 +--
 .../test/scala/org/apache/spark/FileSuite.scala   |  2 ++
 .../SparkContextSchedulerCreationSuite.scala  |  3 ++-
 .../apache/spark/deploy/SparkSubmitSuite.scala|  4 ++--
 .../scala/org/apache/spark/rdd/JdbcRDDSuite.scala |  3 ++-
 .../KryoSerializerDistributedSuite.scala  |  2 ++
 .../spark/util/MutableURLClassLoaderSuite.scala   |  2 ++
 .../spark/streaming/flume/sink/Logging.scala  |  2 ++
 .../apache/spark/graphx/util/BytecodeUtils.scala  |  2 +-
 .../scala/org/apache/spark/repl/SparkIMain.scala  |  2 ++
 scalastyle-config.xml | 11 +++
 .../org/apache/spark/sql/types/DataType.scala |  3 ++-
 .../scala/org/apache/spark/sql/SQLContext.scala   |  3 +--
 .../spark/sql/parquet/ParquetRelation.scala   |  7 ---
 .../org/apache/spark/sql/jdbc/JDBCSuite.scala |  3 ++-
 .../apache/spark/sql/jdbc/JDBCWriteSuite.scala|  3 ++-
 .../thriftserver/HiveThriftServer2Suites.scala|  2 +-
 .../org/apache/spark/sql/hive/TableReader.scala   |  4 +---
 .../spark/sql/hive/client/ClientWrapper.scala |  9 -
 .../spark/sql/hive/HiveSparkSubmitSuite.scala |  8 
 .../spark/streaming/scheduler/JobGenerator.scala  |  6 +++---
 .../apache/spark/tools/GenerateMIMAIgnore.scala   |  2 ++
 .../org/apache/spark/deploy/yarn/Client.scala |  4 ++--
 49 files changed, 117 insertions(+), 84 deletions(-)
--


h

spark git commit: [SPARK-4362] [MLLIB] Make prediction probability available in NaiveBayesModel

2015-07-14 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 4b5cfc988 -> 740b034f1


[SPARK-4362] [MLLIB] Make prediction probability available in NaiveBayesModel

Add predictProbabilities to Naive Bayes, return class probabilities.

Continues https://github.com/apache/spark/pull/6761

Author: Sean Owen 

Closes #7376 from srowen/SPARK-4362 and squashes the following commits:

23d5a76 [Sean Owen] Fix model.labels -> model.theta
95d91fb [Sean Owen] Check that predicted probabilities sum to 1
b32d1c8 [Sean Owen] Add predictProbabilities to Naive Bayes, return class 
probabilities


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/740b034f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/740b034f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/740b034f

Branch: refs/heads/master
Commit: 740b034f1ca885a386f5a9ef7e0c81c714b047ff
Parents: 4b5cfc9
Author: Sean Owen 
Authored: Tue Jul 14 22:44:54 2015 +0100
Committer: Sean Owen 
Committed: Tue Jul 14 22:44:54 2015 +0100

--
 .../spark/mllib/classification/NaiveBayes.scala | 76 +++-
 .../mllib/classification/NaiveBayesSuite.scala  | 55 +-
 2 files changed, 113 insertions(+), 18 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/740b034f/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
index f51ee36..9e379d7 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/classification/NaiveBayes.scala
@@ -93,26 +93,70 @@ class NaiveBayesModel private[mllib] (
   override def predict(testData: Vector): Double = {
 modelType match {
   case Multinomial =>
-val prob = thetaMatrix.multiply(testData)
-BLAS.axpy(1.0, piVector, prob)
-labels(prob.argmax)
+labels(multinomialCalculation(testData).argmax)
   case Bernoulli =>
-testData.foreachActive { (index, value) =>
-  if (value != 0.0 && value != 1.0) {
-throw new SparkException(
-  s"Bernoulli naive Bayes requires 0 or 1 feature values but found 
$testData.")
-  }
-}
-val prob = thetaMinusNegTheta.get.multiply(testData)
-BLAS.axpy(1.0, piVector, prob)
-BLAS.axpy(1.0, negThetaSum.get, prob)
-labels(prob.argmax)
-  case _ =>
-// This should never happen.
-throw new UnknownError(s"Invalid modelType: $modelType.")
+labels(bernoulliCalculation(testData).argmax)
+}
+  }
+
+  /**
+   * Predict values for the given data set using the model trained.
+   *
+   * @param testData RDD representing data points to be predicted
+   * @return an RDD[Vector] where each entry contains the predicted posterior 
class probabilities,
+   * in the same order as class labels
+   */
+  def predictProbabilities(testData: RDD[Vector]): RDD[Vector] = {
+val bcModel = testData.context.broadcast(this)
+testData.mapPartitions { iter =>
+  val model = bcModel.value
+  iter.map(model.predictProbabilities)
 }
   }
 
+  /**
+   * Predict posterior class probabilities for a single data point using the 
model trained.
+   *
+   * @param testData array representing a single data point
+   * @return predicted posterior class probabilities from the trained model,
+   * in the same order as class labels
+   */
+  def predictProbabilities(testData: Vector): Vector = {
+modelType match {
+  case Multinomial =>
+posteriorProbabilities(multinomialCalculation(testData))
+  case Bernoulli =>
+posteriorProbabilities(bernoulliCalculation(testData))
+}
+  }
+
+  private def multinomialCalculation(testData: Vector) = {
+val prob = thetaMatrix.multiply(testData)
+BLAS.axpy(1.0, piVector, prob)
+prob
+  }
+
+  private def bernoulliCalculation(testData: Vector) = {
+testData.foreachActive((_, value) =>
+  if (value != 0.0 && value != 1.0) {
+throw new SparkException(
+  s"Bernoulli naive Bayes requires 0 or 1 feature values but found 
$testData.")
+  }
+)
+val prob = thetaMinusNegTheta.get.multiply(testData)
+BLAS.axpy(1.0, piVector, prob)
+BLAS.axpy(1.0, negThetaSum.get, prob)
+prob
+  }
+
+  private def posteriorProbabilities(logProb: DenseVector) = {
+val logProbArray = logProb.toArray
+val maxLog = logProbArray.max
+val scaledProbs = logProbArray.map(lp => math.exp(lp - maxLog))
+val probSum = scaledProbs.sum
+new DenseVector(scaledProbs.map(_ / probSum))
+  }
+
   over

spark git commit: [SPARK-8800] [SQL] Fix inaccurate precision/scale of Decimal division operation

2015-07-14 Thread yhuai

Repository: spark
Updated Branches:
  refs/heads/master fb1d06fc2 -> 4b5cfc988


[SPARK-8800] [SQL] Fix inaccurate precision/scale of Decimal division operation

JIRA: https://issues.apache.org/jira/browse/SPARK-8800

Previously, we turn to Java BigDecimal's divide with specified ROUNDING_MODE to 
avoid non-terminating decimal expansion problem. However, as JihongMA reported, 
for the division operation on some specific values, we get inaccurate results.

Author: Liang-Chi Hsieh 

Closes #7212 from viirya/fix_decimal4 and squashes the following commits:

4205a0a [Liang-Chi Hsieh] Fix inaccuracy precision/scale of Decimal division 
operation.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4b5cfc98
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4b5cfc98
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4b5cfc98

Branch: refs/heads/master
Commit: 4b5cfc988f23988c2334882a255d494fc93d252e
Parents: fb1d06f
Author: Liang-Chi Hsieh 
Authored: Tue Jul 14 14:19:27 2015 -0700
Committer: Yin Huai 
Committed: Tue Jul 14 14:19:27 2015 -0700

--
 .../scala/org/apache/spark/sql/types/Decimal.scala| 14 +++---
 .../apache/spark/sql/types/decimal/DecimalSuite.scala | 10 +-
 2 files changed, 20 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4b5cfc98/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
index 5a16948..f5bd068 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/types/Decimal.scala
@@ -145,6 +145,14 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
 }
   }
 
+  def toLimitedBigDecimal: BigDecimal = {
+if (decimalVal.ne(null)) {
+  decimalVal
+} else {
+  BigDecimal(longVal, _scale)
+}
+  }
+
   def toJavaBigDecimal: java.math.BigDecimal = toBigDecimal.underlying()
 
   def toUnscaledLong: Long = {
@@ -269,9 +277,9 @@ final class Decimal extends Ordered[Decimal] with 
Serializable {
 if (that.isZero) {
   null
 } else {
-  // To avoid non-terminating decimal expansion problem, we turn to Java 
BigDecimal's divide
-  // with specified ROUNDING_MODE.
-  Decimal(toJavaBigDecimal.divide(that.toJavaBigDecimal, ROUNDING_MODE.id))
+  // To avoid non-terminating decimal expansion problem, we get scala's 
BigDecimal with limited
+  // precision and scala.
+  Decimal(toLimitedBigDecimal / that.toLimitedBigDecimal)
 }
   }
 

http://git-wip-us.apache.org/repos/asf/spark/blob/4b5cfc98/sql/catalyst/src/test/scala/org/apache/spark/sql/types/decimal/DecimalSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/decimal/DecimalSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/decimal/DecimalSuite.scala
index 5f31296..030bb6d 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/types/decimal/DecimalSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/types/decimal/DecimalSuite.scala
@@ -170,6 +170,14 @@ class DecimalSuite extends SparkFunSuite with 
PrivateMethodTester {
 
   test("fix non-terminating decimal expansion problem") {
 val decimal = Decimal(1.0, 10, 3) / Decimal(3.0, 10, 3)
-assert(decimal.toString === "0.333")
+// The difference between decimal should not be more than 0.001.
+assert(decimal.toDouble - 0.333 < 0.001)
+  }
+
+  test("fix loss of precision/scale when doing division operation") {
+val a = Decimal(2) / Decimal(3)
+assert(a.toDouble < 1.0 && a.toDouble > 0.6)
+val b = Decimal(1) / Decimal(8)
+assert(b.toDouble === 0.125)
   }
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-4072] [CORE] Display Streaming blocks in Streaming UI

2015-07-14 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master 0a4071eab -> fb1d06fc2


[SPARK-4072] [CORE] Display Streaming blocks in Streaming UI

Replace #6634

This PR adds `SparkListenerBlockUpdated` to SparkListener so that it can 
monitor all block update infos that are sent to `BlockManagerMasaterEndpoint`, 
and also add new tables in the Storage tab to display the stream block infos.

![screen shot 2015-07-01 at 5 19 46 
pm](https://cloud.githubusercontent.com/assets/1000778/8451562/c291a6ec-2016-11e5-890d-0afc174e1f8c.png)

Author: zsxwing 

Closes #6672 from zsxwing/SPARK-4072-2 and squashes the following commits:

df2c1d8 [zsxwing] Use xml query to check the xml elements
54d54af [zsxwing] Add unit tests for StoragePage
e29fb53 [zsxwing] Update as per TD's comments
ccbee07 [zsxwing] Fix the code style
6dc42b4 [zsxwing] Fix the replication level of blocks
450fad1 [zsxwing] Merge branch 'master' into SPARK-4072-2
1e9ef52 [zsxwing] Don't categorize by Executor ID
ca0ab69 [zsxwing] Fix the code style
3de2762 [zsxwing] Make object BlockUpdatedInfo private
e95b594 [zsxwing] Add 'Aggregated Stream Block Metrics by Executor' table
ba5d0d1 [zsxwing] Refactor the unit test to improve the readability
4bbe341 [zsxwing] Revert JsonProtocol and don't log SparkListenerBlockUpdated
b464dd1 [zsxwing] Add onBlockUpdated to EventLoggingListener
5ba014c [zsxwing] Fix the code style
0b1e47b [zsxwing] Add a developer api BlockUpdatedInfo
04838a9 [zsxwing] Fix the code style
2baa161 [zsxwing] Add unit tests
80f6c6d [zsxwing] Address comments
797ee4b [zsxwing] Display Streaming blocks in Streaming UI


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fb1d06fc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fb1d06fc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fb1d06fc

Branch: refs/heads/master
Commit: fb1d06fc242ec00320f1a3049673fbb03c4a6eb9
Parents: 0a4071e
Author: zsxwing 
Authored: Tue Jul 14 13:58:36 2015 -0700
Committer: Tathagata Das 
Committed: Tue Jul 14 13:58:36 2015 -0700

--
 .../org/apache/spark/JavaSparkListener.java |  22 +-
 .../org/apache/spark/SparkFirehoseListener.java |   6 +
 .../spark/scheduler/EventLoggingListener.scala  |   3 +
 .../apache/spark/scheduler/SparkListener.scala  |  10 +-
 .../spark/scheduler/SparkListenerBus.scala  |   2 +
 .../storage/BlockManagerMasterEndpoint.scala|   3 +-
 .../spark/storage/BlockStatusListener.scala | 105 +
 .../apache/spark/storage/BlockUpdatedInfo.scala |  47 
 .../scala/org/apache/spark/ui/UIUtils.scala |  14 +-
 .../apache/spark/ui/storage/StoragePage.scala   | 148 +++-
 .../apache/spark/ui/storage/StorageTab.scala|   3 +-
 .../storage/BlockStatusListenerSuite.scala  | 119 ++
 .../spark/ui/storage/StoragePageSuite.scala | 230 +++
 13 files changed, 684 insertions(+), 28 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fb1d06fc/core/src/main/java/org/apache/spark/JavaSparkListener.java
--
diff --git a/core/src/main/java/org/apache/spark/JavaSparkListener.java 
b/core/src/main/java/org/apache/spark/JavaSparkListener.java
index 646496f..fa9acf0 100644
--- a/core/src/main/java/org/apache/spark/JavaSparkListener.java
+++ b/core/src/main/java/org/apache/spark/JavaSparkListener.java
@@ -17,23 +17,7 @@
 
 package org.apache.spark;
 
-import org.apache.spark.scheduler.SparkListener;
-import org.apache.spark.scheduler.SparkListenerApplicationEnd;
-import org.apache.spark.scheduler.SparkListenerApplicationStart;
-import org.apache.spark.scheduler.SparkListenerBlockManagerAdded;
-import org.apache.spark.scheduler.SparkListenerBlockManagerRemoved;
-import org.apache.spark.scheduler.SparkListenerEnvironmentUpdate;
-import org.apache.spark.scheduler.SparkListenerExecutorAdded;
-import org.apache.spark.scheduler.SparkListenerExecutorMetricsUpdate;
-import org.apache.spark.scheduler.SparkListenerExecutorRemoved;
-import org.apache.spark.scheduler.SparkListenerJobEnd;
-import org.apache.spark.scheduler.SparkListenerJobStart;
-import org.apache.spark.scheduler.SparkListenerStageCompleted;
-import org.apache.spark.scheduler.SparkListenerStageSubmitted;
-import org.apache.spark.scheduler.SparkListenerTaskEnd;
-import org.apache.spark.scheduler.SparkListenerTaskGettingResult;
-import org.apache.spark.scheduler.SparkListenerTaskStart;
-import org.apache.spark.scheduler.SparkListenerUnpersistRDD;
+import org.apache.spark.scheduler.*;
 
 /**
  * Java clients should extend this class instead of implementing
@@ -94,4 +78,8 @@ public class JavaSparkListener implements SparkListener {
 
   @Override
   public void onExecutorRemoved(SparkListenerExecutorRemoved executorRemoved) 
{ }
+
+  @Override
+  public void

spark git commit: [SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of partitions

2015-07-14 Thread ankurdave

Repository: spark
Updated Branches:
  refs/heads/master d267c2834 -> 0a4071eab


[SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of 
partitions

See https://github.com/aray/e2d/blob/master/EdgePartition2D.ipynb

Author: Andrew Ray 

Closes #7104 from aray/edge-partition-2d-improvement and squashes the following 
commits:

3729f84 [Andrew Ray] correct bounds and remove unneeded comments
97f8464 [Andrew Ray] change less
5141ab4 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement
925fd2c [Andrew Ray] use new interface for partitioning
001bfd0 [Andrew Ray] Refactor PartitionStrategy so that we can return a 
prtition function for a given number of parts. To keep compatibility we define 
default methods that translate between the two implementation options. Made 
EdgePartition2D use old strategy when we have a perfect square and implement 
new interface.
5d42105 [Andrew Ray] % -> /
3560084 [Andrew Ray] Merge branch 'master' into edge-partition-2d-improvement
f006364 [Andrew Ray] remove unneeded comments
cfa2c5e [Andrew Ray] Modifications to EdgePartition2D so that it works for non 
perfect squares.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0a4071ea
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0a4071ea
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0a4071ea

Branch: refs/heads/master
Commit: 0a4071eab30db1db80f61ed2cb2e7243291183ce
Parents: d267c28
Author: Andrew Ray 
Authored: Tue Jul 14 13:14:47 2015 -0700
Committer: Ankur Dave 
Committed: Tue Jul 14 13:14:47 2015 -0700

--
 .../apache/spark/graphx/PartitionStrategy.scala | 32 +---
 1 file changed, 21 insertions(+), 11 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/0a4071ea/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
--
diff --git 
a/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala 
b/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
index 7372dfb..70a7592 100644
--- a/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
+++ b/graphx/src/main/scala/org/apache/spark/graphx/PartitionStrategy.scala
@@ -32,7 +32,7 @@ trait PartitionStrategy extends Serializable {
 object PartitionStrategy {
   /**
* Assigns edges to partitions using a 2D partitioning of the sparse edge 
adjacency matrix,
-   * guaranteeing a `2 * sqrt(numParts) - 1` bound on vertex replication.
+   * guaranteeing a `2 * sqrt(numParts)` bound on vertex replication.
*
* Suppose we have a graph with 12 vertices that we want to partition
* over 9 machines.  We can use the following sparse matrix representation:
@@ -61,26 +61,36 @@ object PartitionStrategy {
* that edges adjacent to `v11` can only be in the first column of blocks 
`(P0, P3,
* P6)` or the last
* row of blocks `(P6, P7, P8)`.  As a consequence we can guarantee that 
`v11` will need to be
-   * replicated to at most `2 * sqrt(numParts) - 1` machines.
+   * replicated to at most `2 * sqrt(numParts)` machines.
*
* Notice that `P0` has many edges and as a consequence this partitioning 
would lead to poor work
* balance.  To improve balance we first multiply each vertex id by a large 
prime to shuffle the
* vertex locations.
*
-   * One of the limitations of this approach is that the number of machines 
must either be a
-   * perfect square. We partially address this limitation by computing the 
machine assignment to
-   * the next
-   * largest perfect square and then mapping back down to the actual number of 
machines.
-   * Unfortunately, this can also lead to work imbalance and so it is 
suggested that a perfect
-   * square is used.
+   * When the number of partitions requested is not a perfect square we use a 
slightly different
+   * method where the last column can have a different number of rows than the 
others while still
+   * maintaining the same size per block.
*/
   case object EdgePartition2D extends PartitionStrategy {
 override def getPartition(src: VertexId, dst: VertexId, numParts: 
PartitionID): PartitionID = {
   val ceilSqrtNumParts: PartitionID = math.ceil(math.sqrt(numParts)).toInt
   val mixingPrime: VertexId = 1125899906842597L
-  val col: PartitionID = (math.abs(src * mixingPrime) % 
ceilSqrtNumParts).toInt
-  val row: PartitionID = (math.abs(dst * mixingPrime) % 
ceilSqrtNumParts).toInt
-  (col * ceilSqrtNumParts + row) % numParts
+  if (numParts == ceilSqrtNumParts * ceilSqrtNumParts) {
+// Use old method for perfect squared to ensure we get same results
+val col: PartitionID = (math.abs(src * mixingPrime) % 
ceilSqrtNumParts).toInt
+val row: Partitio

spark git commit: [SPARK-9031] Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class

2015-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 8fb3a65cb -> d267c2834


[SPARK-9031] Merge BlockObjectWriter and DiskBlockObject writer to remove 
abstract class

BlockObjectWriter has only one concrete non-test class, DiskBlockObjectWriter. 
In order to simplify the code in preparation for other refactorings, I think 
that we should remove this base class and have only DiskBlockObjectWriter.

While at one time we may have planned to have multiple BlockObjectWriter 
implementations, that doesn't seem to have happened, so the extra abstraction 
seems unnecessary.

Author: Josh Rosen 

Closes #7391 from JoshRosen/shuffle-write-interface-refactoring and squashes 
the following commits:

c418e33 [Josh Rosen] Fix compilation
5047995 [Josh Rosen] Fix comments
d5dc548 [Josh Rosen] Update references in comments
89dc797 [Josh Rosen] Rename test suite.
5755918 [Josh Rosen] Remove unnecessary val in case class
1607c91 [Josh Rosen] Merge BlockObjectWriter and DiskBlockObjectWriter


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d267c283
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d267c283
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d267c283

Branch: refs/heads/master
Commit: d267c2834a639aaebd0559355c6a82613abb689b
Parents: 8fb3a65
Author: Josh Rosen 
Authored: Tue Jul 14 12:56:17 2015 -0700
Committer: Reynold Xin 
Committed: Tue Jul 14 12:56:17 2015 -0700

--
 .../sort/BypassMergeSortShuffleWriter.java  |   8 +-
 .../unsafe/UnsafeShuffleExternalSorter.java |   2 +-
 .../unsafe/sort/UnsafeSorterSpillWriter.java|   4 +-
 .../shuffle/FileShuffleBlockResolver.scala  |   8 +-
 .../shuffle/IndexShuffleBlockResolver.scala |   2 +-
 .../spark/shuffle/hash/HashShuffleWriter.scala  |   4 +-
 .../org/apache/spark/storage/BlockManager.scala |   2 +-
 .../spark/storage/BlockObjectWriter.scala   | 256 ---
 .../spark/storage/DiskBlockObjectWriter.scala   | 234 +
 .../spark/util/collection/ChainedBuffer.scala   |   2 +-
 .../spark/util/collection/ExternalSorter.scala  |   4 +-
 .../util/collection/PartitionedPairBuffer.scala |   1 -
 .../PartitionedSerializedPairBuffer.scala   |   5 +-
 .../WritablePartitionedPairCollection.scala |   8 +-
 .../BypassMergeSortShuffleWriterSuite.scala |   4 +-
 .../spark/storage/BlockObjectWriterSuite.scala  | 173 -
 .../storage/DiskBlockObjectWriterSuite.scala| 173 +
 .../PartitionedSerializedPairBufferSuite.scala  |  52 ++--
 18 files changed, 459 insertions(+), 483 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d267c283/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
index d3d6280..0b8b604 100644
--- 
a/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
+++ 
b/core/src/main/java/org/apache/spark/shuffle/sort/BypassMergeSortShuffleWriter.java
@@ -75,7 +75,7 @@ final class BypassMergeSortShuffleWriter implements 
SortShuffleFileWriter<
   private final Serializer serializer;
 
   /** Array of file writers, one for each partition */
-  private BlockObjectWriter[] partitionWriters;
+  private DiskBlockObjectWriter[] partitionWriters;
 
   public BypassMergeSortShuffleWriter(
   SparkConf conf,
@@ -101,7 +101,7 @@ final class BypassMergeSortShuffleWriter implements 
SortShuffleFileWriter<
 }
 final SerializerInstance serInstance = serializer.newInstance();
 final long openStartTime = System.nanoTime();
-partitionWriters = new BlockObjectWriter[numPartitions];
+partitionWriters = new DiskBlockObjectWriter[numPartitions];
 for (int i = 0; i < numPartitions; i++) {
   final Tuple2 tempShuffleBlockIdPlusFile =
 blockManager.diskBlockManager().createTempShuffleBlock();
@@ -121,7 +121,7 @@ final class BypassMergeSortShuffleWriter implements 
SortShuffleFileWriter<
   partitionWriters[partitioner.getPartition(key)].write(key, record._2());
 }
 
-for (BlockObjectWriter writer : partitionWriters) {
+for (DiskBlockObjectWriter writer : partitionWriters) {
   writer.commitAndClose();
 }
   }
@@ -169,7 +169,7 @@ final class BypassMergeSortShuffleWriter implements 
SortShuffleFileWriter<
 if (partitionWriters != null) {
   try {
 final DiskBlockManager diskBlockManager = 
blockManager.diskBlockManager();
-for (BlockObjectWriter writer : partitionWriters) {
+for (DiskBlockObjectWriter writer : partitionWriters) {
   // This method explicitly does

spark git commit: [SPARK-8911] Fix local mode endless heartbeats

2015-07-14 Thread andrewor14

Repository: spark
Updated Branches:
  refs/heads/master c4e98ff06 -> 8fb3a65cb


[SPARK-8911] Fix local mode endless heartbeats

As of #7173 we expect executors to properly register with the driver before 
responding to their heartbeats. This behavior is not matched in local mode. 
This patch adds the missing event that needs to be posted.

Author: Andrew Or 

Closes #7382 from andrewor14/fix-local-heartbeat and squashes the following 
commits:

1258bdf [Andrew Or] Post ExecutorAdded event to local executor


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8fb3a65c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/8fb3a65c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/8fb3a65c

Branch: refs/heads/master
Commit: 8fb3a65cbb714120d612e58ef9d12b0521a83260
Parents: c4e98ff
Author: Andrew Or 
Authored: Tue Jul 14 12:47:11 2015 -0700
Committer: Andrew Or 
Committed: Tue Jul 14 12:47:11 2015 -0700

--
 .../spark/scheduler/local/LocalBackend.scala| 20 +---
 1 file changed, 13 insertions(+), 7 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/8fb3a65c/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala 
b/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala
index 776e5d3..4d48fcf 100644
--- a/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala
+++ b/core/src/main/scala/org/apache/spark/scheduler/local/LocalBackend.scala
@@ -25,7 +25,8 @@ import org.apache.spark.{Logging, SparkConf, SparkContext, 
SparkEnv, TaskState}
 import org.apache.spark.TaskState.TaskState
 import org.apache.spark.executor.{Executor, ExecutorBackend}
 import org.apache.spark.rpc.{RpcCallContext, RpcEndpointRef, RpcEnv, 
ThreadSafeRpcEndpoint}
-import org.apache.spark.scheduler.{SchedulerBackend, TaskSchedulerImpl, 
WorkerOffer}
+import org.apache.spark.scheduler._
+import org.apache.spark.scheduler.cluster.ExecutorInfo
 
 private case class ReviveOffers()
 
@@ -50,8 +51,8 @@ private[spark] class LocalEndpoint(
 
   private var freeCores = totalCores
 
-  private val localExecutorId = SparkContext.DRIVER_IDENTIFIER
-  private val localExecutorHostname = "localhost"
+  val localExecutorId = SparkContext.DRIVER_IDENTIFIER
+  val localExecutorHostname = "localhost"
 
   private val executor = new Executor(
 localExecutorId, localExecutorHostname, SparkEnv.get, userClassPath, 
isLocal = true)
@@ -99,8 +100,9 @@ private[spark] class LocalBackend(
   extends SchedulerBackend with ExecutorBackend with Logging {
 
   private val appId = "local-" + System.currentTimeMillis
-  var localEndpoint: RpcEndpointRef = null
+  private var localEndpoint: RpcEndpointRef = null
   private val userClassPath = getUserClasspath(conf)
+  private val listenerBus = scheduler.sc.listenerBus
 
   /**
* Returns a list of URLs representing the user classpath.
@@ -113,9 +115,13 @@ private[spark] class LocalBackend(
   }
 
   override def start() {
-localEndpoint = SparkEnv.get.rpcEnv.setupEndpoint(
-  "LocalBackendEndpoint",
-  new LocalEndpoint(SparkEnv.get.rpcEnv, userClassPath, scheduler, this, 
totalCores))
+val rpcEnv = SparkEnv.get.rpcEnv
+val executorEndpoint = new LocalEndpoint(rpcEnv, userClassPath, scheduler, 
this, totalCores)
+localEndpoint = rpcEnv.setupEndpoint("LocalBackendEndpoint", 
executorEndpoint)
+listenerBus.post(SparkListenerExecutorAdded(
+  System.currentTimeMillis,
+  executorEndpoint.localExecutorId,
+  new ExecutorInfo(executorEndpoint.localExecutorHostname, totalCores, 
Map.empty)))
   }
 
   override def stop() {


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8933] [BUILD] Provide a --force flag to build/mvn that always uses downloaded maven

2015-07-14 Thread joshrosen

Repository: spark
Updated Branches:
  refs/heads/master 37f2d9635 -> c4e98ff06


[SPARK-8933] [BUILD] Provide a --force flag to build/mvn that always uses 
downloaded maven

added --force flag to manually download, if necessary, and use a built-in 
version of maven best for spark

Author: Brennon York 

Closes #7374 from brennonyork/SPARK-8933 and squashes the following commits:

d673127 [Brennon York] added --force flag to manually download, if necessary, 
and use a built-in version of maven best for spark


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c4e98ff0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c4e98ff0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c4e98ff0

Branch: refs/heads/master
Commit: c4e98ff066cc6f0839d15140eb471d74a0d83e91
Parents: 37f2d96
Author: Brennon York 
Authored: Tue Jul 14 11:43:26 2015 -0700
Committer: Josh Rosen 
Committed: Tue Jul 14 11:43:26 2015 -0700

--
 build/mvn | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c4e98ff0/build/mvn
--
diff --git a/build/mvn b/build/mvn
index e836418..f62f61e 100755
--- a/build/mvn
+++ b/build/mvn
@@ -112,10 +112,17 @@ install_scala() {
 # the environment
 ZINC_PORT=${ZINC_PORT:-"3030"}
 
+# Check for the `--force` flag dictating that `mvn` should be downloaded
+# regardless of whether the system already has a `mvn` install
+if [ "$1" == "--force" ]; then
+  FORCE_MVN=1
+  shift
+fi
+
 # Install Maven if necessary
 MVN_BIN="$(command -v mvn)"
 
-if [ ! "$MVN_BIN" ]; then
+if [ ! "$MVN_BIN" -o -n "$FORCE_MVN" ]; then
   install_mvn
 fi
 
@@ -139,5 +146,7 @@ fi
 # Set any `mvn` options if not already present
 export MAVEN_OPTS=${MAVEN_OPTS:-"$_COMPILE_JVM_OPTS"}
 
+echo "Using \`mvn\` from path: $MVN_BIN"
+
 # Last, call the `mvn` command as usual
 ${MVN_BIN} "$@"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9027] [SQL] Generalize metastore predicate pushdown

2015-07-14 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 59d820aa8 -> 37f2d9635


[SPARK-9027] [SQL] Generalize metastore predicate pushdown

Add support for pushing down metastore filters that are in different orders and 
add some unit tests.

Author: Michael Armbrust 

Closes #7386 from marmbrus/metastoreFilters and squashes the following commits:

05a4524 [Michael Armbrust] [SPARK-9027][SQL] Generalize metastore predicate 
pushdown


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/37f2d963
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/37f2d963
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/37f2d963

Branch: refs/heads/master
Commit: 37f2d9635ff874fb8ad9d246e49faf6098d501c3
Parents: 59d820a
Author: Michael Armbrust 
Authored: Tue Jul 14 11:22:09 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Jul 14 11:22:09 2015 -0700

--
 .../apache/spark/sql/hive/client/HiveShim.scala | 54 +++---
 .../spark/sql/hive/client/FiltersSuite.scala| 78 
 2 files changed, 107 insertions(+), 25 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/37f2d963/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
index 5542a52..d12778c 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala
@@ -34,7 +34,7 @@ import org.apache.hadoop.hive.ql.session.SessionState
 import org.apache.hadoop.hive.serde.serdeConstants
 
 import org.apache.spark.Logging
-import org.apache.spark.sql.catalyst.expressions.{Expression, 
AttributeReference, BinaryComparison}
+import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.types.{StringType, IntegralType}
 
 /**
@@ -312,37 +312,41 @@ private[client] class Shim_v0_13 extends Shim_v0_12 {
   override def getAllPartitions(hive: Hive, table: Table): Seq[Partition] =
 getAllPartitionsMethod.invoke(hive, 
table).asInstanceOf[JSet[Partition]].toSeq
 
-  override def getPartitionsByFilter(
-  hive: Hive,
-  table: Table,
-  predicates: Seq[Expression]): Seq[Partition] = {
+  /**
+   * Converts catalyst expression to the format that Hive's 
getPartitionsByFilter() expects, i.e.
+   * a string that represents partition predicates like "str_key=\"value\" and 
int_key=1 ...".
+   *
+   * Unsupported predicates are skipped.
+   */
+  def convertFilters(table: Table, filters: Seq[Expression]): String = {
 // hive varchar is treated as catalyst string, but hive varchar can't be 
pushed down.
 val varcharKeys = table.getPartitionKeys
   .filter(col => col.getType.startsWith(serdeConstants.VARCHAR_TYPE_NAME))
   .map(col => col.getName).toSet
 
-// Hive getPartitionsByFilter() takes a string that represents partition
-// predicates like "str_key=\"value\" and int_key=1 ..."
-val filter = predicates.flatMap { expr =>
-  expr match {
-case op @ BinaryComparison(lhs, rhs) => {
-  lhs match {
-case AttributeReference(_, _, _, _) => {
-  rhs.dataType match {
-case _: IntegralType =>
-  Some(lhs.prettyString + op.symbol + rhs.prettyString)
-case _: StringType if 
(!varcharKeys.contains(lhs.prettyString)) =>
-  Some(lhs.prettyString + op.symbol + "\"" + rhs.prettyString 
+ "\"")
-case _ => None
-  }
-}
-case _ => None
-  }
-}
-case _ => None
-  }
+filters.collect {
+  case op @ BinaryComparison(a: Attribute, Literal(v, _: IntegralType)) =>
+s"${a.name} ${op.symbol} $v"
+  case op @ BinaryComparison(Literal(v, _: IntegralType), a: Attribute) =>
+s"$v ${op.symbol} ${a.name}"
+
+  case op @ BinaryComparison(a: Attribute, Literal(v, _: StringType))
+  if !varcharKeys.contains(a.name) =>
+s"""${a.name} ${op.symbol} "$v
+  case op @ BinaryComparison(Literal(v, _: StringType), a: Attribute)
+  if !varcharKeys.contains(a.name) =>
+s$v" ${op.symbol} ${a.name}"""
 }.mkString(" and ")
+  }
+
+  override def getPartitionsByFilter(
+  hive: Hive,
+  table: Table,
+  predicates: Seq[Expression]): Seq[Partition] = {
 
+// Hive getPartitionsByFilter() takes a string that represents partition
+// predicates like "str_key=\"value\" and int_key=1 ..."
+val filter = convertFilters(table, predicates)
 val partitions =
   if (filter.isEmpty) {
 getAllPart

spark git commit: [SPARK-9029] [SQL] shortcut CaseKeyWhen if key is null

2015-07-14 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 257236c3e -> 59d820aa8


[SPARK-9029] [SQL] shortcut CaseKeyWhen if key is null

Author: Wenchen Fan 

Closes #7389 from cloud-fan/case-when and squashes the following commits:

ea4b6ba [Wenchen Fan] shortcut for case key when


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59d820aa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/59d820aa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/59d820aa

Branch: refs/heads/master
Commit: 59d820aa8dec08b744971237860b4c6bef577ddf
Parents: 257236c
Author: Wenchen Fan 
Authored: Tue Jul 14 10:20:15 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Jul 14 10:20:15 2015 -0700

--
 .../sql/catalyst/expressions/conditionals.scala | 48 ++--
 1 file changed, 24 insertions(+), 24 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/59d820aa/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
index eea7706..c7f039e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
@@ -230,24 +230,31 @@ case class CaseKeyWhen(key: Expression, branches: 
Seq[Expression]) extends CaseW
 }
   }
 
+  private def evalElse(input: InternalRow): Any = {
+if (branchesArr.length % 2 == 0) {
+  null
+} else {
+  branchesArr(branchesArr.length - 1).eval(input)
+}
+  }
+
   /** Written in imperative fashion for performance considerations. */
   override def eval(input: InternalRow): Any = {
 val evaluatedKey = key.eval(input)
-val len = branchesArr.length
-var i = 0
-// If all branches fail and an elseVal is not provided, the whole statement
-// defaults to null, according to Hive's semantics.
-while (i < len - 1) {
-  if (threeValueEquals(evaluatedKey, branchesArr(i).eval(input))) {
-return branchesArr(i + 1).eval(input)
+// If key is null, we can just return the else part or null if there is no 
else.
+// If key is not null but doesn't match any when part, we need to return
+// the else part or null if there is no else, according to Hive's 
semantics.
+if (evaluatedKey != null) {
+  val len = branchesArr.length
+  var i = 0
+  while (i < len - 1) {
+if (evaluatedKey ==  branchesArr(i).eval(input)) {
+  return branchesArr(i + 1).eval(input)
+}
+i += 2
   }
-  i += 2
 }
-var res: Any = null
-if (i == len - 1) {
-  res = branchesArr(i).eval(input)
-}
-return res
+evalElse(input)
   }
 
   override def genCode(ctx: CodeGenContext, ev: GeneratedExpressionCode): 
String = {
@@ -261,8 +268,7 @@ case class CaseKeyWhen(key: Expression, branches: 
Seq[Expression]) extends CaseW
   s"""
 if (!$got) {
   ${cond.code}
-  if (!${keyEval.isNull} && !${cond.isNull}
- && ${ctx.genEqual(key.dataType, keyEval.primitive, 
cond.primitive)}) {
+  if (!${cond.isNull} && ${ctx.genEqual(key.dataType, 
keyEval.primitive, cond.primitive)}) {
 $got = true;
 ${res.code}
 ${ev.isNull} = ${res.isNull};
@@ -290,19 +296,13 @@ case class CaseKeyWhen(key: Expression, branches: 
Seq[Expression]) extends CaseW
   boolean ${ev.isNull} = true;
   ${ctx.javaType(dataType)} ${ev.primitive} = 
${ctx.defaultValue(dataType)};
   ${keyEval.code}
-  $cases
+  if (!${keyEval.isNull}) {
+$cases
+  }
   $other
 """
   }
 
-  private def threeValueEquals(l: Any, r: Any) = {
-if (l == null || r == null) {
-  false
-} else {
-  l == r
-}
-  }
-
   override def toString: String = {
 s"CASE $key" + branches.sliding(2, 2).map {
   case Seq(cond, value) => s" WHEN $cond THEN $value"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6851] [SQL] function least/greatest follow up

2015-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master c1feebd8f -> 257236c3e


[SPARK-6851] [SQL] function least/greatest follow up

This is a follow up of remaining comments from #6851

Author: Daoyuan Wang 

Closes #7387 from adrian-wang/udflgfollow and squashes the following commits:

6163e62 [Daoyuan Wang] add skipping null values
e8c2e09 [Daoyuan Wang] use seq
8362966 [Daoyuan Wang] pr6851 follow up


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/257236c3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/257236c3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/257236c3

Branch: refs/heads/master
Commit: 257236c3e17906098f801cbc2059e7a9054e8cab
Parents: c1feebd
Author: Daoyuan Wang 
Authored: Tue Jul 14 01:09:33 2015 -0700
Committer: Reynold Xin 
Committed: Tue Jul 14 01:09:33 2015 -0700

--
 .../sql/catalyst/expressions/conditionals.scala | 16 +++-
 .../ConditionalExpressionSuite.scala| 79 ++--
 .../scala/org/apache/spark/sql/functions.scala  | 16 ++--
 3 files changed, 62 insertions(+), 49 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/257236c3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
index 84c28c2..eea7706 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/conditionals.scala
@@ -311,7 +311,11 @@ case class CaseKeyWhen(key: Expression, branches: 
Seq[Expression]) extends CaseW
   }
 }
 
-case class Least(children: Expression*) extends Expression {
+/**
+ * A function that returns the least value of all parameters, skipping null 
values.
+ * It takes at least 2 parameters, and returns null iff all parameters are 
null.
+ */
+case class Least(children: Seq[Expression]) extends Expression {
   require(children.length > 1, "LEAST requires at least 2 arguments, got " + 
children.length)
 
   override def nullable: Boolean = children.forall(_.nullable)
@@ -356,12 +360,16 @@ case class Least(children: Expression*) extends 
Expression {
   ${evalChildren.map(_.code).mkString("\n")}
   boolean ${ev.isNull} = true;
   ${ctx.javaType(dataType)} ${ev.primitive} = 
${ctx.defaultValue(dataType)};
-  ${(0 until children.length).map(updateEval).mkString("\n")}
+  ${children.indices.map(updateEval).mkString("\n")}
 """
   }
 }
 
-case class Greatest(children: Expression*) extends Expression {
+/**
+ * A function that returns the greatest value of all parameters, skipping null 
values.
+ * It takes at least 2 parameters, and returns null iff all parameters are 
null.
+ */
+case class Greatest(children: Seq[Expression]) extends Expression {
   require(children.length > 1, "GREATEST requires at least 2 arguments, got " 
+ children.length)
 
   override def nullable: Boolean = children.forall(_.nullable)
@@ -406,7 +414,7 @@ case class Greatest(children: Expression*) extends 
Expression {
   ${evalChildren.map(_.code).mkString("\n")}
   boolean ${ev.isNull} = true;
   ${ctx.javaType(dataType)} ${ev.primitive} = 
${ctx.defaultValue(dataType)};
-  ${(0 until children.length).map(updateEval).mkString("\n")}
+  ${children.indices.map(updateEval).mkString("\n")}
 """
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/257236c3/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
index adadc8c..afa143b 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ConditionalExpressionSuite.scala
@@ -144,35 +144,35 @@ class ConditionalExpressionSuite extends SparkFunSuite 
with ExpressionEvalHelper
 val c3 = 'a.string.at(2)
 val c4 = 'a.string.at(3)
 val c5 = 'a.string.at(4)
-checkEvaluation(Least(c4, c3, c5), "a", row)
-checkEvaluation(Least(c1, c2), 1, row)
-checkEvaluation(Least(c1, c2, Literal(-1)), -1, row)
-checkEvaluation(Least(c4, c5, c3, c3, Literal("a")), "a", row)
-
-checkEvaluation(Least(Literal(null), Literal(null)), null, 
InternalRow.empty)
-checkEvalua

spark git commit: [SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about `spark.kryoserializer.buffer`

2015-07-14 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 20c1434a8 -> c1feebd8f


[SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about 
`spark.kryoserializer.buffer`

The meaning of spark.kryoserializer.buffer should be "Initial size of Kryo's 
serialization buffer. Note that there will be one buffer per core on each 
worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.".

The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4.

Author: zhaishidan 

Closes #7393 from stanzhai/master and squashes the following commits:

69729ef [zhaishidan] fix document error about spark.kryoserializer.buffer.max.mb


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c1feebd8
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c1feebd8
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c1feebd8

Branch: refs/heads/master
Commit: c1feebd8fcba985667db8ccdafd2b5ec76dcfae7
Parents: 20c1434
Author: zhaishidan 
Authored: Tue Jul 14 08:54:30 2015 +0100
Committer: Sean Owen 
Committed: Tue Jul 14 08:54:30 2015 +0100

--
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c1feebd8/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 443322e..8a186ee 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -665,7 +665,7 @@ Apart from these, the following properties are also 
available, and may be useful
   
 Initial size of Kryo's serialization buffer. Note that there will be one 
buffer
  per core on each worker. This buffer will grow up to
- spark.kryoserializer.buffer.max.mb if needed.
+ spark.kryoserializer.buffer.max if needed.
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about `spark.kryoserializer.buffer`

2015-07-14 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.4 50607eca5 -> dce68ad1a


[SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about 
`spark.kryoserializer.buffer`

The meaning of spark.kryoserializer.buffer should be "Initial size of Kryo's 
serialization buffer. Note that there will be one buffer per core on each 
worker. This buffer will grow up to spark.kryoserializer.buffer.max if needed.".

The spark.kryoserializer.buffer.max.mb is out-of-date in spark 1.4.

Author: zhaishidan 

Closes #7393 from stanzhai/master and squashes the following commits:

69729ef [zhaishidan] fix document error about spark.kryoserializer.buffer.max.mb

(cherry picked from commit c1feebd8fcba985667db8ccdafd2b5ec76dcfae7)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dce68ad1
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/dce68ad1
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/dce68ad1

Branch: refs/heads/branch-1.4
Commit: dce68ad1a0da5580179d1300d4262b9648babcda
Parents: 50607ec
Author: zhaishidan 
Authored: Tue Jul 14 08:54:30 2015 +0100
Committer: Sean Owen 
Committed: Tue Jul 14 08:54:59 2015 +0100

--
 docs/configuration.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/dce68ad1/docs/configuration.md
--
diff --git a/docs/configuration.md b/docs/configuration.md
index 19f3b7e..e60b0f5 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -665,7 +665,7 @@ Apart from these, the following properties are also 
available, and may be useful
   
 Initial size of Kryo's serialization buffer. Note that there will be one 
buffer
  per core on each worker. This buffer will grow up to
- spark.kryoserializer.buffer.max.mb if needed.
+ spark.kryoserializer.buffer.max if needed.
   
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9001] Fixing errors in javadocs that lead to failed build/sbt doc

2015-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 408b384de -> 20c1434a8


[SPARK-9001] Fixing errors in javadocs that lead to failed build/sbt doc

These are minor corrections in the documentation of several classes that are 
preventing:

```bash
build/sbt publish-local
```

I believe this might be an issue associated with running JDK8 as ankurdave does 
not appear to have this issue in JDK7.

Author: Joseph Gonzalez 

Closes #7354 from jegonzal/FixingJavadocErrors and squashes the following 
commits:

6664b7e [Joseph Gonzalez] making requested changes
2e16d89 [Joseph Gonzalez] Fixing errors in javadocs that prevents build/sbt 
publish-local from completing.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/20c1434a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/20c1434a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/20c1434a

Branch: refs/heads/master
Commit: 20c1434a8dbb25b98f6b434b158ae88e44ce3057
Parents: 408b384
Author: Joseph Gonzalez 
Authored: Tue Jul 14 00:32:29 2015 -0700
Committer: Reynold Xin 
Committed: Tue Jul 14 00:32:29 2015 -0700

--
 .../java/org/apache/spark/launcher/SparkLauncher.java |  5 +++--
 .../main/java/org/apache/spark/launcher/package-info.java | 10 +++---
 .../main/java/org/apache/spark/unsafe/bitset/BitSet.java  |  2 +-
 .../org/apache/spark/unsafe/bitset/BitSetMethods.java |  2 +-
 .../java/org/apache/spark/unsafe/map/BytesToBytesMap.java |  6 +-
 .../java/org/apache/spark/unsafe/types/UTF8String.java|  8 
 6 files changed, 21 insertions(+), 12 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/20c1434a/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
--
diff --git 
a/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java 
b/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
index d4cfeac..c0f89c9 100644
--- a/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
+++ b/launcher/src/main/java/org/apache/spark/launcher/SparkLauncher.java
@@ -25,11 +25,12 @@ import java.util.Map;
 
 import static org.apache.spark.launcher.CommandBuilderUtils.*;
 
-/**
+/** 
  * Launcher for Spark applications.
- * 
+ * 
  * Use this class to start Spark applications programmatically. The class uses 
a builder pattern
  * to allow clients to configure the Spark application and launch it as a 
child process.
+ * 
  */
 public class SparkLauncher {
 

http://git-wip-us.apache.org/repos/asf/spark/blob/20c1434a/launcher/src/main/java/org/apache/spark/launcher/package-info.java
--
diff --git a/launcher/src/main/java/org/apache/spark/launcher/package-info.java 
b/launcher/src/main/java/org/apache/spark/launcher/package-info.java
index 7ed756f..7c97dba 100644
--- a/launcher/src/main/java/org/apache/spark/launcher/package-info.java
+++ b/launcher/src/main/java/org/apache/spark/launcher/package-info.java
@@ -17,13 +17,17 @@
 
 /**
  * Library for launching Spark applications.
- * 
+ * 
+ * 
  * This library allows applications to launch Spark programmatically. There's 
only one entry
  * point to the library - the {@link org.apache.spark.launcher.SparkLauncher} 
class.
- * 
+ * 
+ *
+ * 
  * To launch a Spark application, just instantiate a {@link 
org.apache.spark.launcher.SparkLauncher}
  * and configure the application to run. For example:
- *
+ * 
+ * 
  * 
  * {@code
  *   import org.apache.spark.launcher.SparkLauncher;

http://git-wip-us.apache.org/repos/asf/spark/blob/20c1434a/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java
--
diff --git a/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java 
b/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java
index 28e23da..7c12417 100644
--- a/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java
+++ b/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSet.java
@@ -90,7 +90,7 @@ public final class BitSet {
* To iterate over the true bits in a BitSet, use the following loop:
* 
* 
-   *  for (long i = bs.nextSetBit(0); i >= 0; i = bs.nextSetBit(i + 1)) {
+   *  for (long i = bs.nextSetBit(0); i >= 0; i = bs.nextSetBit(i + 1)) {
*// operate on index i here
*  }
* 

http://git-wip-us.apache.org/repos/asf/spark/blob/20c1434a/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java
--
diff --git 
a/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java 
b/unsafe/src/main/java/org/apache/spark/unsafe/bitset/BitSetMethods.java
index 0987191..27462c7 100644
-

spark git commit: [SPARK-8997] [MLLIB] Performance improvements in LocalPrefixSpan

spark git commit: [SPARK-8279][SQL]Add math function round

spark git commit: [SPARK-8018] [MLLIB] KMeans should accept initial cluster centers as param

spark git commit: [SPARK-6259] [MLLIB] Python API for LDA

spark git commit: Revert SPARK-6910 and SPARK-9027

spark git commit: [SPARK-8993][SQL] More comprehensive type checking in expressions.

spark git commit: [SPARK-8808] [SPARKR] Fix assignments in SparkR.

spark git commit: [HOTFIX] Adding new names to known contributors

svn commit: r1691124 - in /spark/site/docs/1.4.1: ./ api/ api/R/ api/java/ api/java/org/ api/java/org/apache/ api/java/org/apache/spark/ api/java/org/apache/spark/annotation/ api/java/org/apache/spark

svn commit: r9826 - /dev/spark/spark-1.4.1-rc4-bin/ /release/spark/spark-1.4.1/

svn commit: r9825 - in /dev/spark/spark-1.4.1-rc4-bin: spark-1.4.1-bin-hadoop2.4-without-hive.tgz spark-1.4.1-bin-hadoop2.4-without-hive.tgz.asc spark-1.4.1-bin-hadoop2.4-without-hive.tgz.md5 spark-1.

svn commit: r9824 - /dev/spark/spark-1.4.1-rc4-bin/

spark git commit: [SPARK-5523] [CORE] [STREAMING] Add a cache for hostname in TaskMetrics to decrease the memory usage and GC overhead

spark git commit: [SPARK-8820] [STREAMING] Add a configuration to set checkpoint dir.

spark git commit: [SPARK-9050] [SQL] Remove unused newOrdering argument from Exchange (cleanup after SPARK-8317)

Git Push Summary

spark git commit: [SPARK-9045] Fix Scala 2.11 build break in UnsafeExternalRowSorter

spark git commit: [SPARK-8962] Add Scalastyle rule to ban direct use of Class.forName; fix existing uses

spark git commit: [SPARK-4362] [MLLIB] Make prediction probability available in NaiveBayesModel

spark git commit: [SPARK-8800] [SQL] Fix inaccurate precision/scale of Decimal division operation

spark git commit: [SPARK-4072] [CORE] Display Streaming blocks in Streaming UI

spark git commit: [SPARK-8718] [GRAPHX] Improve EdgePartition2D for non perfect square number of partitions

spark git commit: [SPARK-9031] Merge BlockObjectWriter and DiskBlockObject writer to remove abstract class

spark git commit: [SPARK-8911] Fix local mode endless heartbeats

spark git commit: [SPARK-8933] [BUILD] Provide a --force flag to build/mvn that always uses downloaded maven

spark git commit: [SPARK-9027] [SQL] Generalize metastore predicate pushdown

spark git commit: [SPARK-9029] [SQL] shortcut CaseKeyWhen if key is null

spark git commit: [SPARK-6851] [SQL] function least/greatest follow up

spark git commit: [SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about `spark.kryoserializer.buffer`

spark git commit: [SPARK-9010] [DOCUMENTATION] Improve the Spark Configuration document about `spark.kryoserializer.buffer`

spark git commit: [SPARK-9001] Fixing errors in javadocs that lead to failed build/sbt doc

31 matches

Site Navigation

Mail list logo

Footer information