date:20150804

[2/2] spark git commit: Update docs/README.md to put all prereqs together.

2015-08-04 Thread rxin

Update docs/README.md to put all prereqs together.

This pull request groups all the prereq requirements into a single section.

cc srowen shivaram

Author: Reynold Xin 

Closes #7951 from rxin/readme-docs and squashes the following commits:

ab7ded0 [Reynold Xin] Updated docs/README.md to put all prereqs together.

(cherry picked from commit f7abd6bec9d51ed4ab6359e50eac853e64ecae86)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6e8446a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b6e8446a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b6e8446a

Branch: refs/heads/branch-1.5
Commit: b6e8446a421013e4679f5f4d95ffa6f166a9b0b7
Parents: 141f034
Author: Reynold Xin 
Authored: Tue Aug 4 22:17:14 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 23:22:11 2015 -0700

--
 docs/README.md | 43 ---
 1 file changed, 8 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b6e8446a/docs/README.md
--
diff --git a/docs/README.md b/docs/README.md
index 5020989..1f4fd3e 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -9,12 +9,13 @@ documentation yourself. Why build it yourself? So that you 
have the docs that co
 whichever version of Spark you currently have checked out of revision control.
 
 ## Prerequisites
-The Spark documenation build uses a number of tools to build HTML docs and API 
docs in Scala, Python
-and R. To get started you can run the following commands
+The Spark documentation build uses a number of tools to build HTML docs and 
API docs in Scala,
+Python and R. To get started you can run the following commands
 
 $ sudo gem install jekyll
 $ sudo gem install jekyll-redirect-from
 $ sudo pip install Pygments
+$ sudo pip install sphinx
 $ Rscript -e 'install.packages(c("knitr", "devtools"), 
repos="http://cran.stat.ucla.edu/";)'
 
 
@@ -29,17 +30,12 @@ you have checked out or downloaded.
 In this directory you will find textfiles formatted using Markdown, with an 
".md" suffix. You can
 read those text files directly if you want. Start with index.md.
 
-The markdown code can be compiled to HTML using the [Jekyll 
tool](http://jekyllrb.com).
-`Jekyll` and a few dependencies must be installed for this to work. We 
recommend
-installing via the Ruby Gem dependency manager. Since the exact HTML output
-varies between versions of Jekyll and its dependencies, we list specific 
versions here
-in some cases:
+Execute `jekyll build` from the `docs/` directory to compile the site. 
Compiling the site with
+Jekyll will create a directory called `_site` containing index.html as well as 
the rest of the
+compiled files.
 
-$ sudo gem install jekyll
-$ sudo gem install jekyll-redirect-from
-
-Execute `jekyll build` from the `docs/` directory to compile the site. 
Compiling the site with Jekyll will create a directory
-called `_site` containing index.html as well as the rest of the compiled files.
+$ cd docs
+$ jekyll build
 
 You can modify the default Jekyll build as follows:
 
@@ -50,29 +46,6 @@ You can modify the default Jekyll build as follows:
 # Build the site with extra features used on the live page
 $ PRODUCTION=1 jekyll build
 
-## Pygments
-
-We also use pygments (http://pygments.org) for syntax highlighting in 
documentation markdown pages,
-so you will also need to install that (it requires Python) by running `sudo 
pip install Pygments`.
-
-To mark a block of code in your markdown to be syntax highlighted by jekyll 
during the compile
-phase, use the following sytax:
-
-{% highlight scala %}
-// Your scala code goes here, you can replace scala with many other
-// supported languages too.
-{% endhighlight %}
-
-## Sphinx
-
-We use Sphinx to generate Python API docs, so you will need to install it by 
running
-`sudo pip install sphinx`.
-
-## knitr, devtools
-
-SparkR documentation is written using `roxygen2` and we use `knitr`, 
`devtools` to generate
-documentation. To install these packages you can run 
`install.packages(c("knitr", "devtools"))` from a
-R console.
 
 ## API Docs (Scaladoc, Sphinx, roxygen2)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[1/2] spark git commit: Add a prerequisites section for building docs

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 864d5de6d -> b6e8446a4


Add a prerequisites section for building docs

This puts all the install commands that need to be run in one section instead 
of being spread over many paragraphs

cc rxin

Author: Shivaram Venkataraman 

Closes #7912 from shivaram/docs-setup-readme and squashes the following commits:

cf7a204 [Shivaram Venkataraman] Add a prerequisites section for building docs

(cherry picked from commit 7abaaad5b169520fbf7299808b2bafde089a16a2)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/141f034b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/141f034b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/141f034b

Branch: refs/heads/branch-1.5
Commit: 141f034b4575b101c4f23d4dd5b2a95e14a632ef
Parents: 864d5de
Author: Shivaram Venkataraman 
Authored: Mon Aug 3 17:00:59 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 23:22:02 2015 -0700

--
 docs/README.md | 10 ++
 1 file changed, 10 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/141f034b/docs/README.md
--
diff --git a/docs/README.md b/docs/README.md
index d7652e9..5020989 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -8,6 +8,16 @@ Read on to learn more about viewing documentation in plain 
text (i.e., markdown)
 documentation yourself. Why build it yourself? So that you have the docs that 
corresponds to
 whichever version of Spark you currently have checked out of revision control.
 
+## Prerequisites
+The Spark documenation build uses a number of tools to build HTML docs and API 
docs in Scala, Python
+and R. To get started you can run the following commands
+
+$ sudo gem install jekyll
+$ sudo gem install jekyll-redirect-from
+$ sudo pip install Pygments
+$ Rscript -e 'install.packages(c("knitr", "devtools"), 
repos="http://cran.stat.ucla.edu/";)'
+
+
 ## Generating the Documentation HTML
 
 We include the Spark documentation as part of the source (as opposed to using 
a hosted wiki, such as


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9119] [SPARK-8359] [SQL] match Decimal.precision/scale with DecimalType

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master d34548587 -> 781c8d71a


[SPARK-9119] [SPARK-8359] [SQL] match Decimal.precision/scale with DecimalType

Let Decimal carry the correct precision and scale with DecimalType.

cc rxin yhuai

Author: Davies Liu 

Closes #7925 from davies/decimal_scale and squashes the following commits:

e19701a [Davies Liu] some tweaks
57d78d2 [Davies Liu] fix tests
5d5bc69 [Davies Liu] match precision and scale with DecimalType


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/781c8d71
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/781c8d71
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/781c8d71

Branch: refs/heads/master
Commit: 781c8d71a0a6a86c84048a4f22cb3a7d035a5be2
Parents: d345485
Author: Davies Liu 
Authored: Tue Aug 4 23:12:49 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 23:12:49 2015 -0700

--
 .../main/scala/org/apache/spark/sql/Row.scala   |  4 +++
 .../sql/catalyst/CatalystTypeConverters.scala   | 21 ++-
 .../catalyst/analysis/HiveTypeCoercion.scala|  4 +--
 .../spark/sql/catalyst/expressions/Cast.scala   |  6 ++--
 .../sql/catalyst/expressions/arithmetic.scala   |  2 +-
 .../org/apache/spark/sql/types/Decimal.scala| 37 
 .../spark/sql/types/decimal/DecimalSuite.scala  | 21 +++
 .../sql/execution/SparkSqlSerializer2.scala |  3 +-
 .../apache/spark/sql/execution/pythonUDFs.scala |  3 +-
 .../org/apache/spark/sql/json/InferSchema.scala | 21 ---
 .../apache/spark/sql/json/JacksonParser.scala   |  5 ++-
 .../apache/spark/sql/JavaApplySchemaSuite.java  |  2 +-
 .../columnar/InMemoryColumnarQuerySuite.scala   |  2 +-
 .../org/apache/spark/sql/json/JsonSuite.scala   | 26 ++
 .../spark/sql/parquet/ParquetQuerySuite.scala   | 13 +++
 .../hive/execution/ScriptTransformation.scala   |  2 +-
 16 files changed, 122 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/781c8d71/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
--
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index 9144947..40159aa 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -417,6 +417,10 @@ trait Row extends Serializable {
 if (!o2.isInstanceOf[Double] || ! 
java.lang.Double.isNaN(o2.asInstanceOf[Double])) {
   return false
 }
+  case d1: java.math.BigDecimal if 
o2.isInstanceOf[java.math.BigDecimal] =>
+if (d1.compareTo(o2.asInstanceOf[java.math.BigDecimal]) != 0) {
+  return false
+}
   case _ => if (o1 != o2) {
 return false
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/781c8d71/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
index c666864..8d0c64e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
@@ -317,18 +317,23 @@ object CatalystTypeConverters {
 
   private class DecimalConverter(dataType: DecimalType)
 extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
-override def toCatalystImpl(scalaValue: Any): Decimal = scalaValue match {
-  case d: BigDecimal => Decimal(d)
-  case d: JavaBigDecimal => Decimal(d)
-  case d: Decimal => d
+override def toCatalystImpl(scalaValue: Any): Decimal = {
+  val decimal = scalaValue match {
+case d: BigDecimal => Decimal(d)
+case d: JavaBigDecimal => Decimal(d)
+case d: Decimal => d
+  }
+  if (decimal.changePrecision(dataType.precision, dataType.scale)) {
+decimal
+  } else {
+null
+  }
 }
 override def toScala(catalystValue: Decimal): JavaBigDecimal = 
catalystValue.toJavaBigDecimal
 override def toScalaImpl(row: InternalRow, column: Int): JavaBigDecimal =
   row.getDecimal(column, dataType.precision, 
dataType.scale).toJavaBigDecimal
   }
 
-  private object BigDecimalConverter extends 
DecimalConverter(DecimalType.SYSTEM_DEFAULT)
-
   private abstract class PrimitiveConverter[T] extends 
CatalystTypeConverter[T, Any, Any] {
 final override def toScala(catalystValue: Any): Any = catalystValue
 final over

spark git commit: [SPARK-9119] [SPARK-8359] [SQL] match Decimal.precision/scale with DecimalType

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 28bb97730 -> 864d5de6d


[SPARK-9119] [SPARK-8359] [SQL] match Decimal.precision/scale with DecimalType

Let Decimal carry the correct precision and scale with DecimalType.

cc rxin yhuai

Author: Davies Liu 

Closes #7925 from davies/decimal_scale and squashes the following commits:

e19701a [Davies Liu] some tweaks
57d78d2 [Davies Liu] fix tests
5d5bc69 [Davies Liu] match precision and scale with DecimalType

(cherry picked from commit 781c8d71a0a6a86c84048a4f22cb3a7d035a5be2)
Signed-off-by: Davies Liu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/864d5de6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/864d5de6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/864d5de6

Branch: refs/heads/branch-1.5
Commit: 864d5de6da3110974ddf0fbd216e3ef934a9f034
Parents: 28bb977
Author: Davies Liu 
Authored: Tue Aug 4 23:12:49 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 23:13:03 2015 -0700

--
 .../main/scala/org/apache/spark/sql/Row.scala   |  4 +++
 .../sql/catalyst/CatalystTypeConverters.scala   | 21 ++-
 .../catalyst/analysis/HiveTypeCoercion.scala|  4 +--
 .../spark/sql/catalyst/expressions/Cast.scala   |  6 ++--
 .../sql/catalyst/expressions/arithmetic.scala   |  2 +-
 .../org/apache/spark/sql/types/Decimal.scala| 37 
 .../spark/sql/types/decimal/DecimalSuite.scala  | 21 +++
 .../sql/execution/SparkSqlSerializer2.scala |  3 +-
 .../apache/spark/sql/execution/pythonUDFs.scala |  3 +-
 .../org/apache/spark/sql/json/InferSchema.scala | 21 ---
 .../apache/spark/sql/json/JacksonParser.scala   |  5 ++-
 .../apache/spark/sql/JavaApplySchemaSuite.java  |  2 +-
 .../columnar/InMemoryColumnarQuerySuite.scala   |  2 +-
 .../org/apache/spark/sql/json/JsonSuite.scala   | 26 ++
 .../spark/sql/parquet/ParquetQuerySuite.scala   | 13 +++
 .../hive/execution/ScriptTransformation.scala   |  2 +-
 16 files changed, 122 insertions(+), 50 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/864d5de6/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
--
diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
index 9144947..40159aa 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/Row.scala
@@ -417,6 +417,10 @@ trait Row extends Serializable {
 if (!o2.isInstanceOf[Double] || ! 
java.lang.Double.isNaN(o2.asInstanceOf[Double])) {
   return false
 }
+  case d1: java.math.BigDecimal if 
o2.isInstanceOf[java.math.BigDecimal] =>
+if (d1.compareTo(o2.asInstanceOf[java.math.BigDecimal]) != 0) {
+  return false
+}
   case _ => if (o1 != o2) {
 return false
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/864d5de6/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
index c666864..8d0c64e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/CatalystTypeConverters.scala
@@ -317,18 +317,23 @@ object CatalystTypeConverters {
 
   private class DecimalConverter(dataType: DecimalType)
 extends CatalystTypeConverter[Any, JavaBigDecimal, Decimal] {
-override def toCatalystImpl(scalaValue: Any): Decimal = scalaValue match {
-  case d: BigDecimal => Decimal(d)
-  case d: JavaBigDecimal => Decimal(d)
-  case d: Decimal => d
+override def toCatalystImpl(scalaValue: Any): Decimal = {
+  val decimal = scalaValue match {
+case d: BigDecimal => Decimal(d)
+case d: JavaBigDecimal => Decimal(d)
+case d: Decimal => d
+  }
+  if (decimal.changePrecision(dataType.precision, dataType.scale)) {
+decimal
+  } else {
+null
+  }
 }
 override def toScala(catalystValue: Decimal): JavaBigDecimal = 
catalystValue.toJavaBigDecimal
 override def toScalaImpl(row: InternalRow, column: Int): JavaBigDecimal =
   row.getDecimal(column, dataType.precision, 
dataType.scale).toJavaBigDecimal
   }
 
-  private object BigDecimalConverter extends 
DecimalConverter(DecimalType.SYSTEM_DEFAULT)
-
   private abstract class PrimitiveConverter[T] extends 
CatalystTypeConverte

spark git commit: [SPARK-8231] [SQL] Add array_contains

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 bca196754 -> 28bb97730


[SPARK-8231] [SQL] Add array_contains

This PR is based on #7580 , thanks to EntilZha

PR for work on https://issues.apache.org/jira/browse/SPARK-8231

Currently, I have an initial implementation for contains. Based on discussion 
on JIRA, it should behave same as Hive: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128

Main points are:
1. If the array is empty, null, or the value is null, return false
2. If there is a type mismatch, throw error
3. If comparison is not supported, throw error

Closes #7580

Author: Pedro Rodriguez 
Author: Pedro Rodriguez 
Author: Davies Liu 

Closes #7949 from davies/array_contains and squashes the following commits:

d3c08bc [Davies Liu] use foreach() to avoid copy
bc3d1fe [Davies Liu] fix array_contains
719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
array_contains
e352cf9 [Pedro Rodriguez] fixed diff from master
4d5b0ff [Pedro Rodriguez] added docs and another type check
ffc0591 [Pedro Rodriguez] fixed unit test
7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints 
which are different between python 2 an 3
b5ffae8 [Pedro Rodriguez] fixed pyspark test
4e7dce3 [Pedro Rodriguez] added more docs
3082399 [Pedro Rodriguez] fixed unit test
46f9789 [Pedro Rodriguez] reverted change
d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then 
added tests to insure this
8528027 [Pedro Rodriguez] added more tests
686e029 [Pedro Rodriguez] fix scala style
d262e9d [Pedro Rodriguez] reworked type checking code and added more tests
2517a58 [Pedro Rodriguez] removed unused import
28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests
12f8795 [Pedro Rodriguez] fix scala style checks
e8a20a9 [Pedro Rodriguez] added python df (broken atm)
65b562c [Pedro Rodriguez] made array_contains nullable false
33b45aa [Pedro Rodriguez] reordered test
9623c64 [Pedro Rodriguez] fixed test
4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs
72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs
69c46fb [Pedro Rodriguez] added tests and codegen
9e0bfc4 [Pedro Rodriguez] initial attempt at implementation


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/28bb9773
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/28bb9773
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/28bb9773

Branch: refs/heads/branch-1.5
Commit: 28bb977302ff3077c82bb8ee7518eb36bddaf2b3
Parents: bca1967
Author: Pedro Rodriguez 
Authored: Tue Aug 4 22:32:21 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 22:35:45 2015 -0700

--
 python/pyspark/sql/functions.py | 17 +
 .../catalyst/analysis/FunctionRegistry.scala|  1 +
 .../expressions/collectionOperations.scala  | 78 +++-
 .../expressions/CollectionFunctionsSuite.scala  | 15 
 .../scala/org/apache/spark/sql/functions.scala  |  8 ++
 .../spark/sql/DataFrameFunctionsSuite.scala | 49 +++-
 6 files changed, 163 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/28bb9773/python/pyspark/sql/functions.py
--
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index e65b14d..9f0d71d 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1311,6 +1311,23 @@ def array(*cols):
 return Column(jc)
 
 
+@since(1.5)
+def array_contains(col, value):
+"""
+Collection function: returns True if the array contains the given value. 
The collection
+elements and value must be of the same type.
+
+:param col: name of column containing array
+:param value: value to check for in array
+
+>>> df = sqlContext.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
+>>> df.select(array_contains(df.data, "a")).collect()
+[Row(array_contains(data,a)=True), Row(array_contains(data,a)=False)]
+"""
+sc = SparkContext._active_spark_context
+return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
+
+
 @since(1.4)
 def explode(col):
 """Returns a new row for each element in the given array or map.

http://git-wip-us.apache.org/repos/asf/spark/blob/28bb9773/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 43e3e9b..94c355f 100644
--- 
a/sql/catalys

spark git commit: [SPARK-8231] [SQL] Add array_contains

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master a02bcf20c -> d34548587


[SPARK-8231] [SQL] Add array_contains

This PR is based on #7580 , thanks to EntilZha

PR for work on https://issues.apache.org/jira/browse/SPARK-8231

Currently, I have an initial implementation for contains. Based on discussion 
on JIRA, it should behave same as Hive: 
https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/udf/generic/GenericUDFArrayContains.java#L102-L128

Main points are:
1. If the array is empty, null, or the value is null, return false
2. If there is a type mismatch, throw error
3. If comparison is not supported, throw error

Closes #7580

Author: Pedro Rodriguez 
Author: Pedro Rodriguez 
Author: Davies Liu 

Closes #7949 from davies/array_contains and squashes the following commits:

d3c08bc [Davies Liu] use foreach() to avoid copy
bc3d1fe [Davies Liu] fix array_contains
719e37d [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
array_contains
e352cf9 [Pedro Rodriguez] fixed diff from master
4d5b0ff [Pedro Rodriguez] added docs and another type check
ffc0591 [Pedro Rodriguez] fixed unit test
7a22deb [Pedro Rodriguez] Changed test to use strings instead of long/ints 
which are different between python 2 an 3
b5ffae8 [Pedro Rodriguez] fixed pyspark test
4e7dce3 [Pedro Rodriguez] added more docs
3082399 [Pedro Rodriguez] fixed unit test
46f9789 [Pedro Rodriguez] reverted change
d3ca013 [Pedro Rodriguez] Fixed type checking to match hive behavior, then 
added tests to insure this
8528027 [Pedro Rodriguez] added more tests
686e029 [Pedro Rodriguez] fix scala style
d262e9d [Pedro Rodriguez] reworked type checking code and added more tests
2517a58 [Pedro Rodriguez] removed unused import
28b4f71 [Pedro Rodriguez] fixed bug with type conversions and re-added tests
12f8795 [Pedro Rodriguez] fix scala style checks
e8a20a9 [Pedro Rodriguez] added python df (broken atm)
65b562c [Pedro Rodriguez] made array_contains nullable false
33b45aa [Pedro Rodriguez] reordered test
9623c64 [Pedro Rodriguez] fixed test
4b4425b [Pedro Rodriguez] changed Arrays in tests to Seqs
72cb4b1 [Pedro Rodriguez] added checkInputTypes and docs
69c46fb [Pedro Rodriguez] added tests and codegen
9e0bfc4 [Pedro Rodriguez] initial attempt at implementation


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d3454858
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d3454858
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d3454858

Branch: refs/heads/master
Commit: d34548587ab55bc2136c8f823b9e6ae96e1355a4
Parents: a02bcf2
Author: Pedro Rodriguez 
Authored: Tue Aug 4 22:32:21 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 22:34:02 2015 -0700

--
 python/pyspark/sql/functions.py | 17 +
 .../catalyst/analysis/FunctionRegistry.scala|  1 +
 .../expressions/collectionOperations.scala  | 78 +++-
 .../expressions/CollectionFunctionsSuite.scala  | 15 
 .../scala/org/apache/spark/sql/functions.scala  |  8 ++
 .../spark/sql/DataFrameFunctionsSuite.scala | 49 +++-
 6 files changed, 163 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d3454858/python/pyspark/sql/functions.py
--
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index e65b14d..9f0d71d 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -1311,6 +1311,23 @@ def array(*cols):
 return Column(jc)
 
 
+@since(1.5)
+def array_contains(col, value):
+"""
+Collection function: returns True if the array contains the given value. 
The collection
+elements and value must be of the same type.
+
+:param col: name of column containing array
+:param value: value to check for in array
+
+>>> df = sqlContext.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
+>>> df.select(array_contains(df.data, "a")).collect()
+[Row(array_contains(data,a)=True), Row(array_contains(data,a)=False)]
+"""
+sc = SparkContext._active_spark_context
+return Column(sc._jvm.functions.array_contains(_to_java_column(col), 
value))
+
+
 @since(1.4)
 def explode(col):
 """Returns a new row for each element in the given array or map.

http://git-wip-us.apache.org/repos/asf/spark/blob/d3454858/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 43e3e9b..94c355f 100644
--- 
a/sql/catalyst/src/ma

spark git commit: [SPARK-9540] [MLLIB] optimize PrefixSpan implementation

2015-08-04 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 6e72d24e2 -> bca196754


[SPARK-9540] [MLLIB] optimize PrefixSpan implementation

This is a major refactoring of the PrefixSpan implementation. It contains the 
following changes:

1. Expand prefix with one item at a time. The existing implementation generates 
all subsets for each itemset, which might have scalability issue when the 
itemset is large.
2. Use a new internal format. `<(12)(31)>` is represented by `[0, 1, 2, 0, 1, 
3, 0]` internally. We use `0` because negative numbers are used to indicates 
partial prefix items, e.g., `_2` is represented by `-2`.
3. Remember the start indices of all partial projections in the projected 
postfix to help next projection.
4. Reuse the original sequence array for projected postfixes.
5. Use `Prefix` IDs in aggregation rather than its content.
6. Use `ArrayBuilder` for building primitive arrays.
7. Expose `maxLocalProjDBSize`.
8. Tests are not changed except using `0` instead of `-1` as the delimiter.

`Postfix`'s API doc should be a good place to start.

Closes #7594

feynmanliang zhangjiajin

Author: Xiangrui Meng 

Closes #7937 from mengxr/SPARK-9540 and squashes the following commits:

2d0ec31 [Xiangrui Meng] address more comments
48f450c [Xiangrui Meng] address comments from Feynman; fixed a bug in project 
and added a test
65f90e8 [Xiangrui Meng] naming and documentation
8afc86a [Xiangrui Meng] refactor impl

(cherry picked from commit a02bcf20c4fc9e2e182630d197221729e996afc2)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bca19675
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bca19675
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bca19675

Branch: refs/heads/branch-1.5
Commit: bca196754ddf2ccd057d775bd5c3f7d3e5657e6f
Parents: 6e72d24
Author: Xiangrui Meng 
Authored: Tue Aug 4 22:28:49 2015 -0700
Committer: Xiangrui Meng 
Committed: Tue Aug 4 22:28:58 2015 -0700

--
 .../spark/mllib/fpm/LocalPrefixSpan.scala   | 132 +++--
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala | 587 ---
 .../spark/mllib/fpm/PrefixSpanSuite.scala   | 271 +
 3 files changed, 599 insertions(+), 391 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bca19675/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
index ccebf95..3ea1077 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
@@ -22,85 +22,89 @@ import scala.collection.mutable
 import org.apache.spark.Logging
 
 /**
- * Calculate all patterns of a projected database in local.
+ * Calculate all patterns of a projected database in local mode.
+ *
+ * @param minCount minimal count for a frequent pattern
+ * @param maxPatternLength max pattern length for a frequent pattern
  */
-private[fpm] object LocalPrefixSpan extends Logging with Serializable {
-  import PrefixSpan._
+private[fpm] class LocalPrefixSpan(
+val minCount: Long,
+val maxPatternLength: Int) extends Logging with Serializable {
+  import PrefixSpan.Postfix
+  import LocalPrefixSpan.ReversedPrefix
+
   /**
-   * Calculate all patterns of a projected database.
-   * @param minCount minimum count
-   * @param maxPatternLength maximum pattern length
-   * @param prefixes prefixes in reversed order
-   * @param database the projected database
-   * @return a set of sequential pattern pairs,
-   * the key of pair is sequential pattern (a list of items in 
reversed order),
-   * the value of pair is the pattern's count.
+   * Generates frequent patterns on the input array of postfixes.
+   * @param postfixes an array of postfixes
+   * @return an iterator of (frequent pattern, count)
*/
-  def run(
-  minCount: Long,
-  maxPatternLength: Int,
-  prefixes: List[Set[Int]],
-  database: Iterable[List[Set[Int]]]): Iterator[(List[Set[Int]], Long)] = {
-if (prefixes.length == maxPatternLength || database.isEmpty) {
-  return Iterator.empty
-}
-val freqItemSetsAndCounts = getFreqItemAndCounts(minCount, database)
-val freqItems = freqItemSetsAndCounts.keys.flatten.toSet
-val filteredDatabase = database.map { suffix =>
-  suffix
-.map(item => freqItems.intersect(item))
-.filter(_.nonEmpty)
-}
-freqItemSetsAndCounts.iterator.flatMap { case (item, count) =>
-  val newPrefixes = item :: prefixes
-  val newProjected = project(filteredDatabase, item)
-  Iterator.single((new

spark git commit: [SPARK-9540] [MLLIB] optimize PrefixSpan implementation

2015-08-04 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master f7abd6bec -> a02bcf20c


[SPARK-9540] [MLLIB] optimize PrefixSpan implementation

This is a major refactoring of the PrefixSpan implementation. It contains the 
following changes:

1. Expand prefix with one item at a time. The existing implementation generates 
all subsets for each itemset, which might have scalability issue when the 
itemset is large.
2. Use a new internal format. `<(12)(31)>` is represented by `[0, 1, 2, 0, 1, 
3, 0]` internally. We use `0` because negative numbers are used to indicates 
partial prefix items, e.g., `_2` is represented by `-2`.
3. Remember the start indices of all partial projections in the projected 
postfix to help next projection.
4. Reuse the original sequence array for projected postfixes.
5. Use `Prefix` IDs in aggregation rather than its content.
6. Use `ArrayBuilder` for building primitive arrays.
7. Expose `maxLocalProjDBSize`.
8. Tests are not changed except using `0` instead of `-1` as the delimiter.

`Postfix`'s API doc should be a good place to start.

Closes #7594

feynmanliang zhangjiajin

Author: Xiangrui Meng 

Closes #7937 from mengxr/SPARK-9540 and squashes the following commits:

2d0ec31 [Xiangrui Meng] address more comments
48f450c [Xiangrui Meng] address comments from Feynman; fixed a bug in project 
and added a test
65f90e8 [Xiangrui Meng] naming and documentation
8afc86a [Xiangrui Meng] refactor impl


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a02bcf20
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a02bcf20
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a02bcf20

Branch: refs/heads/master
Commit: a02bcf20c4fc9e2e182630d197221729e996afc2
Parents: f7abd6b
Author: Xiangrui Meng 
Authored: Tue Aug 4 22:28:49 2015 -0700
Committer: Xiangrui Meng 
Committed: Tue Aug 4 22:28:49 2015 -0700

--
 .../spark/mllib/fpm/LocalPrefixSpan.scala   | 132 +++--
 .../org/apache/spark/mllib/fpm/PrefixSpan.scala | 587 ---
 .../spark/mllib/fpm/PrefixSpanSuite.scala   | 271 +
 3 files changed, 599 insertions(+), 391 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a02bcf20/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
index ccebf95..3ea1077 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/fpm/LocalPrefixSpan.scala
@@ -22,85 +22,89 @@ import scala.collection.mutable
 import org.apache.spark.Logging
 
 /**
- * Calculate all patterns of a projected database in local.
+ * Calculate all patterns of a projected database in local mode.
+ *
+ * @param minCount minimal count for a frequent pattern
+ * @param maxPatternLength max pattern length for a frequent pattern
  */
-private[fpm] object LocalPrefixSpan extends Logging with Serializable {
-  import PrefixSpan._
+private[fpm] class LocalPrefixSpan(
+val minCount: Long,
+val maxPatternLength: Int) extends Logging with Serializable {
+  import PrefixSpan.Postfix
+  import LocalPrefixSpan.ReversedPrefix
+
   /**
-   * Calculate all patterns of a projected database.
-   * @param minCount minimum count
-   * @param maxPatternLength maximum pattern length
-   * @param prefixes prefixes in reversed order
-   * @param database the projected database
-   * @return a set of sequential pattern pairs,
-   * the key of pair is sequential pattern (a list of items in 
reversed order),
-   * the value of pair is the pattern's count.
+   * Generates frequent patterns on the input array of postfixes.
+   * @param postfixes an array of postfixes
+   * @return an iterator of (frequent pattern, count)
*/
-  def run(
-  minCount: Long,
-  maxPatternLength: Int,
-  prefixes: List[Set[Int]],
-  database: Iterable[List[Set[Int]]]): Iterator[(List[Set[Int]], Long)] = {
-if (prefixes.length == maxPatternLength || database.isEmpty) {
-  return Iterator.empty
-}
-val freqItemSetsAndCounts = getFreqItemAndCounts(minCount, database)
-val freqItems = freqItemSetsAndCounts.keys.flatten.toSet
-val filteredDatabase = database.map { suffix =>
-  suffix
-.map(item => freqItems.intersect(item))
-.filter(_.nonEmpty)
-}
-freqItemSetsAndCounts.iterator.flatMap { case (item, count) =>
-  val newPrefixes = item :: prefixes
-  val newProjected = project(filteredDatabase, item)
-  Iterator.single((newPrefixes, count)) ++
-run(minCount, maxPatternLength, newPrefixes, newProjected)
+  def run(postfixe

spark git commit: Update docs/README.md to put all prereqs together.

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master d34bac0e1 -> f7abd6bec


Update docs/README.md to put all prereqs together.

This pull request groups all the prereq requirements into a single section.

cc srowen shivaram

Author: Reynold Xin 

Closes #7951 from rxin/readme-docs and squashes the following commits:

ab7ded0 [Reynold Xin] Updated docs/README.md to put all prereqs together.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f7abd6be
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f7abd6be
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f7abd6be

Branch: refs/heads/master
Commit: f7abd6bec9d51ed4ab6359e50eac853e64ecae86
Parents: d34bac0
Author: Reynold Xin 
Authored: Tue Aug 4 22:17:14 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 22:17:14 2015 -0700

--
 docs/README.md | 43 ---
 1 file changed, 8 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f7abd6be/docs/README.md
--
diff --git a/docs/README.md b/docs/README.md
index 5020989..1f4fd3e 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -9,12 +9,13 @@ documentation yourself. Why build it yourself? So that you 
have the docs that co
 whichever version of Spark you currently have checked out of revision control.
 
 ## Prerequisites
-The Spark documenation build uses a number of tools to build HTML docs and API 
docs in Scala, Python
-and R. To get started you can run the following commands
+The Spark documentation build uses a number of tools to build HTML docs and 
API docs in Scala,
+Python and R. To get started you can run the following commands
 
 $ sudo gem install jekyll
 $ sudo gem install jekyll-redirect-from
 $ sudo pip install Pygments
+$ sudo pip install sphinx
 $ Rscript -e 'install.packages(c("knitr", "devtools"), 
repos="http://cran.stat.ucla.edu/";)'
 
 
@@ -29,17 +30,12 @@ you have checked out or downloaded.
 In this directory you will find textfiles formatted using Markdown, with an 
".md" suffix. You can
 read those text files directly if you want. Start with index.md.
 
-The markdown code can be compiled to HTML using the [Jekyll 
tool](http://jekyllrb.com).
-`Jekyll` and a few dependencies must be installed for this to work. We 
recommend
-installing via the Ruby Gem dependency manager. Since the exact HTML output
-varies between versions of Jekyll and its dependencies, we list specific 
versions here
-in some cases:
+Execute `jekyll build` from the `docs/` directory to compile the site. 
Compiling the site with
+Jekyll will create a directory called `_site` containing index.html as well as 
the rest of the
+compiled files.
 
-$ sudo gem install jekyll
-$ sudo gem install jekyll-redirect-from
-
-Execute `jekyll build` from the `docs/` directory to compile the site. 
Compiling the site with Jekyll will create a directory
-called `_site` containing index.html as well as the rest of the compiled files.
+$ cd docs
+$ jekyll build
 
 You can modify the default Jekyll build as follows:
 
@@ -50,29 +46,6 @@ You can modify the default Jekyll build as follows:
 # Build the site with extra features used on the live page
 $ PRODUCTION=1 jekyll build
 
-## Pygments
-
-We also use pygments (http://pygments.org) for syntax highlighting in 
documentation markdown pages,
-so you will also need to install that (it requires Python) by running `sudo 
pip install Pygments`.
-
-To mark a block of code in your markdown to be syntax highlighted by jekyll 
during the compile
-phase, use the following sytax:
-
-{% highlight scala %}
-// Your scala code goes here, you can replace scala with many other
-// supported languages too.
-{% endhighlight %}
-
-## Sphinx
-
-We use Sphinx to generate Python API docs, so you will need to install it by 
running
-`sudo pip install sphinx`.
-
-## knitr, devtools
-
-SparkR documentation is written using `roxygen2` and we use `knitr`, 
`devtools` to generate
-documentation. To install these packages you can run 
`install.packages(c("knitr", "devtools"))` from a
-R console.
 
 ## API Docs (Scaladoc, Sphinx, roxygen2)
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9504] [STREAMING] [TESTS] Fix o.a.s.streaming.StreamingContextSuite.stop gracefully again

2015-08-04 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 d196d3607 -> 6e72d24e2


[SPARK-9504] [STREAMING] [TESTS] Fix o.a.s.streaming.StreamingContextSuite.stop 
gracefully again

The test failure is here: 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3150/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/junit/org.apache.spark.streaming/StreamingContextSuite/stop_gracefully/

There is a race condition in TestReceiver that it may add 1 record and increase 
`TestReceiver.counter` after stopping `BlockGenerator`. This PR just adds 
`join` to wait the pushing thread.

Author: zsxwing 

Closes #7934 from zsxwing/SPARK-9504-2 and squashes the following commits:

cfd7973 [zsxwing] Wait for the thread to make sure we won't change 
TestReceiver.counter after stopping BlockGenerator

(cherry picked from commit d34bac0e156432ca6a260db73dbe1318060e309c)
Signed-off-by: Tathagata Das 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6e72d24e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6e72d24e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6e72d24e

Branch: refs/heads/branch-1.5
Commit: 6e72d24e24d0e6d140e1b25597edae3f3054f98a
Parents: d196d36
Author: zsxwing 
Authored: Tue Aug 4 20:09:15 2015 -0700
Committer: Tathagata Das 
Committed: Tue Aug 4 20:09:27 2015 -0700

--
 .../scala/org/apache/spark/streaming/StreamingContextSuite.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6e72d24e/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
--
diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
 
b/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
index b7db280..7423ef6 100644
--- 
a/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
+++ 
b/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
@@ -789,7 +789,8 @@ class TestReceiver extends 
Receiver[Int](StorageLevel.MEMORY_ONLY) with Logging
   }
 
   def onStop() {
-// no clean to be done, the receiving thread should stop on it own
+// no clean to be done, the receiving thread should stop on it own, so 
just wait for it.
+receivingThreadOption.foreach(_.join())
   }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9504] [STREAMING] [TESTS] Fix o.a.s.streaming.StreamingContextSuite.stop gracefully again

2015-08-04 Thread tdas

Repository: spark
Updated Branches:
  refs/heads/master 2b67fdb60 -> d34bac0e1


[SPARK-9504] [STREAMING] [TESTS] Fix o.a.s.streaming.StreamingContextSuite.stop 
gracefully again

The test failure is here: 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/3150/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=centos/testReport/junit/org.apache.spark.streaming/StreamingContextSuite/stop_gracefully/

There is a race condition in TestReceiver that it may add 1 record and increase 
`TestReceiver.counter` after stopping `BlockGenerator`. This PR just adds 
`join` to wait the pushing thread.

Author: zsxwing 

Closes #7934 from zsxwing/SPARK-9504-2 and squashes the following commits:

cfd7973 [zsxwing] Wait for the thread to make sure we won't change 
TestReceiver.counter after stopping BlockGenerator


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d34bac0e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d34bac0e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d34bac0e

Branch: refs/heads/master
Commit: d34bac0e156432ca6a260db73dbe1318060e309c
Parents: 2b67fdb
Author: zsxwing 
Authored: Tue Aug 4 20:09:15 2015 -0700
Committer: Tathagata Das 
Committed: Tue Aug 4 20:09:15 2015 -0700

--
 .../scala/org/apache/spark/streaming/StreamingContextSuite.scala  | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d34bac0e/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
--
diff --git 
a/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
 
b/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
index b7db280..7423ef6 100644
--- 
a/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
+++ 
b/streaming/src/test/scala/org/apache/spark/streaming/StreamingContextSuite.scala
@@ -789,7 +789,8 @@ class TestReceiver extends 
Receiver[Int](StorageLevel.MEMORY_ONLY) with Logging
   }
 
   def onStop() {
-// no clean to be done, the receiving thread should stop on it own
+// no clean to be done, the receiving thread should stop on it own, so 
just wait for it.
+receivingThreadOption.foreach(_.join())
   }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9513] [SQL] [PySpark] Add python API for DataFrame functions

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 f957c59b3 -> d196d3607


[SPARK-9513] [SQL] [PySpark] Add python API for DataFrame functions

This adds Python API for those DataFrame functions that is introduced in 1.5.

There is issue with serialize byte_array in Python 3, so some of functions (for 
BinaryType) does not have tests.

cc rxin

Author: Davies Liu 

Closes #7922 from davies/python_functions and squashes the following commits:

8ad942f [Davies Liu] fix test
5fb6ec3 [Davies Liu] fix bugs
3495ed3 [Davies Liu] fix issues
ea5f7bb [Davies Liu] Add python API for DataFrame functions

(cherry picked from commit 2b67fdb60be95778e016efae4f0a9cdf2fbfe779)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d196d360
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d196d360
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d196d360

Branch: refs/heads/branch-1.5
Commit: d196d3607c26621062369fbe283597f76b85b11c
Parents: f957c59
Author: Davies Liu 
Authored: Tue Aug 4 19:25:24 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 19:25:35 2015 -0700

--
 python/pyspark/sql/functions.py | 885 +--
 .../scala/org/apache/spark/sql/functions.scala  |  80 +-
 2 files changed, 628 insertions(+), 337 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d196d360/python/pyspark/sql/functions.py
--
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index a73ecc7..e65b14d 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -32,41 +32,6 @@ from pyspark.sql.types import StringType
 from pyspark.sql.column import Column, _to_java_column, _to_seq
 
 
-__all__ = [
-'array',
-'approxCountDistinct',
-'bin',
-'coalesce',
-'countDistinct',
-'explode',
-'format_number',
-'length',
-'log2',
-'md5',
-'monotonicallyIncreasingId',
-'rand',
-'randn',
-'regexp_extract',
-'regexp_replace',
-'sha1',
-'sha2',
-'size',
-'sort_array',
-'sparkPartitionId',
-'struct',
-'udf',
-'when']
-
-__all__ += ['lag', 'lead', 'ntile']
-
-__all__ += [
-'date_format', 'date_add', 'date_sub', 'add_months', 'months_between',
-'year', 'quarter', 'month', 'hour', 'minute', 'second',
-'dayofmonth', 'dayofyear', 'weekofyear']
-
-__all__ += ['soundex', 'substring', 'substring_index']
-
-
 def _create_function(name, doc=""):
 """ Create a function for aggregator by name"""
 def _(col):
@@ -208,30 +173,6 @@ for _name, _doc in _binary_mathfunctions.items():
 for _name, _doc in _window_functions.items():
 globals()[_name] = since(1.4)(_create_window_function(_name, _doc))
 del _name, _doc
-__all__ += _functions.keys()
-__all__ += _functions_1_4.keys()
-__all__ += _binary_mathfunctions.keys()
-__all__ += _window_functions.keys()
-__all__.sort()
-
-
-@since(1.4)
-def array(*cols):
-"""Creates a new array column.
-
-:param cols: list of column names (string) or list of :class:`Column` 
expressions that have
-the same data type.
-
->>> df.select(array('age', 'age').alias("arr")).collect()
-[Row(arr=[2, 2]), Row(arr=[5, 5])]
->>> df.select(array([df.age, df.age]).alias("arr")).collect()
-[Row(arr=[2, 2]), Row(arr=[5, 5])]
-"""
-sc = SparkContext._active_spark_context
-if len(cols) == 1 and isinstance(cols[0], (list, set)):
-cols = cols[0]
-jc = sc._jvm.functions.array(_to_seq(sc, cols, _to_java_column))
-return Column(jc)
 
 
 @since(1.3)
@@ -249,19 +190,6 @@ def approxCountDistinct(col, rsd=None):
 return Column(jc)
 
 
-@ignore_unicode_prefix
-@since(1.5)
-def bin(col):
-"""Returns the string representation of the binary value of the given 
column.
-
->>> df.select(bin(df.age).alias('c')).collect()
-[Row(c=u'10'), Row(c=u'101')]
-"""
-sc = SparkContext._active_spark_context
-jc = sc._jvm.functions.bin(_to_java_column(col))
-return Column(jc)
-
-
 @since(1.4)
 def coalesce(*cols):
 """Returns the first column that is not null.
@@ -315,82 +243,6 @@ def countDistinct(col, *cols):
 
 
 @since(1.4)
-def explode(col):
-"""Returns a new row for each element in the given array or map.
-
->>> from pyspark.sql import Row
->>> eDF = sqlContext.createDataFrame([Row(a=1, intlist=[1,2,3], 
mapfield={"a": "b"})])
->>> eDF.select(explode(eDF.intlist).alias("anInt")).collect()
-[Row(anInt=1), Row(anInt=2), Row(anInt=3)]
-
->>> eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
-+---+-+
-|key|value|
-+---+-+
-|  a|b|
-+---+-+
-"""
-sc = SparkContext._active_spark_context

spark git commit: [SPARK-9513] [SQL] [PySpark] Add python API for DataFrame functions

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 6f8f0e265 -> 2b67fdb60


[SPARK-9513] [SQL] [PySpark] Add python API for DataFrame functions

This adds Python API for those DataFrame functions that is introduced in 1.5.

There is issue with serialize byte_array in Python 3, so some of functions (for 
BinaryType) does not have tests.

cc rxin

Author: Davies Liu 

Closes #7922 from davies/python_functions and squashes the following commits:

8ad942f [Davies Liu] fix test
5fb6ec3 [Davies Liu] fix bugs
3495ed3 [Davies Liu] fix issues
ea5f7bb [Davies Liu] Add python API for DataFrame functions


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2b67fdb6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2b67fdb6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2b67fdb6

Branch: refs/heads/master
Commit: 2b67fdb60be95778e016efae4f0a9cdf2fbfe779
Parents: 6f8f0e2
Author: Davies Liu 
Authored: Tue Aug 4 19:25:24 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 19:25:24 2015 -0700

--
 python/pyspark/sql/functions.py | 885 +--
 .../scala/org/apache/spark/sql/functions.scala  |  80 +-
 2 files changed, 628 insertions(+), 337 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2b67fdb6/python/pyspark/sql/functions.py
--
diff --git a/python/pyspark/sql/functions.py b/python/pyspark/sql/functions.py
index a73ecc7..e65b14d 100644
--- a/python/pyspark/sql/functions.py
+++ b/python/pyspark/sql/functions.py
@@ -32,41 +32,6 @@ from pyspark.sql.types import StringType
 from pyspark.sql.column import Column, _to_java_column, _to_seq
 
 
-__all__ = [
-'array',
-'approxCountDistinct',
-'bin',
-'coalesce',
-'countDistinct',
-'explode',
-'format_number',
-'length',
-'log2',
-'md5',
-'monotonicallyIncreasingId',
-'rand',
-'randn',
-'regexp_extract',
-'regexp_replace',
-'sha1',
-'sha2',
-'size',
-'sort_array',
-'sparkPartitionId',
-'struct',
-'udf',
-'when']
-
-__all__ += ['lag', 'lead', 'ntile']
-
-__all__ += [
-'date_format', 'date_add', 'date_sub', 'add_months', 'months_between',
-'year', 'quarter', 'month', 'hour', 'minute', 'second',
-'dayofmonth', 'dayofyear', 'weekofyear']
-
-__all__ += ['soundex', 'substring', 'substring_index']
-
-
 def _create_function(name, doc=""):
 """ Create a function for aggregator by name"""
 def _(col):
@@ -208,30 +173,6 @@ for _name, _doc in _binary_mathfunctions.items():
 for _name, _doc in _window_functions.items():
 globals()[_name] = since(1.4)(_create_window_function(_name, _doc))
 del _name, _doc
-__all__ += _functions.keys()
-__all__ += _functions_1_4.keys()
-__all__ += _binary_mathfunctions.keys()
-__all__ += _window_functions.keys()
-__all__.sort()
-
-
-@since(1.4)
-def array(*cols):
-"""Creates a new array column.
-
-:param cols: list of column names (string) or list of :class:`Column` 
expressions that have
-the same data type.
-
->>> df.select(array('age', 'age').alias("arr")).collect()
-[Row(arr=[2, 2]), Row(arr=[5, 5])]
->>> df.select(array([df.age, df.age]).alias("arr")).collect()
-[Row(arr=[2, 2]), Row(arr=[5, 5])]
-"""
-sc = SparkContext._active_spark_context
-if len(cols) == 1 and isinstance(cols[0], (list, set)):
-cols = cols[0]
-jc = sc._jvm.functions.array(_to_seq(sc, cols, _to_java_column))
-return Column(jc)
 
 
 @since(1.3)
@@ -249,19 +190,6 @@ def approxCountDistinct(col, rsd=None):
 return Column(jc)
 
 
-@ignore_unicode_prefix
-@since(1.5)
-def bin(col):
-"""Returns the string representation of the binary value of the given 
column.
-
->>> df.select(bin(df.age).alias('c')).collect()
-[Row(c=u'10'), Row(c=u'101')]
-"""
-sc = SparkContext._active_spark_context
-jc = sc._jvm.functions.bin(_to_java_column(col))
-return Column(jc)
-
-
 @since(1.4)
 def coalesce(*cols):
 """Returns the first column that is not null.
@@ -315,82 +243,6 @@ def countDistinct(col, *cols):
 
 
 @since(1.4)
-def explode(col):
-"""Returns a new row for each element in the given array or map.
-
->>> from pyspark.sql import Row
->>> eDF = sqlContext.createDataFrame([Row(a=1, intlist=[1,2,3], 
mapfield={"a": "b"})])
->>> eDF.select(explode(eDF.intlist).alias("anInt")).collect()
-[Row(anInt=1), Row(anInt=2), Row(anInt=3)]
-
->>> eDF.select(explode(eDF.mapfield).alias("key", "value")).show()
-+---+-+
-|key|value|
-+---+-+
-|  a|b|
-+---+-+
-"""
-sc = SparkContext._active_spark_context
-jc = sc._jvm.functions.explode(_to_java_column(col))
-return Column(jc)
-
-
-@ignore_unicode_pref

spark git commit: [SPARK-7119] [SQL] Give script a default serde with the user specific types

2015-08-04 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 11d231159 -> f957c59b3


[SPARK-7119] [SQL] Give script a default serde with the user specific types

This is to address this issue that there would be not compatible type exception 
when running this:
`from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 
string)) t select thing1 + 2;`

15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted 
due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: 
org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
at 
org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
at 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

chenghao-intel marmbrus

Author: zhichao.li 

Closes #6638 from zhichao-li/transDataType2 and squashes the following commits:

a36cc7c [zhichao.li] style
b9252a8 [zhichao.li] delete cacheRow
f6968a4 [zhichao.li] give script a default serde

(cherry picked from commit 6f8f0e265a29e89bd5192a8d5217cba19f0875da)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f957c59b
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f957c59b
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f957c59b

Branch: refs/heads/branch-1.5
Commit: f957c59b3f7f8851103bb1e36d053dc1402ebb0c
Parents: 11d2311
Author: zhichao.li 
Authored: Tue Aug 4 18:26:05 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Aug 4 18:26:18 2015 -0700

--
 .../org/apache/spark/sql/hive/HiveQl.scala  |  3 +-
 .../hive/execution/ScriptTransformation.scala   | 96 
 .../sql/hive/execution/SQLQuerySuite.scala  | 10 ++
 3 files changed, 49 insertions(+), 60 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f957c59b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
index e2fdfc6..f43e403 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
@@ -21,6 +21,7 @@ import java.sql.Date
 import java.util.Locale
 
 import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars
 import org.apache.hadoop.hive.serde.serdeConstants
 import org.apache.hadoop.hive.ql.{ErrorMsg, Con

spark git commit: [SPARK-7119] [SQL] Give script a default serde with the user specific types

2015-08-04 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master c9a4c36d0 -> 6f8f0e265


[SPARK-7119] [SQL] Give script a default serde with the user specific types

This is to address this issue that there would be not compatible type exception 
when running this:
`from (from src select transform(key, value) using 'cat' as (thing1 int, thing2 
string)) t select thing1 + 2;`

15/04/24 00:58:55 ERROR CliDriver: org.apache.spark.SparkException: Job aborted 
due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.ClassCastException: 
org.apache.spark.sql.types.UTF8String cannot be cast to java.lang.Integer
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
at scala.math.Numeric$IntIsIntegral$.plus(Numeric.scala:57)
at 
org.apache.spark.sql.catalyst.expressions.Add.eval(arithmetic.scala:127)
at 
org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:118)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:68)
at 
org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(Projection.scala:52)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
at 
scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
at scala.collection.AbstractIterator.to(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
at 
scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at org.apache.spark.rdd.RDD$$anonfun$17.apply(RDD.scala:819)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1618)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:63)
at org.apache.spark.scheduler.Task.run(Task.scala:64)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:209)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)

chenghao-intel marmbrus

Author: zhichao.li 

Closes #6638 from zhichao-li/transDataType2 and squashes the following commits:

a36cc7c [zhichao.li] style
b9252a8 [zhichao.li] delete cacheRow
f6968a4 [zhichao.li] give script a default serde


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6f8f0e26
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6f8f0e26
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6f8f0e26

Branch: refs/heads/master
Commit: 6f8f0e265a29e89bd5192a8d5217cba19f0875da
Parents: c9a4c36
Author: zhichao.li 
Authored: Tue Aug 4 18:26:05 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Aug 4 18:26:05 2015 -0700

--
 .../org/apache/spark/sql/hive/HiveQl.scala  |  3 +-
 .../hive/execution/ScriptTransformation.scala   | 96 
 .../sql/hive/execution/SQLQuerySuite.scala  | 10 ++
 3 files changed, 49 insertions(+), 60 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6f8f0e26/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
--
diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
index e2fdfc6..f43e403 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala
@@ -21,6 +21,7 @@ import java.sql.Date
 import java.util.Locale
 
 import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.conf.HiveConf.ConfVars
 import org.apache.hadoop.hive.serde.serdeConstants
 import org.apache.hadoop.hive.ql.{ErrorMsg, Context}
 import org.apache.hadoop.hive.ql.exec.{FunctionRegistry, FunctionInfo}
@@ -907,7 +908,7 @@ 
https://cwik

spark git commit: [SPARK-8313] R Spark packages support

2015-08-04 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 02a6333d2 -> 11d231159


[SPARK-8313] R Spark packages support

shivaram cafreeman Could you please help me in testing this out? Exposing and 
running `rPackageBuilder` from inside the shell works, but for some reason, I 
can't get it to work during Spark Submit. It just starts relaunching Spark 
Submit.

For testing, you may use the R branch with 
[sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can 
call spPackage, and then pass the jar using `--jars`.

Author: Burak Yavuz 

Closes #7139 from brkyvz/r-submit and squashes the following commits:

0de384f [Burak Yavuz] remove unused imports 2
d253708 [Burak Yavuz] removed unused imports
6603d0d [Burak Yavuz] addressed comments
4258ffe [Burak Yavuz] merged master
ddfcc06 [Burak Yavuz] added zipping test
3a1be7d [Burak Yavuz] don't zip
77995df [Burak Yavuz] fix URI
ac45527 [Burak Yavuz] added zipping of all libs
e6bf7b0 [Burak Yavuz] add println ignores
1bc5554 [Burak Yavuz] add assumes for tests
9778e03 [Burak Yavuz] addressed comments
b42b300 [Burak Yavuz] merged master
ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into 
r-submit
d867756 [Burak Yavuz] add apache header
eff5ba1 [Burak Yavuz] ready for review
8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into 
r-submit
e5b5a06 [Burak Yavuz] added doc
bb751ce [Burak Yavuz] fix null bug
0226768 [Burak Yavuz] fixed issues
8810beb [Burak Yavuz] R packages support

(cherry picked from commit c9a4c36d052456c2dd1f7e0a871c6b764b5064d2)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/11d23115
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/11d23115
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/11d23115

Branch: refs/heads/branch-1.5
Commit: 11d2311593587a52ee5015fb0ffd6403ea1138b0
Parents: 02a6333
Author: Burak Yavuz 
Authored: Tue Aug 4 18:20:12 2015 -0700
Committer: Shivaram Venkataraman 
Committed: Tue Aug 4 18:20:20 2015 -0700

--
 R/install-dev.sh|   4 -
 R/pkg/inst/tests/packageInAJarTest.R|  30 +++
 .../scala/org/apache/spark/api/r/RUtils.scala   |  14 +-
 .../org/apache/spark/deploy/RPackageUtils.scala | 232 +++
 .../org/apache/spark/deploy/SparkSubmit.scala   |  11 +-
 .../spark/deploy/SparkSubmitArguments.scala |   1 -
 .../org/apache/spark/deploy/IvyTestUtils.scala  | 101 ++--
 .../spark/deploy/RPackageUtilsSuite.scala   | 156 +
 .../apache/spark/deploy/SparkSubmitSuite.scala  |  24 ++
 9 files changed, 538 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/11d23115/R/install-dev.sh
--
diff --git a/R/install-dev.sh b/R/install-dev.sh
index 4972bb9..59d98c9 100755
--- a/R/install-dev.sh
+++ b/R/install-dev.sh
@@ -42,8 +42,4 @@ Rscript -e ' if("devtools" %in% 
rownames(installed.packages())) { library(devtoo
 # Install SparkR to $LIB_DIR
 R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/
 
-# Zip the SparkR package so that it can be distributed to worker nodes on YARN
-cd $LIB_DIR
-jar cfM "$LIB_DIR/sparkr.zip" SparkR
-
 popd > /dev/null

http://git-wip-us.apache.org/repos/asf/spark/blob/11d23115/R/pkg/inst/tests/packageInAJarTest.R
--
diff --git a/R/pkg/inst/tests/packageInAJarTest.R 
b/R/pkg/inst/tests/packageInAJarTest.R
new file mode 100644
index 000..207a37a
--- /dev/null
+++ b/R/pkg/inst/tests/packageInAJarTest.R
@@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+library(SparkR)
+library(sparkPackageTest)
+
+sc <- sparkR.init()
+
+run1 <- myfunc(5L)
+
+run2 <- myfunc(-4L)
+
+sparkR.stop()
+
+if(run1 != 6) quit(save = "no", status = 1)
+
+if(run2 != -3) quit(save = "no", status = 1)

http://git-wip-us.apache.org/repos/asf/spark/blob/11d23115/core/src/main/scala/org/apache/spark/api/r/RUtils.scala
--

spark git commit: [SPARK-8313] R Spark packages support

2015-08-04 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/master a7fe48f68 -> c9a4c36d0


[SPARK-8313] R Spark packages support

shivaram cafreeman Could you please help me in testing this out? Exposing and 
running `rPackageBuilder` from inside the shell works, but for some reason, I 
can't get it to work during Spark Submit. It just starts relaunching Spark 
Submit.

For testing, you may use the R branch with 
[sbt-spark-package](https://github.com/databricks/sbt-spark-package). You can 
call spPackage, and then pass the jar using `--jars`.

Author: Burak Yavuz 

Closes #7139 from brkyvz/r-submit and squashes the following commits:

0de384f [Burak Yavuz] remove unused imports 2
d253708 [Burak Yavuz] removed unused imports
6603d0d [Burak Yavuz] addressed comments
4258ffe [Burak Yavuz] merged master
ddfcc06 [Burak Yavuz] added zipping test
3a1be7d [Burak Yavuz] don't zip
77995df [Burak Yavuz] fix URI
ac45527 [Burak Yavuz] added zipping of all libs
e6bf7b0 [Burak Yavuz] add println ignores
1bc5554 [Burak Yavuz] add assumes for tests
9778e03 [Burak Yavuz] addressed comments
b42b300 [Burak Yavuz] merged master
ffd134e [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into 
r-submit
d867756 [Burak Yavuz] add apache header
eff5ba1 [Burak Yavuz] ready for review
8838edb [Burak Yavuz] Merge branch 'master' of github.com:apache/spark into 
r-submit
e5b5a06 [Burak Yavuz] added doc
bb751ce [Burak Yavuz] fix null bug
0226768 [Burak Yavuz] fixed issues
8810beb [Burak Yavuz] R packages support


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c9a4c36d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c9a4c36d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c9a4c36d

Branch: refs/heads/master
Commit: c9a4c36d052456c2dd1f7e0a871c6b764b5064d2
Parents: a7fe48f
Author: Burak Yavuz 
Authored: Tue Aug 4 18:20:12 2015 -0700
Committer: Shivaram Venkataraman 
Committed: Tue Aug 4 18:20:12 2015 -0700

--
 R/install-dev.sh|   4 -
 R/pkg/inst/tests/packageInAJarTest.R|  30 +++
 .../scala/org/apache/spark/api/r/RUtils.scala   |  14 +-
 .../org/apache/spark/deploy/RPackageUtils.scala | 232 +++
 .../org/apache/spark/deploy/SparkSubmit.scala   |  11 +-
 .../spark/deploy/SparkSubmitArguments.scala |   1 -
 .../org/apache/spark/deploy/IvyTestUtils.scala  | 101 ++--
 .../spark/deploy/RPackageUtilsSuite.scala   | 156 +
 .../apache/spark/deploy/SparkSubmitSuite.scala  |  24 ++
 9 files changed, 538 insertions(+), 35 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c9a4c36d/R/install-dev.sh
--
diff --git a/R/install-dev.sh b/R/install-dev.sh
index 4972bb9..59d98c9 100755
--- a/R/install-dev.sh
+++ b/R/install-dev.sh
@@ -42,8 +42,4 @@ Rscript -e ' if("devtools" %in% 
rownames(installed.packages())) { library(devtoo
 # Install SparkR to $LIB_DIR
 R CMD INSTALL --library=$LIB_DIR $FWDIR/pkg/
 
-# Zip the SparkR package so that it can be distributed to worker nodes on YARN
-cd $LIB_DIR
-jar cfM "$LIB_DIR/sparkr.zip" SparkR
-
 popd > /dev/null

http://git-wip-us.apache.org/repos/asf/spark/blob/c9a4c36d/R/pkg/inst/tests/packageInAJarTest.R
--
diff --git a/R/pkg/inst/tests/packageInAJarTest.R 
b/R/pkg/inst/tests/packageInAJarTest.R
new file mode 100644
index 000..207a37a
--- /dev/null
+++ b/R/pkg/inst/tests/packageInAJarTest.R
@@ -0,0 +1,30 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+library(SparkR)
+library(sparkPackageTest)
+
+sc <- sparkR.init()
+
+run1 <- myfunc(5L)
+
+run2 <- myfunc(-4L)
+
+sparkR.stop()
+
+if(run1 != 6) quit(save = "no", status = 1)
+
+if(run2 != -3) quit(save = "no", status = 1)

http://git-wip-us.apache.org/repos/asf/spark/blob/c9a4c36d/core/src/main/scala/org/apache/spark/api/r/RUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/r/RUt

spark git commit: [SPARK-9432][SQL] Audit expression unit tests to make sure we pass the proper numeric ranges

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 2237ddbe0 -> 02a6333d2


[SPARK-9432][SQL] Audit expression unit tests to make sure we pass the proper 
numeric ranges

JIRA: https://issues.apache.org/jira/browse/SPARK-9432

Author: Yijie Shen 

Closes #7933 from yjshen/numeric_ranges and squashes the following commits:

e719f78 [Yijie Shen] proper integral range check

(cherry picked from commit a7fe48f68727d5c0247698cff329fb12faff1d50)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/02a6333d
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/02a6333d
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/02a6333d

Branch: refs/heads/branch-1.5
Commit: 02a6333d23345275fba90a3ffa66dc3c3c0e0ff0
Parents: 2237ddb
Author: Yijie Shen 
Authored: Tue Aug 4 18:19:26 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 18:19:34 2015 -0700

--
 .../expressions/ArithmeticExpressionSuite.scala | 51 ++--
 .../expressions/BitwiseFunctionsSuite.scala | 21 
 .../expressions/DateExpressionsSuite.scala  | 14 ++
 .../expressions/IntegralLiteralTestUtils.scala  | 42 
 .../expressions/MathFunctionsSuite.scala| 40 +++
 5 files changed, 164 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/02a6333d/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
index 0bae8fe..a1f15e4 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
@@ -23,6 +23,8 @@ import org.apache.spark.sql.types._
 
 class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 
+  import IntegralLiteralTestUtils._
+
   /**
* Runs through the testFunc for all numeric data types.
*
@@ -47,6 +49,9 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
   checkEvaluation(Add(Literal.create(null, left.dataType), right), null)
   checkEvaluation(Add(left, Literal.create(null, right.dataType)), null)
 }
+checkEvaluation(Add(positiveShortLit, negativeShortLit), -1.toShort)
+checkEvaluation(Add(positiveIntLit, negativeIntLit), -1)
+checkEvaluation(Add(positiveLongLit, negativeLongLit), -1L)
   }
 
   test("- (UnaryMinus)") {
@@ -60,6 +65,12 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
 checkEvaluation(UnaryMinus(Literal(Int.MinValue)), Int.MinValue)
 checkEvaluation(UnaryMinus(Literal(Short.MinValue)), Short.MinValue)
 checkEvaluation(UnaryMinus(Literal(Byte.MinValue)), Byte.MinValue)
+checkEvaluation(UnaryMinus(positiveShortLit), (- positiveShort).toShort)
+checkEvaluation(UnaryMinus(negativeShortLit), (- negativeShort).toShort)
+checkEvaluation(UnaryMinus(positiveIntLit), - positiveInt)
+checkEvaluation(UnaryMinus(negativeIntLit), - negativeInt)
+checkEvaluation(UnaryMinus(positiveLongLit), - positiveLong)
+checkEvaluation(UnaryMinus(negativeLongLit), - negativeLong)
   }
 
   test("- (Minus)") {
@@ -70,6 +81,10 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
   checkEvaluation(Subtract(Literal.create(null, left.dataType), right), 
null)
   checkEvaluation(Subtract(left, Literal.create(null, right.dataType)), 
null)
 }
+checkEvaluation(Subtract(positiveShortLit, negativeShortLit),
+  (positiveShort - negativeShort).toShort)
+checkEvaluation(Subtract(positiveIntLit, negativeIntLit), positiveInt - 
negativeInt)
+checkEvaluation(Subtract(positiveLongLit, negativeLongLit), positiveLong - 
negativeLong)
   }
 
   test("* (Multiply)") {
@@ -80,6 +95,10 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
   checkEvaluation(Multiply(Literal.create(null, left.dataType), right), 
null)
   checkEvaluation(Multiply(left, Literal.create(null, right.dataType)), 
null)
 }
+checkEvaluation(Multiply(positiveShortLit, negativeShortLit),
+  (positiveShort * negativeShort).toShort)
+checkEvaluation(Multiply(positiveIntLit, negativeIntLit), positiveInt * 
negativeInt)
+checkEvaluation(Multiply(positiveLongLit, negativeLongLit), positiveLong * 
negativeLong)
   }
 
   test("/ (Divide) basic") {
@@ -99,6 +118,9 @@ class ArithmeticExpressionSuite extends Sp

spark git commit: [SPARK-9432][SQL] Audit expression unit tests to make sure we pass the proper numeric ranges

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master d92fa1417 -> a7fe48f68


[SPARK-9432][SQL] Audit expression unit tests to make sure we pass the proper 
numeric ranges

JIRA: https://issues.apache.org/jira/browse/SPARK-9432

Author: Yijie Shen 

Closes #7933 from yjshen/numeric_ranges and squashes the following commits:

e719f78 [Yijie Shen] proper integral range check


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a7fe48f6
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a7fe48f6
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a7fe48f6

Branch: refs/heads/master
Commit: a7fe48f68727d5c0247698cff329fb12faff1d50
Parents: d92fa14
Author: Yijie Shen 
Authored: Tue Aug 4 18:19:26 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 18:19:26 2015 -0700

--
 .../expressions/ArithmeticExpressionSuite.scala | 51 ++--
 .../expressions/BitwiseFunctionsSuite.scala | 21 
 .../expressions/DateExpressionsSuite.scala  | 14 ++
 .../expressions/IntegralLiteralTestUtils.scala  | 42 
 .../expressions/MathFunctionsSuite.scala| 40 +++
 5 files changed, 164 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a7fe48f6/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
--
diff --git 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
index 0bae8fe..a1f15e4 100644
--- 
a/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
+++ 
b/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/ArithmeticExpressionSuite.scala
@@ -23,6 +23,8 @@ import org.apache.spark.sql.types._
 
 class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 
+  import IntegralLiteralTestUtils._
+
   /**
* Runs through the testFunc for all numeric data types.
*
@@ -47,6 +49,9 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
   checkEvaluation(Add(Literal.create(null, left.dataType), right), null)
   checkEvaluation(Add(left, Literal.create(null, right.dataType)), null)
 }
+checkEvaluation(Add(positiveShortLit, negativeShortLit), -1.toShort)
+checkEvaluation(Add(positiveIntLit, negativeIntLit), -1)
+checkEvaluation(Add(positiveLongLit, negativeLongLit), -1L)
   }
 
   test("- (UnaryMinus)") {
@@ -60,6 +65,12 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
 checkEvaluation(UnaryMinus(Literal(Int.MinValue)), Int.MinValue)
 checkEvaluation(UnaryMinus(Literal(Short.MinValue)), Short.MinValue)
 checkEvaluation(UnaryMinus(Literal(Byte.MinValue)), Byte.MinValue)
+checkEvaluation(UnaryMinus(positiveShortLit), (- positiveShort).toShort)
+checkEvaluation(UnaryMinus(negativeShortLit), (- negativeShort).toShort)
+checkEvaluation(UnaryMinus(positiveIntLit), - positiveInt)
+checkEvaluation(UnaryMinus(negativeIntLit), - negativeInt)
+checkEvaluation(UnaryMinus(positiveLongLit), - positiveLong)
+checkEvaluation(UnaryMinus(negativeLongLit), - negativeLong)
   }
 
   test("- (Minus)") {
@@ -70,6 +81,10 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
   checkEvaluation(Subtract(Literal.create(null, left.dataType), right), 
null)
   checkEvaluation(Subtract(left, Literal.create(null, right.dataType)), 
null)
 }
+checkEvaluation(Subtract(positiveShortLit, negativeShortLit),
+  (positiveShort - negativeShort).toShort)
+checkEvaluation(Subtract(positiveIntLit, negativeIntLit), positiveInt - 
negativeInt)
+checkEvaluation(Subtract(positiveLongLit, negativeLongLit), positiveLong - 
negativeLong)
   }
 
   test("* (Multiply)") {
@@ -80,6 +95,10 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
   checkEvaluation(Multiply(Literal.create(null, left.dataType), right), 
null)
   checkEvaluation(Multiply(left, Literal.create(null, right.dataType)), 
null)
 }
+checkEvaluation(Multiply(positiveShortLit, negativeShortLit),
+  (positiveShort * negativeShort).toShort)
+checkEvaluation(Multiply(positiveIntLit, negativeIntLit), positiveInt * 
negativeInt)
+checkEvaluation(Multiply(positiveLongLit, negativeLongLit), positiveLong * 
negativeLong)
   }
 
   test("/ (Divide) basic") {
@@ -99,6 +118,9 @@ class ArithmeticExpressionSuite extends SparkFunSuite with 
ExpressionEvalHelper
 checkEvaluation(Divide(Literal(1.toShort), Literal(2.toShort))

spark git commit: [SPARK-8601] [ML] Add an option to disable standardization for linear regression

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 629e26f7e -> d92fa1417


[SPARK-8601] [ML] Add an option to disable standardization for linear regression

All compressed sensing applications, and some of the regression use-cases will 
have better result by turning the feature scaling off. However, if we implement 
this naively by training the dataset without doing any standardization, the 
rate of convergency will not be good. This can be implemented by still 
standardizing the training dataset but we penalize each component differently 
to get effectively the same objective function but a better numerical problem. 
As a result, for those columns with high variances, they will be penalized 
less, and vice versa. Without this, since all the features are standardized, so 
they will be penalized the same.

In R, there is an option for this.
standardize

Logical flag for x variable standardization, prior to fitting the model 
sequence. The coefficients are always returned on the original scale. Default 
is standardize=TRUE. If variables are in the same units already, you might not 
wish to standardize. See details below for y standardization with 
family="gaussian".

Note that the primary author for this PR is holdenk

Author: Holden Karau 
Author: DB Tsai 

Closes #7875 from dbtsai/SPARK-8522 and squashes the following commits:

e856036 [DB Tsai] scala doc
596e96c [DB Tsai] minor
bbff347 [DB Tsai] naming
baa0805 [DB Tsai] touch up
d6234ba [DB Tsai] Merge branch 'master' into 
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression
6b1dc09 [Holden Karau] Merge branch 'master' into 
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression
332f140 [Holden Karau] Merge in master
eebe10a [Holden Karau] Use same comparision operator throughout the test
3f92935 [Holden Karau] merge
b83a41e [Holden Karau] Expand the tests and make them similar to the other PR 
also providing an option to disable standardization (but for LoR).
0c334a2 [Holden Karau] Remove extra line
99ce053 [Holden Karau] merge in master
e54a8a9 [Holden Karau] Fix long line
e47c574 [Holden Karau] Add support for L2 without standardization.
55d3a66 [Holden Karau] Add standardization param for linear regression
00a1dc5 [Holden Karau] Add the param to the linearregression impl


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d92fa141
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d92fa141
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d92fa141

Branch: refs/heads/master
Commit: d92fa14179287c996407d9c7d249103109f9cdef
Parents: 629e26f
Author: Holden Karau 
Authored: Tue Aug 4 18:15:26 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 18:15:26 2015 -0700

--
 .../ml/classification/LogisticRegression.scala  |   6 +-
 .../spark/ml/regression/LinearRegression.scala  |  70 -
 .../ml/regression/LinearRegressionSuite.scala   | 278 ++-
 3 files changed, 268 insertions(+), 86 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d92fa141/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
index c937b960..0d07383 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
@@ -133,9 +133,9 @@ class LogisticRegression(override val uid: String)
   /**
* Whether to standardize the training features before fitting the model.
* The coefficients of models will be always returned on the original scale,
-   * so it will be transparent for users. Note that when no regularization,
-   * with or without standardization, the models should be always converged to
-   * the same solution.
+   * so it will be transparent for users. Note that with/without 
standardization,
+   * the models should be always converged to the same solution when no 
regularization
+   * is applied. In R's GLMNET package, the default behavior is true as well.
* Default is true.
* @group setParam
* */

http://git-wip-us.apache.org/repos/asf/spark/blob/d92fa141/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 
b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
index 3b85ba0..92d819b 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
+++ b

spark git commit: [SPARK-8601] [ML] Add an option to disable standardization for linear regression

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 335097548 -> 2237ddbe0


[SPARK-8601] [ML] Add an option to disable standardization for linear regression

All compressed sensing applications, and some of the regression use-cases will 
have better result by turning the feature scaling off. However, if we implement 
this naively by training the dataset without doing any standardization, the 
rate of convergency will not be good. This can be implemented by still 
standardizing the training dataset but we penalize each component differently 
to get effectively the same objective function but a better numerical problem. 
As a result, for those columns with high variances, they will be penalized 
less, and vice versa. Without this, since all the features are standardized, so 
they will be penalized the same.

In R, there is an option for this.
standardize

Logical flag for x variable standardization, prior to fitting the model 
sequence. The coefficients are always returned on the original scale. Default 
is standardize=TRUE. If variables are in the same units already, you might not 
wish to standardize. See details below for y standardization with 
family="gaussian".

Note that the primary author for this PR is holdenk

Author: Holden Karau 
Author: DB Tsai 

Closes #7875 from dbtsai/SPARK-8522 and squashes the following commits:

e856036 [DB Tsai] scala doc
596e96c [DB Tsai] minor
bbff347 [DB Tsai] naming
baa0805 [DB Tsai] touch up
d6234ba [DB Tsai] Merge branch 'master' into 
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression
6b1dc09 [Holden Karau] Merge branch 'master' into 
SPARK-8522-Disable-Linear_featureScaling-Spark-8601-in-Linear_regression
332f140 [Holden Karau] Merge in master
eebe10a [Holden Karau] Use same comparision operator throughout the test
3f92935 [Holden Karau] merge
b83a41e [Holden Karau] Expand the tests and make them similar to the other PR 
also providing an option to disable standardization (but for LoR).
0c334a2 [Holden Karau] Remove extra line
99ce053 [Holden Karau] merge in master
e54a8a9 [Holden Karau] Fix long line
e47c574 [Holden Karau] Add support for L2 without standardization.
55d3a66 [Holden Karau] Add standardization param for linear regression
00a1dc5 [Holden Karau] Add the param to the linearregression impl

(cherry picked from commit d92fa14179287c996407d9c7d249103109f9cdef)
Signed-off-by: Joseph K. Bradley 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2237ddbe
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2237ddbe
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2237ddbe

Branch: refs/heads/branch-1.5
Commit: 2237ddbe027be084afd85fc5b7a7c22270b6e7f6
Parents: 3350975
Author: Holden Karau 
Authored: Tue Aug 4 18:15:26 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 18:15:35 2015 -0700

--
 .../ml/classification/LogisticRegression.scala  |   6 +-
 .../spark/ml/regression/LinearRegression.scala  |  70 -
 .../ml/regression/LinearRegressionSuite.scala   | 278 ++-
 3 files changed, 268 insertions(+), 86 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2237ddbe/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
index c937b960..0d07383 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
@@ -133,9 +133,9 @@ class LogisticRegression(override val uid: String)
   /**
* Whether to standardize the training features before fitting the model.
* The coefficients of models will be always returned on the original scale,
-   * so it will be transparent for users. Note that when no regularization,
-   * with or without standardization, the models should be always converged to
-   * the same solution.
+   * so it will be transparent for users. Note that with/without 
standardization,
+   * the models should be always converged to the same solution when no 
regularization
+   * is applied. In R's GLMNET package, the default behavior is true as well.
* Default is true.
* @group setParam
* */

http://git-wip-us.apache.org/repos/asf/spark/blob/2237ddbe/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala 
b/mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala
inde

spark git commit: [SPARK-9609] [MLLIB] Fix spelling of Strategy.defaultStrategy

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 1954a7bb1 -> 335097548


[SPARK-9609] [MLLIB] Fix spelling of Strategy.defaultStrategy

jkbradley

Author: Feynman Liang 

Closes #7941 from feynmanliang/SPARK-9609-stategy-spelling and squashes the 
following commits:

d2aafb1 [Feynman Liang] Add deprecated backwards compatibility
aa090a8 [Feynman Liang] Fix spelling

(cherry picked from commit 629e26f7ee916e70f59b017cb6083aa441b26b2c)
Signed-off-by: Joseph K. Bradley 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/33509754
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/33509754
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/33509754

Branch: refs/heads/branch-1.5
Commit: 33509754843fe8eba303c720e6c0f6853b861e7e
Parents: 1954a7b
Author: Feynman Liang 
Authored: Tue Aug 4 18:13:18 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 18:13:27 2015 -0700

--
 .../src/main/scala/org/apache/spark/ml/tree/treeParams.scala | 2 +-
 .../spark/mllib/tree/configuration/BoostingStrategy.scala| 2 +-
 .../org/apache/spark/mllib/tree/configuration/Strategy.scala | 8 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/33509754/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala 
b/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
index e817090..dbd8d31 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
@@ -163,7 +163,7 @@ private[ml] trait DecisionTreeParams extends 
PredictorParams {
   oldAlgo: OldAlgo.Algo,
   oldImpurity: OldImpurity,
   subsamplingRate: Double): OldStrategy = {
-val strategy = OldStrategy.defaultStategy(oldAlgo)
+val strategy = OldStrategy.defaultStrategy(oldAlgo)
 strategy.impurity = oldImpurity
 strategy.checkpointInterval = getCheckpointInterval
 strategy.maxBins = getMaxBins

http://git-wip-us.apache.org/repos/asf/spark/blob/33509754/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
index 9fd30c9..50fe2ac 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
@@ -90,7 +90,7 @@ object BoostingStrategy {
* @return Configuration for boosting algorithm
*/
   def defaultParams(algo: Algo): BoostingStrategy = {
-val treeStrategy = Strategy.defaultStategy(algo)
+val treeStrategy = Strategy.defaultStrategy(algo)
 treeStrategy.maxDepth = 3
 algo match {
   case Algo.Classification =>

http://git-wip-us.apache.org/repos/asf/spark/blob/33509754/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
index ada227c..de2c784 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
@@ -178,14 +178,14 @@ object Strategy {
* @param algo  "Classification" or "Regression"
*/
   def defaultStrategy(algo: String): Strategy = {
-defaultStategy(Algo.fromString(algo))
+defaultStrategy(Algo.fromString(algo))
   }
 
   /**
* Construct a default set of parameters for 
[[org.apache.spark.mllib.tree.DecisionTree]]
* @param algo Algo.Classification or Algo.Regression
*/
-  def defaultStategy(algo: Algo): Strategy = algo match {
+  def defaultStrategy(algo: Algo): Strategy = algo match {
 case Algo.Classification =>
   new Strategy(algo = Classification, impurity = Gini, maxDepth = 10,
 numClasses = 2)
@@ -193,4 +193,8 @@ object Strategy {
   new Strategy(algo = Regression, impurity = Variance, maxDepth = 10,
 numClasses = 0)
   }
+
+  @deprecated("Use Strategy.defaultStrategy instead.", "1.5.0")
+  def defaultStategy(algo: Algo): Strategy = defaultStrategy(algo)
+
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h.

spark git commit: [SPARK-9609] [MLLIB] Fix spelling of Strategy.defaultStrategy

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 7c8fc1f7c -> 629e26f7e


[SPARK-9609] [MLLIB] Fix spelling of Strategy.defaultStrategy

jkbradley

Author: Feynman Liang 

Closes #7941 from feynmanliang/SPARK-9609-stategy-spelling and squashes the 
following commits:

d2aafb1 [Feynman Liang] Add deprecated backwards compatibility
aa090a8 [Feynman Liang] Fix spelling


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/629e26f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/629e26f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/629e26f7

Branch: refs/heads/master
Commit: 629e26f7ee916e70f59b017cb6083aa441b26b2c
Parents: 7c8fc1f
Author: Feynman Liang 
Authored: Tue Aug 4 18:13:18 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 18:13:18 2015 -0700

--
 .../src/main/scala/org/apache/spark/ml/tree/treeParams.scala | 2 +-
 .../spark/mllib/tree/configuration/BoostingStrategy.scala| 2 +-
 .../org/apache/spark/mllib/tree/configuration/Strategy.scala | 8 ++--
 3 files changed, 8 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/629e26f7/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala 
b/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
index e817090..dbd8d31 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
@@ -163,7 +163,7 @@ private[ml] trait DecisionTreeParams extends 
PredictorParams {
   oldAlgo: OldAlgo.Algo,
   oldImpurity: OldImpurity,
   subsamplingRate: Double): OldStrategy = {
-val strategy = OldStrategy.defaultStategy(oldAlgo)
+val strategy = OldStrategy.defaultStrategy(oldAlgo)
 strategy.impurity = oldImpurity
 strategy.checkpointInterval = getCheckpointInterval
 strategy.maxBins = getMaxBins

http://git-wip-us.apache.org/repos/asf/spark/blob/629e26f7/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
index 9fd30c9..50fe2ac 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/BoostingStrategy.scala
@@ -90,7 +90,7 @@ object BoostingStrategy {
* @return Configuration for boosting algorithm
*/
   def defaultParams(algo: Algo): BoostingStrategy = {
-val treeStrategy = Strategy.defaultStategy(algo)
+val treeStrategy = Strategy.defaultStrategy(algo)
 treeStrategy.maxDepth = 3
 algo match {
   case Algo.Classification =>

http://git-wip-us.apache.org/repos/asf/spark/blob/629e26f7/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
index ada227c..de2c784 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/tree/configuration/Strategy.scala
@@ -178,14 +178,14 @@ object Strategy {
* @param algo  "Classification" or "Regression"
*/
   def defaultStrategy(algo: String): Strategy = {
-defaultStategy(Algo.fromString(algo))
+defaultStrategy(Algo.fromString(algo))
   }
 
   /**
* Construct a default set of parameters for 
[[org.apache.spark.mllib.tree.DecisionTree]]
* @param algo Algo.Classification or Algo.Regression
*/
-  def defaultStategy(algo: Algo): Strategy = algo match {
+  def defaultStrategy(algo: Algo): Strategy = algo match {
 case Algo.Classification =>
   new Strategy(algo = Classification, impurity = Gini, maxDepth = 10,
 numClasses = 2)
@@ -193,4 +193,8 @@ object Strategy {
   new Strategy(algo = Regression, impurity = Variance, maxDepth = 10,
 numClasses = 0)
   }
+
+  @deprecated("Use Strategy.defaultStrategy instead.", "1.5.0")
+  def defaultStategy(algo: Algo): Strategy = defaultStrategy(algo)
+
 }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9598][SQL] do not expose generic getter in internal row

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master b77d3b968 -> 7c8fc1f7c


[SPARK-9598][SQL] do not expose generic getter in internal row

Author: Wenchen Fan 

Closes #7932 from cloud-fan/generic-getter and squashes the following commits:

c60de4c [Wenchen Fan] do not expose generic getter in internal row


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7c8fc1f7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7c8fc1f7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7c8fc1f7

Branch: refs/heads/master
Commit: 7c8fc1f7cb837ff5c32811fdeb3ee2b84de2dea4
Parents: b77d3b9
Author: Wenchen Fan 
Authored: Tue Aug 4 17:05:19 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 17:05:19 2015 -0700

--
 .../sql/catalyst/expressions/UnsafeRow.java |  5 --
 .../apache/spark/sql/catalyst/InternalRow.scala | 37 ++--
 .../expressions/GenericSpecializedGetters.scala | 61 
 .../sql/catalyst/expressions/Projection.scala   |  4 +-
 .../expressions/SpecificMutableRow.scala|  2 +-
 .../sql/catalyst/expressions/aggregates.scala   |  2 +-
 .../codegen/GenerateProjection.scala|  2 +-
 .../spark/sql/catalyst/expressions/rows.scala   | 12 ++--
 .../spark/sql/types/GenericArrayData.scala  | 37 +++-
 .../datasources/DataSourceStrategy.scala|  6 +-
 .../spark/sql/columnar/ColumnStatsSuite.scala   | 20 +++
 11 files changed, 80 insertions(+), 108 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/7c8fc1f7/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
--
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
index e6750fc..e3e1622 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
@@ -254,11 +254,6 @@ public final class UnsafeRow extends MutableRow {
   }
 
   @Override
-  public Object genericGet(int ordinal) {
-throw new UnsupportedOperationException();
-  }
-
-  @Override
   public Object get(int ordinal, DataType dataType) {
 if (isNullAt(ordinal) || dataType instanceof NullType) {
   return null;

http://git-wip-us.apache.org/repos/asf/spark/blob/7c8fc1f7/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
index 7656d05..7d17cca 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
@@ -17,15 +17,15 @@
 
 package org.apache.spark.sql.catalyst
 
-import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.{DataType, MapData, ArrayData, Decimal}
+import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
 
 /**
  * An abstract class for row used internal in Spark SQL, which only contain 
the columns as
  * internal types.
  */
-// todo: make InternalRow just extends SpecializedGetters, remove generic 
getter
-abstract class InternalRow extends GenericSpecializedGetters with Serializable 
{
+abstract class InternalRow extends SpecializedGetters with Serializable {
 
   def numFields: Int
 
@@ -50,6 +50,31 @@ abstract class InternalRow extends GenericSpecializedGetters 
with Serializable {
 false
   }
 
+  // Subclasses of InternalRow should implement all special getters and 
equals/hashCode,
+  // or implement this genericGet.
+  protected def genericGet(ordinal: Int): Any = throw new 
IllegalStateException(
+"Concrete internal rows should implement genericGet, " +
+  "or implement all special getters and equals/hashCode")
+
+  // default implementation (slow)
+  private def getAs[T](ordinal: Int) = genericGet(ordinal).asInstanceOf[T]
+  override def isNullAt(ordinal: Int): Boolean = getAs[AnyRef](ordinal) eq null
+  override def get(ordinal: Int, dataType: DataType): AnyRef = getAs(ordinal)
+  override def getBoolean(ordinal: Int): Boolean = getAs(ordinal)
+  override def getByte(ordinal: Int): Byte = getAs(ordinal)
+  override def getShort(ordinal: Int): Short = getAs(ordinal)
+  override def getInt(ordinal: Int): Int = getAs(ordinal)
+  override def getLong(ordinal: Int): Long = getAs(ordinal)
+  override def getFloat(ordinal: Int): Float = getAs(ordinal)
+  override def getDouble(

spark git commit: [SPARK-9598][SQL] do not expose generic getter in internal row

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 cff0fe291 -> 1954a7bb1


[SPARK-9598][SQL] do not expose generic getter in internal row

Author: Wenchen Fan 

Closes #7932 from cloud-fan/generic-getter and squashes the following commits:

c60de4c [Wenchen Fan] do not expose generic getter in internal row

(cherry picked from commit 7c8fc1f7cb837ff5c32811fdeb3ee2b84de2dea4)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1954a7bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1954a7bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1954a7bb

Branch: refs/heads/branch-1.5
Commit: 1954a7bb175122b776870530217159cad366ca6c
Parents: cff0fe2
Author: Wenchen Fan 
Authored: Tue Aug 4 17:05:19 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 17:05:27 2015 -0700

--
 .../sql/catalyst/expressions/UnsafeRow.java |  5 --
 .../apache/spark/sql/catalyst/InternalRow.scala | 37 ++--
 .../expressions/GenericSpecializedGetters.scala | 61 
 .../sql/catalyst/expressions/Projection.scala   |  4 +-
 .../expressions/SpecificMutableRow.scala|  2 +-
 .../sql/catalyst/expressions/aggregates.scala   |  2 +-
 .../codegen/GenerateProjection.scala|  2 +-
 .../spark/sql/catalyst/expressions/rows.scala   | 12 ++--
 .../spark/sql/types/GenericArrayData.scala  | 37 +++-
 .../datasources/DataSourceStrategy.scala|  6 +-
 .../spark/sql/columnar/ColumnStatsSuite.scala   | 20 +++
 11 files changed, 80 insertions(+), 108 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1954a7bb/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
--
diff --git 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
index e6750fc..e3e1622 100644
--- 
a/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
+++ 
b/sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
@@ -254,11 +254,6 @@ public final class UnsafeRow extends MutableRow {
   }
 
   @Override
-  public Object genericGet(int ordinal) {
-throw new UnsupportedOperationException();
-  }
-
-  @Override
   public Object get(int ordinal, DataType dataType) {
 if (isNullAt(ordinal) || dataType instanceof NullType) {
   return null;

http://git-wip-us.apache.org/repos/asf/spark/blob/1954a7bb/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
index 7656d05..7d17cca 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/InternalRow.scala
@@ -17,15 +17,15 @@
 
 package org.apache.spark.sql.catalyst
 
-import org.apache.spark.sql.Row
 import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.types.{DataType, MapData, ArrayData, Decimal}
+import org.apache.spark.unsafe.types.{CalendarInterval, UTF8String}
 
 /**
  * An abstract class for row used internal in Spark SQL, which only contain 
the columns as
  * internal types.
  */
-// todo: make InternalRow just extends SpecializedGetters, remove generic 
getter
-abstract class InternalRow extends GenericSpecializedGetters with Serializable 
{
+abstract class InternalRow extends SpecializedGetters with Serializable {
 
   def numFields: Int
 
@@ -50,6 +50,31 @@ abstract class InternalRow extends GenericSpecializedGetters 
with Serializable {
 false
   }
 
+  // Subclasses of InternalRow should implement all special getters and 
equals/hashCode,
+  // or implement this genericGet.
+  protected def genericGet(ordinal: Int): Any = throw new 
IllegalStateException(
+"Concrete internal rows should implement genericGet, " +
+  "or implement all special getters and equals/hashCode")
+
+  // default implementation (slow)
+  private def getAs[T](ordinal: Int) = genericGet(ordinal).asInstanceOf[T]
+  override def isNullAt(ordinal: Int): Boolean = getAs[AnyRef](ordinal) eq null
+  override def get(ordinal: Int, dataType: DataType): AnyRef = getAs(ordinal)
+  override def getBoolean(ordinal: Int): Boolean = getAs(ordinal)
+  override def getByte(ordinal: Int): Byte = getAs(ordinal)
+  override def getShort(ordinal: Int): Short = getAs(ordinal)
+  override def getInt(ordinal: Int): Int = getAs(ordinal)
+  override def getLong(ordinal: Int): Long

spark git commit: [SPARK-9586] [ML] Update BinaryClassificationEvaluator to use setRawPredictionCol

2015-08-04 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 f4e125acf -> cff0fe291


[SPARK-9586] [ML] Update BinaryClassificationEvaluator to use 
setRawPredictionCol

Update BinaryClassificationEvaluator to use setRawPredictionCol, rather than 
setScoreCol. Deprecated setScoreCol.

I don't think setScoreCol was actually used anywhere (based on search).

CC: mengxr

Author: Joseph K. Bradley 

Closes #7921 from jkbradley/binary-eval-rawpred and squashes the following 
commits:

e5d7dfa [Joseph K. Bradley] Update BinaryClassificationEvaluator to use 
setRawPredictionCol

(cherry picked from commit b77d3b9688d56d33737909375d1d0db07da5827b)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cff0fe29
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cff0fe29
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cff0fe29

Branch: refs/heads/branch-1.5
Commit: cff0fe291aa470ef5cf4e5087c7114fb6360572f
Parents: f4e125a
Author: Joseph K. Bradley 
Authored: Tue Aug 4 16:52:43 2015 -0700
Committer: Xiangrui Meng 
Committed: Tue Aug 4 16:52:53 2015 -0700

--
 .../spark/ml/evaluation/BinaryClassificationEvaluator.scala | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cff0fe29/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
index 4a82b77..5d5cb7e 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
@@ -28,7 +28,7 @@ import org.apache.spark.sql.types.DoubleType
 
 /**
  * :: Experimental ::
- * Evaluator for binary classification, which expects two input columns: score 
and label.
+ * Evaluator for binary classification, which expects two input columns: 
rawPrediction and label.
  */
 @Experimental
 class BinaryClassificationEvaluator(override val uid: String)
@@ -50,6 +50,13 @@ class BinaryClassificationEvaluator(override val uid: String)
   def setMetricName(value: String): this.type = set(metricName, value)
 
   /** @group setParam */
+  def setRawPredictionCol(value: String): this.type = set(rawPredictionCol, 
value)
+
+  /**
+   * @group setParam
+   * @deprecated use [[setRawPredictionCol()]] instead
+   */
+  @deprecated("use setRawPredictionCol instead", "1.5.0")
   def setScoreCol(value: String): this.type = set(rawPredictionCol, value)
 
   /** @group setParam */


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9586] [ML] Update BinaryClassificationEvaluator to use setRawPredictionCol

2015-08-04 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 571d5b536 -> b77d3b968


[SPARK-9586] [ML] Update BinaryClassificationEvaluator to use 
setRawPredictionCol

Update BinaryClassificationEvaluator to use setRawPredictionCol, rather than 
setScoreCol. Deprecated setScoreCol.

I don't think setScoreCol was actually used anywhere (based on search).

CC: mengxr

Author: Joseph K. Bradley 

Closes #7921 from jkbradley/binary-eval-rawpred and squashes the following 
commits:

e5d7dfa [Joseph K. Bradley] Update BinaryClassificationEvaluator to use 
setRawPredictionCol


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b77d3b96
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b77d3b96
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b77d3b96

Branch: refs/heads/master
Commit: b77d3b9688d56d33737909375d1d0db07da5827b
Parents: 571d5b5
Author: Joseph K. Bradley 
Authored: Tue Aug 4 16:52:43 2015 -0700
Committer: Xiangrui Meng 
Committed: Tue Aug 4 16:52:43 2015 -0700

--
 .../spark/ml/evaluation/BinaryClassificationEvaluator.scala | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b77d3b96/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
index 4a82b77..5d5cb7e 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/evaluation/BinaryClassificationEvaluator.scala
@@ -28,7 +28,7 @@ import org.apache.spark.sql.types.DoubleType
 
 /**
  * :: Experimental ::
- * Evaluator for binary classification, which expects two input columns: score 
and label.
+ * Evaluator for binary classification, which expects two input columns: 
rawPrediction and label.
  */
 @Experimental
 class BinaryClassificationEvaluator(override val uid: String)
@@ -50,6 +50,13 @@ class BinaryClassificationEvaluator(override val uid: String)
   def setMetricName(value: String): this.type = set(metricName, value)
 
   /** @group setParam */
+  def setRawPredictionCol(value: String): this.type = set(rawPredictionCol, 
value)
+
+  /**
+   * @group setParam
+   * @deprecated use [[setRawPredictionCol()]] instead
+   */
+  @deprecated("use setRawPredictionCol instead", "1.5.0")
   def setScoreCol(value: String): this.type = set(rawPredictionCol, value)
 
   /** @group setParam */


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix to PySpark.

2015-08-04 Thread meng

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 fe4a4f41a -> f4e125acf


[SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix 
to PySpark.

This PR adds the RowMatrix, IndexedRowMatrix, and CoordinateMatrix distributed 
matrices to PySpark.  Each distributed matrix class acts as a wrapper around 
the Scala/Java counterpart by maintaining a reference to the Java object.  New 
distributed matrices can be created using factory methods added to 
DistributedMatrices, which creates the Java distributed matrix and then wraps 
it with the corresponding PySpark class.  This design allows for simple 
conversion between the various distributed matrices, and lets us re-use the 
Scala code.  Serialization between Python and Java is implemented using 
DataFrames as needed for IndexedRowMatrix and CoordinateMatrix for simplicity.  
Associated documentation and unit-tests have also been added.  To facilitate 
code review, this PR implements access to the rows/entries as RDDs, the number 
of rows & columns, and conversions between the various distributed matrices 
(not including BlockMatrix), and does not implement the other linear algebra 
functions
  of the matrices, although this will be very simple to add now.

Author: Mike Dusenberry 

Closes #7554 from 
dusenberrymw/SPARK-6485_Add_CoordinateMatrix_RowMatrix_IndexedMatrix_to_PySpark 
and squashes the following commits:

bb039cb [Mike Dusenberry] Minor documentation update.
b887c18 [Mike Dusenberry] Updating the matrix conversion logic again to make it 
even cleaner.  Now, we allow the 'rows' parameter in the constructors to be 
either an RDD or the Java matrix object. If 'rows' is an RDD, we create a Java 
matrix object, wrap it, and then store that.  If 'rows' is a Java matrix object 
of the correct type, we just wrap and store that directly.  This is only for 
internal usage, and publicly, we still require 'rows' to be an RDD.  We no 
longer store the 'rows' RDD, and instead just compute it from the Java object 
when needed.  The point of this is that when we do matrix conversions, we do 
the conversion on the Scala/Java side, which returns a Java object, so we 
should use that directly, but exposing 'java_matrix' parameter in the public 
API is not ideal. This non-public feature of allowing 'rows' to be a Java 
matrix object is documented in the '__init__' constructor docstrings, which are 
not part of the generated public API, and doctests are also included.
7f0dcb6 [Mike Dusenberry] Updating module docstring.
cfc1be5 [Mike Dusenberry] Use 'new SQLContext(matrix.rows.sparkContext)' rather 
than 'SQLContext.getOrCreate', as the later doesn't guarantee that the 
SparkContext will be the same as for the matrix.rows data.
687e345 [Mike Dusenberry] Improving conversion performance.  This adds an 
optional 'java_matrix' parameter to the constructors, and pulls the conversion 
logic out into a '_create_from_java' function. Now, if the constructors are 
given a valid Java distributed matrix object as 'java_matrix', they will store 
those internally, rather than create a new one on the Scala/Java side.
3e50b6e [Mike Dusenberry] Moving the distributed matrices to 
pyspark.mllib.linalg.distributed.
308f197 [Mike Dusenberry] Using properties for better documentation.
1633f86 [Mike Dusenberry] Minor documentation cleanup.
f0c13a7 [Mike Dusenberry] CoordinateMatrix should inherit from 
DistributedMatrix.
ffdd724 [Mike Dusenberry] Updating doctests to make documentation cleaner.
3fd4016 [Mike Dusenberry] Updating docstrings.
27cd5f6 [Mike Dusenberry] Simplifying input conversions in the constructors for 
each distributed matrix.
a409cf5 [Mike Dusenberry] Updating doctests to be less verbose by using lists 
instead of DenseVectors explicitly.
d19b0ba [Mike Dusenberry] Updating code and documentation to note that a 
vector-like object (numpy array, list, etc.) can be used in place of explicit 
Vector object, and adding conversions when necessary to RowMatrix construction.
4bd756d [Mike Dusenberry] Adding param documentation to IndexedRow and 
MatrixEntry.
c6bded5 [Mike Dusenberry] Move conversion logic from tuples to IndexedRow or 
MatrixEntry types from within the IndexedRowMatrix and CoordinateMatrix 
constructors to separate _convert_to_indexed_row and _convert_to_matrix_entry 
functions.
329638b [Mike Dusenberry] Moving the Experimental tag to the top of each 
docstring.
0be6826 [Mike Dusenberry] Simplifying doctests by removing duplicated 
rows/entries RDDs within the various tests.
c0900df [Mike Dusenberry] Adding the colons that were accidentally not inserted.
4ad6819 [Mike Dusenberry] Documenting the  and  parameters.
3b854b9 [Mike Dusenberry] Minor updates to documentation.
10046e8 [Mike Dusenberry] Updating documentation to use class constructors 
instead of the removed DistributedMatrices factory methods.
119018d [Mike Dusenberry] Adding static  methods to each of the distributed 
matrix classes to consolidate conversion logic.
4d7af

spark git commit: [SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix to PySpark.

2015-08-04 Thread meng

Repository: spark
Updated Branches:
  refs/heads/master 1833d9c08 -> 571d5b536


[SPARK-6485] [MLLIB] [PYTHON] Add CoordinateMatrix/RowMatrix/IndexedRowMatrix 
to PySpark.

This PR adds the RowMatrix, IndexedRowMatrix, and CoordinateMatrix distributed 
matrices to PySpark.  Each distributed matrix class acts as a wrapper around 
the Scala/Java counterpart by maintaining a reference to the Java object.  New 
distributed matrices can be created using factory methods added to 
DistributedMatrices, which creates the Java distributed matrix and then wraps 
it with the corresponding PySpark class.  This design allows for simple 
conversion between the various distributed matrices, and lets us re-use the 
Scala code.  Serialization between Python and Java is implemented using 
DataFrames as needed for IndexedRowMatrix and CoordinateMatrix for simplicity.  
Associated documentation and unit-tests have also been added.  To facilitate 
code review, this PR implements access to the rows/entries as RDDs, the number 
of rows & columns, and conversions between the various distributed matrices 
(not including BlockMatrix), and does not implement the other linear algebra 
functions
  of the matrices, although this will be very simple to add now.

Author: Mike Dusenberry 

Closes #7554 from 
dusenberrymw/SPARK-6485_Add_CoordinateMatrix_RowMatrix_IndexedMatrix_to_PySpark 
and squashes the following commits:

bb039cb [Mike Dusenberry] Minor documentation update.
b887c18 [Mike Dusenberry] Updating the matrix conversion logic again to make it 
even cleaner.  Now, we allow the 'rows' parameter in the constructors to be 
either an RDD or the Java matrix object. If 'rows' is an RDD, we create a Java 
matrix object, wrap it, and then store that.  If 'rows' is a Java matrix object 
of the correct type, we just wrap and store that directly.  This is only for 
internal usage, and publicly, we still require 'rows' to be an RDD.  We no 
longer store the 'rows' RDD, and instead just compute it from the Java object 
when needed.  The point of this is that when we do matrix conversions, we do 
the conversion on the Scala/Java side, which returns a Java object, so we 
should use that directly, but exposing 'java_matrix' parameter in the public 
API is not ideal. This non-public feature of allowing 'rows' to be a Java 
matrix object is documented in the '__init__' constructor docstrings, which are 
not part of the generated public API, and doctests are also included.
7f0dcb6 [Mike Dusenberry] Updating module docstring.
cfc1be5 [Mike Dusenberry] Use 'new SQLContext(matrix.rows.sparkContext)' rather 
than 'SQLContext.getOrCreate', as the later doesn't guarantee that the 
SparkContext will be the same as for the matrix.rows data.
687e345 [Mike Dusenberry] Improving conversion performance.  This adds an 
optional 'java_matrix' parameter to the constructors, and pulls the conversion 
logic out into a '_create_from_java' function. Now, if the constructors are 
given a valid Java distributed matrix object as 'java_matrix', they will store 
those internally, rather than create a new one on the Scala/Java side.
3e50b6e [Mike Dusenberry] Moving the distributed matrices to 
pyspark.mllib.linalg.distributed.
308f197 [Mike Dusenberry] Using properties for better documentation.
1633f86 [Mike Dusenberry] Minor documentation cleanup.
f0c13a7 [Mike Dusenberry] CoordinateMatrix should inherit from 
DistributedMatrix.
ffdd724 [Mike Dusenberry] Updating doctests to make documentation cleaner.
3fd4016 [Mike Dusenberry] Updating docstrings.
27cd5f6 [Mike Dusenberry] Simplifying input conversions in the constructors for 
each distributed matrix.
a409cf5 [Mike Dusenberry] Updating doctests to be less verbose by using lists 
instead of DenseVectors explicitly.
d19b0ba [Mike Dusenberry] Updating code and documentation to note that a 
vector-like object (numpy array, list, etc.) can be used in place of explicit 
Vector object, and adding conversions when necessary to RowMatrix construction.
4bd756d [Mike Dusenberry] Adding param documentation to IndexedRow and 
MatrixEntry.
c6bded5 [Mike Dusenberry] Move conversion logic from tuples to IndexedRow or 
MatrixEntry types from within the IndexedRowMatrix and CoordinateMatrix 
constructors to separate _convert_to_indexed_row and _convert_to_matrix_entry 
functions.
329638b [Mike Dusenberry] Moving the Experimental tag to the top of each 
docstring.
0be6826 [Mike Dusenberry] Simplifying doctests by removing duplicated 
rows/entries RDDs within the various tests.
c0900df [Mike Dusenberry] Adding the colons that were accidentally not inserted.
4ad6819 [Mike Dusenberry] Documenting the  and  parameters.
3b854b9 [Mike Dusenberry] Minor updates to documentation.
10046e8 [Mike Dusenberry] Updating documentation to use class constructors 
instead of the removed DistributedMatrices factory methods.
119018d [Mike Dusenberry] Adding static  methods to each of the distributed 
matrix classes to consolidate conversion logic.
4d7af86 [

spark git commit: [SPARK-9582] [ML] LDA cleanups

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 e682ee254 -> fe4a4f41a


[SPARK-9582] [ML] LDA cleanups

Small cleanups to recent LDA additions and docs.

CC: feynmanliang

Author: Joseph K. Bradley 

Closes #7916 from jkbradley/lda-cleanups and squashes the following commits:

f7021d9 [Joseph K. Bradley] broadcasting large matrices for LDA in local model 
and online learning
97947aa [Joseph K. Bradley] a few more cleanups
5b03f88 [Joseph K. Bradley] reverted split of lda log likelihood
c566915 [Joseph K. Bradley] small edit to make review easier
63f6c7d [Joseph K. Bradley] clarified log likelihood for lda models

(cherry picked from commit 1833d9c08f021d991334424d0a6d5ec21d1fccb2)
Signed-off-by: Joseph K. Bradley 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe4a4f41
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe4a4f41
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fe4a4f41

Branch: refs/heads/branch-1.5
Commit: fe4a4f41ad8b686455d58fc2fda9494e8dba5636
Parents: e682ee2
Author: Joseph K. Bradley 
Authored: Tue Aug 4 15:43:13 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 15:43:20 2015 -0700

--
 .../spark/mllib/clustering/LDAModel.scala   | 82 +++-
 .../spark/mllib/clustering/LDAOptimizer.scala   | 19 +++--
 2 files changed, 58 insertions(+), 43 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fe4a4f41/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
index 6af90d7..33babda 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
@@ -27,6 +27,7 @@ import org.json4s.jackson.JsonMethods._
 import org.apache.spark.SparkContext
 import org.apache.spark.annotation.Experimental
 import org.apache.spark.api.java.JavaPairRDD
+import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.graphx.{Edge, EdgeContext, Graph, VertexId}
 import org.apache.spark.mllib.linalg.{Matrices, Matrix, Vector, Vectors}
 import org.apache.spark.mllib.util.{Loader, Saveable}
@@ -217,26 +218,28 @@ class LocalLDAModel private[clustering] (
   // TODO: declare in LDAModel and override once implemented in 
DistributedLDAModel
   /**
* Calculates a lower bound on the log likelihood of the entire corpus.
+   *
+   * See Equation (16) in original Online LDA paper.
+   *
* @param documents test corpus to use for calculating log likelihood
* @return variational lower bound on the log likelihood of the entire corpus
*/
-  def logLikelihood(documents: RDD[(Long, Vector)]): Double = bound(documents,
+  def logLikelihood(documents: RDD[(Long, Vector)]): Double = 
logLikelihoodBound(documents,
 docConcentration, topicConcentration, topicsMatrix.toBreeze.toDenseMatrix, 
gammaShape, k,
 vocabSize)
 
   /**
-   * Calculate an upper bound bound on perplexity. See Equation (16) in 
original Online
-   * LDA paper.
+   * Calculate an upper bound bound on perplexity.  (Lower is better.)
+   * See Equation (16) in original Online LDA paper.
+   *
* @param documents test corpus to use for calculating perplexity
-   * @return variational upper bound on log perplexity per word
+   * @return Variational upper bound on log perplexity per token.
*/
   def logPerplexity(documents: RDD[(Long, Vector)]): Double = {
-val corpusWords = documents
+val corpusTokenCount = documents
   .map { case (_, termCounts) => termCounts.toArray.sum }
   .sum()
-val perWordBound = -logLikelihood(documents) / corpusWords
-
-perWordBound
+-logLikelihood(documents) / corpusTokenCount
   }
 
   /**
@@ -244,17 +247,20 @@ class LocalLDAModel private[clustering] (
*log p(documents) >= E_q[log p(documents)] - E_q[log q(documents)]
* This bound is derived by decomposing the LDA model to:
*log p(documents) = E_q[log p(documents)] - E_q[log q(documents)] + 
D(q|p)
-   * and noting that the KL-divergence D(q|p) >= 0. See Equation (16) in 
original Online LDA paper.
+   * and noting that the KL-divergence D(q|p) >= 0.
+   *
+   * See Equation (16) in original Online LDA paper, as well as Appendix A.3 
in the JMLR version of
+   * the original LDA paper.
* @param documents a subset of the test corpus
* @param alpha document-topic Dirichlet prior parameters
-   * @param eta topic-word Dirichlet prior parameters
+   * @param eta topic-word Dirichlet prior parameter
* @param lambda parameters for variational q(beta | lambda) topic-word 
distributions
* @param

spark git commit: [SPARK-9582] [ML] LDA cleanups

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master e37545606 -> 1833d9c08


[SPARK-9582] [ML] LDA cleanups

Small cleanups to recent LDA additions and docs.

CC: feynmanliang

Author: Joseph K. Bradley 

Closes #7916 from jkbradley/lda-cleanups and squashes the following commits:

f7021d9 [Joseph K. Bradley] broadcasting large matrices for LDA in local model 
and online learning
97947aa [Joseph K. Bradley] a few more cleanups
5b03f88 [Joseph K. Bradley] reverted split of lda log likelihood
c566915 [Joseph K. Bradley] small edit to make review easier
63f6c7d [Joseph K. Bradley] clarified log likelihood for lda models


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1833d9c0
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1833d9c0
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1833d9c0

Branch: refs/heads/master
Commit: 1833d9c08f021d991334424d0a6d5ec21d1fccb2
Parents: e375456
Author: Joseph K. Bradley 
Authored: Tue Aug 4 15:43:13 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 15:43:13 2015 -0700

--
 .../spark/mllib/clustering/LDAModel.scala   | 82 +++-
 .../spark/mllib/clustering/LDAOptimizer.scala   | 19 +++--
 2 files changed, 58 insertions(+), 43 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/1833d9c0/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
index 6af90d7..33babda 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAModel.scala
@@ -27,6 +27,7 @@ import org.json4s.jackson.JsonMethods._
 import org.apache.spark.SparkContext
 import org.apache.spark.annotation.Experimental
 import org.apache.spark.api.java.JavaPairRDD
+import org.apache.spark.broadcast.Broadcast
 import org.apache.spark.graphx.{Edge, EdgeContext, Graph, VertexId}
 import org.apache.spark.mllib.linalg.{Matrices, Matrix, Vector, Vectors}
 import org.apache.spark.mllib.util.{Loader, Saveable}
@@ -217,26 +218,28 @@ class LocalLDAModel private[clustering] (
   // TODO: declare in LDAModel and override once implemented in 
DistributedLDAModel
   /**
* Calculates a lower bound on the log likelihood of the entire corpus.
+   *
+   * See Equation (16) in original Online LDA paper.
+   *
* @param documents test corpus to use for calculating log likelihood
* @return variational lower bound on the log likelihood of the entire corpus
*/
-  def logLikelihood(documents: RDD[(Long, Vector)]): Double = bound(documents,
+  def logLikelihood(documents: RDD[(Long, Vector)]): Double = 
logLikelihoodBound(documents,
 docConcentration, topicConcentration, topicsMatrix.toBreeze.toDenseMatrix, 
gammaShape, k,
 vocabSize)
 
   /**
-   * Calculate an upper bound bound on perplexity. See Equation (16) in 
original Online
-   * LDA paper.
+   * Calculate an upper bound bound on perplexity.  (Lower is better.)
+   * See Equation (16) in original Online LDA paper.
+   *
* @param documents test corpus to use for calculating perplexity
-   * @return variational upper bound on log perplexity per word
+   * @return Variational upper bound on log perplexity per token.
*/
   def logPerplexity(documents: RDD[(Long, Vector)]): Double = {
-val corpusWords = documents
+val corpusTokenCount = documents
   .map { case (_, termCounts) => termCounts.toArray.sum }
   .sum()
-val perWordBound = -logLikelihood(documents) / corpusWords
-
-perWordBound
+-logLikelihood(documents) / corpusTokenCount
   }
 
   /**
@@ -244,17 +247,20 @@ class LocalLDAModel private[clustering] (
*log p(documents) >= E_q[log p(documents)] - E_q[log q(documents)]
* This bound is derived by decomposing the LDA model to:
*log p(documents) = E_q[log p(documents)] - E_q[log q(documents)] + 
D(q|p)
-   * and noting that the KL-divergence D(q|p) >= 0. See Equation (16) in 
original Online LDA paper.
+   * and noting that the KL-divergence D(q|p) >= 0.
+   *
+   * See Equation (16) in original Online LDA paper, as well as Appendix A.3 
in the JMLR version of
+   * the original LDA paper.
* @param documents a subset of the test corpus
* @param alpha document-topic Dirichlet prior parameters
-   * @param eta topic-word Dirichlet prior parameters
+   * @param eta topic-word Dirichlet prior parameter
* @param lambda parameters for variational q(beta | lambda) topic-word 
distributions
* @param gammaShape shape parameter for random initialization of 
variational q(theta | gamma)
*   t

spark git commit: [SPARK-9602] remove "Akka/Actor" words from comments

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 f771a83f4 -> 560b2da78


[SPARK-9602] remove "Akka/Actor" words from comments

https://issues.apache.org/jira/browse/SPARK-9602

Although we have hidden Akka behind RPC interface, I found that the 
Akka/Actor-related comments are still spreading everywhere. To make it 
consistent, we shall remove "actor"/"akka" words from the comments...

Author: CodingCat 

Closes #7936 from CodingCat/SPARK-9602 and squashes the following commits:

e8296a3 [CodingCat] remove actor words from comments

(cherry picked from commit 9d668b73687e697cad2ef7fd3c3ba405e9795593)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/560b2da7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/560b2da7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/560b2da7

Branch: refs/heads/branch-1.5
Commit: 560b2da783bc25bd8767f6888665dadecac916d8
Parents: f771a83
Author: CodingCat 
Authored: Tue Aug 4 14:54:11 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 14:54:30 2015 -0700

--
 .../main/scala/org/apache/spark/api/python/PythonRDD.scala| 2 +-
 .../scala/org/apache/spark/deploy/LocalSparkCluster.scala | 4 
 .../main/scala/org/apache/spark/deploy/client/AppClient.scala | 2 +-
 .../org/apache/spark/deploy/master/LeaderElectionAgent.scala  | 6 +++---
 .../scala/org/apache/spark/deploy/master/MasterMessages.scala | 2 +-
 .../spark/deploy/master/ZooKeeperLeaderElectionAgent.scala| 6 +++---
 .../main/scala/org/apache/spark/deploy/worker/Worker.scala| 7 ---
 .../scala/org/apache/spark/deploy/worker/WorkerWatcher.scala  | 4 ++--
 core/src/main/scala/org/apache/spark/rpc/RpcEndpointRef.scala | 2 +-
 .../org/apache/spark/scheduler/OutputCommitCoordinator.scala  | 2 +-
 .../org/apache/spark/scheduler/cluster/ExecutorData.scala | 2 +-
 core/src/main/scala/org/apache/spark/util/IdGenerator.scala   | 6 +++---
 .../spark/deploy/master/CustomRecoveryModeFactory.scala   | 4 ++--
 .../org/apache/spark/deploy/worker/WorkerWatcherSuite.scala   | 5 ++---
 project/MimaExcludes.scala| 2 +-
 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala | 2 +-
 16 files changed, 27 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/560b2da7/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
index 55e563e..2a56bf2 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
@@ -794,7 +794,7 @@ private class PythonAccumulatorParam(@transient serverHost: 
String, serverPort:
 
   /**
* We try to reuse a single Socket to transfer accumulator updates, as they 
are all added
-   * by the DAGScheduler's single-threaded actor anyway.
+   * by the DAGScheduler's single-threaded RpcEndpoint anyway.
*/
   @transient var socket: Socket = _
 

http://git-wip-us.apache.org/repos/asf/spark/blob/560b2da7/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala 
b/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
index 53356ad..83ccaad 100644
--- a/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
@@ -73,12 +73,8 @@ class LocalSparkCluster(
   def stop() {
 logInfo("Shutting down local Spark cluster.")
 // Stop the workers before the master so they don't get upset that it 
disconnected
-// TODO: In Akka 2.1.x, ActorSystem.awaitTermination hangs when you have 
remote actors!
-//   This is unfortunate, but for now we just comment it out.
 workerRpcEnvs.foreach(_.shutdown())
-// workerActorSystems.foreach(_.awaitTermination())
 masterRpcEnvs.foreach(_.shutdown())
-// masterActorSystems.foreach(_.awaitTermination())
 masterRpcEnvs.clear()
 workerRpcEnvs.clear()
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/560b2da7/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala 
b/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
index 7576a29..25ea692 100644
--- a/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
+++ b/core/src/main/scala/org/apache/spark/de

spark git commit: [SPARK-9602] remove "Akka/Actor" words from comments

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master ab8ee1a3b -> 9d668b736


[SPARK-9602] remove "Akka/Actor" words from comments

https://issues.apache.org/jira/browse/SPARK-9602

Although we have hidden Akka behind RPC interface, I found that the 
Akka/Actor-related comments are still spreading everywhere. To make it 
consistent, we shall remove "actor"/"akka" words from the comments...

Author: CodingCat 

Closes #7936 from CodingCat/SPARK-9602 and squashes the following commits:

e8296a3 [CodingCat] remove actor words from comments


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9d668b73
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9d668b73
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9d668b73

Branch: refs/heads/master
Commit: 9d668b73687e697cad2ef7fd3c3ba405e9795593
Parents: ab8ee1a
Author: CodingCat 
Authored: Tue Aug 4 14:54:11 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 14:54:11 2015 -0700

--
 .../main/scala/org/apache/spark/api/python/PythonRDD.scala| 2 +-
 .../scala/org/apache/spark/deploy/LocalSparkCluster.scala | 4 
 .../main/scala/org/apache/spark/deploy/client/AppClient.scala | 2 +-
 .../org/apache/spark/deploy/master/LeaderElectionAgent.scala  | 6 +++---
 .../scala/org/apache/spark/deploy/master/MasterMessages.scala | 2 +-
 .../spark/deploy/master/ZooKeeperLeaderElectionAgent.scala| 6 +++---
 .../main/scala/org/apache/spark/deploy/worker/Worker.scala| 7 ---
 .../scala/org/apache/spark/deploy/worker/WorkerWatcher.scala  | 4 ++--
 core/src/main/scala/org/apache/spark/rpc/RpcEndpointRef.scala | 2 +-
 .../org/apache/spark/scheduler/OutputCommitCoordinator.scala  | 2 +-
 .../org/apache/spark/scheduler/cluster/ExecutorData.scala | 2 +-
 core/src/main/scala/org/apache/spark/util/IdGenerator.scala   | 6 +++---
 .../spark/deploy/master/CustomRecoveryModeFactory.scala   | 4 ++--
 .../org/apache/spark/deploy/worker/WorkerWatcherSuite.scala   | 5 ++---
 project/MimaExcludes.scala| 2 +-
 .../src/main/scala/org/apache/spark/repl/SparkILoop.scala | 2 +-
 16 files changed, 27 insertions(+), 31 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9d668b73/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
--
diff --git a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala 
b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
index 55e563e..2a56bf2 100644
--- a/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
+++ b/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala
@@ -794,7 +794,7 @@ private class PythonAccumulatorParam(@transient serverHost: 
String, serverPort:
 
   /**
* We try to reuse a single Socket to transfer accumulator updates, as they 
are all added
-   * by the DAGScheduler's single-threaded actor anyway.
+   * by the DAGScheduler's single-threaded RpcEndpoint anyway.
*/
   @transient var socket: Socket = _
 

http://git-wip-us.apache.org/repos/asf/spark/blob/9d668b73/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala 
b/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
index 53356ad..83ccaad 100644
--- a/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/LocalSparkCluster.scala
@@ -73,12 +73,8 @@ class LocalSparkCluster(
   def stop() {
 logInfo("Shutting down local Spark cluster.")
 // Stop the workers before the master so they don't get upset that it 
disconnected
-// TODO: In Akka 2.1.x, ActorSystem.awaitTermination hangs when you have 
remote actors!
-//   This is unfortunate, but for now we just comment it out.
 workerRpcEnvs.foreach(_.shutdown())
-// workerActorSystems.foreach(_.awaitTermination())
 masterRpcEnvs.foreach(_.shutdown())
-// masterActorSystems.foreach(_.awaitTermination())
 masterRpcEnvs.clear()
 workerRpcEnvs.clear()
   }

http://git-wip-us.apache.org/repos/asf/spark/blob/9d668b73/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
--
diff --git a/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala 
b/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
index 7576a29..25ea692 100644
--- a/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
+++ b/core/src/main/scala/org/apache/spark/deploy/client/AppClient.scala
@@ -257,7 +257,7 @@ private[spark] class AppClient(
   }
 
   def start() {
-

spark git commit: [SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 9d668b736 -> e37545606


[SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityCol to 
RandomForestClassifier

Added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier, plus 
doc tests for those columns.

CC: holdenk yanboliang

Author: Joseph K. Bradley 

Closes #7903 from jkbradley/rf-prob-python and squashes the following commits:

c62a83f [Joseph K. Bradley] made unit test more robust
14eeba2 [Joseph K. Bradley] added HasRawPredictionCol, HasProbabilityCol to 
RandomForestClassifier in PySpark


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e3754560
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e3754560
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e3754560

Branch: refs/heads/master
Commit: e375456063617cd7000d796024f41e5927f21edd
Parents: 9d668b7
Author: Joseph K. Bradley 
Authored: Tue Aug 4 14:54:26 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 14:54:26 2015 -0700

--
 python/pyspark/ml/classification.py | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e3754560/python/pyspark/ml/classification.py
--
diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index 291320f..5978d8f 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -347,6 +347,7 @@ class DecisionTreeClassificationModel(DecisionTreeModel):
 
 @inherit_doc
 class RandomForestClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasSeed,
+ HasRawPredictionCol, HasProbabilityCol,
  DecisionTreeParams, HasCheckpointInterval):
 """
 `http://en.wikipedia.org/wiki/Random_forest  Random Forest`
@@ -354,6 +355,7 @@ class RandomForestClassifier(JavaEstimator, HasFeaturesCol, 
HasLabelCol, HasPred
 It supports both binary and multiclass labels, as well as both continuous 
and categorical
 features.
 
+>>> import numpy
 >>> from numpy import allclose
 >>> from pyspark.mllib.linalg import Vectors
 >>> from pyspark.ml.feature import StringIndexer
@@ -368,8 +370,13 @@ class RandomForestClassifier(JavaEstimator, 
HasFeaturesCol, HasLabelCol, HasPred
 >>> allclose(model.treeWeights, [1.0, 1.0, 1.0])
 True
 >>> test0 = sqlContext.createDataFrame([(Vectors.dense(-1.0),)], 
["features"])
->>> model.transform(test0).head().prediction
+>>> result = model.transform(test0).head()
+>>> result.prediction
 0.0
+>>> numpy.argmax(result.probability)
+0
+>>> numpy.argmax(result.rawPrediction)
+0
 >>> test1 = sqlContext.createDataFrame([(Vectors.sparse(1, [0], [1.0]),)], 
["features"])
 >>> model.transform(test1).head().prediction
 1.0
@@ -390,11 +397,13 @@ class RandomForestClassifier(JavaEstimator, 
HasFeaturesCol, HasLabelCol, HasPred
 
 @keyword_only
 def __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
+ probabilityCol="probability", 
rawPredictionCol="rawPrediction",
  maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
  maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, 
impurity="gini",
  numTrees=20, featureSubsetStrategy="auto", seed=None):
 """
 __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
+ probabilityCol="probability", 
rawPredictionCol="rawPrediction", \
  maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0, \
  maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, 
impurity="gini", \
  numTrees=20, featureSubsetStrategy="auto", seed=None)
@@ -427,11 +436,13 @@ class RandomForestClassifier(JavaEstimator, 
HasFeaturesCol, HasLabelCol, HasPred
 
 @keyword_only
 def setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
+  probabilityCol="probability", 
rawPredictionCol="rawPrediction",
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
   maxMemoryInMB=256, cacheNodeIds=False, 
checkpointInterval=10, seed=None,
   impurity="gini", numTrees=20, featureSubsetStrategy="auto"):
 """
 setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
+ probabilityCol="probability", 
rawPredictionCol="rawPrediction", \
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0, \

spark git commit: [SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 560b2da78 -> e682ee254


[SPARK-9447] [ML] [PYTHON] Added HasRawPredictionCol, HasProbabilityCol to 
RandomForestClassifier

Added HasRawPredictionCol, HasProbabilityCol to RandomForestClassifier, plus 
doc tests for those columns.

CC: holdenk yanboliang

Author: Joseph K. Bradley 

Closes #7903 from jkbradley/rf-prob-python and squashes the following commits:

c62a83f [Joseph K. Bradley] made unit test more robust
14eeba2 [Joseph K. Bradley] added HasRawPredictionCol, HasProbabilityCol to 
RandomForestClassifier in PySpark

(cherry picked from commit e375456063617cd7000d796024f41e5927f21edd)
Signed-off-by: Joseph K. Bradley 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e682ee25
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e682ee25
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e682ee25

Branch: refs/heads/branch-1.5
Commit: e682ee25477374737f3b1dfc08c98829564b26d4
Parents: 560b2da
Author: Joseph K. Bradley 
Authored: Tue Aug 4 14:54:26 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 14:54:34 2015 -0700

--
 python/pyspark/ml/classification.py | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e682ee25/python/pyspark/ml/classification.py
--
diff --git a/python/pyspark/ml/classification.py 
b/python/pyspark/ml/classification.py
index 291320f..5978d8f 100644
--- a/python/pyspark/ml/classification.py
+++ b/python/pyspark/ml/classification.py
@@ -347,6 +347,7 @@ class DecisionTreeClassificationModel(DecisionTreeModel):
 
 @inherit_doc
 class RandomForestClassifier(JavaEstimator, HasFeaturesCol, HasLabelCol, 
HasPredictionCol, HasSeed,
+ HasRawPredictionCol, HasProbabilityCol,
  DecisionTreeParams, HasCheckpointInterval):
 """
 `http://en.wikipedia.org/wiki/Random_forest  Random Forest`
@@ -354,6 +355,7 @@ class RandomForestClassifier(JavaEstimator, HasFeaturesCol, 
HasLabelCol, HasPred
 It supports both binary and multiclass labels, as well as both continuous 
and categorical
 features.
 
+>>> import numpy
 >>> from numpy import allclose
 >>> from pyspark.mllib.linalg import Vectors
 >>> from pyspark.ml.feature import StringIndexer
@@ -368,8 +370,13 @@ class RandomForestClassifier(JavaEstimator, 
HasFeaturesCol, HasLabelCol, HasPred
 >>> allclose(model.treeWeights, [1.0, 1.0, 1.0])
 True
 >>> test0 = sqlContext.createDataFrame([(Vectors.dense(-1.0),)], 
["features"])
->>> model.transform(test0).head().prediction
+>>> result = model.transform(test0).head()
+>>> result.prediction
 0.0
+>>> numpy.argmax(result.probability)
+0
+>>> numpy.argmax(result.rawPrediction)
+0
 >>> test1 = sqlContext.createDataFrame([(Vectors.sparse(1, [0], [1.0]),)], 
["features"])
 >>> model.transform(test1).head().prediction
 1.0
@@ -390,11 +397,13 @@ class RandomForestClassifier(JavaEstimator, 
HasFeaturesCol, HasLabelCol, HasPred
 
 @keyword_only
 def __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
+ probabilityCol="probability", 
rawPredictionCol="rawPrediction",
  maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
  maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, 
impurity="gini",
  numTrees=20, featureSubsetStrategy="auto", seed=None):
 """
 __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
+ probabilityCol="probability", 
rawPredictionCol="rawPrediction", \
  maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0, \
  maxMemoryInMB=256, cacheNodeIds=False, checkpointInterval=10, 
impurity="gini", \
  numTrees=20, featureSubsetStrategy="auto", seed=None)
@@ -427,11 +436,13 @@ class RandomForestClassifier(JavaEstimator, 
HasFeaturesCol, HasLabelCol, HasPred
 
 @keyword_only
 def setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
+  probabilityCol="probability", 
rawPredictionCol="rawPrediction",
   maxDepth=5, maxBins=32, minInstancesPerNode=1, 
minInfoGain=0.0,
   maxMemoryInMB=256, cacheNodeIds=False, 
checkpointInterval=10, seed=None,
   impurity="gini", numTrees=20, featureSubsetStrategy="auto"):
 """
 setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
+ probabilityCol="probability", 
rawPredictionCol="rawPredicti

spark git commit: [SPARK-9452] [SQL] Support records larger than page size in UnsafeExternalSorter

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 43f6b021e -> f771a83f4


[SPARK-9452] [SQL] Support records larger than page size in UnsafeExternalSorter

This patch extends UnsafeExternalSorter to support records larger than the page 
size. The basic strategy is the same as in #7762: store large records in their 
own overflow pages.

Author: Josh Rosen 

Closes #7891 from JoshRosen/large-records-in-sql-sorter and squashes the 
following commits:

967580b [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
large-records-in-sql-sorter
948c344 [Josh Rosen] Add large records tests for KV sorter.
3c17288 [Josh Rosen] Combine memory and disk cleanup into general 
cleanupResources() method
380f217 [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
large-records-in-sql-sorter
27eafa0 [Josh Rosen] Fix page size in PackedRecordPointerSuite
a49baef [Josh Rosen] Address initial round of review comments
3edb931 [Josh Rosen] Remove accidentally-committed debug statements.
2b164e2 [Josh Rosen] Support large records in UnsafeExternalSorter.

(cherry picked from commit ab8ee1a3b93286a62949569615086ef5030e9fae)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f771a83f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f771a83f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f771a83f

Branch: refs/heads/branch-1.5
Commit: f771a83f4090e979f72d01989e6693d7fbc05c05
Parents: 43f6b02
Author: Josh Rosen 
Authored: Tue Aug 4 14:42:11 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 14:42:20 2015 -0700

--
 .../unsafe/sort/UnsafeExternalSorter.java   | 173 ++-
 .../unsafe/PackedRecordPointerSuite.java|   8 +-
 .../unsafe/sort/UnsafeExternalSorterSuite.java  | 129 +---
 .../sql/execution/UnsafeExternalRowSorter.java  |   2 +-
 .../sql/execution/UnsafeKVExternalSorter.java   |  11 +-
 .../execution/UnsafeKVExternalSorterSuite.scala | 210 ---
 .../unsafe/memory/HeapMemoryAllocator.java  |   3 +
 .../unsafe/memory/UnsafeMemoryAllocator.java|   3 +
 8 files changed, 372 insertions(+), 167 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f771a83f/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index dec7fcf..e6ddd08 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -34,6 +34,7 @@ import org.apache.spark.TaskContext;
 import org.apache.spark.executor.ShuffleWriteMetrics;
 import org.apache.spark.shuffle.ShuffleMemoryManager;
 import org.apache.spark.storage.BlockManager;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
 import org.apache.spark.unsafe.PlatformDependent;
 import org.apache.spark.unsafe.memory.MemoryBlock;
 import org.apache.spark.unsafe.memory.TaskMemoryManager;
@@ -143,8 +144,7 @@ public final class UnsafeExternalSorter {
 taskContext.addOnCompleteCallback(new AbstractFunction0() {
   @Override
   public BoxedUnit apply() {
-deleteSpillFiles();
-freeMemory();
+cleanupResources();
 return null;
   }
 });
@@ -249,7 +249,7 @@ public final class UnsafeExternalSorter {
*
* @return the number of bytes freed.
*/
-  public long freeMemory() {
+  private long freeMemory() {
 updatePeakMemoryUsed();
 long memoryFreed = 0;
 for (MemoryBlock block : allocatedPages) {
@@ -275,44 +275,32 @@ public final class UnsafeExternalSorter {
   /**
* Deletes any spill files created by this sorter.
*/
-  public void deleteSpillFiles() {
+  private void deleteSpillFiles() {
 for (UnsafeSorterSpillWriter spill : spillWriters) {
   File file = spill.getFile();
   if (file != null && file.exists()) {
 if (!file.delete()) {
   logger.error("Was unable to delete spill file {}", 
file.getAbsolutePath());
-};
+}
   }
 }
   }
 
   /**
-   * Checks whether there is enough space to insert a new record into the 
sorter.
-   *
-   * @param requiredSpace the required space in the data page, in bytes, 
including space for storing
-   *  the record size.
-
-   * @return true if the record can be inserted without requiring more 
allocations, false otherwise.
+   * Frees this sorter's in-memory data structures and cleans up its spill 
files.
*/
-  private boolean haveSpaceForRecord(in

spark git commit: [SPARK-9452] [SQL] Support records larger than page size in UnsafeExternalSorter

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master f4b1ac08a -> ab8ee1a3b


[SPARK-9452] [SQL] Support records larger than page size in UnsafeExternalSorter

This patch extends UnsafeExternalSorter to support records larger than the page 
size. The basic strategy is the same as in #7762: store large records in their 
own overflow pages.

Author: Josh Rosen 

Closes #7891 from JoshRosen/large-records-in-sql-sorter and squashes the 
following commits:

967580b [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
large-records-in-sql-sorter
948c344 [Josh Rosen] Add large records tests for KV sorter.
3c17288 [Josh Rosen] Combine memory and disk cleanup into general 
cleanupResources() method
380f217 [Josh Rosen] Merge remote-tracking branch 'origin/master' into 
large-records-in-sql-sorter
27eafa0 [Josh Rosen] Fix page size in PackedRecordPointerSuite
a49baef [Josh Rosen] Address initial round of review comments
3edb931 [Josh Rosen] Remove accidentally-committed debug statements.
2b164e2 [Josh Rosen] Support large records in UnsafeExternalSorter.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ab8ee1a3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ab8ee1a3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ab8ee1a3

Branch: refs/heads/master
Commit: ab8ee1a3b93286a62949569615086ef5030e9fae
Parents: f4b1ac0
Author: Josh Rosen 
Authored: Tue Aug 4 14:42:11 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 14:42:11 2015 -0700

--
 .../unsafe/sort/UnsafeExternalSorter.java   | 173 ++-
 .../unsafe/PackedRecordPointerSuite.java|   8 +-
 .../unsafe/sort/UnsafeExternalSorterSuite.java  | 129 +---
 .../sql/execution/UnsafeExternalRowSorter.java  |   2 +-
 .../sql/execution/UnsafeKVExternalSorter.java   |  11 +-
 .../execution/UnsafeKVExternalSorterSuite.scala | 210 ---
 .../unsafe/memory/HeapMemoryAllocator.java  |   3 +
 .../unsafe/memory/UnsafeMemoryAllocator.java|   3 +
 8 files changed, 372 insertions(+), 167 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/ab8ee1a3/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
--
diff --git 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
index dec7fcf..e6ddd08 100644
--- 
a/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
+++ 
b/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeExternalSorter.java
@@ -34,6 +34,7 @@ import org.apache.spark.TaskContext;
 import org.apache.spark.executor.ShuffleWriteMetrics;
 import org.apache.spark.shuffle.ShuffleMemoryManager;
 import org.apache.spark.storage.BlockManager;
+import org.apache.spark.unsafe.array.ByteArrayMethods;
 import org.apache.spark.unsafe.PlatformDependent;
 import org.apache.spark.unsafe.memory.MemoryBlock;
 import org.apache.spark.unsafe.memory.TaskMemoryManager;
@@ -143,8 +144,7 @@ public final class UnsafeExternalSorter {
 taskContext.addOnCompleteCallback(new AbstractFunction0() {
   @Override
   public BoxedUnit apply() {
-deleteSpillFiles();
-freeMemory();
+cleanupResources();
 return null;
   }
 });
@@ -249,7 +249,7 @@ public final class UnsafeExternalSorter {
*
* @return the number of bytes freed.
*/
-  public long freeMemory() {
+  private long freeMemory() {
 updatePeakMemoryUsed();
 long memoryFreed = 0;
 for (MemoryBlock block : allocatedPages) {
@@ -275,44 +275,32 @@ public final class UnsafeExternalSorter {
   /**
* Deletes any spill files created by this sorter.
*/
-  public void deleteSpillFiles() {
+  private void deleteSpillFiles() {
 for (UnsafeSorterSpillWriter spill : spillWriters) {
   File file = spill.getFile();
   if (file != null && file.exists()) {
 if (!file.delete()) {
   logger.error("Was unable to delete spill file {}", 
file.getAbsolutePath());
-};
+}
   }
 }
   }
 
   /**
-   * Checks whether there is enough space to insert a new record into the 
sorter.
-   *
-   * @param requiredSpace the required space in the data page, in bytes, 
including space for storing
-   *  the record size.
-
-   * @return true if the record can be inserted without requiring more 
allocations, false otherwise.
+   * Frees this sorter's in-memory data structures and cleans up its spill 
files.
*/
-  private boolean haveSpaceForRecord(int requiredSpace) {
-assert(requiredSpace > 0);
-assert(inMemSorter != null);
-return (inMemSor

spark git commit: [SPARK-9553][SQL] remove the no-longer-necessary createCode and createStructCode, and replace the usage

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master a0cc01759 -> f4b1ac08a


[SPARK-9553][SQL] remove the no-longer-necessary createCode and 
createStructCode, and replace the usage

Author: Wenchen Fan 

Closes #7890 from cloud-fan/minor and squashes the following commits:

c3b1be3 [Wenchen Fan] fix style
b0cbe2e [Wenchen Fan] remove the createCode and createStructCode, and replace 
the usage of them by createStructCode


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f4b1ac08
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f4b1ac08
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f4b1ac08

Branch: refs/heads/master
Commit: f4b1ac08a1327e6d0ddc317cdf3997a0f68dec72
Parents: a0cc017
Author: Wenchen Fan 
Authored: Tue Aug 4 14:40:46 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 14:40:46 2015 -0700

--
 .../codegen/GenerateUnsafeProjection.scala  | 161 ++-
 .../expressions/complexTypeCreator.scala|  10 +-
 2 files changed, 17 insertions(+), 154 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f4b1ac08/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
index fc3ecf5..71f8ea0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
@@ -117,161 +117,13 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
   }
 
   /**
-   * Generates the code to create an [[UnsafeRow]] object based on the input 
expressions.
-   * @param ctx context for code generation
-   * @param ev specifies the name of the variable for the output [[UnsafeRow]] 
object
-   * @param expressions input expressions
-   * @return generated code to put the expression output into an [[UnsafeRow]]
-   */
-  def createCode(ctx: CodeGenContext, ev: GeneratedExpressionCode, 
expressions: Seq[Expression])
-: String = {
-
-val ret = ev.primitive
-ctx.addMutableState("UnsafeRow", ret, s"$ret = new UnsafeRow();")
-val buffer = ctx.freshName("buffer")
-ctx.addMutableState("byte[]", buffer, s"$buffer = new byte[64];")
-val cursor = ctx.freshName("cursor")
-val numBytes = ctx.freshName("numBytes")
-
-val exprs = expressions.map { e => e.dataType match {
-  case st: StructType => createCodeForStruct(ctx, e.gen(ctx), st)
-  case _ => e.gen(ctx)
-}}
-val allExprs = exprs.map(_.code).mkString("\n")
-
-val fixedSize = 8 * exprs.length + 
UnsafeRow.calculateBitSetWidthInBytes(exprs.length)
-val additionalSize = expressions.zipWithIndex.map {
-  case (e, i) => genAdditionalSize(e.dataType, exprs(i))
-}.mkString("")
-
-val writers = expressions.zipWithIndex.map { case (e, i) =>
-  val update = genFieldWriter(ctx, e.dataType, exprs(i), ret, i, cursor)
-  s"""if (${exprs(i).isNull}) {
-$ret.setNullAt($i);
-  } else {
-$update;
-  }"""
-}.mkString("\n  ")
-
-s"""
-  $allExprs
-  int $numBytes = $fixedSize $additionalSize;
-  if ($numBytes > $buffer.length) {
-$buffer = new byte[$numBytes];
-  }
-
-  $ret.pointTo(
-$buffer,
-$PlatformDependent.BYTE_ARRAY_OFFSET,
-${expressions.size},
-$numBytes);
-  int $cursor = $fixedSize;
-
-  $writers
-  boolean ${ev.isNull} = false;
- """
-  }
-
-  /**
-   * Generates the Java code to convert a struct (backed by InternalRow) to 
UnsafeRow.
-   *
-   * This function also handles nested structs by recursively generating the 
code to do conversion.
-   *
-   * @param ctx code generation context
-   * @param input the input struct, identified by a [[GeneratedExpressionCode]]
-   * @param schema schema of the struct field
-   */
-  // TODO: refactor createCode and this function to reduce code duplication.
-  private def createCodeForStruct(
-  ctx: CodeGenContext,
-  input: GeneratedExpressionCode,
-  schema: StructType): GeneratedExpressionCode = {
-
-val isNull = input.isNull
-val primitive = ctx.freshName("structConvert")
-ctx.addMutableState("UnsafeRow", primitive, s"$primitive = new 
UnsafeRow();")
-val buffer = ctx.freshName("buffer")
-ctx.addMutableState("byte[]", buffer, s"$buffer = new byte[64];")
-val cursor = ctx.freshName("cursor")
-
-va

spark git commit: [SPARK-9553][SQL] remove the no-longer-necessary createCode and createStructCode, and replace the usage

2015-08-04 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 be37b1bd3 -> 43f6b021e


[SPARK-9553][SQL] remove the no-longer-necessary createCode and 
createStructCode, and replace the usage

Author: Wenchen Fan 

Closes #7890 from cloud-fan/minor and squashes the following commits:

c3b1be3 [Wenchen Fan] fix style
b0cbe2e [Wenchen Fan] remove the createCode and createStructCode, and replace 
the usage of them by createStructCode

(cherry picked from commit f4b1ac08a1327e6d0ddc317cdf3997a0f68dec72)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43f6b021
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43f6b021
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/43f6b021

Branch: refs/heads/branch-1.5
Commit: 43f6b021e5f14b9126e4291f989a076085367c2c
Parents: be37b1b
Author: Wenchen Fan 
Authored: Tue Aug 4 14:40:46 2015 -0700
Committer: Reynold Xin 
Committed: Tue Aug 4 14:40:53 2015 -0700

--
 .../codegen/GenerateUnsafeProjection.scala  | 161 ++-
 .../expressions/complexTypeCreator.scala|  10 +-
 2 files changed, 17 insertions(+), 154 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/43f6b021/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
index fc3ecf5..71f8ea0 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeProjection.scala
@@ -117,161 +117,13 @@ object GenerateUnsafeProjection extends 
CodeGenerator[Seq[Expression], UnsafePro
   }
 
   /**
-   * Generates the code to create an [[UnsafeRow]] object based on the input 
expressions.
-   * @param ctx context for code generation
-   * @param ev specifies the name of the variable for the output [[UnsafeRow]] 
object
-   * @param expressions input expressions
-   * @return generated code to put the expression output into an [[UnsafeRow]]
-   */
-  def createCode(ctx: CodeGenContext, ev: GeneratedExpressionCode, 
expressions: Seq[Expression])
-: String = {
-
-val ret = ev.primitive
-ctx.addMutableState("UnsafeRow", ret, s"$ret = new UnsafeRow();")
-val buffer = ctx.freshName("buffer")
-ctx.addMutableState("byte[]", buffer, s"$buffer = new byte[64];")
-val cursor = ctx.freshName("cursor")
-val numBytes = ctx.freshName("numBytes")
-
-val exprs = expressions.map { e => e.dataType match {
-  case st: StructType => createCodeForStruct(ctx, e.gen(ctx), st)
-  case _ => e.gen(ctx)
-}}
-val allExprs = exprs.map(_.code).mkString("\n")
-
-val fixedSize = 8 * exprs.length + 
UnsafeRow.calculateBitSetWidthInBytes(exprs.length)
-val additionalSize = expressions.zipWithIndex.map {
-  case (e, i) => genAdditionalSize(e.dataType, exprs(i))
-}.mkString("")
-
-val writers = expressions.zipWithIndex.map { case (e, i) =>
-  val update = genFieldWriter(ctx, e.dataType, exprs(i), ret, i, cursor)
-  s"""if (${exprs(i).isNull}) {
-$ret.setNullAt($i);
-  } else {
-$update;
-  }"""
-}.mkString("\n  ")
-
-s"""
-  $allExprs
-  int $numBytes = $fixedSize $additionalSize;
-  if ($numBytes > $buffer.length) {
-$buffer = new byte[$numBytes];
-  }
-
-  $ret.pointTo(
-$buffer,
-$PlatformDependent.BYTE_ARRAY_OFFSET,
-${expressions.size},
-$numBytes);
-  int $cursor = $fixedSize;
-
-  $writers
-  boolean ${ev.isNull} = false;
- """
-  }
-
-  /**
-   * Generates the Java code to convert a struct (backed by InternalRow) to 
UnsafeRow.
-   *
-   * This function also handles nested structs by recursively generating the 
code to do conversion.
-   *
-   * @param ctx code generation context
-   * @param input the input struct, identified by a [[GeneratedExpressionCode]]
-   * @param schema schema of the struct field
-   */
-  // TODO: refactor createCode and this function to reduce code duplication.
-  private def createCodeForStruct(
-  ctx: CodeGenContext,
-  input: GeneratedExpressionCode,
-  schema: StructType): GeneratedExpressionCode = {
-
-val isNull = input.isNull
-val primitive = ctx.freshName("structConvert")
-ctx.addMutableState("UnsafeRow", primitive, s"$primitive = new 
UnsafeRow();")
-val buffer = ctx.freshName("buffer")
-ctx.addMuta

spark git commit: [SPARK-9606] [SQL] Ignore flaky thrift server tests

2015-08-04 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 c5250ddc5 -> be37b1bd3


[SPARK-9606] [SQL] Ignore flaky thrift server tests

Author: Michael Armbrust 

Closes #7939 from marmbrus/turnOffThriftTests and squashes the following 
commits:

80d618e [Michael Armbrust] [SPARK-9606][SQL] Ignore flaky thrift server tests

(cherry picked from commit a0cc01759b0c2cecf340c885d391976eb4e3fad6)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/be37b1bd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/be37b1bd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/be37b1bd

Branch: refs/heads/branch-1.5
Commit: be37b1bd3edd8583180dc1a41ecf4d80990216c7
Parents: c5250dd
Author: Michael Armbrust 
Authored: Tue Aug 4 12:19:52 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Aug 4 12:20:14 2015 -0700

--
 .../spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/be37b1bd/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
index 8374629..17e7044 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
@@ -38,7 +38,7 @@ import org.apache.hive.service.cli.thrift.TCLIService.Client
 import org.apache.hive.service.cli.thrift.ThriftCLIServiceClient
 import org.apache.thrift.protocol.TBinaryProtocol
 import org.apache.thrift.transport.TSocket
-import org.scalatest.BeforeAndAfterAll
+import org.scalatest.{Ignore, BeforeAndAfterAll}
 
 import org.apache.spark.{Logging, SparkFunSuite}
 import org.apache.spark.sql.hive.HiveContext
@@ -53,6 +53,7 @@ object TestData {
   val smallKvWithNull = getTestDataFilePath("small_kv_with_null.txt")
 }
 
+@Ignore // SPARK-9606
 class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest {
   override def mode: ServerMode.Value = ServerMode.binary
 
@@ -379,6 +380,7 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   }
 }
 
+@Ignore // SPARK-9606
 class HiveThriftHttpServerSuite extends HiveThriftJdbcTest {
   override def mode: ServerMode.Value = ServerMode.http
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9606] [SQL] Ignore flaky thrift server tests

2015-08-04 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 5a23213c1 -> a0cc01759


[SPARK-9606] [SQL] Ignore flaky thrift server tests

Author: Michael Armbrust 

Closes #7939 from marmbrus/turnOffThriftTests and squashes the following 
commits:

80d618e [Michael Armbrust] [SPARK-9606][SQL] Ignore flaky thrift server tests


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a0cc0175
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a0cc0175
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a0cc0175

Branch: refs/heads/master
Commit: a0cc01759b0c2cecf340c885d391976eb4e3fad6
Parents: 5a23213
Author: Michael Armbrust 
Authored: Tue Aug 4 12:19:52 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Aug 4 12:19:52 2015 -0700

--
 .../spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala| 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a0cc0175/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
--
diff --git 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
index 8374629..17e7044 100644
--- 
a/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
+++ 
b/sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2Suites.scala
@@ -38,7 +38,7 @@ import org.apache.hive.service.cli.thrift.TCLIService.Client
 import org.apache.hive.service.cli.thrift.ThriftCLIServiceClient
 import org.apache.thrift.protocol.TBinaryProtocol
 import org.apache.thrift.transport.TSocket
-import org.scalatest.BeforeAndAfterAll
+import org.scalatest.{Ignore, BeforeAndAfterAll}
 
 import org.apache.spark.{Logging, SparkFunSuite}
 import org.apache.spark.sql.hive.HiveContext
@@ -53,6 +53,7 @@ object TestData {
   val smallKvWithNull = getTestDataFilePath("small_kv_with_null.txt")
 }
 
+@Ignore // SPARK-9606
 class HiveThriftBinaryServerSuite extends HiveThriftJdbcTest {
   override def mode: ServerMode.Value = ServerMode.binary
 
@@ -379,6 +380,7 @@ class HiveThriftBinaryServerSuite extends 
HiveThriftJdbcTest {
   }
 }
 
+@Ignore // SPARK-9606
 class HiveThriftHttpServerSuite extends HiveThriftJdbcTest {
   override def mode: ServerMode.Value = ServerMode.http
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/master 34a0eb2e8 -> 5a23213c1


[SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier

This PR replaces the old "threshold" with a generalized "thresholds" Param.  We 
keep getThreshold,setThreshold for backwards compatibility for binary 
classification.

Note that the primary author of this PR is holdenk

Author: Holden Karau 
Author: Joseph K. Bradley 

Closes #7909 from 
jkbradley/holdenk-SPARK-8069-add-cutoff-aka-threshold-to-random-forest and 
squashes the following commits:

3952977 [Joseph K. Bradley] fixed pyspark doc test
85febc8 [Joseph K. Bradley] made python unit tests a little more robust
7eb1d86 [Joseph K. Bradley] small cleanups
6cc2ed8 [Joseph K. Bradley] Fixed remaining merge issues.
0255e44 [Joseph K. Bradley] Many cleanups for thresholds, some more tests
7565a60 [Holden Karau] fix pep8 style checks, add a getThreshold method similar 
to our LogisticRegression.scala one for API compat
be87f26 [Holden Karau] Convert threshold to thresholds in the python code, add 
specialized support for Array[Double] to shared parems codegen, etc.
6747dad [Holden Karau] Override raw2prediction for ProbabilisticClassifier, fix 
some tests
25df168 [Holden Karau] Fix handling of thresholds in LogisticRegression
c02d6c0 [Holden Karau] No default for thresholds
5e43628 [Holden Karau] CR feedback and fixed the renamed test
f3fbbd1 [Holden Karau] revert the changes to random forest :(
51f581c [Holden Karau] Add explicit types to public methods, fix long line
f7032eb [Holden Karau] Fix a java test bug, remove some unecessary changes
adf15b4 [Holden Karau] rename the classifier suite test to 
ProbabilisticClassifierSuite now that we only have it in Probabilistic
398078a [Holden Karau] move the thresholding around a bunch based on the design 
doc
4893bdc [Holden Karau] Use numtrees of 3 since previous result was tied (one 
tree for each) and the switch from different max methods picked a different 
element (since they were equal I think this is ok)
638854c [Holden Karau] Add a scala RandomForestClassifierSuite test based on 
corresponding python test
e09919c [Holden Karau] Fix return type, I need more coffee
8d92cac [Holden Karau] Use ClassifierParams as the head
3456ed3 [Holden Karau] Add explicit return types even though just test
a0f3b0c [Holden Karau] scala style fixes
6f14314 [Holden Karau] Since hasthreshold/hasthresholds is in root classifier 
now
ffc8dab [Holden Karau] Update the sharedParams
0420290 [Holden Karau] Allow us to override the get methods selectively
978e77a [Holden Karau] Move HasThreshold into classifier params and start 
defining the overloaded getThreshold/getThresholds functions
1433e52 [Holden Karau] Revert "try and hide threshold but chainges the API so 
no dice there"
1f09a2e [Holden Karau] try and hide threshold but chainges the API so no dice 
there
efb9084 [Holden Karau] move setThresholds only to where its used
6b34809 [Holden Karau] Add a test with thresholding for the RFCS
74f54c3 [Holden Karau] Fix creation of vote array
1986fa8 [Holden Karau] Setting the thresholds only makes sense if the 
underlying class hasn't overridden predict, so lets push it down.
2f44b18 [Holden Karau] Add a global default of null for thresholds param
f338cfc [Holden Karau] Wait that wasn't a good idea, Revert "Some progress 
towards unifying threshold and thresholds"
634b06f [Holden Karau] Some progress towards unifying threshold and thresholds
85c9e01 [Holden Karau] Test passes again... little fnur
099c0f3 [Holden Karau] Move thresholds around some more (set on model not 
trainer)
0f46836 [Holden Karau] Start adding a classifiersuite
f70eb5e [Holden Karau] Fix test compile issues
a7d59c8 [Holden Karau] Move thresholding into Classifier trait
5d999d2 [Holden Karau] Some more progress, start adding a test (maybe try and 
see if we can find a better thing to use for the base of the test)
1fed644 [Holden Karau] Use thresholds to scale scores in random forest 
classifcation
31d6bf2 [Holden Karau] Start threading the threshold info through
0ef228c [Holden Karau] Add hasthresholds


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5a23213c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5a23213c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5a23213c

Branch: refs/heads/master
Commit: 5a23213c148bfe362514f9c71f5273ebda0a848a
Parents: 34a0eb2
Author: Holden Karau 
Authored: Tue Aug 4 10:12:22 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 10:12:22 2015 -0700

--
 .../examples/ml/JavaSimpleParamsExample.java|  3 +-
 .../src/main/python/ml/simple_params_example.py |  2 +-
 .../spark/examples/ml/SimpleParamsExample.scala |  2 +-
 .../spark/ml/classification/Classifier.scala|  3 +-
 .../ml/classification/LogisticRegression.scala  | 47 +++--
 .../ProbabilisticCla

spark git commit: [SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier

2015-08-04 Thread jkbradley

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 a9277cd5a -> c5250ddc5


[SPARK-8069] [ML] Add multiclass thresholds for ProbabilisticClassifier

This PR replaces the old "threshold" with a generalized "thresholds" Param.  We 
keep getThreshold,setThreshold for backwards compatibility for binary 
classification.

Note that the primary author of this PR is holdenk

Author: Holden Karau 
Author: Joseph K. Bradley 

Closes #7909 from 
jkbradley/holdenk-SPARK-8069-add-cutoff-aka-threshold-to-random-forest and 
squashes the following commits:

3952977 [Joseph K. Bradley] fixed pyspark doc test
85febc8 [Joseph K. Bradley] made python unit tests a little more robust
7eb1d86 [Joseph K. Bradley] small cleanups
6cc2ed8 [Joseph K. Bradley] Fixed remaining merge issues.
0255e44 [Joseph K. Bradley] Many cleanups for thresholds, some more tests
7565a60 [Holden Karau] fix pep8 style checks, add a getThreshold method similar 
to our LogisticRegression.scala one for API compat
be87f26 [Holden Karau] Convert threshold to thresholds in the python code, add 
specialized support for Array[Double] to shared parems codegen, etc.
6747dad [Holden Karau] Override raw2prediction for ProbabilisticClassifier, fix 
some tests
25df168 [Holden Karau] Fix handling of thresholds in LogisticRegression
c02d6c0 [Holden Karau] No default for thresholds
5e43628 [Holden Karau] CR feedback and fixed the renamed test
f3fbbd1 [Holden Karau] revert the changes to random forest :(
51f581c [Holden Karau] Add explicit types to public methods, fix long line
f7032eb [Holden Karau] Fix a java test bug, remove some unecessary changes
adf15b4 [Holden Karau] rename the classifier suite test to 
ProbabilisticClassifierSuite now that we only have it in Probabilistic
398078a [Holden Karau] move the thresholding around a bunch based on the design 
doc
4893bdc [Holden Karau] Use numtrees of 3 since previous result was tied (one 
tree for each) and the switch from different max methods picked a different 
element (since they were equal I think this is ok)
638854c [Holden Karau] Add a scala RandomForestClassifierSuite test based on 
corresponding python test
e09919c [Holden Karau] Fix return type, I need more coffee
8d92cac [Holden Karau] Use ClassifierParams as the head
3456ed3 [Holden Karau] Add explicit return types even though just test
a0f3b0c [Holden Karau] scala style fixes
6f14314 [Holden Karau] Since hasthreshold/hasthresholds is in root classifier 
now
ffc8dab [Holden Karau] Update the sharedParams
0420290 [Holden Karau] Allow us to override the get methods selectively
978e77a [Holden Karau] Move HasThreshold into classifier params and start 
defining the overloaded getThreshold/getThresholds functions
1433e52 [Holden Karau] Revert "try and hide threshold but chainges the API so 
no dice there"
1f09a2e [Holden Karau] try and hide threshold but chainges the API so no dice 
there
efb9084 [Holden Karau] move setThresholds only to where its used
6b34809 [Holden Karau] Add a test with thresholding for the RFCS
74f54c3 [Holden Karau] Fix creation of vote array
1986fa8 [Holden Karau] Setting the thresholds only makes sense if the 
underlying class hasn't overridden predict, so lets push it down.
2f44b18 [Holden Karau] Add a global default of null for thresholds param
f338cfc [Holden Karau] Wait that wasn't a good idea, Revert "Some progress 
towards unifying threshold and thresholds"
634b06f [Holden Karau] Some progress towards unifying threshold and thresholds
85c9e01 [Holden Karau] Test passes again... little fnur
099c0f3 [Holden Karau] Move thresholds around some more (set on model not 
trainer)
0f46836 [Holden Karau] Start adding a classifiersuite
f70eb5e [Holden Karau] Fix test compile issues
a7d59c8 [Holden Karau] Move thresholding into Classifier trait
5d999d2 [Holden Karau] Some more progress, start adding a test (maybe try and 
see if we can find a better thing to use for the base of the test)
1fed644 [Holden Karau] Use thresholds to scale scores in random forest 
classifcation
31d6bf2 [Holden Karau] Start threading the threshold info through
0ef228c [Holden Karau] Add hasthresholds

(cherry picked from commit 5a23213c148bfe362514f9c71f5273ebda0a848a)
Signed-off-by: Joseph K. Bradley 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c5250ddc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c5250ddc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c5250ddc

Branch: refs/heads/branch-1.5
Commit: c5250ddc5242a071549e980f69fa8bd785168979
Parents: a9277cd
Author: Holden Karau 
Authored: Tue Aug 4 10:12:22 2015 -0700
Committer: Joseph K. Bradley 
Committed: Tue Aug 4 10:12:33 2015 -0700

--
 .../examples/ml/JavaSimpleParamsExample.java|  3 +-
 .../src/main/python/ml/simple_params_example.py |  2 +-
 .../spark/examples/ml/SimpleParamsExample.scala |  2 +-
 .../spark/ml/classification/Class

spark git commit: [SPARK-9512][SQL] Revert SPARK-9251, Allow evaluation while sorting

2015-08-04 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 aa8390dfc -> a9277cd5a


[SPARK-9512][SQL] Revert SPARK-9251, Allow evaluation while sorting

The analysis rule has a bug and we ended up making the sorter still capable of 
doing evaluation, so lets revert this for now.

Author: Michael Armbrust 

Closes #7906 from marmbrus/revertSortProjection and squashes the following 
commits:

2da6972 [Michael Armbrust] unrevert unrelated changes
4f2b00c [Michael Armbrust] Revert "[SPARK-9251][SQL] do not order by 
expressions which still need evaluation"

(cherry picked from commit 34a0eb2e89d59b0823efc035ddf2dc93f19540c1)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a9277cd5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a9277cd5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a9277cd5

Branch: refs/heads/branch-1.5
Commit: a9277cd5aedd570f550e2a807768c8ffada9576f
Parents: aa8390d
Author: Michael Armbrust 
Authored: Tue Aug 4 10:07:53 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Aug 4 10:08:03 2015 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  | 58 
 .../catalyst/plans/logical/basicOperators.scala | 13 ++---
 .../sql/catalyst/analysis/AnalysisSuite.scala   | 36 ++--
 3 files changed, 8 insertions(+), 99 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a9277cd5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index f5daba1..ca17f3e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -79,7 +79,6 @@ class Analyzer(
   ExtractWindowExpressions ::
   GlobalAggregates ::
   UnresolvedHavingClauseAttributes ::
-  RemoveEvaluationFromSort ::
   HiveTypeCoercion.typeCoercionRules ++
   extendedResolutionRules : _*),
 Batch("Nondeterministic", Once,
@@ -955,63 +954,6 @@ class Analyzer(
 Project(p.output, newPlan.withNewChildren(newChild :: Nil))
 }
   }
-
-  /**
-   * Removes all still-need-evaluate ordering expressions from sort and use an 
inner project to
-   * materialize them, finally use a outer project to project them away to 
keep the result same.
-   * Then we can make sure we only sort by [[AttributeReference]]s.
-   *
-   * As an example,
-   * {{{
-   *   Sort('a, 'b + 1,
-   * Relation('a, 'b))
-   * }}}
-   * will be turned into:
-   * {{{
-   *   Project('a, 'b,
-   * Sort('a, '_sortCondition,
-   *   Project('a, 'b, ('b + 1).as("_sortCondition"),
-   * Relation('a, 'b
-   * }}}
-   */
-  object RemoveEvaluationFromSort extends Rule[LogicalPlan] {
-private def hasAlias(expr: Expression) = {
-  expr.find {
-case a: Alias => true
-case _ => false
-  }.isDefined
-}
-
-override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-  // The ordering expressions have no effect to the output schema of 
`Sort`,
-  // so `Alias`s in ordering expressions are unnecessary and we should 
remove them.
-  case s @ Sort(ordering, _, _) if ordering.exists(hasAlias) =>
-val newOrdering = ordering.map(_.transformUp {
-  case Alias(child, _) => child
-}.asInstanceOf[SortOrder])
-s.copy(order = newOrdering)
-
-  case s @ Sort(ordering, global, child)
-if s.expressions.forall(_.resolved) && s.childrenResolved && 
!s.hasNoEvaluation =>
-
-val (ref, needEval) = 
ordering.partition(_.child.isInstanceOf[AttributeReference])
-
-val namedExpr = needEval.map(_.child match {
-  case n: NamedExpression => n
-  case e => Alias(e, "_sortCondition")()
-})
-
-val newOrdering = ref ++ needEval.zip(namedExpr).map { case (order, 
ne) =>
-  order.copy(child = ne.toAttribute)
-}
-
-// Add still-need-evaluate ordering expressions into inner project and 
then project
-// them away after the sort.
-Project(child.output,
-  Sort(newOrdering, global,
-Project(child.output ++ namedExpr, child)))
-}
-  }
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/a9277cd5/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scal

spark git commit: [SPARK-9512][SQL] Revert SPARK-9251, Allow evaluation while sorting

2015-08-04 Thread marmbrus

Repository: spark
Updated Branches:
  refs/heads/master 6a0f8b994 -> 34a0eb2e8


[SPARK-9512][SQL] Revert SPARK-9251, Allow evaluation while sorting

The analysis rule has a bug and we ended up making the sorter still capable of 
doing evaluation, so lets revert this for now.

Author: Michael Armbrust 

Closes #7906 from marmbrus/revertSortProjection and squashes the following 
commits:

2da6972 [Michael Armbrust] unrevert unrelated changes
4f2b00c [Michael Armbrust] Revert "[SPARK-9251][SQL] do not order by 
expressions which still need evaluation"


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/34a0eb2e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/34a0eb2e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/34a0eb2e

Branch: refs/heads/master
Commit: 34a0eb2e89d59b0823efc035ddf2dc93f19540c1
Parents: 6a0f8b9
Author: Michael Armbrust 
Authored: Tue Aug 4 10:07:53 2015 -0700
Committer: Michael Armbrust 
Committed: Tue Aug 4 10:07:53 2015 -0700

--
 .../spark/sql/catalyst/analysis/Analyzer.scala  | 58 
 .../catalyst/plans/logical/basicOperators.scala | 13 ++---
 .../sql/catalyst/analysis/AnalysisSuite.scala   | 36 ++--
 3 files changed, 8 insertions(+), 99 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/34a0eb2e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
index f5daba1..ca17f3e 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
@@ -79,7 +79,6 @@ class Analyzer(
   ExtractWindowExpressions ::
   GlobalAggregates ::
   UnresolvedHavingClauseAttributes ::
-  RemoveEvaluationFromSort ::
   HiveTypeCoercion.typeCoercionRules ++
   extendedResolutionRules : _*),
 Batch("Nondeterministic", Once,
@@ -955,63 +954,6 @@ class Analyzer(
 Project(p.output, newPlan.withNewChildren(newChild :: Nil))
 }
   }
-
-  /**
-   * Removes all still-need-evaluate ordering expressions from sort and use an 
inner project to
-   * materialize them, finally use a outer project to project them away to 
keep the result same.
-   * Then we can make sure we only sort by [[AttributeReference]]s.
-   *
-   * As an example,
-   * {{{
-   *   Sort('a, 'b + 1,
-   * Relation('a, 'b))
-   * }}}
-   * will be turned into:
-   * {{{
-   *   Project('a, 'b,
-   * Sort('a, '_sortCondition,
-   *   Project('a, 'b, ('b + 1).as("_sortCondition"),
-   * Relation('a, 'b
-   * }}}
-   */
-  object RemoveEvaluationFromSort extends Rule[LogicalPlan] {
-private def hasAlias(expr: Expression) = {
-  expr.find {
-case a: Alias => true
-case _ => false
-  }.isDefined
-}
-
-override def apply(plan: LogicalPlan): LogicalPlan = plan transform {
-  // The ordering expressions have no effect to the output schema of 
`Sort`,
-  // so `Alias`s in ordering expressions are unnecessary and we should 
remove them.
-  case s @ Sort(ordering, _, _) if ordering.exists(hasAlias) =>
-val newOrdering = ordering.map(_.transformUp {
-  case Alias(child, _) => child
-}.asInstanceOf[SortOrder])
-s.copy(order = newOrdering)
-
-  case s @ Sort(ordering, global, child)
-if s.expressions.forall(_.resolved) && s.childrenResolved && 
!s.hasNoEvaluation =>
-
-val (ref, needEval) = 
ordering.partition(_.child.isInstanceOf[AttributeReference])
-
-val namedExpr = needEval.map(_.child match {
-  case n: NamedExpression => n
-  case e => Alias(e, "_sortCondition")()
-})
-
-val newOrdering = ref ++ needEval.zip(namedExpr).map { case (order, 
ne) =>
-  order.copy(child = ne.toAttribute)
-}
-
-// Add still-need-evaluate ordering expressions into inner project and 
then project
-// them away after the sort.
-Project(child.output,
-  Sort(newOrdering, global,
-Project(child.output ++ namedExpr, child)))
-}
-  }
 }
 
 /**

http://git-wip-us.apache.org/repos/asf/spark/blob/34a0eb2e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
index aacfc8

spark git commit: [SPARK-9562] Change reference to amplab/spark-ec2 from mesos/

2015-08-04 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/master b5034c9c5 -> 6a0f8b994


[SPARK-9562] Change reference to amplab/spark-ec2 from mesos/

cc srowen pwendell nchammas

Author: Shivaram Venkataraman 

Closes #7899 from shivaram/spark-ec2-move and squashes the following commits:

7cc22c9 [Shivaram Venkataraman] Change reference to amplab/spark-ec2 from mesos/


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6a0f8b99
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6a0f8b99
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6a0f8b99

Branch: refs/heads/master
Commit: 6a0f8b994de36b7a7bdfb9958d39dbd011776107
Parents: b5034c9
Author: Shivaram Venkataraman 
Authored: Tue Aug 4 09:40:07 2015 -0700
Committer: Shivaram Venkataraman 
Committed: Tue Aug 4 09:40:07 2015 -0700

--
 ec2/spark_ec2.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6a0f8b99/ec2/spark_ec2.py
--
diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py
index ccf922d..11fd7ee 100755
--- a/ec2/spark_ec2.py
+++ b/ec2/spark_ec2.py
@@ -90,7 +90,7 @@ DEFAULT_SPARK_VERSION = SPARK_EC2_VERSION
 DEFAULT_SPARK_GITHUB_REPO = "https://github.com/apache/spark";
 
 # Default location to get the spark-ec2 scripts (and ami-list) from
-DEFAULT_SPARK_EC2_GITHUB_REPO = "https://github.com/mesos/spark-ec2";
+DEFAULT_SPARK_EC2_GITHUB_REPO = "https://github.com/amplab/spark-ec2";
 DEFAULT_SPARK_EC2_BRANCH = "branch-1.4"
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9562] Change reference to amplab/spark-ec2 from mesos/

2015-08-04 Thread shivaram

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 d875368ed -> aa8390dfc


[SPARK-9562] Change reference to amplab/spark-ec2 from mesos/

cc srowen pwendell nchammas

Author: Shivaram Venkataraman 

Closes #7899 from shivaram/spark-ec2-move and squashes the following commits:

7cc22c9 [Shivaram Venkataraman] Change reference to amplab/spark-ec2 from mesos/

(cherry picked from commit 6a0f8b994de36b7a7bdfb9958d39dbd011776107)
Signed-off-by: Shivaram Venkataraman 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/aa8390df
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/aa8390df
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/aa8390df

Branch: refs/heads/branch-1.5
Commit: aa8390dfcbb45eeff3d5894cf9b2edbd245b7320
Parents: d875368
Author: Shivaram Venkataraman 
Authored: Tue Aug 4 09:40:07 2015 -0700
Committer: Shivaram Venkataraman 
Committed: Tue Aug 4 09:40:24 2015 -0700

--
 ec2/spark_ec2.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/aa8390df/ec2/spark_ec2.py
--
diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py
index ccf922d..11fd7ee 100755
--- a/ec2/spark_ec2.py
+++ b/ec2/spark_ec2.py
@@ -90,7 +90,7 @@ DEFAULT_SPARK_VERSION = SPARK_EC2_VERSION
 DEFAULT_SPARK_GITHUB_REPO = "https://github.com/apache/spark";
 
 # Default location to get the spark-ec2 scripts (and ami-list) from
-DEFAULT_SPARK_EC2_GITHUB_REPO = "https://github.com/mesos/spark-ec2";
+DEFAULT_SPARK_EC2_GITHUB_REPO = "https://github.com/amplab/spark-ec2";
 DEFAULT_SPARK_EC2_BRANCH = "branch-1.4"
 
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9541] [SQL] DataTimeUtils cleanup

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 b42e13dca -> d875368ed


[SPARK-9541] [SQL] DataTimeUtils cleanup

JIRA: https://issues.apache.org/jira/browse/SPARK-9541

Author: Yijie Shen 

Closes #7870 from yjshen/datetime_cleanup and squashes the following commits:

9203e33 [Yijie Shen] revert getMonth & getDayOfMonth
5cad119 [Yijie Shen] rebase code
7d62a74 [Yijie Shen] remove tmp tuple inside split date
e98aaac [Yijie Shen] DataTimeUtils cleanup

(cherry picked from commit b5034c9c59947f20423faa46bc6606aad56836b0)
Signed-off-by: Davies Liu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d875368e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d875368e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d875368e

Branch: refs/heads/branch-1.5
Commit: d875368edd7265cedf808c921c0af0deb4895a67
Parents: b42e13d
Author: Yijie Shen 
Authored: Tue Aug 4 09:09:52 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 09:10:02 2015 -0700

--
 .../spark/sql/catalyst/util/DateTimeUtils.scala | 142 ---
 1 file changed, 57 insertions(+), 85 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d875368e/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index f645eb5..063940c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -494,6 +494,50 @@ object DateTimeUtils {
   }
 
   /**
+   * Split date (expressed in days since 1.1.1970) into four fields:
+   * year, month (Jan is Month 1), dayInMonth, daysToMonthEnd (0 if it's last 
day of month).
+   */
+  def splitDate(date: Int): (Int, Int, Int, Int) = {
+var (year, dayInYear) = getYearAndDayInYear(date)
+val isLeap = isLeapYear(year)
+if (isLeap && dayInYear == 60) {
+  (year, 2, 29, 0)
+} else {
+  if (isLeap && dayInYear > 60) dayInYear -= 1
+
+  if (dayInYear <= 181) {
+if (dayInYear <= 31) {
+  (year, 1, dayInYear, 31 - dayInYear)
+} else if (dayInYear <= 59) {
+  (year, 2, dayInYear - 31, if (isLeap) 60 - dayInYear else 59 - 
dayInYear)
+} else if (dayInYear <= 90) {
+  (year, 3, dayInYear - 59, 90 - dayInYear)
+} else if (dayInYear <= 120) {
+  (year, 4, dayInYear - 90, 120 - dayInYear)
+} else if (dayInYear <= 151) {
+  (year, 5, dayInYear - 120, 151 - dayInYear)
+} else {
+  (year, 6, dayInYear - 151, 181 - dayInYear)
+}
+  } else {
+if (dayInYear <= 212) {
+  (year, 7, dayInYear - 181, 212 - dayInYear)
+} else if (dayInYear <= 243) {
+  (year, 8, dayInYear - 212, 243 - dayInYear)
+} else if (dayInYear <= 273) {
+  (year, 9, dayInYear - 243, 273 - dayInYear)
+} else if (dayInYear <= 304) {
+  (year, 10, dayInYear - 273, 304 - dayInYear)
+} else if (dayInYear <= 334) {
+  (year, 11, dayInYear - 304, 334 - dayInYear)
+} else {
+  (year, 12, dayInYear - 334, 365 - dayInYear)
+}
+  }
+}
+  }
+
+  /**
* Returns the month value for the given date. The date is expressed in days
* since 1.1.1970. January is month 1.
*/
@@ -613,15 +657,16 @@ object DateTimeUtils {
* Returns a date value, expressed in days since 1.1.1970.
*/
   def dateAddMonths(days: Int, months: Int): Int = {
-val absoluteMonth = (getYear(days) - YearZero) * 12 + getMonth(days) - 1 + 
months
+val (year, monthInYear, dayOfMonth, daysToMonthEnd) = splitDate(days)
+val absoluteMonth = (year - YearZero) * 12 + monthInYear - 1 + months
 val nonNegativeMonth = if (absoluteMonth >= 0) absoluteMonth else 0
 val currentMonthInYear = nonNegativeMonth % 12
 val currentYear = nonNegativeMonth / 12
+
 val leapDay = if (currentMonthInYear == 1 && isLeapYear(currentYear + 
YearZero)) 1 else 0
 val lastDayOfMonth = monthDays(currentMonthInYear) + leapDay
 
-val dayOfMonth = getDayOfMonth(days)
-val currentDayInMonth = if (getDayOfMonth(days + 1) == 1 || dayOfMonth >= 
lastDayOfMonth) {
+val currentDayInMonth = if (daysToMonthEnd == 0 || dayOfMonth >= 
lastDayOfMonth) {
   // last day of the month
   lastDayOfMonth
 } else {
@@ -641,46 +686,6 @@ object DateTimeUtils {
   }
 
   /**
-   * Returns the last dayInMonth in the month it belongs to. The date is 
expressed
-   * in days since 1.1.1970. the return value starts from

spark git commit: [SPARK-9541] [SQL] DataTimeUtils cleanup

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master 73dedb589 -> b5034c9c5


[SPARK-9541] [SQL] DataTimeUtils cleanup

JIRA: https://issues.apache.org/jira/browse/SPARK-9541

Author: Yijie Shen 

Closes #7870 from yjshen/datetime_cleanup and squashes the following commits:

9203e33 [Yijie Shen] revert getMonth & getDayOfMonth
5cad119 [Yijie Shen] rebase code
7d62a74 [Yijie Shen] remove tmp tuple inside split date
e98aaac [Yijie Shen] DataTimeUtils cleanup


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b5034c9c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b5034c9c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b5034c9c

Branch: refs/heads/master
Commit: b5034c9c59947f20423faa46bc6606aad56836b0
Parents: 73dedb5
Author: Yijie Shen 
Authored: Tue Aug 4 09:09:52 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 09:09:52 2015 -0700

--
 .../spark/sql/catalyst/util/DateTimeUtils.scala | 142 ---
 1 file changed, 57 insertions(+), 85 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b5034c9c/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
index f645eb5..063940c 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
@@ -494,6 +494,50 @@ object DateTimeUtils {
   }
 
   /**
+   * Split date (expressed in days since 1.1.1970) into four fields:
+   * year, month (Jan is Month 1), dayInMonth, daysToMonthEnd (0 if it's last 
day of month).
+   */
+  def splitDate(date: Int): (Int, Int, Int, Int) = {
+var (year, dayInYear) = getYearAndDayInYear(date)
+val isLeap = isLeapYear(year)
+if (isLeap && dayInYear == 60) {
+  (year, 2, 29, 0)
+} else {
+  if (isLeap && dayInYear > 60) dayInYear -= 1
+
+  if (dayInYear <= 181) {
+if (dayInYear <= 31) {
+  (year, 1, dayInYear, 31 - dayInYear)
+} else if (dayInYear <= 59) {
+  (year, 2, dayInYear - 31, if (isLeap) 60 - dayInYear else 59 - 
dayInYear)
+} else if (dayInYear <= 90) {
+  (year, 3, dayInYear - 59, 90 - dayInYear)
+} else if (dayInYear <= 120) {
+  (year, 4, dayInYear - 90, 120 - dayInYear)
+} else if (dayInYear <= 151) {
+  (year, 5, dayInYear - 120, 151 - dayInYear)
+} else {
+  (year, 6, dayInYear - 151, 181 - dayInYear)
+}
+  } else {
+if (dayInYear <= 212) {
+  (year, 7, dayInYear - 181, 212 - dayInYear)
+} else if (dayInYear <= 243) {
+  (year, 8, dayInYear - 212, 243 - dayInYear)
+} else if (dayInYear <= 273) {
+  (year, 9, dayInYear - 243, 273 - dayInYear)
+} else if (dayInYear <= 304) {
+  (year, 10, dayInYear - 273, 304 - dayInYear)
+} else if (dayInYear <= 334) {
+  (year, 11, dayInYear - 304, 334 - dayInYear)
+} else {
+  (year, 12, dayInYear - 334, 365 - dayInYear)
+}
+  }
+}
+  }
+
+  /**
* Returns the month value for the given date. The date is expressed in days
* since 1.1.1970. January is month 1.
*/
@@ -613,15 +657,16 @@ object DateTimeUtils {
* Returns a date value, expressed in days since 1.1.1970.
*/
   def dateAddMonths(days: Int, months: Int): Int = {
-val absoluteMonth = (getYear(days) - YearZero) * 12 + getMonth(days) - 1 + 
months
+val (year, monthInYear, dayOfMonth, daysToMonthEnd) = splitDate(days)
+val absoluteMonth = (year - YearZero) * 12 + monthInYear - 1 + months
 val nonNegativeMonth = if (absoluteMonth >= 0) absoluteMonth else 0
 val currentMonthInYear = nonNegativeMonth % 12
 val currentYear = nonNegativeMonth / 12
+
 val leapDay = if (currentMonthInYear == 1 && isLeapYear(currentYear + 
YearZero)) 1 else 0
 val lastDayOfMonth = monthDays(currentMonthInYear) + leapDay
 
-val dayOfMonth = getDayOfMonth(days)
-val currentDayInMonth = if (getDayOfMonth(days + 1) == 1 || dayOfMonth >= 
lastDayOfMonth) {
+val currentDayInMonth = if (daysToMonthEnd == 0 || dayOfMonth >= 
lastDayOfMonth) {
   // last day of the month
   lastDayOfMonth
 } else {
@@ -641,46 +686,6 @@ object DateTimeUtils {
   }
 
   /**
-   * Returns the last dayInMonth in the month it belongs to. The date is 
expressed
-   * in days since 1.1.1970. the return value starts from 1.
-   */
-  private def getLastDayInMonthOfMonth(date: Int): Int = {
-var (year, dayInYear) = getYe

spark git commit: [SPARK-8246] [SQL] Implement get_json_object

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master b1f88a38d -> 73dedb589


[SPARK-8246] [SQL] Implement get_json_object

This is based on #7485 , thanks to NathanHowell

Tests were copied from Hive, but do not seem to be super comprehensive. I've 
generally replicated Hive's unusual behavior rather than following a JSONPath 
reference, except for one case (as noted in the comments). I don't know if 
there is a way of fully replicating Hive's behavior without a slower TreeNode 
implementation, so I've erred on the side of performance instead.

Author: Davies Liu 
Author: Yin Huai 
Author: Nathan Howell 

Closes #7901 from davies/get_json_object and squashes the following commits:

3ace9b9 [Davies Liu] Merge branch 'get_json_object' of github.com:davies/spark 
into get_json_object
98766fc [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
a7dc6d0 [Davies Liu] Update JsonExpressionsSuite.scala
c818519 [Yin Huai] new results.
18ce26b [Davies Liu] fix tests
6ac29fb [Yin Huai] Golden files.
25eebef [Davies Liu] use HiveQuerySuite
e0ac6ec [Yin Huai] Golden answer files.
940c060 [Davies Liu] tweat code style
44084c5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
9192d09 [Nathan Howell] Match Hiveâs behavior for unwrapping arrays of one 
element
8dab647 [Nathan Howell] [SPARK-8246] [SQL] Implement get_json_object


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/73dedb58
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/73dedb58
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/73dedb58

Branch: refs/heads/master
Commit: 73dedb589d06f7c7a525cc4f07721a77f480c434
Parents: b1f88a3
Author: Davies Liu 
Authored: Tue Aug 4 09:07:09 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 09:07:09 2015 -0700

--
 .../catalyst/analysis/FunctionRegistry.scala|   1 +
 .../catalyst/expressions/jsonFunctions.scala| 309 +++
 .../expressions/JsonExpressionsSuite.scala  | 202 
 .../apache/spark/sql/JsonFunctionsSuite.scala   |  32 ++
 .../hive/execution/HiveCompatibilitySuite.scala |   3 +
 .../apache/spark/sql/hive/test/TestHive.scala   |   6 +-
 ...object #1-0-f01b340b5662c45bb5f1e3b7c6900e1f |   1 +
 ...bject #10-0-f3f47d06d7c51d493d68112b0bd6c1fc |   1 +
 ..._object #2-0-e84c2f8136919830fd665a278e4158a |   1 +
 ...object #3-0-bf140c65c31f8d892ec23e41e16e58bb |   1 +
 ...object #4-0-f0bd902edc1990c9a6c65a6bb672c4d5 |   1 +
 ..._object #5-0-3c09f4316a1533049aee8af749cdcab |   1 +
 ...object #6-0-8334d1ddbe0f41fc7b80d4e6b45409da |   1 +
 ...object #7-0-40d7dff94b26a2e3f4ab71baee3d3ce0 |   1 +
 ...object #8-0-180b4b6fdb26011fec05a7ca99fd9844 |   1 +
 ...object #9-0-47c451a969d856f008f4d6b3d378d94b |   1 +
 .../sql/hive/execution/HiveQuerySuite.scala |  51 +++
 17 files changed, 613 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/73dedb58/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 6140d1b..43e3e9b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -179,6 +179,7 @@ object FunctionRegistry {
 expression[Decode]("decode"),
 expression[FindInSet]("find_in_set"),
 expression[FormatNumber]("format_number"),
+expression[GetJsonObject]("get_json_object"),
 expression[InitCap]("initcap"),
 expression[Lower]("lcase"),
 expression[Lower]("lower"),

http://git-wip-us.apache.org/repos/asf/spark/blob/73dedb58/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
new file mode 100644
index 000..23bfa18
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
@@ -0,0 +1,309 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compl

spark git commit: [SPARK-8246] [SQL] Implement get_json_object

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 945da3534 -> b42e13dca


[SPARK-8246] [SQL] Implement get_json_object

This is based on #7485 , thanks to NathanHowell

Tests were copied from Hive, but do not seem to be super comprehensive. I've 
generally replicated Hive's unusual behavior rather than following a JSONPath 
reference, except for one case (as noted in the comments). I don't know if 
there is a way of fully replicating Hive's behavior without a slower TreeNode 
implementation, so I've erred on the side of performance instead.

Author: Davies Liu 
Author: Yin Huai 
Author: Nathan Howell 

Closes #7901 from davies/get_json_object and squashes the following commits:

3ace9b9 [Davies Liu] Merge branch 'get_json_object' of github.com:davies/spark 
into get_json_object
98766fc [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
a7dc6d0 [Davies Liu] Update JsonExpressionsSuite.scala
c818519 [Yin Huai] new results.
18ce26b [Davies Liu] fix tests
6ac29fb [Yin Huai] Golden files.
25eebef [Davies Liu] use HiveQuerySuite
e0ac6ec [Yin Huai] Golden answer files.
940c060 [Davies Liu] tweat code style
44084c5 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
get_json_object
9192d09 [Nathan Howell] Match Hiveâs behavior for unwrapping arrays of one 
element
8dab647 [Nathan Howell] [SPARK-8246] [SQL] Implement get_json_object

(cherry picked from commit 73dedb589d06f7c7a525cc4f07721a77f480c434)
Signed-off-by: Davies Liu 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b42e13dc
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b42e13dc
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b42e13dc

Branch: refs/heads/branch-1.5
Commit: b42e13dca38c6e9ff9cf879bcb52efa681437120
Parents: 945da35
Author: Davies Liu 
Authored: Tue Aug 4 09:07:09 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 09:07:19 2015 -0700

--
 .../catalyst/analysis/FunctionRegistry.scala|   1 +
 .../catalyst/expressions/jsonFunctions.scala| 309 +++
 .../expressions/JsonExpressionsSuite.scala  | 202 
 .../apache/spark/sql/JsonFunctionsSuite.scala   |  32 ++
 .../hive/execution/HiveCompatibilitySuite.scala |   3 +
 .../apache/spark/sql/hive/test/TestHive.scala   |   6 +-
 ...object #1-0-f01b340b5662c45bb5f1e3b7c6900e1f |   1 +
 ...bject #10-0-f3f47d06d7c51d493d68112b0bd6c1fc |   1 +
 ..._object #2-0-e84c2f8136919830fd665a278e4158a |   1 +
 ...object #3-0-bf140c65c31f8d892ec23e41e16e58bb |   1 +
 ...object #4-0-f0bd902edc1990c9a6c65a6bb672c4d5 |   1 +
 ..._object #5-0-3c09f4316a1533049aee8af749cdcab |   1 +
 ...object #6-0-8334d1ddbe0f41fc7b80d4e6b45409da |   1 +
 ...object #7-0-40d7dff94b26a2e3f4ab71baee3d3ce0 |   1 +
 ...object #8-0-180b4b6fdb26011fec05a7ca99fd9844 |   1 +
 ...object #9-0-47c451a969d856f008f4d6b3d378d94b |   1 +
 .../sql/hive/execution/HiveQuerySuite.scala |  51 +++
 17 files changed, 613 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b42e13dc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index 6140d1b..43e3e9b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -179,6 +179,7 @@ object FunctionRegistry {
 expression[Decode]("decode"),
 expression[FindInSet]("find_in_set"),
 expression[FormatNumber]("format_number"),
+expression[GetJsonObject]("get_json_object"),
 expression[InitCap]("initcap"),
 expression[Lower]("lcase"),
 expression[Lower]("lower"),

http://git-wip-us.apache.org/repos/asf/spark/blob/b42e13dc/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
new file mode 100644
index 000..23bfa18
--- /dev/null
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/jsonFunctions.scala
@@ -0,0 +1,309 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to

spark git commit: [SPARK-8244] [SQL] string function: find in set

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 f44b27a2b -> 945da3534


[SPARK-8244] [SQL] string function: find in set

This PR is based on #7186 (just fix the conflict), thanks to tarekauel .

find_in_set(string str, string strList): int

Returns the first occurance of str in strList where strList is a 
comma-delimited string. Returns null if either argument is null. Returns 0 if 
the first argument contains any commas. For example, find_in_set('ab', 
'abc,b,ab,c,def') returns 3.

Only add this to SQL, not DataFrame.

Closes #7186

Author: Tarek Auel 
Author: Davies Liu 

Closes #7900 from davies/find_in_set and squashes the following commits:

4334209 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
8f00572 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
243ede4 [Tarek Auel] [SPARK-8244][SQL] hive compatibility
1aaf64e [Tarek Auel] [SPARK-8244][SQL] unit test fix
e4093a4 [Tarek Auel] [SPARK-8244][SQL] final modifier for COMMA_UTF8
0d05df5 [Tarek Auel] Merge branch 'master' into SPARK-8244
208d710 [Tarek Auel] [SPARK-8244] address comments & bug fix
71b2e69 [Tarek Auel] [SPARK-8244] find_in_set
66c7fda [Tarek Auel] Merge branch 'master' into SPARK-8244
61b8ca2 [Tarek Auel] [SPARK-8224] removed loop and split; use unsafe String 
comparison
4f75a65 [Tarek Auel] Merge branch 'master' into SPARK-8244
e3b20c8 [Tarek Auel] [SPARK-8244] added type check
1c2bbb7 [Tarek Auel] [SPARK-8244] findInSet


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/945da353
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/945da353
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/945da353

Branch: refs/heads/branch-1.5
Commit: 945da3534762a73fe7ffc52c868ff07a0783502b
Parents: f44b27a
Author: Tarek Auel 
Authored: Tue Aug 4 08:59:42 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 09:01:58 2015 -0700

--
 .../catalyst/analysis/FunctionRegistry.scala|  1 +
 .../catalyst/expressions/stringOperations.scala | 25 --
 .../expressions/StringExpressionsSuite.scala| 10 ++
 .../spark/sql/DataFrameFunctionsSuite.scala |  8 +
 .../apache/spark/unsafe/types/UTF8String.java   | 35 ++--
 .../spark/unsafe/types/UTF8StringSuite.java | 12 +++
 6 files changed, 87 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/945da353/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index bc08466..6140d1b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -177,6 +177,7 @@ object FunctionRegistry {
 expression[ConcatWs]("concat_ws"),
 expression[Encode]("encode"),
 expression[Decode]("decode"),
+expression[FindInSet]("find_in_set"),
 expression[FormatNumber]("format_number"),
 expression[InitCap]("initcap"),
 expression[Lower]("lcase"),

http://git-wip-us.apache.org/repos/asf/spark/blob/945da353/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
index 5622529..0cc785d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
@@ -18,8 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import java.text.DecimalFormat
-import java.util.Arrays
-import java.util.Locale
+import java.util.{Arrays, Locale}
 import java.util.regex.{MatchResult, Pattern}
 
 import org.apache.commons.lang3.StringEscapeUtils
@@ -351,6 +350,28 @@ case class EndsWith(left: Expression, right: Expression)
 }
 
 /**
+ * A function that returns the index (1-based) of the given string (left) in 
the comma-
+ * delimited list (right). Returns 0, if the string wasn't found or if the 
given
+ * string (left) contains a comma.
+ */
+case class FindInSet(left: Expression, right: Expression) extends 
BinaryExpression
+with ImplicitCastInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType)
+
+  override protected def nullSafeEva

spark git commit: [SPARK-8244] [SQL] string function: find in set

2015-08-04 Thread davies

Repository: spark
Updated Branches:
  refs/heads/master d702d5373 -> b1f88a38d


[SPARK-8244] [SQL] string function: find in set

This PR is based on #7186 (just fix the conflict), thanks to tarekauel .

find_in_set(string str, string strList): int

Returns the first occurance of str in strList where strList is a 
comma-delimited string. Returns null if either argument is null. Returns 0 if 
the first argument contains any commas. For example, find_in_set('ab', 
'abc,b,ab,c,def') returns 3.

Only add this to SQL, not DataFrame.

Closes #7186

Author: Tarek Auel 
Author: Davies Liu 

Closes #7900 from davies/find_in_set and squashes the following commits:

4334209 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
8f00572 [Davies Liu] Merge branch 'master' of github.com:apache/spark into 
find_in_set
243ede4 [Tarek Auel] [SPARK-8244][SQL] hive compatibility
1aaf64e [Tarek Auel] [SPARK-8244][SQL] unit test fix
e4093a4 [Tarek Auel] [SPARK-8244][SQL] final modifier for COMMA_UTF8
0d05df5 [Tarek Auel] Merge branch 'master' into SPARK-8244
208d710 [Tarek Auel] [SPARK-8244] address comments & bug fix
71b2e69 [Tarek Auel] [SPARK-8244] find_in_set
66c7fda [Tarek Auel] Merge branch 'master' into SPARK-8244
61b8ca2 [Tarek Auel] [SPARK-8224] removed loop and split; use unsafe String 
comparison
4f75a65 [Tarek Auel] Merge branch 'master' into SPARK-8244
e3b20c8 [Tarek Auel] [SPARK-8244] added type check
1c2bbb7 [Tarek Auel] [SPARK-8244] findInSet


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b1f88a38
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b1f88a38
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b1f88a38

Branch: refs/heads/master
Commit: b1f88a38d53aebe7cabb762cdd2f1cc64726b0b4
Parents: d702d53
Author: Tarek Auel 
Authored: Tue Aug 4 08:59:42 2015 -0700
Committer: Davies Liu 
Committed: Tue Aug 4 08:59:42 2015 -0700

--
 .../catalyst/analysis/FunctionRegistry.scala|  1 +
 .../catalyst/expressions/stringOperations.scala | 25 --
 .../expressions/StringExpressionsSuite.scala| 10 ++
 .../spark/sql/DataFrameFunctionsSuite.scala |  8 +
 .../apache/spark/unsafe/types/UTF8String.java   | 35 ++--
 .../spark/unsafe/types/UTF8StringSuite.java | 12 +++
 6 files changed, 87 insertions(+), 4 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b1f88a38/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
index bc08466..6140d1b 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/FunctionRegistry.scala
@@ -177,6 +177,7 @@ object FunctionRegistry {
 expression[ConcatWs]("concat_ws"),
 expression[Encode]("encode"),
 expression[Decode]("decode"),
+expression[FindInSet]("find_in_set"),
 expression[FormatNumber]("format_number"),
 expression[InitCap]("initcap"),
 expression[Lower]("lcase"),

http://git-wip-us.apache.org/repos/asf/spark/blob/b1f88a38/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
index 5622529..0cc785d 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/stringOperations.scala
@@ -18,8 +18,7 @@
 package org.apache.spark.sql.catalyst.expressions
 
 import java.text.DecimalFormat
-import java.util.Arrays
-import java.util.Locale
+import java.util.{Arrays, Locale}
 import java.util.regex.{MatchResult, Pattern}
 
 import org.apache.commons.lang3.StringEscapeUtils
@@ -351,6 +350,28 @@ case class EndsWith(left: Expression, right: Expression)
 }
 
 /**
+ * A function that returns the index (1-based) of the given string (left) in 
the comma-
+ * delimited list (right). Returns 0, if the string wasn't found or if the 
given
+ * string (left) contains a comma.
+ */
+case class FindInSet(left: Expression, right: Expression) extends 
BinaryExpression
+with ImplicitCastInputTypes {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(StringType, StringType)
+
+  override protected def nullSafeEval(word:

spark git commit: [SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.

2015-08-04 Thread sarutak

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 45c8d2bb8 -> f44b27a2b


[SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.

This allows build/mvn to be used by make-distribution.sh.

Author: Marcelo Vanzin 

Closes #7915 from vanzin/SPARK-9583 and squashes the following commits:

6469e60 [Marcelo Vanzin] [SPARK-9583] [build] Do not print mvn debug messages 
to stdout.

(cherry picked from commit d702d53732b44e8242448ce5302738bd130717d8)
Signed-off-by: Kousuke Saruta 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f44b27a2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f44b27a2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/f44b27a2

Branch: refs/heads/branch-1.5
Commit: f44b27a2b92da2325ed9389cd27b6e2cfd9ec486
Parents: 45c8d2b
Author: Marcelo Vanzin 
Authored: Tue Aug 4 22:19:11 2015 +0900
Committer: Kousuke Saruta 
Committed: Tue Aug 4 22:19:31 2015 +0900

--
 build/mvn | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/f44b27a2/build/mvn
--
diff --git a/build/mvn b/build/mvn
index f62f61e..4a1664b 100755
--- a/build/mvn
+++ b/build/mvn
@@ -51,11 +51,11 @@ install_app() {
 # check if we have curl installed
 # download application
 [ ! -f "${local_tarball}" ] && [ $(command -v curl) ] && \
-  echo "exec: curl ${curl_opts} ${remote_tarball}" && \
+  echo "exec: curl ${curl_opts} ${remote_tarball}" 1>&2 && \
   curl ${curl_opts} "${remote_tarball}" > "${local_tarball}"
 # if the file still doesn't exist, lets try `wget` and cross our fingers
 [ ! -f "${local_tarball}" ] && [ $(command -v wget) ] && \
-  echo "exec: wget ${wget_opts} ${remote_tarball}" && \
+  echo "exec: wget ${wget_opts} ${remote_tarball}" 1>&2 && \
   wget ${wget_opts} -O "${local_tarball}" "${remote_tarball}"
 # if both were unsuccessful, exit
 [ ! -f "${local_tarball}" ] && \
@@ -146,7 +146,7 @@ fi
 # Set any `mvn` options if not already present
 export MAVEN_OPTS=${MAVEN_OPTS:-"$_COMPILE_JVM_OPTS"}
 
-echo "Using \`mvn\` from path: $MVN_BIN"
+echo "Using \`mvn\` from path: $MVN_BIN" 1>&2
 
 # Last, call the `mvn` command as usual
 ${MVN_BIN} "$@"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.

2015-08-04 Thread sarutak

Repository: spark
Updated Branches:
  refs/heads/master cb7fa0aa9 -> d702d5373


[SPARK-9583] [BUILD] Do not print mvn debug messages to stdout.

This allows build/mvn to be used by make-distribution.sh.

Author: Marcelo Vanzin 

Closes #7915 from vanzin/SPARK-9583 and squashes the following commits:

6469e60 [Marcelo Vanzin] [SPARK-9583] [build] Do not print mvn debug messages 
to stdout.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d702d537
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d702d537
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d702d537

Branch: refs/heads/master
Commit: d702d53732b44e8242448ce5302738bd130717d8
Parents: cb7fa0a
Author: Marcelo Vanzin 
Authored: Tue Aug 4 22:19:11 2015 +0900
Committer: Kousuke Saruta 
Committed: Tue Aug 4 22:19:11 2015 +0900

--
 build/mvn | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/d702d537/build/mvn
--
diff --git a/build/mvn b/build/mvn
index f62f61e..4a1664b 100755
--- a/build/mvn
+++ b/build/mvn
@@ -51,11 +51,11 @@ install_app() {
 # check if we have curl installed
 # download application
 [ ! -f "${local_tarball}" ] && [ $(command -v curl) ] && \
-  echo "exec: curl ${curl_opts} ${remote_tarball}" && \
+  echo "exec: curl ${curl_opts} ${remote_tarball}" 1>&2 && \
   curl ${curl_opts} "${remote_tarball}" > "${local_tarball}"
 # if the file still doesn't exist, lets try `wget` and cross our fingers
 [ ! -f "${local_tarball}" ] && [ $(command -v wget) ] && \
-  echo "exec: wget ${wget_opts} ${remote_tarball}" && \
+  echo "exec: wget ${wget_opts} ${remote_tarball}" 1>&2 && \
   wget ${wget_opts} -O "${local_tarball}" "${remote_tarball}"
 # if both were unsuccessful, exit
 [ ! -f "${local_tarball}" ] && \
@@ -146,7 +146,7 @@ fi
 # Set any `mvn` options if not already present
 export MAVEN_OPTS=${MAVEN_OPTS:-"$_COMPILE_JVM_OPTS"}
 
-echo "Using \`mvn\` from path: $MVN_BIN"
+echo "Using \`mvn\` from path: $MVN_BIN" 1>&2
 
 # Last, call the `mvn` command as usual
 ${MVN_BIN} "$@"


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page

2015-08-04 Thread sarutak

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 bd9b75213 -> 45c8d2bb8


[SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page

Add pagination for the RDD page to avoid unresponsive UI when the number of the 
RDD partitions is large.
Before:
![rddpagebefore](https://cloud.githubusercontent.com/assets/9278199/8951533/3d9add54-3601-11e5-99d0-5653b473c49b.png)
After:
![rddpageafter](https://cloud.githubusercontent.com/assets/9278199/8951536/439d66e0-3601-11e5-9cee-1b380fe6620d.png)

Author: Carson Wang 

Closes #7692 from carsonwang/SPARK-2016 and squashes the following commits:

03c7168 [Carson Wang] Fix style issues
612c18c [Carson Wang] RDD partition table pagination for the RDD Page

(cherry picked from commit cb7fa0aa93dae5a25a8e7e387dbd6b55a5a23fb0)
Signed-off-by: Kousuke Saruta 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/45c8d2bb
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/45c8d2bb
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/45c8d2bb

Branch: refs/heads/branch-1.5
Commit: 45c8d2bb872bb905a402cf3aa78b1c4efaac07cf
Parents: bd9b752
Author: Carson Wang 
Authored: Tue Aug 4 22:12:30 2015 +0900
Committer: Kousuke Saruta 
Committed: Tue Aug 4 22:12:54 2015 +0900

--
 .../scala/org/apache/spark/ui/PagedTable.scala  |  16 +-
 .../org/apache/spark/ui/jobs/StagePage.scala|  10 +-
 .../org/apache/spark/ui/storage/RDDPage.scala   | 228 ---
 3 files changed, 209 insertions(+), 45 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/45c8d2bb/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala 
b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
index 17d7b39..6e23754 100644
--- a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
+++ b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
@@ -159,9 +159,9 @@ private[ui] trait PagedTable[T] {
   // "goButtonJsFuncName"
   val formJs =
 s"""$$(function(){
-  |  $$( "#form-task-page" ).submit(function(event) {
-  |var page = $$("#form-task-page-no").val()
-  |var pageSize = $$("#form-task-page-size").val()
+  |  $$( "#form-$tableId-page" ).submit(function(event) {
+  |var page = $$("#form-$tableId-page-no").val()
+  |var pageSize = $$("#form-$tableId-page-size").val()
   |pageSize = pageSize ? pageSize: 100;
   |if (page != "") {
   |  ${goButtonJsFuncName}(page, pageSize);
@@ -173,12 +173,14 @@ private[ui] trait PagedTable[T] {
 
   
 
-  
+  
 {totalPages} Pages. Jump to
-
+
 . Show 
-
-tasks in a page.
+
+items in a page.
 Go
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/45c8d2bb/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 3954c3d..0c94204 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -988,8 +988,7 @@ private[ui] class TaskDataSource(
   shuffleRead,
   shuffleWrite,
   bytesSpilled,
-  errorMessage.getOrElse("")
-)
+  errorMessage.getOrElse(""))
   }
 
   /**
@@ -1197,7 +1196,7 @@ private[ui] class TaskPagedTable(
   private val displayPeakExecutionMemory =
 conf.getOption("spark.sql.unsafe.enabled").exists(_.toBoolean)
 
-  override def tableId: String = ""
+  override def tableId: String = "task-table"
 
   override def tableCssClass: String = "table table-bordered table-condensed 
table-striped"
 
@@ -1212,8 +1211,7 @@ private[ui] class TaskPagedTable(
 currentTime,
 pageSize,
 sortColumn,
-desc
-  )
+desc)
 
   override def pageLink(page: Int): String = {
 val encodedSortColumn = URLEncoder.encode(sortColumn, "UTF-8")
@@ -1277,7 +1275,7 @@ private[ui] class TaskPagedTable(
 Seq(("Errors", ""))
 
 if (!taskHeadersAndCssClasses.map(_._1).contains(sortColumn)) {
-  new IllegalArgumentException(s"Unknown column: $sortColumn")
+  throw new IllegalArgumentException(s"Unknown column: $sortColumn")
 }
 
 val headerRow: Seq[Node] = {

http://git-wip-us.apache.org/repos/asf/spark/blob/45c8d2bb/core/src/main/scala/org/apache/spark/ui/storage/RDDPage.scala
--
diff --git a/core/s

spark git commit: [SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page

2015-08-04 Thread sarutak

Repository: spark
Updated Branches:
  refs/heads/master b211cbc73 -> cb7fa0aa9


[SPARK-2016] [WEBUI] RDD partition table pagination for the RDD Page

Add pagination for the RDD page to avoid unresponsive UI when the number of the 
RDD partitions is large.
Before:
![rddpagebefore](https://cloud.githubusercontent.com/assets/9278199/8951533/3d9add54-3601-11e5-99d0-5653b473c49b.png)
After:
![rddpageafter](https://cloud.githubusercontent.com/assets/9278199/8951536/439d66e0-3601-11e5-9cee-1b380fe6620d.png)

Author: Carson Wang 

Closes #7692 from carsonwang/SPARK-2016 and squashes the following commits:

03c7168 [Carson Wang] Fix style issues
612c18c [Carson Wang] RDD partition table pagination for the RDD Page


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cb7fa0aa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cb7fa0aa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cb7fa0aa

Branch: refs/heads/master
Commit: cb7fa0aa93dae5a25a8e7e387dbd6b55a5a23fb0
Parents: b211cbc
Author: Carson Wang 
Authored: Tue Aug 4 22:12:30 2015 +0900
Committer: Kousuke Saruta 
Committed: Tue Aug 4 22:12:30 2015 +0900

--
 .../scala/org/apache/spark/ui/PagedTable.scala  |  16 +-
 .../org/apache/spark/ui/jobs/StagePage.scala|  10 +-
 .../org/apache/spark/ui/storage/RDDPage.scala   | 228 ---
 3 files changed, 209 insertions(+), 45 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/cb7fa0aa/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala 
b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
index 17d7b39..6e23754 100644
--- a/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
+++ b/core/src/main/scala/org/apache/spark/ui/PagedTable.scala
@@ -159,9 +159,9 @@ private[ui] trait PagedTable[T] {
   // "goButtonJsFuncName"
   val formJs =
 s"""$$(function(){
-  |  $$( "#form-task-page" ).submit(function(event) {
-  |var page = $$("#form-task-page-no").val()
-  |var pageSize = $$("#form-task-page-size").val()
+  |  $$( "#form-$tableId-page" ).submit(function(event) {
+  |var page = $$("#form-$tableId-page-no").val()
+  |var pageSize = $$("#form-$tableId-page-size").val()
   |pageSize = pageSize ? pageSize: 100;
   |if (page != "") {
   |  ${goButtonJsFuncName}(page, pageSize);
@@ -173,12 +173,14 @@ private[ui] trait PagedTable[T] {
 
   
 
-  
+  
 {totalPages} Pages. Jump to
-
+
 . Show 
-
-tasks in a page.
+
+items in a page.
 Go
   
 

http://git-wip-us.apache.org/repos/asf/spark/blob/cb7fa0aa/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala 
b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
index 3954c3d..0c94204 100644
--- a/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/jobs/StagePage.scala
@@ -988,8 +988,7 @@ private[ui] class TaskDataSource(
   shuffleRead,
   shuffleWrite,
   bytesSpilled,
-  errorMessage.getOrElse("")
-)
+  errorMessage.getOrElse(""))
   }
 
   /**
@@ -1197,7 +1196,7 @@ private[ui] class TaskPagedTable(
   private val displayPeakExecutionMemory =
 conf.getOption("spark.sql.unsafe.enabled").exists(_.toBoolean)
 
-  override def tableId: String = ""
+  override def tableId: String = "task-table"
 
   override def tableCssClass: String = "table table-bordered table-condensed 
table-striped"
 
@@ -1212,8 +1211,7 @@ private[ui] class TaskPagedTable(
 currentTime,
 pageSize,
 sortColumn,
-desc
-  )
+desc)
 
   override def pageLink(page: Int): String = {
 val encodedSortColumn = URLEncoder.encode(sortColumn, "UTF-8")
@@ -1277,7 +1275,7 @@ private[ui] class TaskPagedTable(
 Seq(("Errors", ""))
 
 if (!taskHeadersAndCssClasses.map(_._1).contains(sortColumn)) {
-  new IllegalArgumentException(s"Unknown column: $sortColumn")
+  throw new IllegalArgumentException(s"Unknown column: $sortColumn")
 }
 
 val headerRow: Seq[Node] = {

http://git-wip-us.apache.org/repos/asf/spark/blob/cb7fa0aa/core/src/main/scala/org/apache/spark/ui/storage/RDDPage.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/storage/RDDPage.scala 
b/core/src/main/scala/org/apache/spark/ui/storage/RD

spark git commit: [SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was accidentally reverted

2015-08-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 5ae675360 -> bd9b75213


[SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was 
accidentally reverted

This PR removes the dependency reduced POM hack brought back by #7191

Author: tedyu 

Closes #7919 from tedyu/master and squashes the following commits:

1bfbd7b [tedyu] [BUILD] Remove dependency reduced POM hack

(cherry picked from commit b211cbc7369af5eb2cb65d93c4c57c4db7143f47)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bd9b7521
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bd9b7521
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bd9b7521

Branch: refs/heads/branch-1.5
Commit: bd9b7521343c34c42be40ee05a01c8a976ed2307
Parents: 5ae6753
Author: tedyu 
Authored: Tue Aug 4 12:22:53 2015 +0100
Committer: Sean Owen 
Committed: Tue Aug 4 12:24:25 2015 +0100

--
 pom.xml | 3 ---
 1 file changed, 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bd9b7521/pom.xml
--
diff --git a/pom.xml b/pom.xml
index b4ee3cc..2bcc55b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -179,9 +179,6 @@
 1.3.9
 0.9.2
 
-
-false
-
 ${java.home}

spark git commit: [SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was accidentally reverted

2015-08-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 76d74090d -> b211cbc73


[SPARK-8064] [BUILD] Follow-up. Undo change from SPARK-9507 that was 
accidentally reverted

This PR removes the dependency reduced POM hack brought back by #7191

Author: tedyu 

Closes #7919 from tedyu/master and squashes the following commits:

1bfbd7b [tedyu] [BUILD] Remove dependency reduced POM hack


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b211cbc7
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b211cbc7
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b211cbc7

Branch: refs/heads/master
Commit: b211cbc7369af5eb2cb65d93c4c57c4db7143f47
Parents: 76d7409
Author: tedyu 
Authored: Tue Aug 4 12:22:53 2015 +0100
Committer: Sean Owen 
Committed: Tue Aug 4 12:23:04 2015 +0100

--
 pom.xml | 3 ---
 1 file changed, 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/b211cbc7/pom.xml
--
diff --git a/pom.xml b/pom.xml
index b4ee3cc..2bcc55b 100644
--- a/pom.xml
+++ b/pom.xml
@@ -179,9 +179,6 @@
 1.3.9
 0.9.2
 
-
-false
-
 ${java.home}

spark git commit: [SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

2015-08-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/branch-1.5 29f2d5a06 -> 5ae675360


[SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build 
warnings, 1.5.0 edition

Enable most javac lint warnings; fix a lot of build warnings. In a few cases, 
touch up surrounding code in the process.

I'll explain several of the changes inline in comments.

Author: Sean Owen 

Closes #7862 from srowen/SPARK-9534 and squashes the following commits:

ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build 
warnings. In a few cases, touch up surrounding code in the process.

(cherry picked from commit 76d74090d60f74412bd45487e8db6aff2e8343a2)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5ae67536
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5ae67536
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5ae67536

Branch: refs/heads/branch-1.5
Commit: 5ae675360d883483e509788b8867c1c98b4820fd
Parents: 29f2d5a
Author: Sean Owen 
Authored: Tue Aug 4 12:02:26 2015 +0100
Committer: Sean Owen 
Committed: Tue Aug 4 12:02:45 2015 +0100

--
 .../java/JavaSparkContextVarargsWorkaround.java | 19 +++---
 .../spark/storage/TachyonBlockManager.scala |  9 ++-
 .../deploy/master/PersistenceEngineSuite.scala  | 13 ++--
 .../mesos/MesosSchedulerUtilsSuite.scala|  3 +
 .../spark/examples/ml/JavaOneVsRestExample.java |  1 +
 .../streaming/JavaStatefulNetworkWordCount.java |  4 +-
 .../kafka/JavaDirectKafkaStreamSuite.java   |  2 +-
 .../evaluation/JavaRankingMetricsSuite.java | 14 ++---
 .../ml/classification/NaiveBayesSuite.scala |  4 +-
 .../network/protocol/ChunkFetchFailure.java |  5 ++
 .../network/protocol/ChunkFetchRequest.java |  5 ++
 .../network/protocol/ChunkFetchSuccess.java |  5 ++
 .../spark/network/protocol/RpcFailure.java  |  5 ++
 .../spark/network/protocol/RpcRequest.java  |  5 ++
 .../spark/network/protocol/RpcResponse.java |  5 ++
 .../apache/spark/network/TestManagedBuffer.java |  5 ++
 .../spark/network/sasl/SparkSaslSuite.java  | 16 ++---
 .../ExternalShuffleBlockHandlerSuite.java   |  6 +-
 .../shuffle/RetryingBlockFetcherSuite.java  | 47 ---
 pom.xml |  4 ++
 .../apache/spark/sql/JavaDataFrameSuite.java|  1 +
 .../spark/sql/sources/TableScanSuite.scala  | 16 +++--
 .../sql/hive/client/IsolatedClientLoader.scala  |  2 +-
 .../spark/sql/hive/execution/commands.scala |  2 +-
 .../spark/sql/hive/JavaDataFrameSuite.java  |  2 +-
 .../sql/hive/execution/SQLQuerySuite.scala  |  4 +-
 .../streaming/scheduler/JobScheduler.scala  |  4 +-
 .../spark/streaming/JavaWriteAheadLogSuite.java | 62 ++--
 .../spark/streaming/UISeleniumSuite.scala   |  4 +-
 29 files changed, 167 insertions(+), 107 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5ae67536/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
--
diff --git 
a/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
 
b/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
index 2090efd..d4c42b3 100644
--- 
a/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
+++ 
b/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
@@ -23,11 +23,13 @@ import java.util.List;
 // See
 // 
http://scala-programming-language.1934581.n4.nabble.com/Workaround-for-implementing-java-varargs-in-2-7-2-final-tp1944767p1944772.html
 abstract class JavaSparkContextVarargsWorkaround {
-  public  JavaRDD union(JavaRDD... rdds) {
+
+  @SafeVarargs
+  public final  JavaRDD union(JavaRDD... rdds) {
 if (rdds.length == 0) {
   throw new IllegalArgumentException("Union called on empty list");
 }
-ArrayList> rest = new ArrayList>(rdds.length - 1);
+List> rest = new ArrayList<>(rdds.length - 1);
 for (int i = 1; i < rdds.length; i++) {
   rest.add(rdds[i]);
 }
@@ -38,18 +40,19 @@ abstract class JavaSparkContextVarargsWorkaround {
 if (rdds.length == 0) {
   throw new IllegalArgumentException("Union called on empty list");
 }
-ArrayList rest = new ArrayList(rdds.length - 
1);
+List rest = new ArrayList<>(rdds.length - 1);
 for (int i = 1; i < rdds.length; i++) {
   rest.add(rdds[i]);
 }
 return union(rdds[0], rest);
   }
 
-  public  JavaPairRDD union(JavaPairRDD... rdds) {
+  @SafeVarargs
+  public final  JavaPairRDD union(JavaPairRDD... rdds) {
 if (rdds.length == 0) {
   throw new IllegalArgumentException("Union called on empty list");
 }
-ArrayList> rest = n

spark git commit: [SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build warnings, 1.5.0 edition

2015-08-04 Thread srowen

Repository: spark
Updated Branches:
  refs/heads/master 9e952ecbc -> 76d74090d


[SPARK-9534] [BUILD] Enable javac lint for scalac parity; fix a lot of build 
warnings, 1.5.0 edition

Enable most javac lint warnings; fix a lot of build warnings. In a few cases, 
touch up surrounding code in the process.

I'll explain several of the changes inline in comments.

Author: Sean Owen 

Closes #7862 from srowen/SPARK-9534 and squashes the following commits:

ea51618 [Sean Owen] Enable most javac lint warnings; fix a lot of build 
warnings. In a few cases, touch up surrounding code in the process.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76d74090
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76d74090
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/76d74090

Branch: refs/heads/master
Commit: 76d74090d60f74412bd45487e8db6aff2e8343a2
Parents: 9e952ec
Author: Sean Owen 
Authored: Tue Aug 4 12:02:26 2015 +0100
Committer: Sean Owen 
Committed: Tue Aug 4 12:02:26 2015 +0100

--
 .../java/JavaSparkContextVarargsWorkaround.java | 19 +++---
 .../spark/storage/TachyonBlockManager.scala |  9 ++-
 .../deploy/master/PersistenceEngineSuite.scala  | 13 ++--
 .../mesos/MesosSchedulerUtilsSuite.scala|  3 +
 .../spark/examples/ml/JavaOneVsRestExample.java |  1 +
 .../streaming/JavaStatefulNetworkWordCount.java |  4 +-
 .../kafka/JavaDirectKafkaStreamSuite.java   |  2 +-
 .../evaluation/JavaRankingMetricsSuite.java | 14 ++---
 .../ml/classification/NaiveBayesSuite.scala |  4 +-
 .../network/protocol/ChunkFetchFailure.java |  5 ++
 .../network/protocol/ChunkFetchRequest.java |  5 ++
 .../network/protocol/ChunkFetchSuccess.java |  5 ++
 .../spark/network/protocol/RpcFailure.java  |  5 ++
 .../spark/network/protocol/RpcRequest.java  |  5 ++
 .../spark/network/protocol/RpcResponse.java |  5 ++
 .../apache/spark/network/TestManagedBuffer.java |  5 ++
 .../spark/network/sasl/SparkSaslSuite.java  | 16 ++---
 .../ExternalShuffleBlockHandlerSuite.java   |  6 +-
 .../shuffle/RetryingBlockFetcherSuite.java  | 47 ---
 pom.xml |  4 ++
 .../apache/spark/sql/JavaDataFrameSuite.java|  1 +
 .../spark/sql/sources/TableScanSuite.scala  | 16 +++--
 .../sql/hive/client/IsolatedClientLoader.scala  |  2 +-
 .../spark/sql/hive/execution/commands.scala |  2 +-
 .../spark/sql/hive/JavaDataFrameSuite.java  |  2 +-
 .../sql/hive/execution/SQLQuerySuite.scala  |  4 +-
 .../streaming/scheduler/JobScheduler.scala  |  4 +-
 .../spark/streaming/JavaWriteAheadLogSuite.java | 62 ++--
 .../spark/streaming/UISeleniumSuite.scala   |  4 +-
 29 files changed, 167 insertions(+), 107 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/76d74090/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
--
diff --git 
a/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
 
b/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
index 2090efd..d4c42b3 100644
--- 
a/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
+++ 
b/core/src/main/java/org/apache/spark/api/java/JavaSparkContextVarargsWorkaround.java
@@ -23,11 +23,13 @@ import java.util.List;
 // See
 // 
http://scala-programming-language.1934581.n4.nabble.com/Workaround-for-implementing-java-varargs-in-2-7-2-final-tp1944767p1944772.html
 abstract class JavaSparkContextVarargsWorkaround {
-  public  JavaRDD union(JavaRDD... rdds) {
+
+  @SafeVarargs
+  public final  JavaRDD union(JavaRDD... rdds) {
 if (rdds.length == 0) {
   throw new IllegalArgumentException("Union called on empty list");
 }
-ArrayList> rest = new ArrayList>(rdds.length - 1);
+List> rest = new ArrayList<>(rdds.length - 1);
 for (int i = 1; i < rdds.length; i++) {
   rest.add(rdds[i]);
 }
@@ -38,18 +40,19 @@ abstract class JavaSparkContextVarargsWorkaround {
 if (rdds.length == 0) {
   throw new IllegalArgumentException("Union called on empty list");
 }
-ArrayList rest = new ArrayList(rdds.length - 
1);
+List rest = new ArrayList<>(rdds.length - 1);
 for (int i = 1; i < rdds.length; i++) {
   rest.add(rdds[i]);
 }
 return union(rdds[0], rest);
   }
 
-  public  JavaPairRDD union(JavaPairRDD... rdds) {
+  @SafeVarargs
+  public final  JavaPairRDD union(JavaPairRDD... rdds) {
 if (rdds.length == 0) {
   throw new IllegalArgumentException("Union called on empty list");
 }
-ArrayList> rest = new ArrayList>(rdds.length - 1);
+List> rest = new ArrayList<>(rdds.length - 1);
 for (int i = 1;

61 matches

Mail list logo