spark git commit: [SPARK-5539][MLLIB] LDA guide

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 955f2863e -> 5782ee29e


[SPARK-5539][MLLIB] LDA guide

This is the LDA user guide from jkbradley with Java and Scala code example.

Author: Xiangrui Meng 
Author: Joseph K. Bradley 

Closes #4465 from mengxr/lda-guide and squashes the following commits:

6dcb7d1 [Xiangrui Meng] update java example in the user guide
76169ff [Xiangrui Meng] update java example
36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
lda-guide
c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example 
(which is in the guide and probably should be removed).

(cherry picked from commit 855d12ac0a9cdade4cd2cc64c4e7209478be6690)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5782ee29
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5782ee29
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5782ee29

Branch: refs/heads/branch-1.3
Commit: 5782ee29eb273b1f87a07fd624bbf228d2597b98
Parents: 955f286
Author: Xiangrui Meng 
Authored: Sun Feb 8 23:40:36 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 23:40:44 2015 -0800

--
 data/mllib/sample_lda_data.txt  |  12 ++
 docs/mllib-clustering.md| 129 ++-
 .../spark/examples/mllib/JavaLDAExample.java|  75 +++
 3 files changed, 215 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5782ee29/data/mllib/sample_lda_data.txt
--
diff --git a/data/mllib/sample_lda_data.txt b/data/mllib/sample_lda_data.txt
new file mode 100644
index 000..2e76702
--- /dev/null
+++ b/data/mllib/sample_lda_data.txt
@@ -0,0 +1,12 @@
+1 2 6 0 2 3 1 1 0 0 3
+1 3 0 1 3 0 0 2 0 0 1
+1 4 1 0 0 4 9 0 1 2 0
+2 1 0 3 0 0 5 0 2 3 9
+3 1 1 9 3 0 2 0 0 1 3
+4 2 0 3 4 5 1 1 1 4 0
+2 1 0 3 0 0 5 0 2 2 9
+1 1 1 9 2 1 2 0 0 1 3
+4 4 0 3 4 2 1 3 0 0 0
+2 8 2 0 3 0 2 0 2 7 2
+1 1 1 9 0 2 2 0 0 3 3
+4 1 0 0 4 5 1 3 0 1 0

http://git-wip-us.apache.org/repos/asf/spark/blob/5782ee29/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 1e9ef34..99ed6b6 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -55,7 +55,7 @@ has the following parameters:
 
 Power iteration clustering is a scalable and efficient algorithm for 
clustering points given pointwise mutual affinity values.  Internally the 
algorithm:
 
-* accepts a 
[Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph)
 that represents a  normalized pairwise affinity between all input points.
+* accepts a [Graph](api/graphx/index.html#org.apache.spark.graphx.Graph) that 
represents a  normalized pairwise affinity between all input points.
 * calculates the principal eigenvalue and eigenvector
 * Clusters each of the input points according to their principal eigenvector 
component value
 
@@ -71,6 +71,35 @@ Example outputs for a dataset inspired by the paper - but 
with five clusters ins
   
 
 
+### Latent Dirichlet Allocation (LDA)
+
+[Latent Dirichlet Allocation 
(LDA)](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)
+is a topic model which infers topics from a collection of text documents.
+LDA can be thought of as a clustering algorithm as follows:
+
+* Topics correspond to cluster centers, and documents correspond to examples 
(rows) in a dataset.
+* Topics and documents both exist in a feature space, where feature vectors 
are vectors of word counts.
+* Rather than estimating a clustering using a traditional distance, LDA uses a 
function based
+ on a statistical model of how text documents are generated.
+
+LDA takes in a collection of documents as vectors of word counts.
+It learns clustering using 
[expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
+on the likelihood function. After fitting on the documents, LDA provides:
+
+* Topics: Inferred topics, each of which is a probability distribution over 
terms (words).
+* Topic distributions for documents: For each document in the training set, 
LDA gives a probability distribution over topics.
+
+LDA takes the following parameters:
+
+* `k`: Number of topics (i.e., cluster centers)
+* `maxIterations`: Limit on the number of iterations of EM used for learning
+* `docConcentration`: Hyperparameter for prior over documents' distributions 
over topics. Currently must be > 1, where larger values encourage smoother 
inferred distributions.
+* `topicConcentration`: Hyperparameter for prior over topics' distributions 
over terms (words). Currently must be > 1, where larger values encourage 

spark git commit: [SPARK-5539][MLLIB] LDA guide

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 4575c5643 -> 855d12ac0


[SPARK-5539][MLLIB] LDA guide

This is the LDA user guide from jkbradley with Java and Scala code example.

Author: Xiangrui Meng 
Author: Joseph K. Bradley 

Closes #4465 from mengxr/lda-guide and squashes the following commits:

6dcb7d1 [Xiangrui Meng] update java example in the user guide
76169ff [Xiangrui Meng] update java example
36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into 
lda-guide
c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example 
(which is in the guide and probably should be removed).


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/855d12ac
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/855d12ac
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/855d12ac

Branch: refs/heads/master
Commit: 855d12ac0a9cdade4cd2cc64c4e7209478be6690
Parents: 4575c56
Author: Xiangrui Meng 
Authored: Sun Feb 8 23:40:36 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 23:40:36 2015 -0800

--
 data/mllib/sample_lda_data.txt  |  12 ++
 docs/mllib-clustering.md| 129 ++-
 .../spark/examples/mllib/JavaLDAExample.java|  75 +++
 3 files changed, 215 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/855d12ac/data/mllib/sample_lda_data.txt
--
diff --git a/data/mllib/sample_lda_data.txt b/data/mllib/sample_lda_data.txt
new file mode 100644
index 000..2e76702
--- /dev/null
+++ b/data/mllib/sample_lda_data.txt
@@ -0,0 +1,12 @@
+1 2 6 0 2 3 1 1 0 0 3
+1 3 0 1 3 0 0 2 0 0 1
+1 4 1 0 0 4 9 0 1 2 0
+2 1 0 3 0 0 5 0 2 3 9
+3 1 1 9 3 0 2 0 0 1 3
+4 2 0 3 4 5 1 1 1 4 0
+2 1 0 3 0 0 5 0 2 2 9
+1 1 1 9 2 1 2 0 0 1 3
+4 4 0 3 4 2 1 3 0 0 0
+2 8 2 0 3 0 2 0 2 7 2
+1 1 1 9 0 2 2 0 0 3 3
+4 1 0 0 4 5 1 3 0 1 0

http://git-wip-us.apache.org/repos/asf/spark/blob/855d12ac/docs/mllib-clustering.md
--
diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md
index 1e9ef34..99ed6b6 100644
--- a/docs/mllib-clustering.md
+++ b/docs/mllib-clustering.md
@@ -55,7 +55,7 @@ has the following parameters:
 
 Power iteration clustering is a scalable and efficient algorithm for 
clustering points given pointwise mutual affinity values.  Internally the 
algorithm:
 
-* accepts a 
[Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph)
 that represents a  normalized pairwise affinity between all input points.
+* accepts a [Graph](api/graphx/index.html#org.apache.spark.graphx.Graph) that 
represents a  normalized pairwise affinity between all input points.
 * calculates the principal eigenvalue and eigenvector
 * Clusters each of the input points according to their principal eigenvector 
component value
 
@@ -71,6 +71,35 @@ Example outputs for a dataset inspired by the paper - but 
with five clusters ins
   
 
 
+### Latent Dirichlet Allocation (LDA)
+
+[Latent Dirichlet Allocation 
(LDA)](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation)
+is a topic model which infers topics from a collection of text documents.
+LDA can be thought of as a clustering algorithm as follows:
+
+* Topics correspond to cluster centers, and documents correspond to examples 
(rows) in a dataset.
+* Topics and documents both exist in a feature space, where feature vectors 
are vectors of word counts.
+* Rather than estimating a clustering using a traditional distance, LDA uses a 
function based
+ on a statistical model of how text documents are generated.
+
+LDA takes in a collection of documents as vectors of word counts.
+It learns clustering using 
[expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm)
+on the likelihood function. After fitting on the documents, LDA provides:
+
+* Topics: Inferred topics, each of which is a probability distribution over 
terms (words).
+* Topic distributions for documents: For each document in the training set, 
LDA gives a probability distribution over topics.
+
+LDA takes the following parameters:
+
+* `k`: Number of topics (i.e., cluster centers)
+* `maxIterations`: Limit on the number of iterations of EM used for learning
+* `docConcentration`: Hyperparameter for prior over documents' distributions 
over topics. Currently must be > 1, where larger values encourage smoother 
inferred distributions.
+* `topicConcentration`: Hyperparameter for prior over topics' distributions 
over terms (words). Currently must be > 1, where larger values encourage 
smoother inferred distributions.
+* `checkpointInterval`: If using checkpointing (set in the Spark 
configu

spark git commit: [SPARK-5472][SQL] Fix Scala code style

2015-02-08 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 4396dfb37 -> 4575c5643


[SPARK-5472][SQL] Fix Scala code style

Fix Scala code style.

Author: Hung Lin 

Closes #4464 from hunglin/SPARK-5472 and squashes the following commits:

ef7a3b3 [Hung Lin] SPARK-5472: fix scala style


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4575c564
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4575c564
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4575c564

Branch: refs/heads/master
Commit: 4575c5643a82818bf64f9648314bdc2fdc12febb
Parents: 4396dfb
Author: Hung Lin 
Authored: Sun Feb 8 22:36:42 2015 -0800
Committer: Reynold Xin 
Committed: Sun Feb 8 22:36:42 2015 -0800

--
 .../org/apache/spark/sql/jdbc/JDBCRDD.scala | 42 ++--
 .../apache/spark/sql/jdbc/JDBCRelation.scala| 35 +---
 2 files changed, 41 insertions(+), 36 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4575c564/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
index a2f9467..0bec32c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
@@ -17,13 +17,10 @@
 
 package org.apache.spark.sql.jdbc
 
-import java.sql.{Connection, DatabaseMetaData, DriverManager, ResultSet, 
ResultSetMetaData, SQLException}
-import scala.collection.mutable.ArrayBuffer
+import java.sql.{Connection, DriverManager, ResultSet, ResultSetMetaData, 
SQLException}
 
 import org.apache.spark.{Logging, Partition, SparkContext, TaskContext}
 import org.apache.spark.rdd.RDD
-import org.apache.spark.util.NextIterator
-import org.apache.spark.sql.catalyst.analysis.HiveTypeCoercion
 import org.apache.spark.sql.catalyst.expressions.{Row, SpecificMutableRow}
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.sources._
@@ -100,7 +97,7 @@ private[sql] object JDBCRDD extends Logging {
   try {
 val rsmd = rs.getMetaData
 val ncols = rsmd.getColumnCount
-var fields = new Array[StructField](ncols);
+val fields = new Array[StructField](ncols)
 var i = 0
 while (i < ncols) {
   val columnName = rsmd.getColumnName(i + 1)
@@ -176,23 +173,27 @@ private[sql] object JDBCRDD extends Logging {
*
* @return An RDD representing "SELECT requiredColumns FROM fqTable".
*/
-  def scanTable(sc: SparkContext,
-schema: StructType,
-driver: String,
-url: String,
-fqTable: String,
-requiredColumns: Array[String],
-filters: Array[Filter],
-parts: Array[Partition]): RDD[Row] = {
+  def scanTable(
+  sc: SparkContext,
+  schema: StructType,
+  driver: String,
+  url: String,
+  fqTable: String,
+  requiredColumns: Array[String],
+  filters: Array[Filter],
+  parts: Array[Partition]): RDD[Row] = {
+
 val prunedSchema = pruneSchema(schema, requiredColumns)
 
-return new JDBCRDD(sc,
-getConnector(driver, url),
-prunedSchema,
-fqTable,
-requiredColumns,
-filters,
-parts)
+return new
+JDBCRDD(
+  sc,
+  getConnector(driver, url),
+  prunedSchema,
+  fqTable,
+  requiredColumns,
+  filters,
+  parts)
   }
 }
 
@@ -412,6 +413,5 @@ private[sql] class JDBCRDD(
   gotNext = false
   nextValue
 }
-
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/4575c564/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
index e09125e..66ad38e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
@@ -96,7 +96,8 @@ private[sql] class DefaultSource extends RelationProvider {
 
 if (driver != null) Class.forName(driver)
 
-if (   partitionColumn != null
+if (
+  partitionColumn != null
 && (lowerBound == null || upperBound == null || numPartitions == 
null)) {
   sys.error("Partitioning incompletely specified")
 }
@@ -104,30 +105,34 @@ private[sql] class DefaultSource extends RelationProvider 
{
 val partitionInfo = if (partitionColumn == null) {
   null
 } else {
-  JDBCPartitioningInfo(partitionCol

spark git commit: [SPARK-5472][SQL] Fix Scala code style

2015-02-08 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 fa8ea48f2 -> 955f2863e


[SPARK-5472][SQL] Fix Scala code style

Fix Scala code style.

Author: Hung Lin 

Closes #4464 from hunglin/SPARK-5472 and squashes the following commits:

ef7a3b3 [Hung Lin] SPARK-5472: fix scala style

(cherry picked from commit 4575c5643a82818bf64f9648314bdc2fdc12febb)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/955f2863
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/955f2863
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/955f2863

Branch: refs/heads/branch-1.3
Commit: 955f2863e39a96c0b00ad7d3eac972bb1cfcb594
Parents: fa8ea48
Author: Hung Lin 
Authored: Sun Feb 8 22:36:42 2015 -0800
Committer: Reynold Xin 
Committed: Sun Feb 8 22:36:51 2015 -0800

--
 .../org/apache/spark/sql/jdbc/JDBCRDD.scala | 42 ++--
 .../apache/spark/sql/jdbc/JDBCRelation.scala| 35 +---
 2 files changed, 41 insertions(+), 36 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/955f2863/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
--
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
index a2f9467..0bec32c 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala
@@ -17,13 +17,10 @@
 
 package org.apache.spark.sql.jdbc
 
-import java.sql.{Connection, DatabaseMetaData, DriverManager, ResultSet, 
ResultSetMetaData, SQLException}
-import scala.collection.mutable.ArrayBuffer
+import java.sql.{Connection, DriverManager, ResultSet, ResultSetMetaData, 
SQLException}
 
 import org.apache.spark.{Logging, Partition, SparkContext, TaskContext}
 import org.apache.spark.rdd.RDD
-import org.apache.spark.util.NextIterator
-import org.apache.spark.sql.catalyst.analysis.HiveTypeCoercion
 import org.apache.spark.sql.catalyst.expressions.{Row, SpecificMutableRow}
 import org.apache.spark.sql.types._
 import org.apache.spark.sql.sources._
@@ -100,7 +97,7 @@ private[sql] object JDBCRDD extends Logging {
   try {
 val rsmd = rs.getMetaData
 val ncols = rsmd.getColumnCount
-var fields = new Array[StructField](ncols);
+val fields = new Array[StructField](ncols)
 var i = 0
 while (i < ncols) {
   val columnName = rsmd.getColumnName(i + 1)
@@ -176,23 +173,27 @@ private[sql] object JDBCRDD extends Logging {
*
* @return An RDD representing "SELECT requiredColumns FROM fqTable".
*/
-  def scanTable(sc: SparkContext,
-schema: StructType,
-driver: String,
-url: String,
-fqTable: String,
-requiredColumns: Array[String],
-filters: Array[Filter],
-parts: Array[Partition]): RDD[Row] = {
+  def scanTable(
+  sc: SparkContext,
+  schema: StructType,
+  driver: String,
+  url: String,
+  fqTable: String,
+  requiredColumns: Array[String],
+  filters: Array[Filter],
+  parts: Array[Partition]): RDD[Row] = {
+
 val prunedSchema = pruneSchema(schema, requiredColumns)
 
-return new JDBCRDD(sc,
-getConnector(driver, url),
-prunedSchema,
-fqTable,
-requiredColumns,
-filters,
-parts)
+return new
+JDBCRDD(
+  sc,
+  getConnector(driver, url),
+  prunedSchema,
+  fqTable,
+  requiredColumns,
+  filters,
+  parts)
   }
 }
 
@@ -412,6 +413,5 @@ private[sql] class JDBCRDD(
   gotNext = false
   nextValue
 }
-
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/955f2863/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
--
diff --git 
a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
index e09125e..66ad38e 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala
@@ -96,7 +96,8 @@ private[sql] class DefaultSource extends RelationProvider {
 
 if (driver != null) Class.forName(driver)
 
-if (   partitionColumn != null
+if (
+  partitionColumn != null
 && (lowerBound == null || upperBound == null || numPartitions == 
null)) {
   sys.error("Partitioning incompletely specified")
 }
@@ -104,30 +105,34 @@ private[sql] class DefaultSource extends RelationProvider 
{
 val partiti

svn commit: r7966 - /dev/spark/spark-1.2.1-rc3/ /release/spark/spark-1.2.1/

2015-02-08 Thread pwendell
Author: pwendell
Date: Mon Feb  9 06:34:02 2015
New Revision: 7966

Log:
Spark release 1.2.1

Added:
release/spark/spark-1.2.1/
  - copied from r7965, dev/spark/spark-1.2.1-rc3/
Removed:
dev/spark/spark-1.2.1-rc3/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



svn commit: r7965 - /dev/spark/spark-1.2.1-rc3/

2015-02-08 Thread pwendell
Author: pwendell
Date: Mon Feb  9 06:29:04 2015
New Revision: 7965

Log:
Adding Spark 1.2.1 RC3

Added:
dev/spark/spark-1.2.1-rc3/
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.asc   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz   (with 
props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.asc   (with 
props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz.asc   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz.md5
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz.sha
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz.asc   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz.md5
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz.sha
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz.asc   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz.md5
dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz.sha
dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz.asc   (with props)
dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz.md5
dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz.sha

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz
==
Binary file - no diff available.

Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz
--
svn:mime-type = application/x-gzip

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.asc
==
Binary file - no diff available.

Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.asc
--
svn:mime-type = application/pgp-signature

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5
==
--- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5 (added)
+++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5 Mon Feb  9 06:29:04 
2015
@@ -0,0 +1 @@
+spark-1.2.1-bin-cdh4.tgz: 9C 18 E5 43 F9 32 3C 2A  6A A9 C1 0C 11 F9 05 58

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha
==
--- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha (added)
+++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha Mon Feb  9 06:29:04 
2015
@@ -0,0 +1,3 @@
+spark-1.2.1-bin-cdh4.tgz: 208BD991 F14AD9A4 54A26F97 64A3AB8D 290E55B4 D1275E51
+  CEAC7E11 F797B55D 2B59BE38 F0186E43 A66B5FFE 281C546D
+  F7C3511B B1FD8A0A B495E5AC AD207A4F

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz
==
Binary file - no diff available.

Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz
--
svn:mime-type = application/x-gzip

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.asc
==
Binary file - no diff available.

Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.asc
--
svn:mime-type = application/pgp-signature

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5
==
--- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5 (added)
+++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5 Mon Feb 
 9 06:29:04 2015
@@ -0,0 +1,2 @@
+spark-1.2.1-bin-hadoop1-scala2.11.tgz: DE F4 A3 77 D3 41 F7 9F  3A 54 2D 7C CA
+   04 0D 88

Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha
==
--- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha (added)
+++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha Mon Feb 
 9 06:29:04 201

svn commit: r7964 - /release/spark/spark-1.1.0/

2015-02-08 Thread pwendell
Author: pwendell
Date: Mon Feb  9 05:50:16 2015
New Revision: 7964

Log:
Removing Spark 1.1.0 release.


Removed:
release/spark/spark-1.1.0/


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols overflow

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 df9b10573 -> fa8ea48f2


SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols 
overflow

Check that size of dense matrix array is not beyond Int.MaxValue in Matrices.* 
methods. jkbradley this should be an easy one. Review and/or merge as you see 
fit.

Author: Sean Owen 

Closes #4461 from srowen/SPARK-4405 and squashes the following commits:

c67574e [Sean Owen] Check that size of dense matrix array is not beyond 
Int.MaxValue in Matrices.* methods

(cherry picked from commit 4396dfb37f433ef186e3e0a09db9906986ec940b)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fa8ea48f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fa8ea48f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fa8ea48f

Branch: refs/heads/branch-1.3
Commit: fa8ea48f2d693b1e9db7a7138c23075748b3c0f5
Parents: df9b105
Author: Sean Owen 
Authored: Sun Feb 8 21:08:50 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 21:08:56 2015 -0800

--
 .../org/apache/spark/mllib/linalg/Matrices.scala  | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/fa8ea48f/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index c8a97b8..89b3867 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -256,8 +256,11 @@ object DenseMatrix {
* @param numCols number of columns of the matrix
* @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): DenseMatrix =
+  def zeros(numRows: Int, numCols: Int): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  }
 
   /**
* Generate a `DenseMatrix` consisting of ones.
@@ -265,8 +268,11 @@ object DenseMatrix {
* @param numCols number of columns of the matrix
* @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): DenseMatrix =
+  def ones(numRows: Int, numCols: Int): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  }
 
   /**
* Generate an Identity Matrix in `DenseMatrix` format.
@@ -291,6 +297,8 @@ object DenseMatrix {
* @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 
1)
*/
   def rand(numRows: Int, numCols: Int, rng: Random): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
   }
 
@@ -302,6 +310,8 @@ object DenseMatrix {
* @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 
1)
*/
   def randn(numRows: Int, numCols: Int, rng: Random): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols overflow

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master c17161189 -> 4396dfb37


SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols 
overflow

Check that size of dense matrix array is not beyond Int.MaxValue in Matrices.* 
methods. jkbradley this should be an easy one. Review and/or merge as you see 
fit.

Author: Sean Owen 

Closes #4461 from srowen/SPARK-4405 and squashes the following commits:

c67574e [Sean Owen] Check that size of dense matrix array is not beyond 
Int.MaxValue in Matrices.* methods


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4396dfb3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4396dfb3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4396dfb3

Branch: refs/heads/master
Commit: 4396dfb37f433ef186e3e0a09db9906986ec940b
Parents: c171611
Author: Sean Owen 
Authored: Sun Feb 8 21:08:50 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 21:08:50 2015 -0800

--
 .../org/apache/spark/mllib/linalg/Matrices.scala  | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/4396dfb3/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index c8a97b8..89b3867 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -256,8 +256,11 @@ object DenseMatrix {
* @param numCols number of columns of the matrix
* @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros
*/
-  def zeros(numRows: Int, numCols: Int): DenseMatrix =
+  def zeros(numRows: Int, numCols: Int): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols))
+  }
 
   /**
* Generate a `DenseMatrix` consisting of ones.
@@ -265,8 +268,11 @@ object DenseMatrix {
* @param numCols number of columns of the matrix
* @return `DenseMatrix` with size `numRows` x `numCols` and values of ones
*/
-  def ones(numRows: Int, numCols: Int): DenseMatrix =
+  def ones(numRows: Int, numCols: Int): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0))
+  }
 
   /**
* Generate an Identity Matrix in `DenseMatrix` format.
@@ -291,6 +297,8 @@ object DenseMatrix {
* @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 
1)
*/
   def rand(numRows: Int, numCols: Int, rng: Random): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextDouble()))
   }
 
@@ -302,6 +310,8 @@ object DenseMatrix {
* @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 
1)
*/
   def randn(numRows: Int, numCols: Int, rng: Random): DenseMatrix = {
+require(numRows.toLong * numCols <= Int.MaxValue,
+s"$numRows x $numCols dense matrix is too large to allocate")
 new DenseMatrix(numRows, numCols, Array.fill(numRows * 
numCols)(rng.nextGaussian()))
   }
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-5660][MLLIB] Make Matrix apply public

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master a052ed425 -> c17161189


[SPARK-5660][MLLIB] Make Matrix apply public

This is #4447 with `override`.

Closes #4447

Author: Joseph K. Bradley 
Author: Xiangrui Meng 

Closes #4462 from mengxr/SPARK-5660 and squashes the following commits:

f82c8d6 [Xiangrui Meng] add override to matrix.apply
91cedde [Joseph K. Bradley] made matrix apply public


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c1716118
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c1716118
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c1716118

Branch: refs/heads/master
Commit: c17161189d57f2e3a8d3550ea59a68edf487c8b7
Parents: a052ed4
Author: Joseph K. Bradley 
Authored: Sun Feb 8 21:07:36 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 21:07:36 2015 -0800

--
 .../main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c1716118/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index 84f8ac2..c8a97b8 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -50,7 +50,7 @@ sealed trait Matrix extends Serializable {
   private[mllib] def toBreeze: BM[Double]
 
   /** Gets the (i, j)-th element. */
-  private[mllib] def apply(i: Int, j: Int): Double
+  def apply(i: Int, j: Int): Double
 
   /** Return the index for the (i, j)-th element in the backing array. */
   private[mllib] def index(i: Int, j: Int): Int
@@ -163,7 +163,7 @@ class DenseMatrix(
 
   private[mllib] def apply(i: Int): Double = values(i)
 
-  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
+  override def apply(i: Int, j: Int): Double = values(index(i, j))
 
   private[mllib] def index(i: Int, j: Int): Int = {
 if (!isTransposed) i + numRows * j else j + numCols * i
@@ -398,7 +398,7 @@ class SparseMatrix(
  }
   }
 
-  private[mllib] def apply(i: Int, j: Int): Double = {
+  override def apply(i: Int, j: Int): Double = {
 val ind = index(i, j)
 if (ind < 0) 0.0 else values(ind)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-5660][MLLIB] Make Matrix apply public

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 e1996aafa -> df9b10573


[SPARK-5660][MLLIB] Make Matrix apply public

This is #4447 with `override`.

Closes #4447

Author: Joseph K. Bradley 
Author: Xiangrui Meng 

Closes #4462 from mengxr/SPARK-5660 and squashes the following commits:

f82c8d6 [Xiangrui Meng] add override to matrix.apply
91cedde [Joseph K. Bradley] made matrix apply public

(cherry picked from commit c17161189d57f2e3a8d3550ea59a68edf487c8b7)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df9b1057
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df9b1057
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df9b1057

Branch: refs/heads/branch-1.3
Commit: df9b1057397b0d34fa8f1882651d29f623c7222e
Parents: e1996aa
Author: Joseph K. Bradley 
Authored: Sun Feb 8 21:07:36 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 21:07:45 2015 -0800

--
 .../main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/df9b1057/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
--
diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
index 84f8ac2..c8a97b8 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala
@@ -50,7 +50,7 @@ sealed trait Matrix extends Serializable {
   private[mllib] def toBreeze: BM[Double]
 
   /** Gets the (i, j)-th element. */
-  private[mllib] def apply(i: Int, j: Int): Double
+  def apply(i: Int, j: Int): Double
 
   /** Return the index for the (i, j)-th element in the backing array. */
   private[mllib] def index(i: Int, j: Int): Int
@@ -163,7 +163,7 @@ class DenseMatrix(
 
   private[mllib] def apply(i: Int): Double = values(i)
 
-  private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j))
+  override def apply(i: Int, j: Int): Double = values(index(i, j))
 
   private[mllib] def index(i: Int, j: Int): Int = {
 if (!isTransposed) i + numRows * j else j + numCols * i
@@ -398,7 +398,7 @@ class SparseMatrix(
  }
   }
 
-  private[mllib] def apply(i: Int, j: Int): Double = {
+  override def apply(i: Int, j: Int): Double = {
 val ind = index(i, j)
 if (ind < 0) 0.0 else values(ind)
   }


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in tabular format.

2015-02-08 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 c515634ef -> e1996aafa


[SPARK-5643][SQL] Add a show method to print the content of a DataFrame in 
tabular format.

An example:
```
year  month AVG('Adj Close) MAX('Adj Close)
1980  120.5032180.595103
1981  010.5232890.570307
1982  020.4365040.475256
1983  030.4105160.442194
1984  040.4500900.483521
```

Author: Reynold Xin 

Closes #4416 from rxin/SPARK-5643 and squashes the following commits:

d0e0d6e [Reynold Xin] [SQL] Minor update to data source and statistics 
documentation.
269da83 [Reynold Xin] Updated isLocal comment.
2cf3c27 [Reynold Xin] Moved logic into optimizer.
1a04d8b [Reynold Xin] [SPARK-5643][SQL] Add a show method to print the content 
of a DataFrame in columnar format.

(cherry picked from commit a052ed42501fee3641348337505b6176426653c4)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e1996aaf
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e1996aaf
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e1996aaf

Branch: refs/heads/branch-1.3
Commit: e1996aafadec95fb365b1ce1b87300441cd272ef
Parents: c515634
Author: Reynold Xin 
Authored: Sun Feb 8 18:56:51 2015 -0800
Committer: Reynold Xin 
Committed: Sun Feb 8 18:57:03 2015 -0800

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 18 ++-
 .../catalyst/plans/logical/LogicalPlan.scala|  7 ++-
 .../optimizer/ConvertToLocalRelationSuite.scala | 57 
 .../scala/org/apache/spark/sql/DataFrame.scala  | 21 +++-
 .../org/apache/spark/sql/DataFrameImpl.scala| 41 --
 .../apache/spark/sql/IncomputableColumn.scala   |  6 ++-
 .../spark/sql/execution/basicOperators.scala|  7 +--
 .../apache/spark/sql/sources/interfaces.scala   | 15 +++---
 8 files changed, 151 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/e1996aaf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 8c8f289..3bc48c9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -50,7 +50,9 @@ object DefaultOptimizer extends Optimizer {
   CombineFilters,
   PushPredicateThroughProject,
   PushPredicateThroughJoin,
-  ColumnPruning) :: Nil
+  ColumnPruning) ::
+Batch("LocalRelation", FixedPoint(100),
+  ConvertToLocalRelation) :: Nil
 }
 
 /**
@@ -610,3 +612,17 @@ object DecimalAggregates extends Rule[LogicalPlan] {
 DecimalType(prec + 4, scale + 4))
   }
 }
+
+/**
+ * Converts local operations (i.e. ones that don't require data exchange) on 
LocalRelation to
+ * another LocalRelation.
+ *
+ * This is relatively simple as it currently handles only a single case: 
Project.
+ */
+object ConvertToLocalRelation extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Project(projectList, LocalRelation(output, data)) =>
+  val projection = new InterpretedProjection(projectList, output)
+  LocalRelation(projectList.map(_.toAttribute), data.map(projection))
+  }
+}

http://git-wip-us.apache.org/repos/asf/spark/blob/e1996aaf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
index 8d30528..7cf4b81 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
@@ -29,12 +29,15 @@ import org.apache.spark.sql.catalyst.trees
 /**
  * Estimates of various statistics.  The default estimation logic simply 
lazily multiplies the
  * corresponding statistic produced by the children.  To override this 
behavior, override
- * `statistics` and assign it an overriden version of `Statistics`.
+ * `statistics` and assign it an overridden version of `Statistics`.
  *
- * '''NOTE''': concrete and/or overriden versions of statistics fields should 
pay attention to the
+ * '''NOTE''': concrete and/or overridden versions of statistics fields should 
pay attention to the
  * performance of the implementations.  The reas

spark git commit: [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in tabular format.

2015-02-08 Thread rxin
Repository: spark
Updated Branches:
  refs/heads/master 56aff4bd6 -> a052ed425


[SPARK-5643][SQL] Add a show method to print the content of a DataFrame in 
tabular format.

An example:
```
year  month AVG('Adj Close) MAX('Adj Close)
1980  120.5032180.595103
1981  010.5232890.570307
1982  020.4365040.475256
1983  030.4105160.442194
1984  040.4500900.483521
```

Author: Reynold Xin 

Closes #4416 from rxin/SPARK-5643 and squashes the following commits:

d0e0d6e [Reynold Xin] [SQL] Minor update to data source and statistics 
documentation.
269da83 [Reynold Xin] Updated isLocal comment.
2cf3c27 [Reynold Xin] Moved logic into optimizer.
1a04d8b [Reynold Xin] [SPARK-5643][SQL] Add a show method to print the content 
of a DataFrame in columnar format.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a052ed42
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a052ed42
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a052ed42

Branch: refs/heads/master
Commit: a052ed42501fee3641348337505b6176426653c4
Parents: 56aff4b
Author: Reynold Xin 
Authored: Sun Feb 8 18:56:51 2015 -0800
Committer: Reynold Xin 
Committed: Sun Feb 8 18:56:51 2015 -0800

--
 .../sql/catalyst/optimizer/Optimizer.scala  | 18 ++-
 .../catalyst/plans/logical/LogicalPlan.scala|  7 ++-
 .../optimizer/ConvertToLocalRelationSuite.scala | 57 
 .../scala/org/apache/spark/sql/DataFrame.scala  | 21 +++-
 .../org/apache/spark/sql/DataFrameImpl.scala| 41 --
 .../apache/spark/sql/IncomputableColumn.scala   |  6 ++-
 .../spark/sql/execution/basicOperators.scala|  7 +--
 .../apache/spark/sql/sources/interfaces.scala   | 15 +++---
 8 files changed, 151 insertions(+), 21 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/a052ed42/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
index 8c8f289..3bc48c9 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
@@ -50,7 +50,9 @@ object DefaultOptimizer extends Optimizer {
   CombineFilters,
   PushPredicateThroughProject,
   PushPredicateThroughJoin,
-  ColumnPruning) :: Nil
+  ColumnPruning) ::
+Batch("LocalRelation", FixedPoint(100),
+  ConvertToLocalRelation) :: Nil
 }
 
 /**
@@ -610,3 +612,17 @@ object DecimalAggregates extends Rule[LogicalPlan] {
 DecimalType(prec + 4, scale + 4))
   }
 }
+
+/**
+ * Converts local operations (i.e. ones that don't require data exchange) on 
LocalRelation to
+ * another LocalRelation.
+ *
+ * This is relatively simple as it currently handles only a single case: 
Project.
+ */
+object ConvertToLocalRelation extends Rule[LogicalPlan] {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case Project(projectList, LocalRelation(output, data)) =>
+  val projection = new InterpretedProjection(projectList, output)
+  LocalRelation(projectList.map(_.toAttribute), data.map(projection))
+  }
+}

http://git-wip-us.apache.org/repos/asf/spark/blob/a052ed42/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
--
diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
index 8d30528..7cf4b81 100644
--- 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
+++ 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala
@@ -29,12 +29,15 @@ import org.apache.spark.sql.catalyst.trees
 /**
  * Estimates of various statistics.  The default estimation logic simply 
lazily multiplies the
  * corresponding statistic produced by the children.  To override this 
behavior, override
- * `statistics` and assign it an overriden version of `Statistics`.
+ * `statistics` and assign it an overridden version of `Statistics`.
  *
- * '''NOTE''': concrete and/or overriden versions of statistics fields should 
pay attention to the
+ * '''NOTE''': concrete and/or overridden versions of statistics fields should 
pay attention to the
  * performance of the implementations.  The reason is that estimations might 
get triggered in
  * performance-critical processes, such as query plan plan

spark git commit: SPARK-5665 [DOCS] Update netlib-java documentation

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 5c299c58f -> 56aff4bd6


SPARK-5665 [DOCS] Update netlib-java documentation

I am the author of netlib-java and I found this documentation to be out of 
date. Some main points:

1. Breeze has not depended on jBLAS for some time
2. netlib-java provides a pure JVM implementation as the fallback (the original 
docs did not appear to be aware of this, claiming that gfortran was necessary)
3. The licensing issue is not just about LGPL: optimised natives have 
proprietary licenses. Building with the LGPL flag turned on really doesn't help 
you get past this.
4. I really think it's best to direct people to my detailed setup guide instead 
of trying to compress it into one sentence. It is different for each 
architecture, each OS, and for each backend.

I hope this helps to clear things up :smile:

Author: Sam Halliday 
Author: Sam Halliday 

Closes #4448 from fommil/patch-1 and squashes the following commits:

18cda11 [Sam Halliday] remove link to skillsmatters at request of @mengxr
a35e4a9 [Sam Halliday] reword netlib-java/breeze docs


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/56aff4bd
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/56aff4bd
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/56aff4bd

Branch: refs/heads/master
Commit: 56aff4bd6c7c9d18f4f962025708f20a4a82dcf0
Parents: 5c299c5
Author: Sam Halliday 
Authored: Sun Feb 8 16:34:26 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 16:34:26 2015 -0800

--
 docs/mllib-guide.md | 41 -
 1 file changed, 24 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/56aff4bd/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 7779fbc..3d32d03 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -56,25 +56,32 @@ See the **[spark.ml programming guide](ml-guide.html)** for 
more information on
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/),
-which depends on [netlib-java](https://github.com/fommil/netlib-java),
-and [jblas](https://github.com/mikiobraun/jblas). 
-`netlib-java` and `jblas` depend on native Fortran routines.
-You need to install the
+MLlib uses the linear algebra package
+[Breeze](http://www.scalanlp.org/), which depends on
+[netlib-java](https://github.com/fommil/netlib-java) for optimised
+numerical processing. If natives are not available at runtime, you
+will see a warning message and a pure JVM implementation will be used
+instead.
+
+To learn more about the benefits and background of system optimised
+natives, you may wish to watch Sam Halliday's ScalaX talk on
+[High Performance Linear Algebra in 
Scala](http://fommil.github.io/scalax14/#/)).
+
+Due to licensing issues with runtime proprietary binaries, we do not
+include `netlib-java`'s native proxies by default. To configure
+`netlib-java` / Breeze to use system optimised binaries, include
+`com.github.fommil.netlib:all:1.1.2` (or build Spark with
+`-Pnetlib-lgpl`) as a dependency of your project and read the
+[netlib-java](https://github.com/fommil/netlib-java) documentation for
+your platform's additional installation instructions.
+
+MLlib also uses [jblas](https://github.com/mikiobraun/jblas) which
+will require you to install the
 [gfortran runtime 
library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries)
 if it is not already present on your nodes.
-MLlib will throw a linking error if it cannot detect these libraries 
automatically.
-Due to license issues, we do not include `netlib-java`'s native libraries in 
MLlib's
-dependency set under default settings.
-If no native library is available at runtime, you will see a warning message.
-To use native libraries from `netlib-java`, please build Spark with 
`-Pnetlib-lgpl` or
-include `com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
-If you want to use optimized BLAS/LAPACK libraries such as
-[OpenBLAS](http://www.openblas.net/), please link its shared libraries to
-`/usr/lib/libblas.so.3` and `/usr/lib/liblapack.so.3`, respectively.
-BLAS/LAPACK libraries on worker nodes should be built without multithreading.
-
-To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 
1.4 or newer.
+
+To use MLlib in Python, you will need [NumPy](http://www.numpy.org)
+version 1.4 or newer.
 
 ---
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: SPARK-5665 [DOCS] Update netlib-java documentation

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 9e4d58fe2 -> c515634ef


SPARK-5665 [DOCS] Update netlib-java documentation

I am the author of netlib-java and I found this documentation to be out of 
date. Some main points:

1. Breeze has not depended on jBLAS for some time
2. netlib-java provides a pure JVM implementation as the fallback (the original 
docs did not appear to be aware of this, claiming that gfortran was necessary)
3. The licensing issue is not just about LGPL: optimised natives have 
proprietary licenses. Building with the LGPL flag turned on really doesn't help 
you get past this.
4. I really think it's best to direct people to my detailed setup guide instead 
of trying to compress it into one sentence. It is different for each 
architecture, each OS, and for each backend.

I hope this helps to clear things up :smile:

Author: Sam Halliday 
Author: Sam Halliday 

Closes #4448 from fommil/patch-1 and squashes the following commits:

18cda11 [Sam Halliday] remove link to skillsmatters at request of @mengxr
a35e4a9 [Sam Halliday] reword netlib-java/breeze docs

(cherry picked from commit 56aff4bd6c7c9d18f4f962025708f20a4a82dcf0)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c515634e
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c515634e
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c515634e

Branch: refs/heads/branch-1.3
Commit: c515634ef178b49cd4f8ce2c5d08a77054be3a55
Parents: 9e4d58f
Author: Sam Halliday 
Authored: Sun Feb 8 16:34:26 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 16:34:34 2015 -0800

--
 docs/mllib-guide.md | 41 -
 1 file changed, 24 insertions(+), 17 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c515634e/docs/mllib-guide.md
--
diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md
index 7779fbc..3d32d03 100644
--- a/docs/mllib-guide.md
+++ b/docs/mllib-guide.md
@@ -56,25 +56,32 @@ See the **[spark.ml programming guide](ml-guide.html)** for 
more information on
 
 # Dependencies
 
-MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/),
-which depends on [netlib-java](https://github.com/fommil/netlib-java),
-and [jblas](https://github.com/mikiobraun/jblas). 
-`netlib-java` and `jblas` depend on native Fortran routines.
-You need to install the
+MLlib uses the linear algebra package
+[Breeze](http://www.scalanlp.org/), which depends on
+[netlib-java](https://github.com/fommil/netlib-java) for optimised
+numerical processing. If natives are not available at runtime, you
+will see a warning message and a pure JVM implementation will be used
+instead.
+
+To learn more about the benefits and background of system optimised
+natives, you may wish to watch Sam Halliday's ScalaX talk on
+[High Performance Linear Algebra in 
Scala](http://fommil.github.io/scalax14/#/)).
+
+Due to licensing issues with runtime proprietary binaries, we do not
+include `netlib-java`'s native proxies by default. To configure
+`netlib-java` / Breeze to use system optimised binaries, include
+`com.github.fommil.netlib:all:1.1.2` (or build Spark with
+`-Pnetlib-lgpl`) as a dependency of your project and read the
+[netlib-java](https://github.com/fommil/netlib-java) documentation for
+your platform's additional installation instructions.
+
+MLlib also uses [jblas](https://github.com/mikiobraun/jblas) which
+will require you to install the
 [gfortran runtime 
library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries)
 if it is not already present on your nodes.
-MLlib will throw a linking error if it cannot detect these libraries 
automatically.
-Due to license issues, we do not include `netlib-java`'s native libraries in 
MLlib's
-dependency set under default settings.
-If no native library is available at runtime, you will see a warning message.
-To use native libraries from `netlib-java`, please build Spark with 
`-Pnetlib-lgpl` or
-include `com.github.fommil.netlib:all:1.1.2` as a dependency of your project.
-If you want to use optimized BLAS/LAPACK libraries such as
-[OpenBLAS](http://www.openblas.net/), please link its shared libraries to
-`/usr/lib/libblas.so.3` and `/usr/lib/liblapack.so.3`, respectively.
-BLAS/LAPACK libraries on worker nodes should be built without multithreading.
-
-To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 
1.4 or newer.
+
+To use MLlib in Python, you will need [NumPy](http://www.numpy.org)
+version 1.4 or newer.
 
 ---
 


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.o

spark git commit: [SPARK-5598][MLLIB] model save/load for ALS

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 42c56b6f1 -> 9e4d58fe2


[SPARK-5598][MLLIB] model save/load for ALS

following #4233. jkbradley

Author: Xiangrui Meng 

Closes #4422 from mengxr/SPARK-5598 and squashes the following commits:

a059394 [Xiangrui Meng] SaveLoad not extending Loader
14b7ea6 [Xiangrui Meng] address comments
f487cb2 [Xiangrui Meng] add unit tests
62fc43c [Xiangrui Meng] implement save/load for MFM

(cherry picked from commit 5c299c58fb9a5434a40be82150d4725bba805adf)
Signed-off-by: Xiangrui Meng 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9e4d58fe
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9e4d58fe
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9e4d58fe

Branch: refs/heads/branch-1.3
Commit: 9e4d58fe27bb3e2aa978a69a73415e23f7fd5de1
Parents: 42c56b6
Author: Xiangrui Meng 
Authored: Sun Feb 8 16:26:20 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 16:26:37 2015 -0800

--
 .../apache/spark/mllib/recommendation/ALS.scala |  2 +-
 .../MatrixFactorizationModel.scala  | 82 +++-
 .../MatrixFactorizationModelSuite.scala | 19 +
 3 files changed, 100 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/9e4d58fe/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index 4bb28d1..caacab9 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.mllib.recommendation
 
 import org.apache.spark.Logging
-import org.apache.spark.annotation.{DeveloperApi, Experimental}
+import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.api.java.JavaRDD
 import org.apache.spark.ml.recommendation.{ALS => NewALS}
 import org.apache.spark.rdd.RDD

http://git-wip-us.apache.org/repos/asf/spark/blob/9e4d58fe/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
index ed2f8b4..9ff06ac 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
@@ -17,13 +17,17 @@
 
 package org.apache.spark.mllib.recommendation
 
+import java.io.IOException
 import java.lang.{Integer => JavaInteger}
 
+import org.apache.hadoop.fs.Path
 import org.jblas.DoubleMatrix
 
-import org.apache.spark.Logging
+import org.apache.spark.{Logging, SparkContext}
 import org.apache.spark.api.java.{JavaPairRDD, JavaRDD}
+import org.apache.spark.mllib.util.{Loader, Saveable}
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{Row, SQLContext}
 import org.apache.spark.storage.StorageLevel
 
 /**
@@ -41,7 +45,8 @@ import org.apache.spark.storage.StorageLevel
 class MatrixFactorizationModel(
 val rank: Int,
 val userFeatures: RDD[(Int, Array[Double])],
-val productFeatures: RDD[(Int, Array[Double])]) extends Serializable with 
Logging {
+val productFeatures: RDD[(Int, Array[Double])])
+  extends Saveable with Serializable with Logging {
 
   require(rank > 0)
   validateFeatures("User", userFeatures)
@@ -125,6 +130,12 @@ class MatrixFactorizationModel(
 recommend(productFeatures.lookup(product).head, userFeatures, num)
   .map(t => Rating(t._1, product, t._2))
 
+  protected override val formatVersion: String = "1.0"
+
+  override def save(sc: SparkContext, path: String): Unit = {
+MatrixFactorizationModel.SaveLoadV1_0.save(this, path)
+  }
+
   private def recommend(
   recommendToFeatures: Array[Double],
   recommendableFeatures: RDD[(Int, Array[Double])],
@@ -136,3 +147,70 @@ class MatrixFactorizationModel(
 scored.top(num)(Ordering.by(_._2))
   }
 }
+
+object MatrixFactorizationModel extends Loader[MatrixFactorizationModel] {
+
+  import org.apache.spark.mllib.util.Loader._
+
+  override def load(sc: SparkContext, path: String): MatrixFactorizationModel 
= {
+val (loadedClassName, formatVersion, metadata) = loadMetadata(sc, path)
+val classNameV1_0 = SaveLoadV1_0.thisClassName
+(loadedClassName, formatVersion) match {
+  case (className, "1.0") if className == classNameV1_0 =>
+SaveLoadV1_0.load(sc, path)
+  case _ =

spark git commit: [SPARK-5598][MLLIB] model save/load for ALS

2015-02-08 Thread meng
Repository: spark
Updated Branches:
  refs/heads/master 804949d51 -> 5c299c58f


[SPARK-5598][MLLIB] model save/load for ALS

following #4233. jkbradley

Author: Xiangrui Meng 

Closes #4422 from mengxr/SPARK-5598 and squashes the following commits:

a059394 [Xiangrui Meng] SaveLoad not extending Loader
14b7ea6 [Xiangrui Meng] address comments
f487cb2 [Xiangrui Meng] add unit tests
62fc43c [Xiangrui Meng] implement save/load for MFM


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5c299c58
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5c299c58
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5c299c58

Branch: refs/heads/master
Commit: 5c299c58fb9a5434a40be82150d4725bba805adf
Parents: 804949d
Author: Xiangrui Meng 
Authored: Sun Feb 8 16:26:20 2015 -0800
Committer: Xiangrui Meng 
Committed: Sun Feb 8 16:26:20 2015 -0800

--
 .../apache/spark/mllib/recommendation/ALS.scala |  2 +-
 .../MatrixFactorizationModel.scala  | 82 +++-
 .../MatrixFactorizationModelSuite.scala | 19 +
 3 files changed, 100 insertions(+), 3 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/5c299c58/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
index 4bb28d1..caacab9 100644
--- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
+++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala
@@ -18,7 +18,7 @@
 package org.apache.spark.mllib.recommendation
 
 import org.apache.spark.Logging
-import org.apache.spark.annotation.{DeveloperApi, Experimental}
+import org.apache.spark.annotation.DeveloperApi
 import org.apache.spark.api.java.JavaRDD
 import org.apache.spark.ml.recommendation.{ALS => NewALS}
 import org.apache.spark.rdd.RDD

http://git-wip-us.apache.org/repos/asf/spark/blob/5c299c58/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
index ed2f8b4..9ff06ac 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala
@@ -17,13 +17,17 @@
 
 package org.apache.spark.mllib.recommendation
 
+import java.io.IOException
 import java.lang.{Integer => JavaInteger}
 
+import org.apache.hadoop.fs.Path
 import org.jblas.DoubleMatrix
 
-import org.apache.spark.Logging
+import org.apache.spark.{Logging, SparkContext}
 import org.apache.spark.api.java.{JavaPairRDD, JavaRDD}
+import org.apache.spark.mllib.util.{Loader, Saveable}
 import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.{Row, SQLContext}
 import org.apache.spark.storage.StorageLevel
 
 /**
@@ -41,7 +45,8 @@ import org.apache.spark.storage.StorageLevel
 class MatrixFactorizationModel(
 val rank: Int,
 val userFeatures: RDD[(Int, Array[Double])],
-val productFeatures: RDD[(Int, Array[Double])]) extends Serializable with 
Logging {
+val productFeatures: RDD[(Int, Array[Double])])
+  extends Saveable with Serializable with Logging {
 
   require(rank > 0)
   validateFeatures("User", userFeatures)
@@ -125,6 +130,12 @@ class MatrixFactorizationModel(
 recommend(productFeatures.lookup(product).head, userFeatures, num)
   .map(t => Rating(t._1, product, t._2))
 
+  protected override val formatVersion: String = "1.0"
+
+  override def save(sc: SparkContext, path: String): Unit = {
+MatrixFactorizationModel.SaveLoadV1_0.save(this, path)
+  }
+
   private def recommend(
   recommendToFeatures: Array[Double],
   recommendableFeatures: RDD[(Int, Array[Double])],
@@ -136,3 +147,70 @@ class MatrixFactorizationModel(
 scored.top(num)(Ordering.by(_._2))
   }
 }
+
+object MatrixFactorizationModel extends Loader[MatrixFactorizationModel] {
+
+  import org.apache.spark.mllib.util.Loader._
+
+  override def load(sc: SparkContext, path: String): MatrixFactorizationModel 
= {
+val (loadedClassName, formatVersion, metadata) = loadMetadata(sc, path)
+val classNameV1_0 = SaveLoadV1_0.thisClassName
+(loadedClassName, formatVersion) match {
+  case (className, "1.0") if className == classNameV1_0 =>
+SaveLoadV1_0.load(sc, path)
+  case _ =>
+throw new IOException("MatrixFactorizationModel.load did not recognize 
model with" +
+  

svn commit: r1658279 - in /spark: robots.txt sitemap.xml

2015-02-08 Thread matei
Author: matei
Date: Sun Feb  8 23:59:49 2015
New Revision: 1658279

URL: http://svn.apache.org/r1658279
Log:
Add robots.txt and sitemap.xml to top-level folder too so they get generated

Added:
spark/robots.txt
spark/sitemap.xml

Added: spark/robots.txt
URL: http://svn.apache.org/viewvc/spark/robots.txt?rev=1658279&view=auto
==
--- spark/robots.txt (added)
+++ spark/robots.txt Sun Feb  8 23:59:49 2015
@@ -0,0 +1 @@
+Sitemap: http://spark.apache.org/sitemap.xml

Added: spark/sitemap.xml
URL: http://svn.apache.org/viewvc/spark/sitemap.xml?rev=1658279&view=auto
==
--- spark/sitemap.xml (added)
+++ spark/sitemap.xml Sun Feb  8 23:59:49 2015
@@ -0,0 +1,1871 @@
+
+http://www.sitemaps.org/schemas/sitemap/0.9";
+  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";
+  xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9
+http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd";>
+
+
+  http://spark.apache.org/
+  2015-01-22T00:27:22+00:00
+  daily
+
+
+  http://spark.apache.org/downloads.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/sql/
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/streaming/
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/mllib/
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/graphx/
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/documentation.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/docs/latest/
+  1.0
+  2014-12-19T00:12:40+00:00
+  weekly
+
+
+  http://spark.apache.org/examples.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/community.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/faq.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/news/spark-summit-east-agenda-posted.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/news/spark-1-2-0-released.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/news/spark-1-1-1-released.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  
http://spark.apache.org/news/registration-open-for-spark-summit-east.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/news/index.html
+  2015-01-22T00:27:22+00:00
+  daily
+
+
+  http://spark.apache.org/docs/latest/spark-standalone.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/latest/ec2-scripts.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/latest/quick-start.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/releases/spark-release-1-2-0.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/docs/latest/building-spark.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/latest/sql-programming-guide.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  
http://spark.apache.org/docs/latest/streaming-programming-guide.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/latest/mllib-guide.html
+  2015-01-15T02:38:52+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/latest/graphx-programming-guide.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/1.2.0/
+  2014-11-24T23:38:52+00:00
+  weekly
+  0.5
+
+
+  http://spark.apache.org/docs/1.1.1/
+  2014-11-24T23:38:52+00:00
+  weekly
+  0.5
+
+
+  http://spark.apache.org/docs/1.0.2/
+  2014-08-06T00:40:54+00:00
+  weekly
+  0.5
+
+
+  http://spark.apache.org/docs/0.9.2/
+  2014-07-23T23:08:20+00:00
+  weekly
+  0.4
+
+
+  http://spark.apache.org/docs/0.8.1/
+  2013-12-19T23:20:24+00:00
+  weekly
+  0.4
+
+
+  http://spark.apache.org/docs/0.7.3/
+  2013-08-24T03:23:13+00:00
+  weekly
+  0.3
+
+
+  http://spark.apache.org/docs/0.6.2/
+  2013-08-24T03:23:13+00:00
+  weekly
+  0.3
+
+
+  http://spark.apache.org/screencasts/1-first-steps-with-spark.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  
http://spark.apache.org/screencasts/2-spark-documentation-overview.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  
http://spark.apache.org/screencasts/3-transformations-and-caching.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  
http://spark.apache.org/screencasts/4-a-standalone-job-in-spark.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/research.html
+  2015-01-22T00:27:22+00:00
+  weekly
+
+
+  http://spark.apache.org/docs/latest/index.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/latest/programming-guide.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+  http://spark.apache.org/docs/latest/bagel-programming-guide.html
+  2014-12-19T00:12:40+00:00
+  weekly
+  1.0
+
+
+ 

svn commit: r1658278 - in /spark: ./ _layouts/ graphx/ mllib/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/ sql/ streaming/

2015-02-08 Thread matei
Author: matei
Date: Sun Feb  8 23:58:24 2015
New Revision: 1658278

URL: http://svn.apache.org/r1658278
Log:
Add meta description tags

Modified:
spark/_layouts/global.html
spark/graphx/index.md
spark/index.md
spark/mllib/index.md
spark/site/community.html
spark/site/documentation.html
spark/site/downloads.html
spark/site/examples.html
spark/site/faq.html
spark/site/graphx/index.html
spark/site/index.html
spark/site/mailing-lists.html
spark/site/mllib/index.html
spark/site/news/amp-camp-2013-registration-ope.html
spark/site/news/announcing-the-first-spark-summit.html
spark/site/news/fourth-spark-screencast-published.html
spark/site/news/index.html
spark/site/news/nsdi-paper.html
spark/site/news/proposals-open-for-spark-summit-east.html
spark/site/news/registration-open-for-spark-summit-east.html
spark/site/news/run-spark-and-shark-on-amazon-emr.html
spark/site/news/spark-0-6-1-and-0-5-2-released.html
spark/site/news/spark-0-6-2-released.html
spark/site/news/spark-0-7-0-released.html
spark/site/news/spark-0-7-2-released.html
spark/site/news/spark-0-7-3-released.html
spark/site/news/spark-0-8-0-released.html
spark/site/news/spark-0-8-1-released.html
spark/site/news/spark-0-9-0-released.html
spark/site/news/spark-0-9-1-released.html
spark/site/news/spark-0-9-2-released.html
spark/site/news/spark-1-0-0-released.html
spark/site/news/spark-1-0-1-released.html
spark/site/news/spark-1-0-2-released.html
spark/site/news/spark-1-1-0-released.html
spark/site/news/spark-1-1-1-released.html
spark/site/news/spark-1-2-0-released.html
spark/site/news/spark-accepted-into-apache-incubator.html
spark/site/news/spark-and-shark-in-the-news.html
spark/site/news/spark-becomes-tlp.html
spark/site/news/spark-featured-in-wired.html
spark/site/news/spark-mailing-lists-moving-to-apache.html
spark/site/news/spark-meetups.html
spark/site/news/spark-screencasts-published.html
spark/site/news/spark-summit-2013-is-a-wrap.html
spark/site/news/spark-summit-2014-videos-posted.html
spark/site/news/spark-summit-agenda-posted.html
spark/site/news/spark-summit-east-agenda-posted.html
spark/site/news/spark-tips-from-quantifind.html
spark/site/news/spark-user-survey-and-powered-by-page.html
spark/site/news/spark-version-0-6-0-released.html
spark/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html
spark/site/news/strata-exercises-now-available-online.html
spark/site/news/submit-talks-to-spark-summit-2014.html
spark/site/news/two-weeks-to-spark-summit-2014.html
spark/site/news/video-from-first-spark-development-meetup.html
spark/site/releases/spark-release-0-3.html
spark/site/releases/spark-release-0-5-0.html
spark/site/releases/spark-release-0-5-1.html
spark/site/releases/spark-release-0-5-2.html
spark/site/releases/spark-release-0-6-0.html
spark/site/releases/spark-release-0-6-1.html
spark/site/releases/spark-release-0-6-2.html
spark/site/releases/spark-release-0-7-0.html
spark/site/releases/spark-release-0-7-2.html
spark/site/releases/spark-release-0-7-3.html
spark/site/releases/spark-release-0-8-0.html
spark/site/releases/spark-release-0-8-1.html
spark/site/releases/spark-release-0-9-0.html
spark/site/releases/spark-release-0-9-1.html
spark/site/releases/spark-release-0-9-2.html
spark/site/releases/spark-release-1-0-0.html
spark/site/releases/spark-release-1-0-1.html
spark/site/releases/spark-release-1-0-2.html
spark/site/releases/spark-release-1-1-0.html
spark/site/releases/spark-release-1-1-1.html
spark/site/releases/spark-release-1-2-0.html
spark/site/research.html
spark/site/screencasts/1-first-steps-with-spark.html
spark/site/screencasts/2-spark-documentation-overview.html
spark/site/screencasts/3-transformations-and-caching.html
spark/site/screencasts/4-a-standalone-job-in-spark.html
spark/site/screencasts/index.html
spark/site/sql/index.html
spark/site/streaming/index.html
spark/sql/index.md
spark/streaming/index.md

Modified: spark/_layouts/global.html
URL: 
http://svn.apache.org/viewvc/spark/_layouts/global.html?rev=1658278&r1=1658277&r2=1658278&view=diff
==
--- spark/_layouts/global.html (original)
+++ spark/_layouts/global.html Sun Feb  8 23:58:24 2015
@@ -16,6 +16,10 @@
 
   {% endif %}
 
+  {% if page.description %}
+
+  {% endif %}
+
   
   
   

Modified: spark/graphx/index.md
URL: 
http://svn.apache.org/viewvc/spark/graphx/index.md?rev=1658278&r1=1658277&r2=1658278&view=diff
==
--- spark/graphx/index.md (original)
+++ spark/graphx/index.md Sun Feb  8 23:58:24 2015
@@ -2,6 +2,7 @@
 layout: global
 type: "page singular"
 title

spark git commit: [SQL] Set sessionState in QueryExecution.

2015-02-08 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/master 75fdccca3 -> 804949d51


[SQL] Set sessionState in QueryExecution.

This PR sets the SessionState in HiveContext's QueryExecution. So, we can make 
sure that SessionState.get can return the SessionState every time.

Author: Yin Huai 

Closes #4445 from yhuai/setSessionState and squashes the following commits:

769c9f1 [Yin Huai] Remove unused import.
439f329 [Yin Huai] Try again.
427a0c9 [Yin Huai] Set SessionState everytime when we create a QueryExecution 
in HiveContext.
a3b7793 [Yin Huai] Set sessionState when dealing with CreateTableAsSelect.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/804949d5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/804949d5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/804949d5

Branch: refs/heads/master
Commit: 804949d519e2caa293a409d84b4e6190c1105444
Parents: 75fdccc
Author: Yin Huai 
Authored: Sun Feb 8 14:55:07 2015 -0800
Committer: Michael Armbrust 
Committed: Sun Feb 8 14:55:07 2015 -0800

--
 .../src/main/scala/org/apache/spark/sql/hive/HiveContext.scala  | 5 +
 1 file changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/804949d5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
index ad37b7d..2c00659 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
@@ -424,6 +424,11 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) 
{
   /** Extends QueryExecution with hive specific features. */
   protected[sql] class QueryExecution(logicalPlan: LogicalPlan)
 extends super.QueryExecution(logicalPlan) {
+// Like what we do in runHive, makes sure the session represented by the
+// `sessionState` field is activated.
+if (SessionState.get() != sessionState) {
+  SessionState.start(sessionState)
+}
 
 /**
  * Returns the result as a hive compatible sequence of strings.  For 
native commands, the


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SQL] Set sessionState in QueryExecution.

2015-02-08 Thread marmbrus
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 bc55e20fd -> 42c56b6f1


[SQL] Set sessionState in QueryExecution.

This PR sets the SessionState in HiveContext's QueryExecution. So, we can make 
sure that SessionState.get can return the SessionState every time.

Author: Yin Huai 

Closes #4445 from yhuai/setSessionState and squashes the following commits:

769c9f1 [Yin Huai] Remove unused import.
439f329 [Yin Huai] Try again.
427a0c9 [Yin Huai] Set SessionState everytime when we create a QueryExecution 
in HiveContext.
a3b7793 [Yin Huai] Set sessionState when dealing with CreateTableAsSelect.

(cherry picked from commit 804949d519e2caa293a409d84b4e6190c1105444)
Signed-off-by: Michael Armbrust 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/42c56b6f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/42c56b6f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/42c56b6f

Branch: refs/heads/branch-1.3
Commit: 42c56b6f1820f258c85d799e1acd8ae51fe5196a
Parents: bc55e20
Author: Yin Huai 
Authored: Sun Feb 8 14:55:07 2015 -0800
Committer: Michael Armbrust 
Committed: Sun Feb 8 14:55:16 2015 -0800

--
 .../src/main/scala/org/apache/spark/sql/hive/HiveContext.scala  | 5 +
 1 file changed, 5 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/42c56b6f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
--
diff --git 
a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala 
b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
index ad37b7d..2c00659 100644
--- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
+++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala
@@ -424,6 +424,11 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) 
{
   /** Extends QueryExecution with hive specific features. */
   protected[sql] class QueryExecution(logicalPlan: LogicalPlan)
 extends super.QueryExecution(logicalPlan) {
+// Like what we do in runHive, makes sure the session represented by the
+// `sessionState` field is activated.
+if (SessionState.get() != sessionState) {
+  SessionState.start(sessionState)
+}
 
 /**
  * Returns the result as a hive compatible sequence of strings.  For 
native commands, the


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai...

2015-02-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 96010faa3 -> bc55e20fd


[SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai...

...ns avro-mapred for

hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.

sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:

Building Spark Project Hive 1.2.0
[INFO] 
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli)  spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Excluding this dependency allows the explicitly listed avro-mapred dependency
to be picked up.

Author: medale 

Closes #4315 from medale/avro-hadoop2 and squashes the following commits:

1ab4fa3 [medale] Merge branch 'master' into avro-hadoop2
9d85e2a [medale] Merge remote-tracking branch 'upstream/master' into 
avro-hadoop2
51b9c2a [medale] [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 
2) contains avro-mapred for hadoop 1 API had been marked as resolved but did 
not work for at least some builds due to version conflicts using 
avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) 
when building for hadoop2.

(cherry picked from commit 75fdccca32972f86a975033d7c4ce576dd79290f)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc55e20f
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc55e20f
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc55e20f

Branch: refs/heads/branch-1.3
Commit: bc55e20fd5df7eb5254df2206ca4f1469750e6c9
Parents: 96010fa
Author: medale 
Authored: Sun Feb 8 10:35:29 2015 +
Committer: Sean Owen 
Committed: Sun Feb 8 10:35:40 2015 +

--
 pom.xml | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/bc55e20f/pom.xml
--
diff --git a/pom.xml b/pom.xml
index e0c796b..f6f176d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -975,6 +975,10 @@
 com.esotericsoftware.kryo
 kryo
   
+  
+org.apache.avro
+avro-mapred
+  
 
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai...

2015-02-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 23a99dabf -> 75fdccca3


[SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai...

...ns avro-mapred for

hadoop 1 API had been marked as resolved but did not work for at least some
builds due to version conflicts using avro-mapred-1.7.5.jar and
avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2.

sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5:

Building Spark Project Hive 1.2.0
[INFO] 
[INFO]
[INFO] --- maven-dependency-plugin:2.4:tree (default-cli)  spark-hive_2.10 ---
[INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0
[INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile
[INFO] |  \- org.apache.avro:avro-mapred:jar:1.7.5:compile
[INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile
[INFO]

Excluding this dependency allows the explicitly listed avro-mapred dependency
to be picked up.

Author: medale 

Closes #4315 from medale/avro-hadoop2 and squashes the following commits:

1ab4fa3 [medale] Merge branch 'master' into avro-hadoop2
9d85e2a [medale] Merge remote-tracking branch 'upstream/master' into 
avro-hadoop2
51b9c2a [medale] [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 
2) contains avro-mapred for hadoop 1 API had been marked as resolved but did 
not work for at least some builds due to version conflicts using 
avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) 
when building for hadoop2.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75fdccca
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75fdccca
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75fdccca

Branch: refs/heads/master
Commit: 75fdccca32972f86a975033d7c4ce576dd79290f
Parents: 23a99da
Author: medale 
Authored: Sun Feb 8 10:35:29 2015 +
Committer: Sean Owen 
Committed: Sun Feb 8 10:35:29 2015 +

--
 pom.xml | 4 
 1 file changed, 4 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/75fdccca/pom.xml
--
diff --git a/pom.xml b/pom.xml
index e0c796b..f6f176d 100644
--- a/pom.xml
+++ b/pom.xml
@@ -975,6 +975,10 @@
 com.esotericsoftware.kryo
 kryo
   
+  
+org.apache.avro
+avro-mapred
+  
 
   
   


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args

2015-02-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/branch-1.3 0f9d76599 -> 96010faa3


[SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args

Spark web UI return `HTTP ERROR 500` when GET arguments is missing.

Author: Kirill A. Korinskiy 

Closes #4239 from catap/ui_500 and squashes the following commits:

520e180 [Kirill A. Korinskiy] [SPARK-5672][Web UI] Return `HTTP ERROR 400` when 
have missing args

(cherry picked from commit 23a99dabf10761b7c8ffc4fddd96bf8b5af13f38)
Signed-off-by: Sean Owen 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96010faa
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96010faa
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96010faa

Branch: refs/heads/branch-1.3
Commit: 96010faa318a822b3a8d6153ca4c1541a384cc34
Parents: 0f9d765
Author: Kirill A. Korinskiy 
Authored: Sun Feb 8 10:31:46 2015 +
Committer: Sean Owen 
Committed: Sun Feb 8 10:31:58 2015 +

--
 .../scala/org/apache/spark/ui/JettyUtils.scala  | 27 
 .../spark/ui/exec/ExecutorThreadDumpPage.scala  |  2 +-
 .../org/apache/spark/ui/jobs/JobPage.scala  |  5 +++-
 .../org/apache/spark/ui/jobs/PoolPage.scala |  2 ++
 .../org/apache/spark/ui/jobs/StagePage.scala| 10 ++--
 .../org/apache/spark/ui/storage/RDDPage.scala   |  5 +++-
 6 files changed, 35 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/96010faa/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index 88fed83..bf4b24e 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -62,17 +62,22 @@ private[spark] object JettyUtils extends Logging {
   securityMgr: SecurityManager): HttpServlet = {
 new HttpServlet {
   override def doGet(request: HttpServletRequest, response: 
HttpServletResponse) {
-if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) {
-  
response.setContentType("%s;charset=utf-8".format(servletParams.contentType))
-  response.setStatus(HttpServletResponse.SC_OK)
-  val result = servletParams.responder(request)
-  response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
-  response.getWriter.println(servletParams.extractFn(result))
-} else {
-  response.setStatus(HttpServletResponse.SC_UNAUTHORIZED)
-  response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
-  response.sendError(HttpServletResponse.SC_UNAUTHORIZED,
-"User is not authorized to access this page.")
+try {
+  if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) {
+
response.setContentType("%s;charset=utf-8".format(servletParams.contentType))
+response.setStatus(HttpServletResponse.SC_OK)
+val result = servletParams.responder(request)
+response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
+response.getWriter.println(servletParams.extractFn(result))
+  } else {
+response.setStatus(HttpServletResponse.SC_UNAUTHORIZED)
+response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
+response.sendError(HttpServletResponse.SC_UNAUTHORIZED,
+  "User is not authorized to access this page.")
+  }
+} catch {
+  case e: IllegalArgumentException =>
+response.sendError(HttpServletResponse.SC_BAD_REQUEST, 
e.getMessage)
 }
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/96010faa/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala 
b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
index c82730f..f0ae95b 100644
--- a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
@@ -43,7 +43,7 @@ private[ui] class ExecutorThreadDumpPage(parent: 
ExecutorsTab) extends WebUIPage
 }
 id
 }.getOrElse {
-  return Text(s"Missing executorId parameter")
+  throw new IllegalArgumentException(s"Missing executorId parameter")
 }
 val time = System.currentTimeMillis()
 val maybeThreadDump = sc.get.getExecutorThreadDump(executorId)

http://git-wip-us.apache.org/repos/asf/spark/blob/96010faa/core/src/main

spark git commit: [SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args

2015-02-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 487831369 -> 23a99dabf


[SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args

Spark web UI return `HTTP ERROR 500` when GET arguments is missing.

Author: Kirill A. Korinskiy 

Closes #4239 from catap/ui_500 and squashes the following commits:

520e180 [Kirill A. Korinskiy] [SPARK-5672][Web UI] Return `HTTP ERROR 400` when 
have missing args


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/23a99dab
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/23a99dab
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/23a99dab

Branch: refs/heads/master
Commit: 23a99dabf10761b7c8ffc4fddd96bf8b5af13f38
Parents: 4878313
Author: Kirill A. Korinskiy 
Authored: Sun Feb 8 10:31:46 2015 +
Committer: Sean Owen 
Committed: Sun Feb 8 10:31:46 2015 +

--
 .../scala/org/apache/spark/ui/JettyUtils.scala  | 27 
 .../spark/ui/exec/ExecutorThreadDumpPage.scala  |  2 +-
 .../org/apache/spark/ui/jobs/JobPage.scala  |  5 +++-
 .../org/apache/spark/ui/jobs/PoolPage.scala |  2 ++
 .../org/apache/spark/ui/jobs/StagePage.scala| 10 ++--
 .../org/apache/spark/ui/storage/RDDPage.scala   |  5 +++-
 6 files changed, 35 insertions(+), 16 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/23a99dab/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
--
diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala 
b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
index 88fed83..bf4b24e 100644
--- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
+++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala
@@ -62,17 +62,22 @@ private[spark] object JettyUtils extends Logging {
   securityMgr: SecurityManager): HttpServlet = {
 new HttpServlet {
   override def doGet(request: HttpServletRequest, response: 
HttpServletResponse) {
-if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) {
-  
response.setContentType("%s;charset=utf-8".format(servletParams.contentType))
-  response.setStatus(HttpServletResponse.SC_OK)
-  val result = servletParams.responder(request)
-  response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
-  response.getWriter.println(servletParams.extractFn(result))
-} else {
-  response.setStatus(HttpServletResponse.SC_UNAUTHORIZED)
-  response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
-  response.sendError(HttpServletResponse.SC_UNAUTHORIZED,
-"User is not authorized to access this page.")
+try {
+  if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) {
+
response.setContentType("%s;charset=utf-8".format(servletParams.contentType))
+response.setStatus(HttpServletResponse.SC_OK)
+val result = servletParams.responder(request)
+response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
+response.getWriter.println(servletParams.extractFn(result))
+  } else {
+response.setStatus(HttpServletResponse.SC_UNAUTHORIZED)
+response.setHeader("Cache-Control", "no-cache, no-store, 
must-revalidate")
+response.sendError(HttpServletResponse.SC_UNAUTHORIZED,
+  "User is not authorized to access this page.")
+  }
+} catch {
+  case e: IllegalArgumentException =>
+response.sendError(HttpServletResponse.SC_BAD_REQUEST, 
e.getMessage)
 }
   }
 }

http://git-wip-us.apache.org/repos/asf/spark/blob/23a99dab/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
--
diff --git 
a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala 
b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
index c82730f..f0ae95b 100644
--- a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
+++ b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala
@@ -43,7 +43,7 @@ private[ui] class ExecutorThreadDumpPage(parent: 
ExecutorsTab) extends WebUIPage
 }
 id
 }.getOrElse {
-  return Text(s"Missing executorId parameter")
+  throw new IllegalArgumentException(s"Missing executorId parameter")
 }
 val time = System.currentTimeMillis()
 val maybeThreadDump = sc.get.getExecutorThreadDump(executorId)

http://git-wip-us.apache.org/repos/asf/spark/blob/23a99dab/core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala
--

spark git commit: [SPARK-5656] Fail gracefully for large values of k and/or n that will ex...

2015-02-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 6fb141e2a -> 487831369


[SPARK-5656] Fail gracefully for large values of k and/or n that will ex...

...ceed max int.

Large values of k and/or n in EigenValueDecomposition.symmetricEigs will result 
in array initialization to a value larger than Integer.MAX_VALUE in the 
following: var v = new Array[Double](n * ncv)

Author: mbittmann 
Author: bittmannm 

Closes #4433 from mbittmann/master and squashes the following commits:

ee56e05 [mbittmann] [SPARK-5656] Combine checks into simple message
e49cbbb [mbittmann] [SPARK-5656] Simply error message
860836b [mbittmann] Array size check updates based on code review
a604816 [bittmannm] [SPARK-5656] Fail gracefully for large values of k and/or n 
that will exceed max int.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48783136
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48783136
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48783136

Branch: refs/heads/master
Commit: 48783136958e76d96f477802805e000ee5da5697
Parents: 6fb141e
Author: mbittmann 
Authored: Sun Feb 8 10:13:29 2015 +
Committer: Sean Owen 
Committed: Sun Feb 8 10:13:29 2015 +

--
 .../org/apache/spark/mllib/linalg/EigenValueDecomposition.scala   | 3 +++
 1 file changed, 3 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/48783136/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala
--
diff --git 
a/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala
 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala
index 3515461..9d6f975 100644
--- 
a/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala
@@ -79,6 +79,9 @@ private[mllib] object EigenValueDecomposition {
 // Mode 1: A*x = lambda*x, A symmetric
 iparam(6) = 1
 
+require(n * ncv.toLong <= Integer.MAX_VALUE && ncv * (ncv.toLong + 8) <= 
Integer.MAX_VALUE,
+  s"k = $k and/or n = $n are too large to compute an eigendecomposition")
+
 var ido = new intW(0)
 var info = new intW(0)
 var resid = new Array[Double](n)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org



spark git commit: [SPARK-5366][EC2] Check the mode of private key

2015-02-08 Thread srowen
Repository: spark
Updated Branches:
  refs/heads/master 5de14cc27 -> 6fb141e2a


[SPARK-5366][EC2] Check the mode of private key

Check the mode of private key file.

Author: liuchang0812 

Closes #4162 from Liuchang0812/ec2-script and squashes the following commits:

fc37355 [liuchang0812] quota file name
01ed464 [liuchang0812] more output
ce2a207 [liuchang0812] pep8
f44efd2 [liuchang0812] move code to real_main
8475a54 [liuchang0812] fix bug
cd61a1a [liuchang0812] import stat
c106cb2 [liuchang0812] fix trivis bug
89c9953 [liuchang0812] more output about checking private key
1177a90 [liuchang0812] remove commet
41188ab [liuchang0812] check the mode of private key


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6fb141e2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6fb141e2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6fb141e2

Branch: refs/heads/master
Commit: 6fb141e2a9e728499f8782310560bfaef7a5ed6c
Parents: 5de14cc
Author: liuchang0812 
Authored: Sun Feb 8 10:08:51 2015 +
Committer: Sean Owen 
Committed: Sun Feb 8 10:08:51 2015 +

--
 ec2/spark_ec2.py | 15 +++
 1 file changed, 15 insertions(+)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/6fb141e2/ec2/spark_ec2.py
--
diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py
index 3f7242a..725b1e4 100755
--- a/ec2/spark_ec2.py
+++ b/ec2/spark_ec2.py
@@ -24,10 +24,12 @@ from __future__ import with_statement
 import hashlib
 import logging
 import os
+import os.path
 import pipes
 import random
 import shutil
 import string
+from stat import S_IRUSR
 import subprocess
 import sys
 import tarfile
@@ -349,6 +351,7 @@ def launch_cluster(conn, opts, cluster_name):
 if opts.identity_file is None:
 print >> stderr, "ERROR: Must provide an identity file (-i) for ssh 
connections."
 sys.exit(1)
+
 if opts.key_pair is None:
 print >> stderr, "ERROR: Must provide a key pair name (-k) to use on 
instances."
 sys.exit(1)
@@ -1007,6 +1010,18 @@ def real_main():
 DeprecationWarning
 )
 
+if opts.identity_file is not None:
+if not os.path.exists(opts.identity_file):
+print >> stderr,\
+"ERROR: The identity file '{f}' doesn't 
exist.".format(f=opts.identity_file)
+sys.exit(1)
+
+file_mode = os.stat(opts.identity_file).st_mode
+if not (file_mode & S_IRUSR) or not oct(file_mode)[-2:] == '00':
+print >> stderr, "ERROR: The identity file must be accessible only 
by you."
+print >> stderr, 'You can fix this with: chmod 400 
"{f}"'.format(f=opts.identity_file)
+sys.exit(1)
+
 if opts.ebs_vol_num > 8:
 print >> stderr, "ebs-vol-num cannot be greater than 8"
 sys.exit(1)


-
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org