spark git commit: [SPARK-5539][MLLIB] LDA guide
Repository: spark Updated Branches: refs/heads/branch-1.3 955f2863e -> 5782ee29e [SPARK-5539][MLLIB] LDA guide This is the LDA user guide from jkbradley with Java and Scala code example. Author: Xiangrui Meng Author: Joseph K. Bradley Closes #4465 from mengxr/lda-guide and squashes the following commits: 6dcb7d1 [Xiangrui Meng] update java example in the user guide 76169ff [Xiangrui Meng] update java example 36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into lda-guide c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example (which is in the guide and probably should be removed). (cherry picked from commit 855d12ac0a9cdade4cd2cc64c4e7209478be6690) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5782ee29 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5782ee29 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5782ee29 Branch: refs/heads/branch-1.3 Commit: 5782ee29eb273b1f87a07fd624bbf228d2597b98 Parents: 955f286 Author: Xiangrui Meng Authored: Sun Feb 8 23:40:36 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 23:40:44 2015 -0800 -- data/mllib/sample_lda_data.txt | 12 ++ docs/mllib-clustering.md| 129 ++- .../spark/examples/mllib/JavaLDAExample.java| 75 +++ 3 files changed, 215 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5782ee29/data/mllib/sample_lda_data.txt -- diff --git a/data/mllib/sample_lda_data.txt b/data/mllib/sample_lda_data.txt new file mode 100644 index 000..2e76702 --- /dev/null +++ b/data/mllib/sample_lda_data.txt @@ -0,0 +1,12 @@ +1 2 6 0 2 3 1 1 0 0 3 +1 3 0 1 3 0 0 2 0 0 1 +1 4 1 0 0 4 9 0 1 2 0 +2 1 0 3 0 0 5 0 2 3 9 +3 1 1 9 3 0 2 0 0 1 3 +4 2 0 3 4 5 1 1 1 4 0 +2 1 0 3 0 0 5 0 2 2 9 +1 1 1 9 2 1 2 0 0 1 3 +4 4 0 3 4 2 1 3 0 0 0 +2 8 2 0 3 0 2 0 2 7 2 +1 1 1 9 0 2 2 0 0 3 3 +4 1 0 0 4 5 1 3 0 1 0 http://git-wip-us.apache.org/repos/asf/spark/blob/5782ee29/docs/mllib-clustering.md -- diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index 1e9ef34..99ed6b6 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -55,7 +55,7 @@ has the following parameters: Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values. Internally the algorithm: -* accepts a [Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph) that represents a normalized pairwise affinity between all input points. +* accepts a [Graph](api/graphx/index.html#org.apache.spark.graphx.Graph) that represents a normalized pairwise affinity between all input points. * calculates the principal eigenvalue and eigenvector * Clusters each of the input points according to their principal eigenvector component value @@ -71,6 +71,35 @@ Example outputs for a dataset inspired by the paper - but with five clusters ins +### Latent Dirichlet Allocation (LDA) + +[Latent Dirichlet Allocation (LDA)](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) +is a topic model which infers topics from a collection of text documents. +LDA can be thought of as a clustering algorithm as follows: + +* Topics correspond to cluster centers, and documents correspond to examples (rows) in a dataset. +* Topics and documents both exist in a feature space, where feature vectors are vectors of word counts. +* Rather than estimating a clustering using a traditional distance, LDA uses a function based + on a statistical model of how text documents are generated. + +LDA takes in a collection of documents as vectors of word counts. +It learns clustering using [expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) +on the likelihood function. After fitting on the documents, LDA provides: + +* Topics: Inferred topics, each of which is a probability distribution over terms (words). +* Topic distributions for documents: For each document in the training set, LDA gives a probability distribution over topics. + +LDA takes the following parameters: + +* `k`: Number of topics (i.e., cluster centers) +* `maxIterations`: Limit on the number of iterations of EM used for learning +* `docConcentration`: Hyperparameter for prior over documents' distributions over topics. Currently must be > 1, where larger values encourage smoother inferred distributions. +* `topicConcentration`: Hyperparameter for prior over topics' distributions over terms (words). Currently must be > 1, where larger values encourage
spark git commit: [SPARK-5539][MLLIB] LDA guide
Repository: spark Updated Branches: refs/heads/master 4575c5643 -> 855d12ac0 [SPARK-5539][MLLIB] LDA guide This is the LDA user guide from jkbradley with Java and Scala code example. Author: Xiangrui Meng Author: Joseph K. Bradley Closes #4465 from mengxr/lda-guide and squashes the following commits: 6dcb7d1 [Xiangrui Meng] update java example in the user guide 76169ff [Xiangrui Meng] update java example 36c3ae2 [Xiangrui Meng] Merge remote-tracking branch 'apache/master' into lda-guide c2a1efe [Joseph K. Bradley] Added LDA programming guide, plus Java example (which is in the guide and probably should be removed). Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/855d12ac Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/855d12ac Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/855d12ac Branch: refs/heads/master Commit: 855d12ac0a9cdade4cd2cc64c4e7209478be6690 Parents: 4575c56 Author: Xiangrui Meng Authored: Sun Feb 8 23:40:36 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 23:40:36 2015 -0800 -- data/mllib/sample_lda_data.txt | 12 ++ docs/mllib-clustering.md| 129 ++- .../spark/examples/mllib/JavaLDAExample.java| 75 +++ 3 files changed, 215 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/855d12ac/data/mllib/sample_lda_data.txt -- diff --git a/data/mllib/sample_lda_data.txt b/data/mllib/sample_lda_data.txt new file mode 100644 index 000..2e76702 --- /dev/null +++ b/data/mllib/sample_lda_data.txt @@ -0,0 +1,12 @@ +1 2 6 0 2 3 1 1 0 0 3 +1 3 0 1 3 0 0 2 0 0 1 +1 4 1 0 0 4 9 0 1 2 0 +2 1 0 3 0 0 5 0 2 3 9 +3 1 1 9 3 0 2 0 0 1 3 +4 2 0 3 4 5 1 1 1 4 0 +2 1 0 3 0 0 5 0 2 2 9 +1 1 1 9 2 1 2 0 0 1 3 +4 4 0 3 4 2 1 3 0 0 0 +2 8 2 0 3 0 2 0 2 7 2 +1 1 1 9 0 2 2 0 0 3 3 +4 1 0 0 4 5 1 3 0 1 0 http://git-wip-us.apache.org/repos/asf/spark/blob/855d12ac/docs/mllib-clustering.md -- diff --git a/docs/mllib-clustering.md b/docs/mllib-clustering.md index 1e9ef34..99ed6b6 100644 --- a/docs/mllib-clustering.md +++ b/docs/mllib-clustering.md @@ -55,7 +55,7 @@ has the following parameters: Power iteration clustering is a scalable and efficient algorithm for clustering points given pointwise mutual affinity values. Internally the algorithm: -* accepts a [Graph](https://spark.apache.org/docs/0.9.2/api/graphx/index.html#org.apache.spark.graphx.Graph) that represents a normalized pairwise affinity between all input points. +* accepts a [Graph](api/graphx/index.html#org.apache.spark.graphx.Graph) that represents a normalized pairwise affinity between all input points. * calculates the principal eigenvalue and eigenvector * Clusters each of the input points according to their principal eigenvector component value @@ -71,6 +71,35 @@ Example outputs for a dataset inspired by the paper - but with five clusters ins +### Latent Dirichlet Allocation (LDA) + +[Latent Dirichlet Allocation (LDA)](http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation) +is a topic model which infers topics from a collection of text documents. +LDA can be thought of as a clustering algorithm as follows: + +* Topics correspond to cluster centers, and documents correspond to examples (rows) in a dataset. +* Topics and documents both exist in a feature space, where feature vectors are vectors of word counts. +* Rather than estimating a clustering using a traditional distance, LDA uses a function based + on a statistical model of how text documents are generated. + +LDA takes in a collection of documents as vectors of word counts. +It learns clustering using [expectation-maximization](http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm) +on the likelihood function. After fitting on the documents, LDA provides: + +* Topics: Inferred topics, each of which is a probability distribution over terms (words). +* Topic distributions for documents: For each document in the training set, LDA gives a probability distribution over topics. + +LDA takes the following parameters: + +* `k`: Number of topics (i.e., cluster centers) +* `maxIterations`: Limit on the number of iterations of EM used for learning +* `docConcentration`: Hyperparameter for prior over documents' distributions over topics. Currently must be > 1, where larger values encourage smoother inferred distributions. +* `topicConcentration`: Hyperparameter for prior over topics' distributions over terms (words). Currently must be > 1, where larger values encourage smoother inferred distributions. +* `checkpointInterval`: If using checkpointing (set in the Spark configu
spark git commit: [SPARK-5472][SQL] Fix Scala code style
Repository: spark Updated Branches: refs/heads/master 4396dfb37 -> 4575c5643 [SPARK-5472][SQL] Fix Scala code style Fix Scala code style. Author: Hung Lin Closes #4464 from hunglin/SPARK-5472 and squashes the following commits: ef7a3b3 [Hung Lin] SPARK-5472: fix scala style Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4575c564 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4575c564 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4575c564 Branch: refs/heads/master Commit: 4575c5643a82818bf64f9648314bdc2fdc12febb Parents: 4396dfb Author: Hung Lin Authored: Sun Feb 8 22:36:42 2015 -0800 Committer: Reynold Xin Committed: Sun Feb 8 22:36:42 2015 -0800 -- .../org/apache/spark/sql/jdbc/JDBCRDD.scala | 42 ++-- .../apache/spark/sql/jdbc/JDBCRelation.scala| 35 +--- 2 files changed, 41 insertions(+), 36 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4575c564/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala index a2f9467..0bec32c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala @@ -17,13 +17,10 @@ package org.apache.spark.sql.jdbc -import java.sql.{Connection, DatabaseMetaData, DriverManager, ResultSet, ResultSetMetaData, SQLException} -import scala.collection.mutable.ArrayBuffer +import java.sql.{Connection, DriverManager, ResultSet, ResultSetMetaData, SQLException} import org.apache.spark.{Logging, Partition, SparkContext, TaskContext} import org.apache.spark.rdd.RDD -import org.apache.spark.util.NextIterator -import org.apache.spark.sql.catalyst.analysis.HiveTypeCoercion import org.apache.spark.sql.catalyst.expressions.{Row, SpecificMutableRow} import org.apache.spark.sql.types._ import org.apache.spark.sql.sources._ @@ -100,7 +97,7 @@ private[sql] object JDBCRDD extends Logging { try { val rsmd = rs.getMetaData val ncols = rsmd.getColumnCount -var fields = new Array[StructField](ncols); +val fields = new Array[StructField](ncols) var i = 0 while (i < ncols) { val columnName = rsmd.getColumnName(i + 1) @@ -176,23 +173,27 @@ private[sql] object JDBCRDD extends Logging { * * @return An RDD representing "SELECT requiredColumns FROM fqTable". */ - def scanTable(sc: SparkContext, -schema: StructType, -driver: String, -url: String, -fqTable: String, -requiredColumns: Array[String], -filters: Array[Filter], -parts: Array[Partition]): RDD[Row] = { + def scanTable( + sc: SparkContext, + schema: StructType, + driver: String, + url: String, + fqTable: String, + requiredColumns: Array[String], + filters: Array[Filter], + parts: Array[Partition]): RDD[Row] = { + val prunedSchema = pruneSchema(schema, requiredColumns) -return new JDBCRDD(sc, -getConnector(driver, url), -prunedSchema, -fqTable, -requiredColumns, -filters, -parts) +return new +JDBCRDD( + sc, + getConnector(driver, url), + prunedSchema, + fqTable, + requiredColumns, + filters, + parts) } } @@ -412,6 +413,5 @@ private[sql] class JDBCRDD( gotNext = false nextValue } - } } http://git-wip-us.apache.org/repos/asf/spark/blob/4575c564/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala index e09125e..66ad38e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala @@ -96,7 +96,8 @@ private[sql] class DefaultSource extends RelationProvider { if (driver != null) Class.forName(driver) -if ( partitionColumn != null +if ( + partitionColumn != null && (lowerBound == null || upperBound == null || numPartitions == null)) { sys.error("Partitioning incompletely specified") } @@ -104,30 +105,34 @@ private[sql] class DefaultSource extends RelationProvider { val partitionInfo = if (partitionColumn == null) { null } else { - JDBCPartitioningInfo(partitionCol
spark git commit: [SPARK-5472][SQL] Fix Scala code style
Repository: spark Updated Branches: refs/heads/branch-1.3 fa8ea48f2 -> 955f2863e [SPARK-5472][SQL] Fix Scala code style Fix Scala code style. Author: Hung Lin Closes #4464 from hunglin/SPARK-5472 and squashes the following commits: ef7a3b3 [Hung Lin] SPARK-5472: fix scala style (cherry picked from commit 4575c5643a82818bf64f9648314bdc2fdc12febb) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/955f2863 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/955f2863 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/955f2863 Branch: refs/heads/branch-1.3 Commit: 955f2863e39a96c0b00ad7d3eac972bb1cfcb594 Parents: fa8ea48 Author: Hung Lin Authored: Sun Feb 8 22:36:42 2015 -0800 Committer: Reynold Xin Committed: Sun Feb 8 22:36:51 2015 -0800 -- .../org/apache/spark/sql/jdbc/JDBCRDD.scala | 42 ++-- .../apache/spark/sql/jdbc/JDBCRelation.scala| 35 +--- 2 files changed, 41 insertions(+), 36 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/955f2863/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala index a2f9467..0bec32c 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRDD.scala @@ -17,13 +17,10 @@ package org.apache.spark.sql.jdbc -import java.sql.{Connection, DatabaseMetaData, DriverManager, ResultSet, ResultSetMetaData, SQLException} -import scala.collection.mutable.ArrayBuffer +import java.sql.{Connection, DriverManager, ResultSet, ResultSetMetaData, SQLException} import org.apache.spark.{Logging, Partition, SparkContext, TaskContext} import org.apache.spark.rdd.RDD -import org.apache.spark.util.NextIterator -import org.apache.spark.sql.catalyst.analysis.HiveTypeCoercion import org.apache.spark.sql.catalyst.expressions.{Row, SpecificMutableRow} import org.apache.spark.sql.types._ import org.apache.spark.sql.sources._ @@ -100,7 +97,7 @@ private[sql] object JDBCRDD extends Logging { try { val rsmd = rs.getMetaData val ncols = rsmd.getColumnCount -var fields = new Array[StructField](ncols); +val fields = new Array[StructField](ncols) var i = 0 while (i < ncols) { val columnName = rsmd.getColumnName(i + 1) @@ -176,23 +173,27 @@ private[sql] object JDBCRDD extends Logging { * * @return An RDD representing "SELECT requiredColumns FROM fqTable". */ - def scanTable(sc: SparkContext, -schema: StructType, -driver: String, -url: String, -fqTable: String, -requiredColumns: Array[String], -filters: Array[Filter], -parts: Array[Partition]): RDD[Row] = { + def scanTable( + sc: SparkContext, + schema: StructType, + driver: String, + url: String, + fqTable: String, + requiredColumns: Array[String], + filters: Array[Filter], + parts: Array[Partition]): RDD[Row] = { + val prunedSchema = pruneSchema(schema, requiredColumns) -return new JDBCRDD(sc, -getConnector(driver, url), -prunedSchema, -fqTable, -requiredColumns, -filters, -parts) +return new +JDBCRDD( + sc, + getConnector(driver, url), + prunedSchema, + fqTable, + requiredColumns, + filters, + parts) } } @@ -412,6 +413,5 @@ private[sql] class JDBCRDD( gotNext = false nextValue } - } } http://git-wip-us.apache.org/repos/asf/spark/blob/955f2863/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala -- diff --git a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala index e09125e..66ad38e 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/JDBCRelation.scala @@ -96,7 +96,8 @@ private[sql] class DefaultSource extends RelationProvider { if (driver != null) Class.forName(driver) -if ( partitionColumn != null +if ( + partitionColumn != null && (lowerBound == null || upperBound == null || numPartitions == null)) { sys.error("Partitioning incompletely specified") } @@ -104,30 +105,34 @@ private[sql] class DefaultSource extends RelationProvider { val partiti
svn commit: r7966 - /dev/spark/spark-1.2.1-rc3/ /release/spark/spark-1.2.1/
Author: pwendell Date: Mon Feb 9 06:34:02 2015 New Revision: 7966 Log: Spark release 1.2.1 Added: release/spark/spark-1.2.1/ - copied from r7965, dev/spark/spark-1.2.1-rc3/ Removed: dev/spark/spark-1.2.1-rc3/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
svn commit: r7965 - /dev/spark/spark-1.2.1-rc3/
Author: pwendell Date: Mon Feb 9 06:29:04 2015 New Revision: 7965 Log: Adding Spark 1.2.1 RC3 Added: dev/spark/spark-1.2.1-rc3/ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.asc (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5 dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.asc (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5 dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz.asc (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz.md5 dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1.tgz.sha dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz.asc (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz.md5 dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.3.tgz.sha dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz.asc (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz.md5 dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop2.4.tgz.sha dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz.asc (with props) dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz.md5 dev/spark/spark-1.2.1-rc3/spark-1.2.1.tgz.sha Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz == Binary file - no diff available. Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz -- svn:mime-type = application/x-gzip Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.asc == Binary file - no diff available. Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.asc -- svn:mime-type = application/pgp-signature Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5 == --- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5 (added) +++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.md5 Mon Feb 9 06:29:04 2015 @@ -0,0 +1 @@ +spark-1.2.1-bin-cdh4.tgz: 9C 18 E5 43 F9 32 3C 2A 6A A9 C1 0C 11 F9 05 58 Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha == --- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha (added) +++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-cdh4.tgz.sha Mon Feb 9 06:29:04 2015 @@ -0,0 +1,3 @@ +spark-1.2.1-bin-cdh4.tgz: 208BD991 F14AD9A4 54A26F97 64A3AB8D 290E55B4 D1275E51 + CEAC7E11 F797B55D 2B59BE38 F0186E43 A66B5FFE 281C546D + F7C3511B B1FD8A0A B495E5AC AD207A4F Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz == Binary file - no diff available. Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz -- svn:mime-type = application/x-gzip Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.asc == Binary file - no diff available. Propchange: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.asc -- svn:mime-type = application/pgp-signature Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5 == --- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5 (added) +++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.md5 Mon Feb 9 06:29:04 2015 @@ -0,0 +1,2 @@ +spark-1.2.1-bin-hadoop1-scala2.11.tgz: DE F4 A3 77 D3 41 F7 9F 3A 54 2D 7C CA + 04 0D 88 Added: dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha == --- dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha (added) +++ dev/spark/spark-1.2.1-rc3/spark-1.2.1-bin-hadoop1-scala2.11.tgz.sha Mon Feb 9 06:29:04 201
svn commit: r7964 - /release/spark/spark-1.1.0/
Author: pwendell Date: Mon Feb 9 05:50:16 2015 New Revision: 7964 Log: Removing Spark 1.1.0 release. Removed: release/spark/spark-1.1.0/ - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols overflow
Repository: spark Updated Branches: refs/heads/branch-1.3 df9b10573 -> fa8ea48f2 SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols overflow Check that size of dense matrix array is not beyond Int.MaxValue in Matrices.* methods. jkbradley this should be an easy one. Review and/or merge as you see fit. Author: Sean Owen Closes #4461 from srowen/SPARK-4405 and squashes the following commits: c67574e [Sean Owen] Check that size of dense matrix array is not beyond Int.MaxValue in Matrices.* methods (cherry picked from commit 4396dfb37f433ef186e3e0a09db9906986ec940b) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fa8ea48f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fa8ea48f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/fa8ea48f Branch: refs/heads/branch-1.3 Commit: fa8ea48f2d693b1e9db7a7138c23075748b3c0f5 Parents: df9b105 Author: Sean Owen Authored: Sun Feb 8 21:08:50 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 21:08:56 2015 -0800 -- .../org/apache/spark/mllib/linalg/Matrices.scala | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/fa8ea48f/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala index c8a97b8..89b3867 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala @@ -256,8 +256,11 @@ object DenseMatrix { * @param numCols number of columns of the matrix * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): DenseMatrix = + def zeros(numRows: Int, numCols: Int): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + } /** * Generate a `DenseMatrix` consisting of ones. @@ -265,8 +268,11 @@ object DenseMatrix { * @param numCols number of columns of the matrix * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): DenseMatrix = + def ones(numRows: Int, numCols: Int): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + } /** * Generate an Identity Matrix in `DenseMatrix` format. @@ -291,6 +297,8 @@ object DenseMatrix { * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) */ def rand(numRows: Int, numCols: Int, rng: Random): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) } @@ -302,6 +310,8 @@ object DenseMatrix { * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) */ def randn(numRows: Int, numCols: Int, rng: Random): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols overflow
Repository: spark Updated Branches: refs/heads/master c17161189 -> 4396dfb37 SPARK-4405 [MLLIB] Matrices.* construction methods should check for rows x cols overflow Check that size of dense matrix array is not beyond Int.MaxValue in Matrices.* methods. jkbradley this should be an easy one. Review and/or merge as you see fit. Author: Sean Owen Closes #4461 from srowen/SPARK-4405 and squashes the following commits: c67574e [Sean Owen] Check that size of dense matrix array is not beyond Int.MaxValue in Matrices.* methods Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4396dfb3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/4396dfb3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/4396dfb3 Branch: refs/heads/master Commit: 4396dfb37f433ef186e3e0a09db9906986ec940b Parents: c171611 Author: Sean Owen Authored: Sun Feb 8 21:08:50 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 21:08:50 2015 -0800 -- .../org/apache/spark/mllib/linalg/Matrices.scala | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/4396dfb3/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala index c8a97b8..89b3867 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala @@ -256,8 +256,11 @@ object DenseMatrix { * @param numCols number of columns of the matrix * @return `DenseMatrix` with size `numRows` x `numCols` and values of zeros */ - def zeros(numRows: Int, numCols: Int): DenseMatrix = + def zeros(numRows: Int, numCols: Int): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, new Array[Double](numRows * numCols)) + } /** * Generate a `DenseMatrix` consisting of ones. @@ -265,8 +268,11 @@ object DenseMatrix { * @param numCols number of columns of the matrix * @return `DenseMatrix` with size `numRows` x `numCols` and values of ones */ - def ones(numRows: Int, numCols: Int): DenseMatrix = + def ones(numRows: Int, numCols: Int): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(1.0)) + } /** * Generate an Identity Matrix in `DenseMatrix` format. @@ -291,6 +297,8 @@ object DenseMatrix { * @return `DenseMatrix` with size `numRows` x `numCols` and values in U(0, 1) */ def rand(numRows: Int, numCols: Int, rng: Random): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextDouble())) } @@ -302,6 +310,8 @@ object DenseMatrix { * @return `DenseMatrix` with size `numRows` x `numCols` and values in N(0, 1) */ def randn(numRows: Int, numCols: Int, rng: Random): DenseMatrix = { +require(numRows.toLong * numCols <= Int.MaxValue, +s"$numRows x $numCols dense matrix is too large to allocate") new DenseMatrix(numRows, numCols, Array.fill(numRows * numCols)(rng.nextGaussian())) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5660][MLLIB] Make Matrix apply public
Repository: spark Updated Branches: refs/heads/master a052ed425 -> c17161189 [SPARK-5660][MLLIB] Make Matrix apply public This is #4447 with `override`. Closes #4447 Author: Joseph K. Bradley Author: Xiangrui Meng Closes #4462 from mengxr/SPARK-5660 and squashes the following commits: f82c8d6 [Xiangrui Meng] add override to matrix.apply 91cedde [Joseph K. Bradley] made matrix apply public Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c1716118 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c1716118 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c1716118 Branch: refs/heads/master Commit: c17161189d57f2e3a8d3550ea59a68edf487c8b7 Parents: a052ed4 Author: Joseph K. Bradley Authored: Sun Feb 8 21:07:36 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 21:07:36 2015 -0800 -- .../main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c1716118/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala index 84f8ac2..c8a97b8 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala @@ -50,7 +50,7 @@ sealed trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double + def apply(i: Int, j: Int): Double /** Return the index for the (i, j)-th element in the backing array. */ private[mllib] def index(i: Int, j: Int): Int @@ -163,7 +163,7 @@ class DenseMatrix( private[mllib] def apply(i: Int): Double = values(i) - private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + override def apply(i: Int, j: Int): Double = values(index(i, j)) private[mllib] def index(i: Int, j: Int): Int = { if (!isTransposed) i + numRows * j else j + numCols * i @@ -398,7 +398,7 @@ class SparseMatrix( } } - private[mllib] def apply(i: Int, j: Int): Double = { + override def apply(i: Int, j: Int): Double = { val ind = index(i, j) if (ind < 0) 0.0 else values(ind) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5660][MLLIB] Make Matrix apply public
Repository: spark Updated Branches: refs/heads/branch-1.3 e1996aafa -> df9b10573 [SPARK-5660][MLLIB] Make Matrix apply public This is #4447 with `override`. Closes #4447 Author: Joseph K. Bradley Author: Xiangrui Meng Closes #4462 from mengxr/SPARK-5660 and squashes the following commits: f82c8d6 [Xiangrui Meng] add override to matrix.apply 91cedde [Joseph K. Bradley] made matrix apply public (cherry picked from commit c17161189d57f2e3a8d3550ea59a68edf487c8b7) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/df9b1057 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/df9b1057 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/df9b1057 Branch: refs/heads/branch-1.3 Commit: df9b1057397b0d34fa8f1882651d29f623c7222e Parents: e1996aa Author: Joseph K. Bradley Authored: Sun Feb 8 21:07:36 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 21:07:45 2015 -0800 -- .../main/scala/org/apache/spark/mllib/linalg/Matrices.scala| 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/df9b1057/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala index 84f8ac2..c8a97b8 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/Matrices.scala @@ -50,7 +50,7 @@ sealed trait Matrix extends Serializable { private[mllib] def toBreeze: BM[Double] /** Gets the (i, j)-th element. */ - private[mllib] def apply(i: Int, j: Int): Double + def apply(i: Int, j: Int): Double /** Return the index for the (i, j)-th element in the backing array. */ private[mllib] def index(i: Int, j: Int): Int @@ -163,7 +163,7 @@ class DenseMatrix( private[mllib] def apply(i: Int): Double = values(i) - private[mllib] def apply(i: Int, j: Int): Double = values(index(i, j)) + override def apply(i: Int, j: Int): Double = values(index(i, j)) private[mllib] def index(i: Int, j: Int): Int = { if (!isTransposed) i + numRows * j else j + numCols * i @@ -398,7 +398,7 @@ class SparseMatrix( } } - private[mllib] def apply(i: Int, j: Int): Double = { + override def apply(i: Int, j: Int): Double = { val ind = index(i, j) if (ind < 0) 0.0 else values(ind) } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in tabular format.
Repository: spark Updated Branches: refs/heads/branch-1.3 c515634ef -> e1996aafa [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in tabular format. An example: ``` year month AVG('Adj Close) MAX('Adj Close) 1980 120.5032180.595103 1981 010.5232890.570307 1982 020.4365040.475256 1983 030.4105160.442194 1984 040.4500900.483521 ``` Author: Reynold Xin Closes #4416 from rxin/SPARK-5643 and squashes the following commits: d0e0d6e [Reynold Xin] [SQL] Minor update to data source and statistics documentation. 269da83 [Reynold Xin] Updated isLocal comment. 2cf3c27 [Reynold Xin] Moved logic into optimizer. 1a04d8b [Reynold Xin] [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in columnar format. (cherry picked from commit a052ed42501fee3641348337505b6176426653c4) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e1996aaf Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e1996aaf Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e1996aaf Branch: refs/heads/branch-1.3 Commit: e1996aafadec95fb365b1ce1b87300441cd272ef Parents: c515634 Author: Reynold Xin Authored: Sun Feb 8 18:56:51 2015 -0800 Committer: Reynold Xin Committed: Sun Feb 8 18:57:03 2015 -0800 -- .../sql/catalyst/optimizer/Optimizer.scala | 18 ++- .../catalyst/plans/logical/LogicalPlan.scala| 7 ++- .../optimizer/ConvertToLocalRelationSuite.scala | 57 .../scala/org/apache/spark/sql/DataFrame.scala | 21 +++- .../org/apache/spark/sql/DataFrameImpl.scala| 41 -- .../apache/spark/sql/IncomputableColumn.scala | 6 ++- .../spark/sql/execution/basicOperators.scala| 7 +-- .../apache/spark/sql/sources/interfaces.scala | 15 +++--- 8 files changed, 151 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/e1996aaf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 8c8f289..3bc48c9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -50,7 +50,9 @@ object DefaultOptimizer extends Optimizer { CombineFilters, PushPredicateThroughProject, PushPredicateThroughJoin, - ColumnPruning) :: Nil + ColumnPruning) :: +Batch("LocalRelation", FixedPoint(100), + ConvertToLocalRelation) :: Nil } /** @@ -610,3 +612,17 @@ object DecimalAggregates extends Rule[LogicalPlan] { DecimalType(prec + 4, scale + 4)) } } + +/** + * Converts local operations (i.e. ones that don't require data exchange) on LocalRelation to + * another LocalRelation. + * + * This is relatively simple as it currently handles only a single case: Project. + */ +object ConvertToLocalRelation extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case Project(projectList, LocalRelation(output, data)) => + val projection = new InterpretedProjection(projectList, output) + LocalRelation(projectList.map(_.toAttribute), data.map(projection)) + } +} http://git-wip-us.apache.org/repos/asf/spark/blob/e1996aaf/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index 8d30528..7cf4b81 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -29,12 +29,15 @@ import org.apache.spark.sql.catalyst.trees /** * Estimates of various statistics. The default estimation logic simply lazily multiplies the * corresponding statistic produced by the children. To override this behavior, override - * `statistics` and assign it an overriden version of `Statistics`. + * `statistics` and assign it an overridden version of `Statistics`. * - * '''NOTE''': concrete and/or overriden versions of statistics fields should pay attention to the + * '''NOTE''': concrete and/or overridden versions of statistics fields should pay attention to the * performance of the implementations. The reas
spark git commit: [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in tabular format.
Repository: spark Updated Branches: refs/heads/master 56aff4bd6 -> a052ed425 [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in tabular format. An example: ``` year month AVG('Adj Close) MAX('Adj Close) 1980 120.5032180.595103 1981 010.5232890.570307 1982 020.4365040.475256 1983 030.4105160.442194 1984 040.4500900.483521 ``` Author: Reynold Xin Closes #4416 from rxin/SPARK-5643 and squashes the following commits: d0e0d6e [Reynold Xin] [SQL] Minor update to data source and statistics documentation. 269da83 [Reynold Xin] Updated isLocal comment. 2cf3c27 [Reynold Xin] Moved logic into optimizer. 1a04d8b [Reynold Xin] [SPARK-5643][SQL] Add a show method to print the content of a DataFrame in columnar format. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a052ed42 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/a052ed42 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/a052ed42 Branch: refs/heads/master Commit: a052ed42501fee3641348337505b6176426653c4 Parents: 56aff4b Author: Reynold Xin Authored: Sun Feb 8 18:56:51 2015 -0800 Committer: Reynold Xin Committed: Sun Feb 8 18:56:51 2015 -0800 -- .../sql/catalyst/optimizer/Optimizer.scala | 18 ++- .../catalyst/plans/logical/LogicalPlan.scala| 7 ++- .../optimizer/ConvertToLocalRelationSuite.scala | 57 .../scala/org/apache/spark/sql/DataFrame.scala | 21 +++- .../org/apache/spark/sql/DataFrameImpl.scala| 41 -- .../apache/spark/sql/IncomputableColumn.scala | 6 ++- .../spark/sql/execution/basicOperators.scala| 7 +-- .../apache/spark/sql/sources/interfaces.scala | 15 +++--- 8 files changed, 151 insertions(+), 21 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/a052ed42/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala index 8c8f289..3bc48c9 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala @@ -50,7 +50,9 @@ object DefaultOptimizer extends Optimizer { CombineFilters, PushPredicateThroughProject, PushPredicateThroughJoin, - ColumnPruning) :: Nil + ColumnPruning) :: +Batch("LocalRelation", FixedPoint(100), + ConvertToLocalRelation) :: Nil } /** @@ -610,3 +612,17 @@ object DecimalAggregates extends Rule[LogicalPlan] { DecimalType(prec + 4, scale + 4)) } } + +/** + * Converts local operations (i.e. ones that don't require data exchange) on LocalRelation to + * another LocalRelation. + * + * This is relatively simple as it currently handles only a single case: Project. + */ +object ConvertToLocalRelation extends Rule[LogicalPlan] { + def apply(plan: LogicalPlan): LogicalPlan = plan transform { +case Project(projectList, LocalRelation(output, data)) => + val projection = new InterpretedProjection(projectList, output) + LocalRelation(projectList.map(_.toAttribute), data.map(projection)) + } +} http://git-wip-us.apache.org/repos/asf/spark/blob/a052ed42/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala -- diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala index 8d30528..7cf4b81 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala @@ -29,12 +29,15 @@ import org.apache.spark.sql.catalyst.trees /** * Estimates of various statistics. The default estimation logic simply lazily multiplies the * corresponding statistic produced by the children. To override this behavior, override - * `statistics` and assign it an overriden version of `Statistics`. + * `statistics` and assign it an overridden version of `Statistics`. * - * '''NOTE''': concrete and/or overriden versions of statistics fields should pay attention to the + * '''NOTE''': concrete and/or overridden versions of statistics fields should pay attention to the * performance of the implementations. The reason is that estimations might get triggered in * performance-critical processes, such as query plan plan
spark git commit: SPARK-5665 [DOCS] Update netlib-java documentation
Repository: spark Updated Branches: refs/heads/master 5c299c58f -> 56aff4bd6 SPARK-5665 [DOCS] Update netlib-java documentation I am the author of netlib-java and I found this documentation to be out of date. Some main points: 1. Breeze has not depended on jBLAS for some time 2. netlib-java provides a pure JVM implementation as the fallback (the original docs did not appear to be aware of this, claiming that gfortran was necessary) 3. The licensing issue is not just about LGPL: optimised natives have proprietary licenses. Building with the LGPL flag turned on really doesn't help you get past this. 4. I really think it's best to direct people to my detailed setup guide instead of trying to compress it into one sentence. It is different for each architecture, each OS, and for each backend. I hope this helps to clear things up :smile: Author: Sam Halliday Author: Sam Halliday Closes #4448 from fommil/patch-1 and squashes the following commits: 18cda11 [Sam Halliday] remove link to skillsmatters at request of @mengxr a35e4a9 [Sam Halliday] reword netlib-java/breeze docs Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/56aff4bd Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/56aff4bd Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/56aff4bd Branch: refs/heads/master Commit: 56aff4bd6c7c9d18f4f962025708f20a4a82dcf0 Parents: 5c299c5 Author: Sam Halliday Authored: Sun Feb 8 16:34:26 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 16:34:26 2015 -0800 -- docs/mllib-guide.md | 41 - 1 file changed, 24 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/56aff4bd/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 7779fbc..3d32d03 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -56,25 +56,32 @@ See the **[spark.ml programming guide](ml-guide.html)** for more information on # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), -which depends on [netlib-java](https://github.com/fommil/netlib-java), -and [jblas](https://github.com/mikiobraun/jblas). -`netlib-java` and `jblas` depend on native Fortran routines. -You need to install the +MLlib uses the linear algebra package +[Breeze](http://www.scalanlp.org/), which depends on +[netlib-java](https://github.com/fommil/netlib-java) for optimised +numerical processing. If natives are not available at runtime, you +will see a warning message and a pure JVM implementation will be used +instead. + +To learn more about the benefits and background of system optimised +natives, you may wish to watch Sam Halliday's ScalaX talk on +[High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/)). + +Due to licensing issues with runtime proprietary binaries, we do not +include `netlib-java`'s native proxies by default. To configure +`netlib-java` / Breeze to use system optimised binaries, include +`com.github.fommil.netlib:all:1.1.2` (or build Spark with +`-Pnetlib-lgpl`) as a dependency of your project and read the +[netlib-java](https://github.com/fommil/netlib-java) documentation for +your platform's additional installation instructions. + +MLlib also uses [jblas](https://github.com/mikiobraun/jblas) which +will require you to install the [gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries) if it is not already present on your nodes. -MLlib will throw a linking error if it cannot detect these libraries automatically. -Due to license issues, we do not include `netlib-java`'s native libraries in MLlib's -dependency set under default settings. -If no native library is available at runtime, you will see a warning message. -To use native libraries from `netlib-java`, please build Spark with `-Pnetlib-lgpl` or -include `com.github.fommil.netlib:all:1.1.2` as a dependency of your project. -If you want to use optimized BLAS/LAPACK libraries such as -[OpenBLAS](http://www.openblas.net/), please link its shared libraries to -`/usr/lib/libblas.so.3` and `/usr/lib/liblapack.so.3`, respectively. -BLAS/LAPACK libraries on worker nodes should be built without multithreading. - -To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. + +To use MLlib in Python, you will need [NumPy](http://www.numpy.org) +version 1.4 or newer. --- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: SPARK-5665 [DOCS] Update netlib-java documentation
Repository: spark Updated Branches: refs/heads/branch-1.3 9e4d58fe2 -> c515634ef SPARK-5665 [DOCS] Update netlib-java documentation I am the author of netlib-java and I found this documentation to be out of date. Some main points: 1. Breeze has not depended on jBLAS for some time 2. netlib-java provides a pure JVM implementation as the fallback (the original docs did not appear to be aware of this, claiming that gfortran was necessary) 3. The licensing issue is not just about LGPL: optimised natives have proprietary licenses. Building with the LGPL flag turned on really doesn't help you get past this. 4. I really think it's best to direct people to my detailed setup guide instead of trying to compress it into one sentence. It is different for each architecture, each OS, and for each backend. I hope this helps to clear things up :smile: Author: Sam Halliday Author: Sam Halliday Closes #4448 from fommil/patch-1 and squashes the following commits: 18cda11 [Sam Halliday] remove link to skillsmatters at request of @mengxr a35e4a9 [Sam Halliday] reword netlib-java/breeze docs (cherry picked from commit 56aff4bd6c7c9d18f4f962025708f20a4a82dcf0) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c515634e Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c515634e Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c515634e Branch: refs/heads/branch-1.3 Commit: c515634ef178b49cd4f8ce2c5d08a77054be3a55 Parents: 9e4d58f Author: Sam Halliday Authored: Sun Feb 8 16:34:26 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 16:34:34 2015 -0800 -- docs/mllib-guide.md | 41 - 1 file changed, 24 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c515634e/docs/mllib-guide.md -- diff --git a/docs/mllib-guide.md b/docs/mllib-guide.md index 7779fbc..3d32d03 100644 --- a/docs/mllib-guide.md +++ b/docs/mllib-guide.md @@ -56,25 +56,32 @@ See the **[spark.ml programming guide](ml-guide.html)** for more information on # Dependencies -MLlib uses the linear algebra package [Breeze](http://www.scalanlp.org/), -which depends on [netlib-java](https://github.com/fommil/netlib-java), -and [jblas](https://github.com/mikiobraun/jblas). -`netlib-java` and `jblas` depend on native Fortran routines. -You need to install the +MLlib uses the linear algebra package +[Breeze](http://www.scalanlp.org/), which depends on +[netlib-java](https://github.com/fommil/netlib-java) for optimised +numerical processing. If natives are not available at runtime, you +will see a warning message and a pure JVM implementation will be used +instead. + +To learn more about the benefits and background of system optimised +natives, you may wish to watch Sam Halliday's ScalaX talk on +[High Performance Linear Algebra in Scala](http://fommil.github.io/scalax14/#/)). + +Due to licensing issues with runtime proprietary binaries, we do not +include `netlib-java`'s native proxies by default. To configure +`netlib-java` / Breeze to use system optimised binaries, include +`com.github.fommil.netlib:all:1.1.2` (or build Spark with +`-Pnetlib-lgpl`) as a dependency of your project and read the +[netlib-java](https://github.com/fommil/netlib-java) documentation for +your platform's additional installation instructions. + +MLlib also uses [jblas](https://github.com/mikiobraun/jblas) which +will require you to install the [gfortran runtime library](https://github.com/mikiobraun/jblas/wiki/Missing-Libraries) if it is not already present on your nodes. -MLlib will throw a linking error if it cannot detect these libraries automatically. -Due to license issues, we do not include `netlib-java`'s native libraries in MLlib's -dependency set under default settings. -If no native library is available at runtime, you will see a warning message. -To use native libraries from `netlib-java`, please build Spark with `-Pnetlib-lgpl` or -include `com.github.fommil.netlib:all:1.1.2` as a dependency of your project. -If you want to use optimized BLAS/LAPACK libraries such as -[OpenBLAS](http://www.openblas.net/), please link its shared libraries to -`/usr/lib/libblas.so.3` and `/usr/lib/liblapack.so.3`, respectively. -BLAS/LAPACK libraries on worker nodes should be built without multithreading. - -To use MLlib in Python, you will need [NumPy](http://www.numpy.org) version 1.4 or newer. + +To use MLlib in Python, you will need [NumPy](http://www.numpy.org) +version 1.4 or newer. --- - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.o
spark git commit: [SPARK-5598][MLLIB] model save/load for ALS
Repository: spark Updated Branches: refs/heads/branch-1.3 42c56b6f1 -> 9e4d58fe2 [SPARK-5598][MLLIB] model save/load for ALS following #4233. jkbradley Author: Xiangrui Meng Closes #4422 from mengxr/SPARK-5598 and squashes the following commits: a059394 [Xiangrui Meng] SaveLoad not extending Loader 14b7ea6 [Xiangrui Meng] address comments f487cb2 [Xiangrui Meng] add unit tests 62fc43c [Xiangrui Meng] implement save/load for MFM (cherry picked from commit 5c299c58fb9a5434a40be82150d4725bba805adf) Signed-off-by: Xiangrui Meng Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9e4d58fe Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/9e4d58fe Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/9e4d58fe Branch: refs/heads/branch-1.3 Commit: 9e4d58fe27bb3e2aa978a69a73415e23f7fd5de1 Parents: 42c56b6 Author: Xiangrui Meng Authored: Sun Feb 8 16:26:20 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 16:26:37 2015 -0800 -- .../apache/spark/mllib/recommendation/ALS.scala | 2 +- .../MatrixFactorizationModel.scala | 82 +++- .../MatrixFactorizationModelSuite.scala | 19 + 3 files changed, 100 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/9e4d58fe/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala index 4bb28d1..caacab9 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala @@ -18,7 +18,7 @@ package org.apache.spark.mllib.recommendation import org.apache.spark.Logging -import org.apache.spark.annotation.{DeveloperApi, Experimental} +import org.apache.spark.annotation.DeveloperApi import org.apache.spark.api.java.JavaRDD import org.apache.spark.ml.recommendation.{ALS => NewALS} import org.apache.spark.rdd.RDD http://git-wip-us.apache.org/repos/asf/spark/blob/9e4d58fe/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala index ed2f8b4..9ff06ac 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala @@ -17,13 +17,17 @@ package org.apache.spark.mllib.recommendation +import java.io.IOException import java.lang.{Integer => JavaInteger} +import org.apache.hadoop.fs.Path import org.jblas.DoubleMatrix -import org.apache.spark.Logging +import org.apache.spark.{Logging, SparkContext} import org.apache.spark.api.java.{JavaPairRDD, JavaRDD} +import org.apache.spark.mllib.util.{Loader, Saveable} import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Row, SQLContext} import org.apache.spark.storage.StorageLevel /** @@ -41,7 +45,8 @@ import org.apache.spark.storage.StorageLevel class MatrixFactorizationModel( val rank: Int, val userFeatures: RDD[(Int, Array[Double])], -val productFeatures: RDD[(Int, Array[Double])]) extends Serializable with Logging { +val productFeatures: RDD[(Int, Array[Double])]) + extends Saveable with Serializable with Logging { require(rank > 0) validateFeatures("User", userFeatures) @@ -125,6 +130,12 @@ class MatrixFactorizationModel( recommend(productFeatures.lookup(product).head, userFeatures, num) .map(t => Rating(t._1, product, t._2)) + protected override val formatVersion: String = "1.0" + + override def save(sc: SparkContext, path: String): Unit = { +MatrixFactorizationModel.SaveLoadV1_0.save(this, path) + } + private def recommend( recommendToFeatures: Array[Double], recommendableFeatures: RDD[(Int, Array[Double])], @@ -136,3 +147,70 @@ class MatrixFactorizationModel( scored.top(num)(Ordering.by(_._2)) } } + +object MatrixFactorizationModel extends Loader[MatrixFactorizationModel] { + + import org.apache.spark.mllib.util.Loader._ + + override def load(sc: SparkContext, path: String): MatrixFactorizationModel = { +val (loadedClassName, formatVersion, metadata) = loadMetadata(sc, path) +val classNameV1_0 = SaveLoadV1_0.thisClassName +(loadedClassName, formatVersion) match { + case (className, "1.0") if className == classNameV1_0 => +SaveLoadV1_0.load(sc, path) + case _ =
spark git commit: [SPARK-5598][MLLIB] model save/load for ALS
Repository: spark Updated Branches: refs/heads/master 804949d51 -> 5c299c58f [SPARK-5598][MLLIB] model save/load for ALS following #4233. jkbradley Author: Xiangrui Meng Closes #4422 from mengxr/SPARK-5598 and squashes the following commits: a059394 [Xiangrui Meng] SaveLoad not extending Loader 14b7ea6 [Xiangrui Meng] address comments f487cb2 [Xiangrui Meng] add unit tests 62fc43c [Xiangrui Meng] implement save/load for MFM Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/5c299c58 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/5c299c58 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/5c299c58 Branch: refs/heads/master Commit: 5c299c58fb9a5434a40be82150d4725bba805adf Parents: 804949d Author: Xiangrui Meng Authored: Sun Feb 8 16:26:20 2015 -0800 Committer: Xiangrui Meng Committed: Sun Feb 8 16:26:20 2015 -0800 -- .../apache/spark/mllib/recommendation/ALS.scala | 2 +- .../MatrixFactorizationModel.scala | 82 +++- .../MatrixFactorizationModelSuite.scala | 19 + 3 files changed, 100 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/5c299c58/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala index 4bb28d1..caacab9 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/ALS.scala @@ -18,7 +18,7 @@ package org.apache.spark.mllib.recommendation import org.apache.spark.Logging -import org.apache.spark.annotation.{DeveloperApi, Experimental} +import org.apache.spark.annotation.DeveloperApi import org.apache.spark.api.java.JavaRDD import org.apache.spark.ml.recommendation.{ALS => NewALS} import org.apache.spark.rdd.RDD http://git-wip-us.apache.org/repos/asf/spark/blob/5c299c58/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala index ed2f8b4..9ff06ac 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/recommendation/MatrixFactorizationModel.scala @@ -17,13 +17,17 @@ package org.apache.spark.mllib.recommendation +import java.io.IOException import java.lang.{Integer => JavaInteger} +import org.apache.hadoop.fs.Path import org.jblas.DoubleMatrix -import org.apache.spark.Logging +import org.apache.spark.{Logging, SparkContext} import org.apache.spark.api.java.{JavaPairRDD, JavaRDD} +import org.apache.spark.mllib.util.{Loader, Saveable} import org.apache.spark.rdd.RDD +import org.apache.spark.sql.{Row, SQLContext} import org.apache.spark.storage.StorageLevel /** @@ -41,7 +45,8 @@ import org.apache.spark.storage.StorageLevel class MatrixFactorizationModel( val rank: Int, val userFeatures: RDD[(Int, Array[Double])], -val productFeatures: RDD[(Int, Array[Double])]) extends Serializable with Logging { +val productFeatures: RDD[(Int, Array[Double])]) + extends Saveable with Serializable with Logging { require(rank > 0) validateFeatures("User", userFeatures) @@ -125,6 +130,12 @@ class MatrixFactorizationModel( recommend(productFeatures.lookup(product).head, userFeatures, num) .map(t => Rating(t._1, product, t._2)) + protected override val formatVersion: String = "1.0" + + override def save(sc: SparkContext, path: String): Unit = { +MatrixFactorizationModel.SaveLoadV1_0.save(this, path) + } + private def recommend( recommendToFeatures: Array[Double], recommendableFeatures: RDD[(Int, Array[Double])], @@ -136,3 +147,70 @@ class MatrixFactorizationModel( scored.top(num)(Ordering.by(_._2)) } } + +object MatrixFactorizationModel extends Loader[MatrixFactorizationModel] { + + import org.apache.spark.mllib.util.Loader._ + + override def load(sc: SparkContext, path: String): MatrixFactorizationModel = { +val (loadedClassName, formatVersion, metadata) = loadMetadata(sc, path) +val classNameV1_0 = SaveLoadV1_0.thisClassName +(loadedClassName, formatVersion) match { + case (className, "1.0") if className == classNameV1_0 => +SaveLoadV1_0.load(sc, path) + case _ => +throw new IOException("MatrixFactorizationModel.load did not recognize model with" + +
svn commit: r1658279 - in /spark: robots.txt sitemap.xml
Author: matei Date: Sun Feb 8 23:59:49 2015 New Revision: 1658279 URL: http://svn.apache.org/r1658279 Log: Add robots.txt and sitemap.xml to top-level folder too so they get generated Added: spark/robots.txt spark/sitemap.xml Added: spark/robots.txt URL: http://svn.apache.org/viewvc/spark/robots.txt?rev=1658279&view=auto == --- spark/robots.txt (added) +++ spark/robots.txt Sun Feb 8 23:59:49 2015 @@ -0,0 +1 @@ +Sitemap: http://spark.apache.org/sitemap.xml Added: spark/sitemap.xml URL: http://svn.apache.org/viewvc/spark/sitemap.xml?rev=1658279&view=auto == --- spark/sitemap.xml (added) +++ spark/sitemap.xml Sun Feb 8 23:59:49 2015 @@ -0,0 +1,1871 @@ + +http://www.sitemaps.org/schemas/sitemap/0.9"; + xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; + xsi:schemaLocation="http://www.sitemaps.org/schemas/sitemap/0.9 +http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd";> + + + http://spark.apache.org/ + 2015-01-22T00:27:22+00:00 + daily + + + http://spark.apache.org/downloads.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/sql/ + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/streaming/ + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/mllib/ + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/graphx/ + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/documentation.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/docs/latest/ + 1.0 + 2014-12-19T00:12:40+00:00 + weekly + + + http://spark.apache.org/examples.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/community.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/faq.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/news/spark-summit-east-agenda-posted.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/news/spark-1-2-0-released.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/news/spark-1-1-1-released.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/news/registration-open-for-spark-summit-east.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/news/index.html + 2015-01-22T00:27:22+00:00 + daily + + + http://spark.apache.org/docs/latest/spark-standalone.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/ec2-scripts.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/quick-start.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/releases/spark-release-1-2-0.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/docs/latest/building-spark.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/sql-programming-guide.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/streaming-programming-guide.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/mllib-guide.html + 2015-01-15T02:38:52+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/graphx-programming-guide.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/1.2.0/ + 2014-11-24T23:38:52+00:00 + weekly + 0.5 + + + http://spark.apache.org/docs/1.1.1/ + 2014-11-24T23:38:52+00:00 + weekly + 0.5 + + + http://spark.apache.org/docs/1.0.2/ + 2014-08-06T00:40:54+00:00 + weekly + 0.5 + + + http://spark.apache.org/docs/0.9.2/ + 2014-07-23T23:08:20+00:00 + weekly + 0.4 + + + http://spark.apache.org/docs/0.8.1/ + 2013-12-19T23:20:24+00:00 + weekly + 0.4 + + + http://spark.apache.org/docs/0.7.3/ + 2013-08-24T03:23:13+00:00 + weekly + 0.3 + + + http://spark.apache.org/docs/0.6.2/ + 2013-08-24T03:23:13+00:00 + weekly + 0.3 + + + http://spark.apache.org/screencasts/1-first-steps-with-spark.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/screencasts/2-spark-documentation-overview.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/screencasts/3-transformations-and-caching.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/screencasts/4-a-standalone-job-in-spark.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/research.html + 2015-01-22T00:27:22+00:00 + weekly + + + http://spark.apache.org/docs/latest/index.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/programming-guide.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + + http://spark.apache.org/docs/latest/bagel-programming-guide.html + 2014-12-19T00:12:40+00:00 + weekly + 1.0 + + +
svn commit: r1658278 - in /spark: ./ _layouts/ graphx/ mllib/ site/ site/graphx/ site/mllib/ site/news/ site/releases/ site/screencasts/ site/sql/ site/streaming/ sql/ streaming/
Author: matei Date: Sun Feb 8 23:58:24 2015 New Revision: 1658278 URL: http://svn.apache.org/r1658278 Log: Add meta description tags Modified: spark/_layouts/global.html spark/graphx/index.md spark/index.md spark/mllib/index.md spark/site/community.html spark/site/documentation.html spark/site/downloads.html spark/site/examples.html spark/site/faq.html spark/site/graphx/index.html spark/site/index.html spark/site/mailing-lists.html spark/site/mllib/index.html spark/site/news/amp-camp-2013-registration-ope.html spark/site/news/announcing-the-first-spark-summit.html spark/site/news/fourth-spark-screencast-published.html spark/site/news/index.html spark/site/news/nsdi-paper.html spark/site/news/proposals-open-for-spark-summit-east.html spark/site/news/registration-open-for-spark-summit-east.html spark/site/news/run-spark-and-shark-on-amazon-emr.html spark/site/news/spark-0-6-1-and-0-5-2-released.html spark/site/news/spark-0-6-2-released.html spark/site/news/spark-0-7-0-released.html spark/site/news/spark-0-7-2-released.html spark/site/news/spark-0-7-3-released.html spark/site/news/spark-0-8-0-released.html spark/site/news/spark-0-8-1-released.html spark/site/news/spark-0-9-0-released.html spark/site/news/spark-0-9-1-released.html spark/site/news/spark-0-9-2-released.html spark/site/news/spark-1-0-0-released.html spark/site/news/spark-1-0-1-released.html spark/site/news/spark-1-0-2-released.html spark/site/news/spark-1-1-0-released.html spark/site/news/spark-1-1-1-released.html spark/site/news/spark-1-2-0-released.html spark/site/news/spark-accepted-into-apache-incubator.html spark/site/news/spark-and-shark-in-the-news.html spark/site/news/spark-becomes-tlp.html spark/site/news/spark-featured-in-wired.html spark/site/news/spark-mailing-lists-moving-to-apache.html spark/site/news/spark-meetups.html spark/site/news/spark-screencasts-published.html spark/site/news/spark-summit-2013-is-a-wrap.html spark/site/news/spark-summit-2014-videos-posted.html spark/site/news/spark-summit-agenda-posted.html spark/site/news/spark-summit-east-agenda-posted.html spark/site/news/spark-tips-from-quantifind.html spark/site/news/spark-user-survey-and-powered-by-page.html spark/site/news/spark-version-0-6-0-released.html spark/site/news/spark-wins-daytona-gray-sort-100tb-benchmark.html spark/site/news/strata-exercises-now-available-online.html spark/site/news/submit-talks-to-spark-summit-2014.html spark/site/news/two-weeks-to-spark-summit-2014.html spark/site/news/video-from-first-spark-development-meetup.html spark/site/releases/spark-release-0-3.html spark/site/releases/spark-release-0-5-0.html spark/site/releases/spark-release-0-5-1.html spark/site/releases/spark-release-0-5-2.html spark/site/releases/spark-release-0-6-0.html spark/site/releases/spark-release-0-6-1.html spark/site/releases/spark-release-0-6-2.html spark/site/releases/spark-release-0-7-0.html spark/site/releases/spark-release-0-7-2.html spark/site/releases/spark-release-0-7-3.html spark/site/releases/spark-release-0-8-0.html spark/site/releases/spark-release-0-8-1.html spark/site/releases/spark-release-0-9-0.html spark/site/releases/spark-release-0-9-1.html spark/site/releases/spark-release-0-9-2.html spark/site/releases/spark-release-1-0-0.html spark/site/releases/spark-release-1-0-1.html spark/site/releases/spark-release-1-0-2.html spark/site/releases/spark-release-1-1-0.html spark/site/releases/spark-release-1-1-1.html spark/site/releases/spark-release-1-2-0.html spark/site/research.html spark/site/screencasts/1-first-steps-with-spark.html spark/site/screencasts/2-spark-documentation-overview.html spark/site/screencasts/3-transformations-and-caching.html spark/site/screencasts/4-a-standalone-job-in-spark.html spark/site/screencasts/index.html spark/site/sql/index.html spark/site/streaming/index.html spark/sql/index.md spark/streaming/index.md Modified: spark/_layouts/global.html URL: http://svn.apache.org/viewvc/spark/_layouts/global.html?rev=1658278&r1=1658277&r2=1658278&view=diff == --- spark/_layouts/global.html (original) +++ spark/_layouts/global.html Sun Feb 8 23:58:24 2015 @@ -16,6 +16,10 @@ {% endif %} + {% if page.description %} + + {% endif %} + Modified: spark/graphx/index.md URL: http://svn.apache.org/viewvc/spark/graphx/index.md?rev=1658278&r1=1658277&r2=1658278&view=diff == --- spark/graphx/index.md (original) +++ spark/graphx/index.md Sun Feb 8 23:58:24 2015 @@ -2,6 +2,7 @@ layout: global type: "page singular" title
spark git commit: [SQL] Set sessionState in QueryExecution.
Repository: spark Updated Branches: refs/heads/master 75fdccca3 -> 804949d51 [SQL] Set sessionState in QueryExecution. This PR sets the SessionState in HiveContext's QueryExecution. So, we can make sure that SessionState.get can return the SessionState every time. Author: Yin Huai Closes #4445 from yhuai/setSessionState and squashes the following commits: 769c9f1 [Yin Huai] Remove unused import. 439f329 [Yin Huai] Try again. 427a0c9 [Yin Huai] Set SessionState everytime when we create a QueryExecution in HiveContext. a3b7793 [Yin Huai] Set sessionState when dealing with CreateTableAsSelect. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/804949d5 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/804949d5 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/804949d5 Branch: refs/heads/master Commit: 804949d519e2caa293a409d84b4e6190c1105444 Parents: 75fdccc Author: Yin Huai Authored: Sun Feb 8 14:55:07 2015 -0800 Committer: Michael Armbrust Committed: Sun Feb 8 14:55:07 2015 -0800 -- .../src/main/scala/org/apache/spark/sql/hive/HiveContext.scala | 5 + 1 file changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/804949d5/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala index ad37b7d..2c00659 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala @@ -424,6 +424,11 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) { /** Extends QueryExecution with hive specific features. */ protected[sql] class QueryExecution(logicalPlan: LogicalPlan) extends super.QueryExecution(logicalPlan) { +// Like what we do in runHive, makes sure the session represented by the +// `sessionState` field is activated. +if (SessionState.get() != sessionState) { + SessionState.start(sessionState) +} /** * Returns the result as a hive compatible sequence of strings. For native commands, the - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SQL] Set sessionState in QueryExecution.
Repository: spark Updated Branches: refs/heads/branch-1.3 bc55e20fd -> 42c56b6f1 [SQL] Set sessionState in QueryExecution. This PR sets the SessionState in HiveContext's QueryExecution. So, we can make sure that SessionState.get can return the SessionState every time. Author: Yin Huai Closes #4445 from yhuai/setSessionState and squashes the following commits: 769c9f1 [Yin Huai] Remove unused import. 439f329 [Yin Huai] Try again. 427a0c9 [Yin Huai] Set SessionState everytime when we create a QueryExecution in HiveContext. a3b7793 [Yin Huai] Set sessionState when dealing with CreateTableAsSelect. (cherry picked from commit 804949d519e2caa293a409d84b4e6190c1105444) Signed-off-by: Michael Armbrust Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/42c56b6f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/42c56b6f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/42c56b6f Branch: refs/heads/branch-1.3 Commit: 42c56b6f1820f258c85d799e1acd8ae51fe5196a Parents: bc55e20 Author: Yin Huai Authored: Sun Feb 8 14:55:07 2015 -0800 Committer: Michael Armbrust Committed: Sun Feb 8 14:55:16 2015 -0800 -- .../src/main/scala/org/apache/spark/sql/hive/HiveContext.scala | 5 + 1 file changed, 5 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/42c56b6f/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala -- diff --git a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala index ad37b7d..2c00659 100644 --- a/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala +++ b/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala @@ -424,6 +424,11 @@ class HiveContext(sc: SparkContext) extends SQLContext(sc) { /** Extends QueryExecution with hive specific features. */ protected[sql] class QueryExecution(logicalPlan: LogicalPlan) extends super.QueryExecution(logicalPlan) { +// Like what we do in runHive, makes sure the session represented by the +// `sessionState` field is activated. +if (SessionState.get() != sessionState) { + SessionState.start(sessionState) +} /** * Returns the result as a hive compatible sequence of strings. For native commands, the - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai...
Repository: spark Updated Branches: refs/heads/branch-1.3 96010faa3 -> bc55e20fd [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai... ...ns avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2. sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5: Building Spark Project Hive 1.2.0 [INFO] [INFO] [INFO] --- maven-dependency-plugin:2.4:tree (default-cli) spark-hive_2.10 --- [INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0 [INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile [INFO] | \- org.apache.avro:avro-mapred:jar:1.7.5:compile [INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile [INFO] Excluding this dependency allows the explicitly listed avro-mapred dependency to be picked up. Author: medale Closes #4315 from medale/avro-hadoop2 and squashes the following commits: 1ab4fa3 [medale] Merge branch 'master' into avro-hadoop2 9d85e2a [medale] Merge remote-tracking branch 'upstream/master' into avro-hadoop2 51b9c2a [medale] [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2. (cherry picked from commit 75fdccca32972f86a975033d7c4ce576dd79290f) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bc55e20f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/bc55e20f Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/bc55e20f Branch: refs/heads/branch-1.3 Commit: bc55e20fd5df7eb5254df2206ca4f1469750e6c9 Parents: 96010fa Author: medale Authored: Sun Feb 8 10:35:29 2015 + Committer: Sean Owen Committed: Sun Feb 8 10:35:40 2015 + -- pom.xml | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/bc55e20f/pom.xml -- diff --git a/pom.xml b/pom.xml index e0c796b..f6f176d 100644 --- a/pom.xml +++ b/pom.xml @@ -975,6 +975,10 @@ com.esotericsoftware.kryo kryo + +org.apache.avro +avro-mapred + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai...
Repository: spark Updated Branches: refs/heads/master 23a99dabf -> 75fdccca3 [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contai... ...ns avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2. sql/hive/pom.xml org.spark-project.hive:hive-exec's depends on 1.7.5: Building Spark Project Hive 1.2.0 [INFO] [INFO] [INFO] --- maven-dependency-plugin:2.4:tree (default-cli) spark-hive_2.10 --- [INFO] org.apache.spark:spark-hive_2.10:jar:1.2.0 [INFO] +- org.spark-project.hive:hive-exec:jar:0.13.1a:compile [INFO] | \- org.apache.avro:avro-mapred:jar:1.7.5:compile [INFO] \- org.apache.avro:avro-mapred:jar:hadoop2:1.7.6:compile [INFO] Excluding this dependency allows the explicitly listed avro-mapred dependency to be picked up. Author: medale Closes #4315 from medale/avro-hadoop2 and squashes the following commits: 1ab4fa3 [medale] Merge branch 'master' into avro-hadoop2 9d85e2a [medale] Merge remote-tracking branch 'upstream/master' into avro-hadoop2 51b9c2a [medale] [SPARK-3039] [BUILD] Spark assembly for new hadoop API (hadoop 2) contains avro-mapred for hadoop 1 API had been marked as resolved but did not work for at least some builds due to version conflicts using avro-mapred-1.7.5.jar and avro-mapred-1.7.6-hadoop2.jar (the correct version) when building for hadoop2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/75fdccca Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/75fdccca Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/75fdccca Branch: refs/heads/master Commit: 75fdccca32972f86a975033d7c4ce576dd79290f Parents: 23a99da Author: medale Authored: Sun Feb 8 10:35:29 2015 + Committer: Sean Owen Committed: Sun Feb 8 10:35:29 2015 + -- pom.xml | 4 1 file changed, 4 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/75fdccca/pom.xml -- diff --git a/pom.xml b/pom.xml index e0c796b..f6f176d 100644 --- a/pom.xml +++ b/pom.xml @@ -975,6 +975,10 @@ com.esotericsoftware.kryo kryo + +org.apache.avro +avro-mapred + - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args
Repository: spark Updated Branches: refs/heads/branch-1.3 0f9d76599 -> 96010faa3 [SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args Spark web UI return `HTTP ERROR 500` when GET arguments is missing. Author: Kirill A. Korinskiy Closes #4239 from catap/ui_500 and squashes the following commits: 520e180 [Kirill A. Korinskiy] [SPARK-5672][Web UI] Return `HTTP ERROR 400` when have missing args (cherry picked from commit 23a99dabf10761b7c8ffc4fddd96bf8b5af13f38) Signed-off-by: Sean Owen Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/96010faa Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/96010faa Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/96010faa Branch: refs/heads/branch-1.3 Commit: 96010faa318a822b3a8d6153ca4c1541a384cc34 Parents: 0f9d765 Author: Kirill A. Korinskiy Authored: Sun Feb 8 10:31:46 2015 + Committer: Sean Owen Committed: Sun Feb 8 10:31:58 2015 + -- .../scala/org/apache/spark/ui/JettyUtils.scala | 27 .../spark/ui/exec/ExecutorThreadDumpPage.scala | 2 +- .../org/apache/spark/ui/jobs/JobPage.scala | 5 +++- .../org/apache/spark/ui/jobs/PoolPage.scala | 2 ++ .../org/apache/spark/ui/jobs/StagePage.scala| 10 ++-- .../org/apache/spark/ui/storage/RDDPage.scala | 5 +++- 6 files changed, 35 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/96010faa/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index 88fed83..bf4b24e 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -62,17 +62,22 @@ private[spark] object JettyUtils extends Logging { securityMgr: SecurityManager): HttpServlet = { new HttpServlet { override def doGet(request: HttpServletRequest, response: HttpServletResponse) { -if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) { - response.setContentType("%s;charset=utf-8".format(servletParams.contentType)) - response.setStatus(HttpServletResponse.SC_OK) - val result = servletParams.responder(request) - response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") - response.getWriter.println(servletParams.extractFn(result)) -} else { - response.setStatus(HttpServletResponse.SC_UNAUTHORIZED) - response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") - response.sendError(HttpServletResponse.SC_UNAUTHORIZED, -"User is not authorized to access this page.") +try { + if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) { + response.setContentType("%s;charset=utf-8".format(servletParams.contentType)) +response.setStatus(HttpServletResponse.SC_OK) +val result = servletParams.responder(request) +response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") +response.getWriter.println(servletParams.extractFn(result)) + } else { +response.setStatus(HttpServletResponse.SC_UNAUTHORIZED) +response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") +response.sendError(HttpServletResponse.SC_UNAUTHORIZED, + "User is not authorized to access this page.") + } +} catch { + case e: IllegalArgumentException => +response.sendError(HttpServletResponse.SC_BAD_REQUEST, e.getMessage) } } } http://git-wip-us.apache.org/repos/asf/spark/blob/96010faa/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala index c82730f..f0ae95b 100644 --- a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala +++ b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala @@ -43,7 +43,7 @@ private[ui] class ExecutorThreadDumpPage(parent: ExecutorsTab) extends WebUIPage } id }.getOrElse { - return Text(s"Missing executorId parameter") + throw new IllegalArgumentException(s"Missing executorId parameter") } val time = System.currentTimeMillis() val maybeThreadDump = sc.get.getExecutorThreadDump(executorId) http://git-wip-us.apache.org/repos/asf/spark/blob/96010faa/core/src/main
spark git commit: [SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args
Repository: spark Updated Branches: refs/heads/master 487831369 -> 23a99dabf [SPARK-5672][Web UI] Don't return `ERROR 500` when have missing args Spark web UI return `HTTP ERROR 500` when GET arguments is missing. Author: Kirill A. Korinskiy Closes #4239 from catap/ui_500 and squashes the following commits: 520e180 [Kirill A. Korinskiy] [SPARK-5672][Web UI] Return `HTTP ERROR 400` when have missing args Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/23a99dab Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/23a99dab Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/23a99dab Branch: refs/heads/master Commit: 23a99dabf10761b7c8ffc4fddd96bf8b5af13f38 Parents: 4878313 Author: Kirill A. Korinskiy Authored: Sun Feb 8 10:31:46 2015 + Committer: Sean Owen Committed: Sun Feb 8 10:31:46 2015 + -- .../scala/org/apache/spark/ui/JettyUtils.scala | 27 .../spark/ui/exec/ExecutorThreadDumpPage.scala | 2 +- .../org/apache/spark/ui/jobs/JobPage.scala | 5 +++- .../org/apache/spark/ui/jobs/PoolPage.scala | 2 ++ .../org/apache/spark/ui/jobs/StagePage.scala| 10 ++-- .../org/apache/spark/ui/storage/RDDPage.scala | 5 +++- 6 files changed, 35 insertions(+), 16 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/23a99dab/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala index 88fed83..bf4b24e 100644 --- a/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala +++ b/core/src/main/scala/org/apache/spark/ui/JettyUtils.scala @@ -62,17 +62,22 @@ private[spark] object JettyUtils extends Logging { securityMgr: SecurityManager): HttpServlet = { new HttpServlet { override def doGet(request: HttpServletRequest, response: HttpServletResponse) { -if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) { - response.setContentType("%s;charset=utf-8".format(servletParams.contentType)) - response.setStatus(HttpServletResponse.SC_OK) - val result = servletParams.responder(request) - response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") - response.getWriter.println(servletParams.extractFn(result)) -} else { - response.setStatus(HttpServletResponse.SC_UNAUTHORIZED) - response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") - response.sendError(HttpServletResponse.SC_UNAUTHORIZED, -"User is not authorized to access this page.") +try { + if (securityMgr.checkUIViewPermissions(request.getRemoteUser)) { + response.setContentType("%s;charset=utf-8".format(servletParams.contentType)) +response.setStatus(HttpServletResponse.SC_OK) +val result = servletParams.responder(request) +response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") +response.getWriter.println(servletParams.extractFn(result)) + } else { +response.setStatus(HttpServletResponse.SC_UNAUTHORIZED) +response.setHeader("Cache-Control", "no-cache, no-store, must-revalidate") +response.sendError(HttpServletResponse.SC_UNAUTHORIZED, + "User is not authorized to access this page.") + } +} catch { + case e: IllegalArgumentException => +response.sendError(HttpServletResponse.SC_BAD_REQUEST, e.getMessage) } } } http://git-wip-us.apache.org/repos/asf/spark/blob/23a99dab/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala -- diff --git a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala index c82730f..f0ae95b 100644 --- a/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala +++ b/core/src/main/scala/org/apache/spark/ui/exec/ExecutorThreadDumpPage.scala @@ -43,7 +43,7 @@ private[ui] class ExecutorThreadDumpPage(parent: ExecutorsTab) extends WebUIPage } id }.getOrElse { - return Text(s"Missing executorId parameter") + throw new IllegalArgumentException(s"Missing executorId parameter") } val time = System.currentTimeMillis() val maybeThreadDump = sc.get.getExecutorThreadDump(executorId) http://git-wip-us.apache.org/repos/asf/spark/blob/23a99dab/core/src/main/scala/org/apache/spark/ui/jobs/JobPage.scala --
spark git commit: [SPARK-5656] Fail gracefully for large values of k and/or n that will ex...
Repository: spark Updated Branches: refs/heads/master 6fb141e2a -> 487831369 [SPARK-5656] Fail gracefully for large values of k and/or n that will ex... ...ceed max int. Large values of k and/or n in EigenValueDecomposition.symmetricEigs will result in array initialization to a value larger than Integer.MAX_VALUE in the following: var v = new Array[Double](n * ncv) Author: mbittmann Author: bittmannm Closes #4433 from mbittmann/master and squashes the following commits: ee56e05 [mbittmann] [SPARK-5656] Combine checks into simple message e49cbbb [mbittmann] [SPARK-5656] Simply error message 860836b [mbittmann] Array size check updates based on code review a604816 [bittmannm] [SPARK-5656] Fail gracefully for large values of k and/or n that will exceed max int. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/48783136 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/48783136 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/48783136 Branch: refs/heads/master Commit: 48783136958e76d96f477802805e000ee5da5697 Parents: 6fb141e Author: mbittmann Authored: Sun Feb 8 10:13:29 2015 + Committer: Sean Owen Committed: Sun Feb 8 10:13:29 2015 + -- .../org/apache/spark/mllib/linalg/EigenValueDecomposition.scala | 3 +++ 1 file changed, 3 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/48783136/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala -- diff --git a/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala b/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala index 3515461..9d6f975 100644 --- a/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala +++ b/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala @@ -79,6 +79,9 @@ private[mllib] object EigenValueDecomposition { // Mode 1: A*x = lambda*x, A symmetric iparam(6) = 1 +require(n * ncv.toLong <= Integer.MAX_VALUE && ncv * (ncv.toLong + 8) <= Integer.MAX_VALUE, + s"k = $k and/or n = $n are too large to compute an eigendecomposition") + var ido = new intW(0) var info = new intW(0) var resid = new Array[Double](n) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
spark git commit: [SPARK-5366][EC2] Check the mode of private key
Repository: spark Updated Branches: refs/heads/master 5de14cc27 -> 6fb141e2a [SPARK-5366][EC2] Check the mode of private key Check the mode of private key file. Author: liuchang0812 Closes #4162 from Liuchang0812/ec2-script and squashes the following commits: fc37355 [liuchang0812] quota file name 01ed464 [liuchang0812] more output ce2a207 [liuchang0812] pep8 f44efd2 [liuchang0812] move code to real_main 8475a54 [liuchang0812] fix bug cd61a1a [liuchang0812] import stat c106cb2 [liuchang0812] fix trivis bug 89c9953 [liuchang0812] more output about checking private key 1177a90 [liuchang0812] remove commet 41188ab [liuchang0812] check the mode of private key Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6fb141e2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/6fb141e2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/6fb141e2 Branch: refs/heads/master Commit: 6fb141e2a9e728499f8782310560bfaef7a5ed6c Parents: 5de14cc Author: liuchang0812 Authored: Sun Feb 8 10:08:51 2015 + Committer: Sean Owen Committed: Sun Feb 8 10:08:51 2015 + -- ec2/spark_ec2.py | 15 +++ 1 file changed, 15 insertions(+) -- http://git-wip-us.apache.org/repos/asf/spark/blob/6fb141e2/ec2/spark_ec2.py -- diff --git a/ec2/spark_ec2.py b/ec2/spark_ec2.py index 3f7242a..725b1e4 100755 --- a/ec2/spark_ec2.py +++ b/ec2/spark_ec2.py @@ -24,10 +24,12 @@ from __future__ import with_statement import hashlib import logging import os +import os.path import pipes import random import shutil import string +from stat import S_IRUSR import subprocess import sys import tarfile @@ -349,6 +351,7 @@ def launch_cluster(conn, opts, cluster_name): if opts.identity_file is None: print >> stderr, "ERROR: Must provide an identity file (-i) for ssh connections." sys.exit(1) + if opts.key_pair is None: print >> stderr, "ERROR: Must provide a key pair name (-k) to use on instances." sys.exit(1) @@ -1007,6 +1010,18 @@ def real_main(): DeprecationWarning ) +if opts.identity_file is not None: +if not os.path.exists(opts.identity_file): +print >> stderr,\ +"ERROR: The identity file '{f}' doesn't exist.".format(f=opts.identity_file) +sys.exit(1) + +file_mode = os.stat(opts.identity_file).st_mode +if not (file_mode & S_IRUSR) or not oct(file_mode)[-2:] == '00': +print >> stderr, "ERROR: The identity file must be accessible only by you." +print >> stderr, 'You can fix this with: chmod 400 "{f}"'.format(f=opts.identity_file) +sys.exit(1) + if opts.ebs_vol_num > 8: print >> stderr, "ebs-vol-num cannot be greater than 8" sys.exit(1) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org