Git Push Summary

2014-11-03 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 [created] 76386e1a2 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

Git Push Summary

2014-11-03 Thread pwendell
Repository: spark Updated Tags: refs/tags/v1.1.0-rc4 [deleted] 5918ea4c9 - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org

git commit: [EC2] Factor out Mesos spark-ec2 branch

2014-11-03 Thread shivaram
Repository: spark Updated Branches: refs/heads/master 76386e1a2 - 2aca97c7c [EC2] Factor out Mesos spark-ec2 branch We reference a specific branch in two places. This patch makes it one place. Author: Nicholas Chammas nicholas.cham...@gmail.com Closes #3008 from

git commit: [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/master 2aca97c7c - 3cca19622 [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1. ~~~ In [14]: import random In

git commit: [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 76386e1a2 - a68321400 [SPARK-4148][PySpark] fix seed distribution and add some tests for rdd.sample The current way of seed distribution makes the random sequences from partition i and i+1 offset by 1. ~~~ In [14]: import random In

git commit: [SPARK-4211][Build] Fixes hive.version in Maven profile hive-0.13.1

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 a68321400 - fc782896b [SPARK-4211][Build] Fixes hive.version in Maven profile hive-0.13.1 instead of `hive.version=0.13.1`. e.g. mvn -Phive -Phive=0.13.1 Note: `hive.version=0.13.1a` is the default property value. However, when

git commit: [SPARK-4211][Build] Fixes hive.version in Maven profile hive-0.13.1

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 3cca19622 - df607da02 [SPARK-4211][Build] Fixes hive.version in Maven profile hive-0.13.1 instead of `hive.version=0.13.1`. e.g. mvn -Phive -Phive=0.13.1 Note: `hive.version=0.13.1a` is the default property value. However, when

git commit: [SPARK-4207][SQL] Query which has syntax like 'not like' is not working in Spark SQL

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 fc782896b - 292da4ef2 [SPARK-4207][SQL] Query which has syntax like 'not like' is not working in Spark SQL Queries which has 'not like' is not working spark sql. sql(SELECT * FROM records where value not like 'val%') same query

git commit: [SPARK-4207][SQL] Query which has syntax like 'not like' is not working in Spark SQL

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master df607da02 - 2b6e1ce6e [SPARK-4207][SQL] Query which has syntax like 'not like' is not working in Spark SQL Queries which has 'not like' is not working spark sql. sql(SELECT * FROM records where value not like 'val%') same query works in

git commit: [SPARK-3594] [PySpark] [SQL] take more rows to infer schema or sampling

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 292da4ef2 - cc5dc4247 [SPARK-3594] [PySpark] [SQL] take more rows to infer schema or sampling This patch will try to infer schema for RDD which has empty value (None, [], {}) in the first row. It will try first 100 rows and merge the

git commit: [SPARK-3594] [PySpark] [SQL] take more rows to infer schema or sampling

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 2b6e1ce6e - 24544fbce [SPARK-3594] [PySpark] [SQL] take more rows to infer schema or sampling This patch will try to infer schema for RDD which has empty value (None, [], {}) in the first row. It will try first 100 rows and merge the

git commit: [SPARK-4202][SQL] Simple DSL support for Scala UDF

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 24544fbce - c238fb423 [SPARK-4202][SQL] Simple DSL support for Scala UDF This feature is based on an offline discussion with mengxr, hopefully can be useful for the new MLlib pipeline API. For the following test snippet ```scala case

git commit: [SPARK-4202][SQL] Simple DSL support for Scala UDF

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 cc5dc4247 - 572300ba8 [SPARK-4202][SQL] Simple DSL support for Scala UDF This feature is based on an offline discussion with mengxr, hopefully can be useful for the new MLlib pipeline API. For the following test snippet ```scala

git commit: [SPARK-4152] [SQL] Avoid data change in CTAS while table already existed

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master c238fb423 - e83f13e8d [SPARK-4152] [SQL] Avoid data change in CTAS while table already existed CREATE TABLE t1 (a String); CREATE TABLE t1 AS SELECT key FROM src; – throw exception CREATE TABLE if not exists t1 AS SELECT key FROM src;

git commit: [SPARK-4152] [SQL] Avoid data change in CTAS while table already existed

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 572300ba8 - 6104754f7 [SPARK-4152] [SQL] Avoid data change in CTAS while table already existed CREATE TABLE t1 (a String); CREATE TABLE t1 AS SELECT key FROM src; – throw exception CREATE TABLE if not exists t1 AS SELECT key FROM

git commit: [SQL] More aggressive defaults

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 6104754f7 - 51985f78c [SQL] More aggressive defaults - Turns on compression for in-memory cached data by default - Changes the default parquet compression format back to gzip (we have seen more OOMs with production workloads due to

git commit: [SQL] More aggressive defaults

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master e83f13e8d - 25bef7e69 [SQL] More aggressive defaults - Turns on compression for in-memory cached data by default - Changes the default parquet compression format back to gzip (we have seen more OOMs with production workloads due to the

git commit: SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader insta...

2014-11-03 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 25bef7e69 - 28128150e SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader insta... ...ntiation Author: Sandy Ryza sa...@cloudera.com Closes #3045 from sryza/sandy-spark-4178 and squashes the following commits: 8d2e70e

git commit: SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader insta...

2014-11-03 Thread pwendell
Repository: spark Updated Branches: refs/heads/branch-1.2 51985f78c - fa86d862f SPARK-4178. Hadoop input metrics ignore bytes read in RecordReader insta... ...ntiation Author: Sandy Ryza sa...@cloudera.com Closes #3045 from sryza/sandy-spark-4178 and squashes the following commits: 8d2e70e

git commit: [SQL] Convert arguments to Scala UDFs

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 28128150e - 15b58a223 [SQL] Convert arguments to Scala UDFs Author: Michael Armbrust mich...@databricks.com Closes #3077 from marmbrus/udfsWithUdts and squashes the following commits: 34b5f27 [Michael Armbrust] style 504adef [Michael

git commit: [SQL] Convert arguments to Scala UDFs

2014-11-03 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.2 fa86d862f - 52db2b942 [SQL] Convert arguments to Scala UDFs Author: Michael Armbrust mich...@databricks.com Closes #3077 from marmbrus/udfsWithUdts and squashes the following commits: 34b5f27 [Michael Armbrust] style 504adef [Michael

git commit: [SPARK-4168][WebUI] web statges number should show correctly when stages are more than 1000

2014-11-03 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 15b58a223 - 97a466eca [SPARK-4168][WebUI] web statges number should show correctly when stages are more than 1000 The number of completed stages and failed stages showed on webUI will always be less than 1000. This is really misleading

git commit: [SPARK-611] Display executor thread dumps in web UI

2014-11-03 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 97a466eca - 4f035dd2c [SPARK-611] Display executor thread dumps in web UI This patch allows executor thread dumps to be collected on-demand and viewed in the Spark web UI. The thread dumps are collected using Thread.getAllStackTraces().

git commit: [FIX][MLLIB] fix seed in BaggedPointSuite

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/master 4f035dd2c - c5912ecc7 [FIX][MLLIB] fix seed in BaggedPointSuite Saw Jenkins test failures due to random seeds. jkbradley manishamde Author: Xiangrui Meng m...@databricks.com Closes #3084 from mengxr/fix-baggedpoint-suite and squashes

git commit: [FIX][MLLIB] fix seed in BaggedPointSuite

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 52db2b942 - 0826eed9c [FIX][MLLIB] fix seed in BaggedPointSuite Saw Jenkins test failures due to random seeds. jkbradley manishamde Author: Xiangrui Meng m...@databricks.com Closes #3084 from mengxr/fix-baggedpoint-suite and

git commit: [SPARK-4192][SQL] Internal API for Python UDT

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/master c5912ecc7 - 04450d115 [SPARK-4192][SQL] Internal API for Python UDT Following #2919, this PR adds Python UDT (for internal use only) with tests under pyspark.tests. Before `SQLContext.applySchema`, we check whether we need to convert

git commit: [SPARK-4192][SQL] Internal API for Python UDT

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 0826eed9c - 42d02db86 [SPARK-4192][SQL] Internal API for Python UDT Following #2919, this PR adds Python UDT (for internal use only) with tests under pyspark.tests. Before `SQLContext.applySchema`, we check whether we need to convert

git commit: [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/master 04450d115 - 1a9c6cdda [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a

git commit: [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD

2014-11-03 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.2 42d02db86 - 8395e8fbd [SPARK-3573][MLLIB] Make MLlib's Vector compatible with SQL's SchemaRDD Register MLlib's Vector as a SQL user-defined type (UDT) in both Scala and Python. With this PR, we can easily map a RDD[LabeledPoint] to a

git commit: [SPARK-4163][Core] Add a backward compatibility test for FetchFailed

2014-11-03 Thread adav
Repository: spark Updated Branches: refs/heads/master 1a9c6cdda - 9bdc8412a [SPARK-4163][Core] Add a backward compatibility test for FetchFailed /cc aarondav Author: zsxwing zsxw...@gmail.com Closes #3086 from zsxwing/SPARK-4163-back-comp and squashes the following commits: 21cb2a8

git commit: [SPARK-4166][Core] Add a backward compatibility test for ExecutorLostFailure

2014-11-03 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master 9bdc8412a - b671ce047 [SPARK-4166][Core] Add a backward compatibility test for ExecutorLostFailure Author: zsxwing zsxw...@gmail.com Closes #3085 from zsxwing/SPARK-4166-back-comp and squashes the following commits: 89329f4 [zsxwing]

git commit: [SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.

2014-11-03 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master b671ce047 - e4f42631a [SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default. This PR simplify serializer, always use batched serializer (AutoBatchedSerializer as default), even batch size is 1. Author: Davies

git commit: [SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default.

2014-11-03 Thread joshrosen
Repository: spark Updated Branches: refs/heads/branch-1.2 8395e8fbd - 786e75b33 [SPARK-3886] [PySpark] simplify serializer, use AutoBatchedSerializer by default. This PR simplify serializer, always use batched serializer (AutoBatchedSerializer as default), even batch size is 1. Author: