git commit: Add caching information to rdd.toDebugString

2014-08-14 Thread pwendell
Repository: spark Updated Branches: refs/heads/master e1b85f310 -> fba8ec39c Add caching information to rdd.toDebugString I find it useful to see where in an RDD's DAG data is cached, so I figured others might too. I've added both the caching level, and the actual memory state of the RDD. S

git commit: SPARK-2955 [BUILD] Test code fails to compile with "mvn compile" without "install"

2014-08-14 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 500f84e49 -> e1b85f310 SPARK-2955 [BUILD] Test code fails to compile with "mvn compile" without "install" (This is the corrected follow-up to https://issues.apache.org/jira/browse/SPARK-2903) Right now, `mvn compile test-compile` fails t

git commit: [SPARK-2912] [Spark QA] Include commit hash in Spark QA messages

2014-08-14 Thread pwendell
Repository: spark Updated Branches: refs/heads/master 9422a9b08 -> 500f84e49 [SPARK-2912] [Spark QA] Include commit hash in Spark QA messages You can find the [discussion that motivated this PR here](http://mail-archives.apache.org/mod_mbox/spark-dev/201408.mbox/%3CCABPQxssy0ri2QAz=cc9Tx+EXYW

[2/5] [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/3f23d2a3/core/src/test/resources/netty-test-file.txt -- diff --git a/core/src/test/resources/netty-test-file.txt b/core/src/test/resources/netty-test-file.txt new file mode 100644

[3/5] [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/3f23d2a3/core/src/main/scala/org/apache/spark/storage/BlockNotFoundException.scala -- diff --git a/core/src/main/scala/org/apache/spark/storage/BlockNotFoundException.scala b/core

[5/5] git commit: [SPARK-2936] Migrate Netty network module from Java to Scala

2014-08-14 Thread rxin
[SPARK-2936] Migrate Netty network module from Java to Scala The Netty network module was originally written when Scala 2.9.x had a bug that prevents a pure Scala implementation, and a subset of the files were done in Java. We have since upgraded to Scala 2.10, and can migrate all Java files now

[1/5] [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 72e730e98 -> 3f23d2a38 http://git-wip-us.apache.org/repos/asf/spark/blob/3f23d2a3/core/src/test/scala/org/apache/spark/network/netty/ServerClientIntegrationSuite.scala -

[4/5] git commit: [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
[SPARK-2468] Netty based block server / client module This is a rewrite of the original Netty module that was added about 1.5 years ago. The old code was turned off by default and didn't really work because it lacked a frame decoder (only worked with very very small blocks). For this pull reque

git commit: [SPARK-2736] PySpark converter and example script for reading Avro files

2014-08-14 Thread matei
Repository: spark Updated Branches: refs/heads/master 3a8b68b73 -> 9422a9b08 [SPARK-2736] PySpark converter and example script for reading Avro files JIRA: https://issues.apache.org/jira/browse/SPARK-2736 This patch includes: 1. An Avro converter that converts Avro data types to Python. It ha

git commit: [SPARK-2736] PySpark converter and example script for reading Avro files

2014-08-14 Thread matei
Repository: spark Updated Branches: refs/heads/branch-1.1 f99e4fc80 -> 72e730e98 [SPARK-2736] PySpark converter and example script for reading Avro files JIRA: https://issues.apache.org/jira/browse/SPARK-2736 This patch includes: 1. An Avro converter that converts Avro data types to Python. I

[1/4] [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 655699f8b -> 3a8b68b73 http://git-wip-us.apache.org/repos/asf/spark/blob/3a8b68b7/core/src/test/scala/org/apache/spark/network/netty/ServerClientIntegrationSuite.scala -- di

[2/4] [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/3a8b68b7/core/src/test/resources/netty-test-file.txt -- diff --git a/core/src/test/resources/netty-test-file.txt b/core/src/test/resources/netty-test-file.txt new file mode 100644

[4/4] git commit: [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
[SPARK-2468] Netty based block server / client module This is a rewrite of the original Netty module that was added about 1.5 years ago. The old code was turned off by default and didn't really work because it lacked a frame decoder (only worked with very very small blocks). For this pull reque

[3/4] [SPARK-2468] Netty based block server / client module

2014-08-14 Thread rxin
http://git-wip-us.apache.org/repos/asf/spark/blob/3a8b68b7/core/src/main/scala/org/apache/spark/storage/BlockNotFoundException.scala -- diff --git a/core/src/main/scala/org/apache/spark/storage/BlockNotFoundException.scala b/core

git commit: [SPARK-3027] TaskContext: tighten visibility and provide Java friendly callback API

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 475a35ba4 -> f99e4fc80 [SPARK-3027] TaskContext: tighten visibility and provide Java friendly callback API Note this also passes the TaskContext itself to the TaskCompletionListener. In the future we can mark TaskContext with the exce

git commit: [SPARK-3027] TaskContext: tighten visibility and provide Java friendly callback API

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master fa5a08e67 -> 655699f8b [SPARK-3027] TaskContext: tighten visibility and provide Java friendly callback API Note this also passes the TaskContext itself to the TaskCompletionListener. In the future we can mark TaskContext with the exceptio

git commit: Make dev/mima runnable on Mac OS X.

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 f5d9176fb -> 475a35ba4 Make dev/mima runnable on Mac OS X. Mac OS X's find is from the BSD variant that doesn't have -printf option. Author: Reynold Xin Closes #1953 from rxin/mima and squashes the following commits: e284afe [Reynol

git commit: Make dev/mima runnable on Mac OS X.

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master a75bc7a21 -> fa5a08e67 Make dev/mima runnable on Mac OS X. Mac OS X's find is from the BSD variant that doesn't have -printf option. Author: Reynold Xin Closes #1953 from rxin/mima and squashes the following commits: e284afe [Reynold Xi

git commit: SPARK-3009: Reverted readObject method in ApplicationInfo so that Applic...

2014-08-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.1 c39a3f337 -> f5d9176fb SPARK-3009: Reverted readObject method in ApplicationInfo so that Applic... ...ationInfo is initialized properly after deserialization Author: Jacek Lewandowski Closes #1947 from jacek-lewandowski/master and sq

git commit: SPARK-3009: Reverted readObject method in ApplicationInfo so that Applic...

2014-08-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/master a7f8a4f5e -> a75bc7a21 SPARK-3009: Reverted readObject method in ApplicationInfo so that Applic... ...ationInfo is initialized properly after deserialization Author: Jacek Lewandowski Closes #1947 from jacek-lewandowski/master and squash

git commit: Revert [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 dc8ef9387 -> c39a3f337 Revert [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile Reverts #1924 due to build failures with hadoop 0.23. Author: Michael Armbrust Closes #1949 from marmbrus/revert1

git commit: Revert [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 962210675 -> a7f8a4f5e Revert [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile Reverts #1924 due to build failures with hadoop 0.23. Author: Michael Armbrust Closes #1949 from marmbrus/revert1924

git commit: [SPARK-2979][MLlib] Improve the convergence rate by minimizing the condition number

2014-08-14 Thread meng
Repository: spark Updated Branches: refs/heads/branch-1.1 a3dc54fa1 -> dc8ef9387 [SPARK-2979][MLlib] Improve the convergence rate by minimizing the condition number In theory, the scale of your inputs are irrelevant to logistic regression. You can "theoretically" multiply X1 by 1E6 and the es

git commit: [SPARK-2979][MLlib] Improve the convergence rate by minimizing the condition number

2014-08-14 Thread meng
Repository: spark Updated Branches: refs/heads/master eaeb0f76f -> 962210675 [SPARK-2979][MLlib] Improve the convergence rate by minimizing the condition number In theory, the scale of your inputs are irrelevant to logistic regression. You can "theoretically" multiply X1 by 1E6 and the estima

git commit: Minor cleanup of metrics.Source

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 df25acdf4 -> a3dc54fa1 Minor cleanup of metrics.Source - Added override. - Marked some variables as private. Author: Reynold Xin Closes #1943 from rxin/metricsSource and squashes the following commits: fbfa943 [Reynold Xin] Minor cl

git commit: Minor cleanup of metrics.Source

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 267fdffe2 -> eaeb0f76f Minor cleanup of metrics.Source - Added override. - Marked some variables as private. Author: Reynold Xin Closes #1943 from rxin/metricsSource and squashes the following commits: fbfa943 [Reynold Xin] Minor cleanu

git commit: [SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs when set --driver-java-options

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 850abaa36 -> df25acdf4 [SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs when set --driver-java-options https://issues.apache.org/jira/browse/SPARK-2925 Run cmd like this will get the error bin/spark-sql --driver-java-

git commit: [SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs when set --driver-java-options

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master fde692b36 -> 267fdffe2 [SPARK-2925] [sql]fix spark-sql and start-thriftserver shell bugs when set --driver-java-options https://issues.apache.org/jira/browse/SPARK-2925 Run cmd like this will get the error bin/spark-sql --driver-java-opti

git commit: [SQL] Python JsonRDD UTF8 Encoding Fix

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master add75d483 -> fde692b36 [SQL] Python JsonRDD UTF8 Encoding Fix Only encode unicode objects to UTF-8, and not strings Author: Ahir Reddy Closes #1914 from ahirreddy/json-rdd-unicode-fix1 and squashes the following commits: ca4e9ba [Ahir

git commit: [SQL] Python JsonRDD UTF8 Encoding Fix

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 de501e169 -> 850abaa36 [SQL] Python JsonRDD UTF8 Encoding Fix Only encode unicode objects to UTF-8, and not strings Author: Ahir Reddy Closes #1914 from ahirreddy/json-rdd-unicode-fix1 and squashes the following commits: ca4e9ba [A

git commit: [SPARK-2927][SQL] Add a conf to configure if we always read Binary columns stored in Parquet as String columns

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 078f3fbda -> add75d483 [SPARK-2927][SQL] Add a conf to configure if we always read Binary columns stored in Parquet as String columns This PR adds a new conf flag `spark.sql.parquet.binaryAsString`. When it is `true`, if there is no parqu

git commit: [SPARK-2927][SQL] Add a conf to configure if we always read Binary columns stored in Parquet as String columns

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 221c84e6a -> de501e169 [SPARK-2927][SQL] Add a conf to configure if we always read Binary columns stored in Parquet as String columns This PR adds a new conf flag `spark.sql.parquet.binaryAsString`. When it is `true`, if there is no p

git commit: [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/master 6b8de0e36 -> 078f3fbda [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile Author: Chia-Yung Su Closes #1924 from joesu/bugfix-spark3011 and squashes the following commits: c7e44f2 [Chia-Yung Su] matc

git commit: [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile

2014-08-14 Thread marmbrus
Repository: spark Updated Branches: refs/heads/branch-1.1 af809de77 -> 221c84e6a [SPARK-3011][SQL] _temporary directory should be filtered out by sqlContext.parquetFile Author: Chia-Yung Su Closes #1924 from joesu/bugfix-spark3011 and squashes the following commits: c7e44f2 [Chia-Yung Su]

git commit: SPARK-3009: Reverted readObject method in ApplicationInfo so that Applic...

2014-08-14 Thread andrewor14
Repository: spark Updated Branches: refs/heads/branch-1.0 9e783a270 -> 6c6409e8b SPARK-3009: Reverted readObject method in ApplicationInfo so that Applic... ...ationInfo is initialized properly after deserialization Author: Jacek Lewandowski Closes #1922 from jacek-lewandowski/branch-1.0 an

git commit: SPARK-2893: Do not swallow Exceptions when running a custom kryo registrator

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.0 afb56843d -> 9e783a270 SPARK-2893: Do not swallow Exceptions when running a custom kryo registrator The previous behaviour of swallowing ClassNotFound exceptions when running a custom Kryo registrator could lead to difficult to debug p

git commit: SPARK-2893: Do not swallow Exceptions when running a custom kryo registrator

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master d069c5d9d -> 6b8de0e36 SPARK-2893: Do not swallow Exceptions when running a custom kryo registrator The previous behaviour of swallowing ClassNotFound exceptions when running a custom Kryo registrator could lead to difficult to debug probl

git commit: SPARK-2893: Do not swallow Exceptions when running a custom kryo registrator

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 0cb2b82e0 -> af809de77 SPARK-2893: Do not swallow Exceptions when running a custom kryo registrator The previous behaviour of swallowing ClassNotFound exceptions when running a custom Kryo registrator could lead to difficult to debug p

git commit: [SPARK-3029] Disable local execution of Spark jobs by default

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/branch-1.1 1baf06f4e -> 0cb2b82e0 [SPARK-3029] Disable local execution of Spark jobs by default Currently, local execution of Spark jobs is only used by take(), and it can be problematic as it can load a significant amount of data onto the driver

git commit: [SPARK-3029] Disable local execution of Spark jobs by default

2014-08-14 Thread rxin
Repository: spark Updated Branches: refs/heads/master 69a57a18e -> d069c5d9d [SPARK-3029] Disable local execution of Spark jobs by default Currently, local execution of Spark jobs is only used by take(), and it can be problematic as it can load a significant amount of data onto the driver. Th