spark git commit: [SPARK-9404][SPARK-9542][SQL] unsafe array data and map data

2015-08-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 687c8c371 -> 608353c8e [SPARK-9404][SPARK-9542][SQL] unsafe array data and map data This PR adds a UnsafeArrayData, current we encode it in this way: first 4 bytes is the # elements then each 4 byte is the start offset of the element, unle

spark git commit: [SPARK-9372] [SQL] Filter nulls in join keys

2015-08-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 4cdd8ecd6 -> 687c8c371 [SPARK-9372] [SQL] Filter nulls in join keys This PR adds an optimization rule, `FilterNullsInJoinKey`, to add `Filter` before join operators to filter out rows having null values for join keys. This optimization is

spark git commit: [SPARK-9536] [SPARK-9537] [SPARK-9538] [ML] [PYSPARK] ml.classification support raw and probability prediction for PySpark

2015-08-02 Thread jkbradley
Repository: spark Updated Branches: refs/heads/master 114ff926f -> 4cdd8ecd6 [SPARK-9536] [SPARK-9537] [SPARK-9538] [ML] [PYSPARK] ml.classification support raw and probability prediction for PySpark Make the following ml.classification class support raw and probability prediction for PySpar

spark git commit: [SPARK-2205] [SQL] Avoid unnecessary exchange operators in multi-way joins

2015-08-02 Thread yhuai
Repository: spark Updated Branches: refs/heads/master 30e89111d -> 114ff926f [SPARK-2205] [SQL] Avoid unnecessary exchange operators in multi-way joins This PR adds `PartitioningCollection`, which is used to represent the `outputPartitioning` for SparkPlans with multiple children (e.g. `Shuf

spark git commit: [SPARK-9546][SQL] Centralize orderable data type checking.

2015-08-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 536d2adc1 -> 30e89111d [SPARK-9546][SQL] Centralize orderable data type checking. This pull request creates two isOrderable functions in RowOrdering that can be used to check whether a data type or a sequence of expressions can be used in

spark git commit: [SPARK-9535][SQL][DOCS] Modify document for codegen.

2015-08-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 9d03ad910 -> 536d2adc1 [SPARK-9535][SQL][DOCS] Modify document for codegen. #7142 made codegen enabled by default so let's modify the corresponding documents. Closes #7142 Author: KaiXinXiaoLei Author: Kousuke Saruta Closes #7863 from

spark git commit: [SPARK-9543][SQL] Add randomized testing for UnsafeKVExternalSorter.

2015-08-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 0722f4331 -> 9d03ad910 [SPARK-9543][SQL] Add randomized testing for UnsafeKVExternalSorter. The detailed approach is documented in UnsafeKVExternalSorterSuite.testKVSorter(), working as follows: 1. Create input by generating data randomly

spark git commit: [SPARK-7937][SQL] Support comparison on StructType

2015-08-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 2e981b7bf -> 0722f4331 [SPARK-7937][SQL] Support comparison on StructType This brings #6519 up-to-date with master branch. Closes #6519. Author: Liang-Chi Hsieh Author: Liang-Chi Hsieh Author: Reynold Xin Closes #7877 from rxin/sort-s

spark git commit: [SPARK-9531] [SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter

2015-08-02 Thread joshrosen
Repository: spark Updated Branches: refs/heads/master 66924ffa6 -> 2e981b7bf [SPARK-9531] [SQL] UnsafeFixedWidthAggregationMap.destructAndCreateExternalSorter This pull request adds a destructAndCreateExternalSorter method to UnsafeFixedWidthAggregationMap. The new method does the following:

spark git commit: [SPARK-9527] [MLLIB] add PrefixSpanModel and make PrefixSpan Java friendly

2015-08-02 Thread meng
Repository: spark Updated Branches: refs/heads/master 8eafa2aeb -> 66924ffa6 [SPARK-9527] [MLLIB] add PrefixSpanModel and make PrefixSpan Java friendly 1. Use `PrefixSpanModel` to wrap the frequent sequences. 2. Define `FreqSequence` to wrap each frequent sequence, which contains a Java-frien

spark git commit: [SPARK-9208][SQL] Sort DataFrame functions alphabetically.

2015-08-02 Thread rxin
Repository: spark Updated Branches: refs/heads/master 244016a95 -> 8eafa2aeb [SPARK-9208][SQL] Sort DataFrame functions alphabetically. Author: Reynold Xin Closes #7861 from rxin/api-audit and squashes the following commits: 7200256 [Reynold Xin] [SPARK-9208][SQL] Sort DataFrame functions a

spark git commit: [SPARK-9254] [BUILD] [HOTFIX] sbt-launch-lib.bash should support HTTP/HTTPS redirection

2015-08-02 Thread srowen
Repository: spark Updated Branches: refs/heads/branch-1.3 047a61365 -> cc5f711c0 [SPARK-9254] [BUILD] [HOTFIX] sbt-launch-lib.bash should support HTTP/HTTPS redirection Target file(s) can be hosted on CDN nodes. HTTP/HTTPS redirection must be supported to download these files. Author: Cheng

spark git commit: [SPARK-9149] [ML] [EXAMPLES] Add an example of spark.ml KMeans

2015-08-02 Thread srowen
Repository: spark Updated Branches: refs/heads/master 9d1c02526 -> 244016a95 [SPARK-9149] [ML] [EXAMPLES] Add an example of spark.ml KMeans [SPARK-9149] Add an example of spark.ml KMeans - ASF JIRA https://issues.apache.org/jira/browse/SPARK-9149 jkbradley Should we support other data format

spark git commit: [SPARK-9521] [BUILD] Require Maven 3.3.3+ in the build

2015-08-02 Thread srowen
Repository: spark Updated Branches: refs/heads/master 16b928c54 -> 9d1c02526 [SPARK-9521] [BUILD] Require Maven 3.3.3+ in the build Enforce Maven 3.3.3+ in the build. (Also update the scala compiler plugin while we're at it.) Author: Sean Owen Closes #7852 from srowen/SPARK-9521 and squash