[GitHub] spark pull request: [SPARK-7585] [ml] [doc] VectorIndexer user gui...

2015-05-19 Thread jkbradley
GitHub user jkbradley opened a pull request: https://github.com/apache/spark/pull/6255 [SPARK-7585] [ml] [doc] VectorIndexer user guide section Added VectorIndexer section to ML user guide. Also added javaCategoryMaps() method and Java unit test for it. CC: @mengxr You

[GitHub] spark pull request: [SPARK-7696][SQL] Aggregate function's result ...

2015-05-19 Thread ogirardot
Github user ogirardot commented on the pull request: https://github.com/apache/spark/pull/6237#issuecomment-103349381 @marmbrus I may misunderstand the nullable flag, but I can have an empty dataset with a non-nullable column. For example : ``` scala val r =

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103353969 @andrewor14 I can do a local microbenchmark for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark pull request: [SPARK-7713] [SQL] Use shared broadcast hadoop...

2015-05-19 Thread andrewor14
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/6252#discussion_r30571665 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/SqlNewHadoopRDD.scala --- @@ -0,0 +1,269 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [SPARK-7585] [ml] [doc] VectorIndexer user gui...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6255#issuecomment-103353671 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7585] [ml] [doc] VectorIndexer user gui...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6255#issuecomment-103353670 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-05-19 Thread scwf
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-103362864 ping can you update this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-7678] [ml] Fix default random seed in H...

2015-05-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/6251#issuecomment-103371105 That last update is a bit of a hack; really, we should improve the unit tests so they are not so dependent on the random seed. But I think it's tolerable for now.

[GitHub] spark pull request: [SPARK-7581][ml][doc] User guide for spark.ml ...

2015-05-19 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6113 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SQL][SPARK-6785] fix DateUtils.fromJavaDate(j...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6242#issuecomment-103386299 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SQL][SPARK-6785] fix DateUtils.fromJavaDate(j...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6242#issuecomment-103386786 [Test build #33066 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33066/consoleFull) for PR 6242 at commit

[GitHub] spark pull request: [SQL][SPARK-6785] fix DateUtils.fromJavaDate(j...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6242#issuecomment-103386352 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576460 --- Diff: core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala --- @@ -21,14 +21,13 @@ import java.io.IOException import

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576462 --- Diff: core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala --- @@ -95,8 +93,29 @@ private[spark] class TachyonBlockManager() extends

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576706 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -17,7 +17,8 @@ package org.apache.spark.storage -import

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576683 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -485,14 +486,15 @@ private[spark] class BlockManager( if

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103389096 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576891 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -107,13 +137,19 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: [SQL][SPARK-6785] fix DateUtils.fromJavaDate(j...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6242#issuecomment-103389900 [Test build #33066 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33066/consoleFull) for PR 6242 at commit

[GitHub] spark pull request: [SQL][SPARK-6785] fix DateUtils.fromJavaDate(j...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6242#issuecomment-103389909 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SQL][SPARK-6785] fix DateUtils.fromJavaDate(j...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6242#issuecomment-103389911 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103351384 @yhuai can you try this and see if you can notice the speedup? (Does the closure get serialized correctly?) --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103351169 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6257#issuecomment-103366308 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6257#issuecomment-103366389 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6257#issuecomment-10336 [Test build #33062 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33062/consoleFull) for PR 6257 at commit

[GitHub] spark pull request: SPARK-7723 Fix string interpolation in pipelin...

2015-05-19 Thread tuxdna
GitHub user tuxdna opened a pull request: https://github.com/apache/spark/pull/6258 SPARK-7723 Fix string interpolation in pipeline examples https://issues.apache.org/jira/browse/SPARK-7723 You can merge this pull request into a Git repository by running: $ git pull

[GitHub] spark pull request: SPARK-7723 Fix string interpolation in pipelin...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6258#issuecomment-103379689 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: SPARK-7723 Fix string interpolation in pipelin...

2015-05-19 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/6258#issuecomment-103380565 LGTM, this is well trivial enough that it can just be a PR. It's a doc change, so I'll check it out manually. --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-7696][SQL] Aggregate function's result ...

2015-05-19 Thread ogirardot
Github user ogirardot commented on the pull request: https://github.com/apache/spark/pull/6237#issuecomment-103381292 I don't really understand, the Array is empty not filled with nulls --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: [SPARK-7696][SQL] Aggregate function's result ...

2015-05-19 Thread ogirardot
Github user ogirardot commented on the pull request: https://github.com/apache/spark/pull/6237#issuecomment-103382692 @rxin any input ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SQL][SPARK-6785] fix DateUtils.fromJavaDate(j...

2015-05-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6242#issuecomment-103384823 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-6954] [YARN] ExecutorAllocationManager ...

2015-05-19 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/5856#issuecomment-103387411 OK, I'm confused and/or dumb, but this failed because it was already merged, by me: https://github.com/apache/spark/commit/98ac39d2f5828fbdad8c9a4e563ad1169e3b9948

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576923 --- Diff: core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala --- @@ -105,21 +124,35 @@ private[spark] class TachyonBlockManager() extends

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576952 --- Diff: core/src/main/scala/org/apache/spark/storage/TachyonBlockManager.scala --- @@ -105,21 +124,35 @@ private[spark] class TachyonBlockManager() extends

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103351142 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103351266 [Test build #33060 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33060/consoleFull) for PR 6256 at commit

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103363732 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103363655 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103364309 [Test build #33061 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33061/consoleFull) for PR 6256 at commit

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/6257#issuecomment-103365156 @rxin, I eventually find another way to implement the rollup cube which doesn't depend on # 5780, sorry for the delay. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/6257 [SPARK-7320] [SQL] Add Cube / Rollup for dataframe Add `cube` `rollup` for DataFrame For example: ```scala testData.rollup($a + $b, $b).agg(sum($a - $b)) testData.cube($a +

[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-103369377 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-103369402 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [Spark-7511][MLLIB] pyspark ml seed param shou...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6139#issuecomment-103371995 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7678] [ml] Fix default random seed in H...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6251#issuecomment-103372014 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7678] [ml] Fix default random seed in H...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6251#issuecomment-103371991 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [Spark-7511][MLLIB] pyspark ml seed param shou...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6139#issuecomment-103372035 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7586][ML][doc] Add docs of Word2Vec in ...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6181#issuecomment-103373837 [Test build #830 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/830/consoleFull) for PR 6181 at commit

[GitHub] spark pull request: [SPARK-7681] [MLLIB] remove mima excludes for ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6254#issuecomment-103373526 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-7681] [MLLIB] remove mima excludes for ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6254#issuecomment-103373528 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7681] [MLLIB] remove mima excludes for ...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6254#issuecomment-103373484 [Test build #33056 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33056/consoleFull) for PR 6254 at commit

[GitHub] spark pull request: [SPARK-7585] [ml] [doc] VectorIndexer user gui...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6255#issuecomment-103373270 [Test build #829 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/829/consoleFull) for PR 6255 at commit

[GitHub] spark pull request: [SPARK-7586][ML][doc] Add docs of Word2Vec in ...

2015-05-19 Thread yinxusen
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/6181#issuecomment-103382178 @jkbradley Are there more concrete error infos for Mima error? I search from the console output of the test, but cannot locate the error. --- If your project is set

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6257#issuecomment-103383663 Can we just move rollup and cube mthods to GroupedData, and remove Cube/Rollup? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576656 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala --- @@ -1197,8 +1199,19 @@ private[spark] class BlockManager( bytes:

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103389098 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103389077 [Test build #33060 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33060/consoleFull) for PR 6256 at commit

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576645 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -40,7 +40,7 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576822 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -62,42 +62,72 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576853 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -107,13 +137,19 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread andrewor14
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/6256 [SPARK-7718] [SQL] Speed up partitioning by avoiding closure cleaning According to @yhuai we spent 6-7 seconds cleaning closures in a partitioning job that takes 12 seconds. Since we provide

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103349881 Also cc/ @pwendell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-7585] [ml] [doc] VectorIndexer user gui...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6255#issuecomment-103349557 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7585] [ml] [doc] VectorIndexer user gui...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6255#issuecomment-103349531 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7713] [SQL] Use shared broadcast hadoop...

2015-05-19 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/6252#discussion_r30571508 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -584,6 +588,34 @@ abstract class HadoopFsRelation

[GitHub] spark pull request: [WIP] [SQL] [PySpark] add reader amd writer AP...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/6238#discussion_r30571643 --- Diff: python/pyspark/sql/readwriter.py --- @@ -0,0 +1,338 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] spark pull request: [SPARK-7497] [PySpark] [Streaming] fix streami...

2015-05-19 Thread tdas
Github user tdas commented on the pull request: https://github.com/apache/spark/pull/6239#issuecomment-103367448 @davies Why are you increasing the batch duration?? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-7026] [SQL] fix left semi join with equ...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5643#issuecomment-103369808 [Test build #33063 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33063/consoleFull) for PR 5643 at commit

[GitHub] spark pull request: [SPARK-7678] [ml] Fix default random seed in H...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6251#issuecomment-103372395 [Test build #33064 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33064/consoleFull) for PR 6251 at commit

[GitHub] spark pull request: [Spark-7511][MLLIB] pyspark ml seed param shou...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6139#issuecomment-103372480 [Test build #33065 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33065/consoleFull) for PR 6139 at commit

[GitHub] spark pull request: [SPARK-7581][ml][doc] User guide for spark.ml ...

2015-05-19 Thread jkbradley
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/6113#issuecomment-103374138 LGTM @yinxusen Thanks! Merging into master and branch-1.4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-7663][MLlib] Add requirement for word2v...

2015-05-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6228#discussion_r30573731 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -158,6 +158,9 @@ class Word2Vec extends Serializable with Logging {

[GitHub] spark pull request: [SPARK-7704] Updating Programming Guides per S...

2015-05-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/6234#discussion_r30573957 --- Diff: docs/programming-guide.md --- @@ -41,14 +41,15 @@ In addition, if you wish to access an HDFS cluster, you need to add a dependency

[GitHub] spark pull request: [SPARK-7696][SQL] Aggregate function's result ...

2015-05-19 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/6237#issuecomment-103381022 But you will see null outputs: ```scala a.filter(_1 1).groupBy(_1).agg(avg(_1)).collect res5:Array[org.apache.spark.sql.Row] = Array() ``` ---

[GitHub] spark pull request: [SPARK-7696][SQL] Aggregate function's result ...

2015-05-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6237#issuecomment-103384521 What does other SQL systems do? i.e. Hive, MySQL. it might make sense for sum ot return 0, but should avg return null if there is no data? --- If your project is

[GitHub] spark pull request: [SPARK-6411] [SQL] [PySpark] support datetime ...

2015-05-19 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/6250#issuecomment-103387302 @airhorns Maybe long term, but unlikely in the short term since it's super complicated to support them. Of course, if somebody has time to look into scoping this (what's

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread shimingfei
Github user shimingfei commented on the pull request: https://github.com/apache/spark/pull/5908#issuecomment-103387350 @rxin Can this be merged ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576550 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -62,42 +62,72 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576577 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -62,42 +62,72 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576614 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -40,7 +40,7 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: [SPARK-7389] [core]Tachyon integration improve...

2015-05-19 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/5908#discussion_r30576753 --- Diff: core/src/main/scala/org/apache/spark/storage/ExternalBlockStore.scala --- @@ -62,42 +62,72 @@ private[spark] class ExternalBlockStore(blockManager:

[GitHub] spark pull request: Fixing a few basic typos in the Programming Gu...

2015-05-19 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/6240 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103397818 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6257#issuecomment-103398194 [Test build #33062 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33062/consoleFull) for PR 6257 at commit

[GitHub] spark pull request: [SPARK-7320] [SQL] Add Cube / Rollup for dataf...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6257#issuecomment-103398212 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7678] [ml] Fix default random seed in H...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6251#issuecomment-103401303 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7696][SQL] Aggregate function's result ...

2015-05-19 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/6237#issuecomment-103402444 From this point of view, I think it's reasonable to say aggregation functions should always be nullable, but depends on different use scenarios. (with / without

[GitHub] spark pull request: [SPARK-7585] [ml] [doc] VectorIndexer user gui...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6255#issuecomment-103403500 [Test build #829 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/829/consoleFull) for PR 6255 at commit

[GitHub] spark pull request: [SPARK-7652][MLlib] Update the implementation ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6189#issuecomment-103404215 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7652][MLlib] Update the implementation ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6189#issuecomment-103404312 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7718] [SQL] Speed up partitioning by av...

2015-05-19 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/6256#issuecomment-103407249 Used the same micro benchmark configuration and code in #6225. Removing the `clean()` call does help, and leads to a ~6% performance gain. (Not able to upload

[GitHub] spark pull request: SPARK-7637: SQL O(N) merge implementation for ...

2015-05-19 Thread rowan000
GitHub user rowan000 opened a pull request: https://github.com/apache/spark/pull/6259 SPARK-7637: SQL O(N) merge implementation for StructType merge Contribution is my original work and I license the work to the project under the projects open source license. You can merge this

[GitHub] spark pull request: SPARK-1537 [WiP] Application Timeline Server i...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5423#issuecomment-103415114 [Test build #33071 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33071/consoleFull) for PR 5423 at commit

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-103417784 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-05-19 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-103418402 [Test build #33072 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/33072/consoleFull) for PR 5688 at commit

[GitHub] spark pull request: [SQL] [WIP] Tries to skip row groups when read...

2015-05-19 Thread liancheng
Github user liancheng closed the pull request at: https://github.com/apache/spark/pull/5334 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-103418411 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-7119][SQL] ScriptTransform should also ...

2015-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5688#issuecomment-103418409 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SQL] [WIP] Tries to skip row groups when read...

2015-05-19 Thread liancheng
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/5334#issuecomment-103418401 Closing this as it's superseded by #6225. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

  1   2   3   4   5   6   7   8   >