[GitHub] spark issue #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBU...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15429 **[Test build #66732 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66732/consoleFull)** for PR 15429 at commit

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14847 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66730/ Test PASSed. ---

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14847 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14847 **[Test build #66730 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66730/consoleFull)** for PR 14847 at commit

[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14788 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66728/ Test PASSed. ---

[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #66728 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66728/consoleFull)** for PR 14788 at commit

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15432 **[Test build #66738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66738/consoleFull)** for PR 15432 at commit

[GitHub] spark issue #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.dir is r...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15382 **[Test build #3322 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3322/consoleFull)** for PR 15382 at commit

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66735/ Test PASSed. ---

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15432 **[Test build #66737 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66737/consoleFull)** for PR 15432 at commit

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66735 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66735/consoleFull)** for PR 15428 at commit

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/9 We should also just check save / load is backward compatible with older versions. It should be, but subtle things can sneak in so let's be careful about that. --- If your project is set up for it,

[GitHub] spark pull request #15432: [SPARK-17854][SQL] rand/randn allows null as inpu...

2016-10-11 Thread HyukjinKwon
GitHub user HyukjinKwon opened a pull request: https://github.com/apache/spark/pull/15432 [SPARK-17854][SQL] rand/randn allows null as input seed ## What changes were proposed in this pull request? This PR proposes `rand`/`randn` accept `null` as input. In this case, it

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82751668 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -446,6 +463,20 @@ private[ml] object DefaultParamsReader { val cls =

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82753706 --- Diff: mllib/src/main/scala/org/apache/spark/ml/util/ReadWrite.scala --- @@ -300,15 +301,23 @@ private[ml] object DefaultParamsWriter {

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82752375 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/KMeansSuite.scala --- @@ -137,18 +143,64 @@ class KMeansSuite extends SparkFunSuite with

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82750575 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -333,13 +372,44 @@ class KMeans @Since("1.5.0") ( override def

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82749390 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,20 @@ class KMeans @Since("1.5.0") ( @Since("1.5.0")

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82749322 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -303,6 +312,20 @@ class KMeans @Since("1.5.0") ( @Since("1.5.0")

[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/9#discussion_r82749661 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala --- @@ -81,6 +81,13 @@ private[clustering] trait KMeansParams extends Params with

[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15431 **[Test build #66736 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66736/consoleFull)** for PR 15431 at commit

[GitHub] spark issue #12930: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/12930 Move the new fix to #15431 which leverages ```RFormula forceIndexLabel```, more succinct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request #12930: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiv...

2016-10-11 Thread yanboliang
Github user yanboliang closed the pull request at: https://github.com/apache/spark/pull/12930 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiv...

2016-10-11 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15431 [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes error when label is numeric type ## What changes were proposed in this pull request? Fix SparkR ```spark.naiveBayes``` error when

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82750631 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf)

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82750409 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf)

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66733/ Test PASSed. ---

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66733 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66733/consoleFull)** for PR 15428 at commit

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15285 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66725/ Test PASSed. ---

[GitHub] spark issue #15285: [SPARK-17711] Compress rolled executor log

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15285 **[Test build #66725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66725/consoleFull)** for PR 15285 at commit

[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15430 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15430 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66734/ Test PASSed. ---

[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15430 **[Test build #66734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66734/consoleFull)** for PR 15430 at commit

[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15342 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66729/ Test PASSed. ---

[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15342 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15342 **[Test build #66729 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66729/consoleFull)** for PR 15342 at commit

[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15388 Thanks! @rxin @cloud-fan @hvanhovell @gatorsmile --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/9 I have a few high level questions on this: Params Why are we only setting` k` based on the `initialModel`? I had thought from previous discussion above (it was a while ago now)

[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15295 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15295 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66722/ Test PASSed. ---

[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15295 **[Test build #66722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66722/consoleFull)** for PR 15295 at commit

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15427 cc @cloud-fan @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15427 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66724/ Test PASSed. ---

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15427 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66735 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66735/consoleFull)** for PR 15428 at commit

[GitHub] spark issue #15427: [SPARK-17866][SPARK-17867][SQL] Fix Dataset.dropduplicat...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15427 **[Test build #66724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66724/consoleFull)** for PR 15427 at commit

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82745403 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf)

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/9 I misread DB's meaning in my previous comment. I agree that the parameter settings of `initialModel`, if set, should take precedence. If it conflicts with an existing `k` then log a warning.

[GitHub] spark issue #13194: [SPARK-15402] [ML] [PySpark] PySpark ml.evaluation shoul...

2016-10-11 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/13194 Resolved merge conflicts, ping @holdenk @MLnick @srowen @jkbradley to take a look when you available. This is just add Python API and should be very straight forward to move ahead. Thanks. ---

[GitHub] spark issue #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python API for...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15430 **[Test build #66734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66734/consoleFull)** for PR 15430 at commit

[GitHub] spark pull request #15423: [SPARK-17860][SQL] SHOW COLUMN's database conflic...

2016-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15423#discussion_r82744230 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -168,17 +168,7 @@ class SparkSqlAstBuilder(conf: SQLConf)

[GitHub] spark issue #15405: [SPARK-15917][CORE] Added support for number of executor...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15405 **[Test build #3323 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3323/consoleFull)** for PR 15405 at commit

[GitHub] spark pull request #15430: [SPARK-15957][Follow-up][ML][PySpark] Add Python ...

2016-10-11 Thread yanboliang
GitHub user yanboliang opened a pull request: https://github.com/apache/spark/pull/15430 [SPARK-15957][Follow-up][ML][PySpark] Add Python API for RFormula forceIndexLabel. ## What changes were proposed in this pull request? Follow-up work of #13675, add Python API for

[GitHub] spark pull request #15072: [SPARK-17123][SQL] Use type-widened encoder for D...

2016-10-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/15072#discussion_r82743571 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -53,7 +53,15 @@ import org.apache.spark.util.Utils private[sql]

[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew

2016-10-11 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82743168 --- Diff: core/src/main/scala/org/apache/spark/shuffle/ShuffleManager.scala --- @@ -48,7 +48,8 @@ private[spark] trait ShuffleManager { handle:

[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...

2016-10-11 Thread VinceShieh
Github user VinceShieh commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82743072 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -73,15 +78,27 @@ final class Bucketizer @Since("1.4.0")

[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew

2016-10-11 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82742902 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -687,18 +691,21 @@ private[spark] object MapOutputTracker extends Logging {

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14847 @viirya can you try to create a new operator for this optimization and make it work with whole-stage-codegen? thanks! --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request #15388: [SPARK-17821][SQL] Support And and Or in Expressi...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15388 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #15297: [WIP][SPARK-9862]Handling data skew

2016-10-11 Thread witgo
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/15297#discussion_r82742056 --- Diff: core/src/main/scala/org/apache/spark/MapOutputTracker.scala --- @@ -138,13 +138,16 @@ private[spark] abstract class MapOutputTracker(conf:

[GitHub] spark issue #15388: [SPARK-17821][SQL] Support And and Or in Expression Cano...

2016-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15388 thanks, merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...

2016-10-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82741203 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala --- @@ -270,10 +270,10 @@ private[ml] trait HasFitIntercept extends Params

[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...

2016-10-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82741770 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -128,8 +145,9 @@ object Bucketizer extends

[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...

2016-10-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15428#discussion_r82741459 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -73,15 +78,27 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0")

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66733 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66733/consoleFull)** for PR 15428 at commit

[GitHub] spark issue #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBU...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15429 **[Test build #66732 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66732/consoleFull)** for PR 15429 at commit

[GitHub] spark issue #15425: [SPARK-17816] [Core] [Branch-2.0] Fix ConcurrentModifica...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15425 **[Test build #3321 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3321/consoleFull)** for PR 15425 at commit

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14847 This would really only be interesting if it works with whole stage code gen; otherwise it is not really interesting. In addition, it'd make sense to have an explicit operator for this, e.g.

[GitHub] spark pull request #15429: [SPARK-17840] [DOCS] Add some pointers for wiki/C...

2016-10-11 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/15429 [SPARK-17840] [DOCS] Add some pointers for wiki/CONTRIBUTING.md in README.md and some warnings in PULL_REQUEST_TEMPLATE ## What changes were proposed in this pull request? Link to

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66731/ Test FAILed. ---

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15428 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66731/consoleFull)** for PR 15428 at commit

[GitHub] spark issue #15428: [SPARK-17219][ML] enchanced NaN value handling in Bucket...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15428 **[Test build #66731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66731/consoleFull)** for PR 15428 at commit

[GitHub] spark issue #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.dir is r...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15382 **[Test build #3322 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3322/consoleFull)** for PR 15382 at commit

[GitHub] spark pull request #15428: [SPARK-17219][ML] enchanced NaN value handling in...

2016-10-11 Thread VinceShieh
GitHub user VinceShieh opened a pull request: https://github.com/apache/spark/pull/15428 [SPARK-17219][ML] enchanced NaN value handling in Bucketizer ## What changes were proposed in this pull request? This PR is an enhancement of PR with commit

[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15342 **[Test build #66729 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66729/consoleFull)** for PR 15342 at commit

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14847 **[Test build #66730 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66730/consoleFull)** for PR 14847 at commit

[GitHub] spark pull request #15072: [SPARK-17123][SQL] Use type-widened encoder for D...

2016-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15072#discussion_r82738407 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -53,7 +53,15 @@ import org.apache.spark.util.Utils private[sql]

[GitHub] spark issue #15342: [SPARK-11560] [MLLIB] Optimize KMeans implementation / r...

2016-10-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15342 Yeah, but now we have yet 2 more overloads. I had intended to point people to 1 new overload, but I guess it's weird to make people specify the seed arg. And optional args, the normal solution,

[GitHub] spark pull request #14847: [SPARK-17254][SQL] Filter can stop when the condi...

2016-10-11 Thread viirya
GitHub user viirya reopened a pull request: https://github.com/apache/spark/pull/14847 [SPARK-17254][SQL] Filter can stop when the condition is false if the child output is sorted ## What changes were proposed in this pull request? From

[GitHub] spark issue #14847: [SPARK-17254][SQL] Filter can stop when the condition is...

2016-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/14847 Re-open it and see if we can have some consensus about this direction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14702 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66723/ Test FAILed. ---

[GitHub] spark issue #14702: [SPARK-15694] Implement ScriptTransformation in sql/core...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14702 **[Test build #66723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66723/consoleFull)** for PR 14702 at commit

[GitHub] spark pull request #15072: [SPARK-17123][SQL] Use type-widened encoder for D...

2016-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15072#discussion_r82737858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -53,7 +53,15 @@ import org.apache.spark.util.Utils private[sql]

[GitHub] spark pull request #15426: [SPARK-17864][SQL] Mark data type APIs as stable ...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15426 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-10-11 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/9 +1 on what @sethah proposed. We can log with warn when k is modified by setting the initial model. Thanks. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #11459: [SPARK-13025] Allow users to set initial model in logist...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11459 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66726/ Test PASSed. ---

[GitHub] spark issue #11459: [SPARK-13025] Allow users to set initial model in logist...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11459 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82737475 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,129 @@ +/* + * Licensed under the Apache License,

[GitHub] spark issue #11459: [SPARK-13025] Allow users to set initial model in logist...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11459 **[Test build #66726 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66726/consoleFull)** for PR 11459 at commit

[GitHub] spark issue #15426: [SPARK-17864][SQL] Mark data type APIs as stable (not De...

2016-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15426 merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15426: [SPARK-17864][SQL] Mark data type APIs as stable (not De...

2016-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/15426 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15408: [SPARK-17839][CORE] Use Nio's directbuffer instea...

2016-10-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15408#discussion_r82737052 --- Diff: core/src/main/java/org/apache/spark/io/NioBufferedFileInputStream.java --- @@ -0,0 +1,129 @@ +/* + * Licensed under the Apache License,

[GitHub] spark pull request #14788: [SPARK-17174][SQL] Add the support for TimestampT...

2016-10-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/14788#discussion_r82737073 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/functions.scala --- @@ -2374,14 +2374,14 @@ object functions { * @group datetime_funcs *

[GitHub] spark issue #14788: [SPARK-17174][SQL] Add the support for TimestampType for...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14788 **[Test build #66728 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66728/consoleFull)** for PR 14788 at commit

<    1   2   3   4   5   6   7   >