[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12819 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65591/ Test FAILed. ---

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12819 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12819 **[Test build #65591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65591/consoleFull)** for PR 12819 at commit

[GitHub] spark issue #15149: [SPARK-17057] [ML] ProbabilisticClassifierModels' thresh...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15149 **[Test build #65593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65593/consoleFull)** for PR 15149 at commit

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12819 **[Test build #65591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65591/consoleFull)** for PR 12819 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)** for PR 14731 at commit

[GitHub] spark issue #14650: [SPARK-17062][MESOS] add conf option to mesos dispatcher

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14650 **[Test build #65594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65594/consoleFull)** for PR 14650 at commit

[GitHub] spark pull request #15145: [SPARK-17589] [TEST] [2.0] Fix test case `create ...

2016-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15145#discussion_r79375663 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -509,7 +509,7 @@ class MetastoreDataSourcesSuite

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14731 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65592/ Test PASSed. ---

[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15051 **[Test build #65596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65596/consoleFull)** for PR 15051 at commit

[GitHub] spark issue #15149: [SPARK-17057] [ML] ProbabilisticClassifierModels' thresh...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15149 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65593/ Test FAILed. ---

[GitHub] spark issue #15149: [SPARK-17057] [ML] ProbabilisticClassifierModels' thresh...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15149 **[Test build #65593 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65593/consoleFull)** for PR 15149 at commit

[GitHub] spark issue #15149: [SPARK-17057] [ML] ProbabilisticClassifierModels' thresh...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15149 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-19 Thread koeninger
Github user koeninger commented on the issue: https://github.com/apache/spark/pull/15102 I'm not concerned about people deleting partitions before messages have been processed, because they can take care of that problem themselves, by not deleting things until consuming has

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14803 **[Test build #65595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65595/consoleFull)** for PR 14803 at commit

[GitHub] spark issue #14731: [SPARK-17159] [streaming]: optimise check for new files ...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14731 **[Test build #65592 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65592/consoleFull)** for PR 14731 at commit

[GitHub] spark pull request #14467: [SPARK-16861][PYSPARK][CORE] Refactor PySpark acc...

2016-09-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14467#discussion_r79369158 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -866,11 +866,14 @@ class BytesToString extends

[GitHub] spark issue #15051: [SPARK-17499][SparkR][ML][MLLib] make the default params...

2016-09-19 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15051 I think that's ok. We have similar restrictions in our cases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #14650: [SPARK-17062][MESOS] add conf option to mesos dis...

2016-09-19 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/14650#discussion_r79376179 --- Diff: mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcherArguments.scala --- @@ -18,23 +18,43 @@ package

[GitHub] spark pull request #14650: [SPARK-17062][MESOS] add conf option to mesos dis...

2016-09-19 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/14650#discussion_r79376168 --- Diff: mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcherArguments.scala --- @@ -73,37 +94,55 @@ private[mesos] class

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-19 Thread phalodi
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/15134 @jerryshao ok i will change it as you suggested but then i think if UUID is not a good name then we should also change it for spark session you see below spark session generate UUID while not

[GitHub] spark issue #15140: [SPARK-17585][PySpark][Core] PySpark SparkContext.addFil...

2016-09-19 Thread yanboliang
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/15140 cc @rxin @davies @srowen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #15148: Spark 5992 yunn lsh

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79338430 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark issue #15131: [SPARK-17577][SparkR][Core] SparkR support add files to ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15131 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-19 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79335325 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsColumnSuite.scala --- @@ -0,0 +1,228 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #15053: [Doc] improve python API docstrings

2016-09-19 Thread holdenk
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/15053 Oh yes I agree that's a much smaller change - I was just explaining the motivation behind my initial comment as mortada asked me to elaborate. --- If your project is set up for it, you can reply

[GitHub] spark issue #15148: Spark 5992 yunn lsh

2016-09-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15148 @Yunni Please use a proper title as "[SPARK-5992][ML] ...". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request #15148: Spark 5992 yunn lsh

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79339839 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15146: [SPARK-17590][SQL] Analyze CTE definitions at onc...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15146#discussion_r79340534 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -262,7 +262,7 @@ ctes ; namedQuery -

[GitHub] spark pull request #15054: [SPARK-17502] [SQL] Fix Multiple Bugs in DDL Stat...

2016-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15054#discussion_r79340506 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/SessionCatalog.scala --- @@ -246,30 +246,42 @@ class SessionCatalog( }

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-19 Thread phalodi
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/15134 @sadikovi Yes it will, like if we will not define app name in SparkConf object or spark-submit then it will generate exception , In this PR random UUID is generated for app name if its not

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79331978 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala --- @@ -101,4 +101,47 @@ class StatisticsSuite extends QueryTest with

[GitHub] spark issue #15148: Spark 5992 yunn lsh

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15148 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #15024: [SPARK-17470][SQL] unify path for data source table and ...

2016-09-19 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15024 In [DataSource.scala](https://github.com/cloud-fan/spark/blob/9ab4b8ce3dd7c41edb0681ff903d218bad2e4225/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/DataSource.scala),

[GitHub] spark issue #15131: [SPARK-17577][SparkR][Core] SparkR support add files to ...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15131 **[Test build #65590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65590/consoleFull)** for PR 15131 at commit

[GitHub] spark issue #15131: [SPARK-17577][SparkR][Core] SparkR support add files to ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15131 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65590/ Test PASSed. ---

[GitHub] spark issue #15148: Spark 5992 yunn lsh

2016-09-19 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15148 Fix the title please? https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #15148: Spark 5992 yunn lsh

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r79337558 --- Diff: mllib/src/main/scala/org/apache/spark/ml/lsh/LSH.scala --- @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under

[GitHub] spark pull request #15146: [SPARK-17590][SQL] Analyze CTE definitions at onc...

2016-09-19 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/15146#discussion_r79340032 --- Diff: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 --- @@ -262,7 +262,7 @@ ctes ; namedQuery -

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-19 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79330943 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala --- @@ -0,0 +1,209 @@ +/* + * Licensed to the

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79331719 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala --- @@ -101,4 +101,47 @@ class StatisticsSuite extends QueryTest with

[GitHub] spark issue #15024: [SPARK-17470][SQL] unify path for data source table and ...

2016-09-19 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15024 In the

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-19 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15134 @phalodi we don't restrict user to have to set an app name either for SparkContext or SparkSession. You could refer to this code in SparkSubmit: ``` // Set name from main class if

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-19 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79331568 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala --- @@ -101,4 +101,47 @@ class StatisticsSuite extends QueryTest with

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79332105 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala --- @@ -101,4 +101,47 @@ class StatisticsSuite extends QueryTest with

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-19 Thread phalodi
Github user phalodi commented on the issue: https://github.com/apache/spark/pull/15134 @jerryshao yeah you are right jerry for spark-submit and launcher its works, but for many time user also have usecase to just locally using spark like reading file with spark more fast even we run

[GitHub] spark issue #15134: [SPARK-17580][CORE]Add random UUID as app name while app...

2016-09-19 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/15134 My concern is that previously Spark will throw an exception if app name is not set, while in 2.0 we bring in SparkSession which breaks the convention, so do we need to let SparkSession to be

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15090#discussion_r79331169 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsSuite.scala --- @@ -101,4 +101,47 @@ class StatisticsSuite extends QueryTest with

[GitHub] spark pull request #15148: Spark 5992 yunn lsh

2016-09-19 Thread Yunni
GitHub user Yunni opened a pull request: https://github.com/apache/spark/pull/15148 Spark 5992 yunn lsh ## What changes were proposed in this pull request? Implement Locality Sensitive Hashing along with approximate nearest neighbors and approximate similarity join based

[GitHub] spark pull request #15150: [SPARK-17595] [MLLib] Use a bounded priority queu...

2016-09-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15150#discussion_r79412313 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -580,7 +581,15 @@ class Word2VecModel private[spark] ( ind +=

[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15041#discussion_r79416925 --- Diff: core/src/main/scala/org/apache/spark/util/collection/Utils.scala --- @@ -30,10 +34,22 @@ private[spark] object Utils { * Returns the first

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/1 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65600/ Test FAILed. ---

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/12819 @yanboliang What went into the decision to use RDD based aggregation? Just curious, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79415724 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging { } }

[GitHub] spark issue #15150: [SPARK-17595] [MLLib] Use a bounded priority queue to fi...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15150 **[Test build #65601 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65601/consoleFull)** for PR 15150 at commit

[GitHub] spark issue #15122: [SPARK-17569] Make StructuredStreaming FileStreamSource ...

2016-09-19 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/15122 @yhuai The suggestions are for purely testing purposes, to make sure that StructuredStreaming doesn't check for file existence twice. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #14784: [SPARK-17210][SPARKR] sparkr.zip is not distribut...

2016-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14784#discussion_r79363809 --- Diff: R/pkg/R/sparkR.R --- @@ -369,6 +372,24 @@ sparkR.session <- function( overrideEnvs(sparkConfigMap, paramMap) } --- End diff

[GitHub] spark pull request #14784: [SPARK-17210][SPARKR] sparkr.zip is not distribut...

2016-09-19 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/14784#discussion_r79363920 --- Diff: R/pkg/R/sparkR.R --- @@ -369,6 +372,24 @@ sparkR.session <- function( overrideEnvs(sparkConfigMap, paramMap) } + #

[GitHub] spark pull request #14650: [SPARK-17062][MESOS] add conf option to mesos dis...

2016-09-19 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/14650#discussion_r79376216 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -40,12 +40,12 @@ import org.apache.ivy.plugins.matcher.GlobPatternMatcher

[GitHub] spark pull request #15150: [SPARK-17595] [MLLib] Use a bounded priority queu...

2016-09-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15150#discussion_r79412206 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/Word2Vec.scala --- @@ -580,7 +581,15 @@ class Word2VecModel private[spark] ( ind +=

[GitHub] spark pull request #15149: [SPARK-17057] [ML] ProbabilisticClassifierModels'...

2016-09-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15149#discussion_r79412702 --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala --- @@ -56,6 +56,21 @@ class

[GitHub] spark pull request #14650: [SPARK-17062][MESOS] add conf option to mesos dis...

2016-09-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14650#discussion_r79417055 --- Diff: mesos/src/main/scala/org/apache/spark/deploy/mesos/MesosClusterDispatcherArguments.scala --- @@ -18,23 +18,43 @@ package

[GitHub] spark pull request #14650: [SPARK-17062][MESOS] add conf option to mesos dis...

2016-09-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/14650#discussion_r79417214 --- Diff: core/src/main/scala/org/apache/spark/util/Executable.scala --- @@ -0,0 +1,25 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79419642 --- Diff: core/src/main/scala/org/apache/spark/scheduler/Task.scala --- @@ -54,7 +54,10 @@ private[spark] abstract class Task[T]( val partitionId:

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/1 **[Test build #65600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65600/consoleFull)** for PR 1 at commit

[GitHub] spark pull request #15145: [SPARK-17589] [TEST] [2.0] Fix test case `create ...

2016-09-19 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/15145#discussion_r79413782 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/MetastoreDataSourcesSuite.scala --- @@ -509,7 +509,7 @@ class MetastoreDataSourcesSuite

[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15041#discussion_r79413763 --- Diff: core/src/main/scala/org/apache/spark/util/collection/Utils.scala --- @@ -30,10 +34,22 @@ private[spark] object Utils { * Returns the first

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65599/ Test PASSed. ---

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12819 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-19 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15041#discussion_r79419305 --- Diff: core/src/main/scala/org/apache/spark/util/collection/Utils.scala --- @@ -30,10 +34,22 @@ private[spark] object Utils { * Returns the first

[GitHub] spark issue #14827: [SPARK-17259] [build] Hadoop 2.7 profile to depend on Ha...

2016-09-19 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14827 Go ahead and close this one but I think you deserve 'credit' for the JIRA change, if that makes any difference. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15115: [SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7....

2016-09-19 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15115 The only difference was that you're also making 2.7 the default, which isn't bad or anything. Otherwise i think it was just an oversight. There's way too much traffic to keep track of unfortunately,

[GitHub] spark pull request #15041: [SPARK-17488][CORE] TakeAndOrder will OOM when th...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15041#discussion_r79416074 --- Diff: core/src/main/scala/org/apache/spark/util/collection/Utils.scala --- @@ -30,10 +34,22 @@ private[spark] object Utils { * Returns the first

[GitHub] spark issue #15145: [SPARK-17589] [TEST] [2.0] Fix test case `create externa...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15145 **[Test build #65602 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65602/consoleFull)** for PR 15145 at commit

[GitHub] spark issue #12819: [SPARK-14077][ML] Refactor NaiveBayes to support weighte...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12819 **[Test build #65599 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65599/consoleFull)** for PR 12819 at commit

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79419042 --- Diff: core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala --- @@ -42,7 +42,10 @@ import org.apache.spark.rdd.RDD *

[GitHub] spark pull request #14659: [SPARK-16757] Set up Spark caller context to HDFS

2016-09-19 Thread tgravescs
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/14659#discussion_r79421792 --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala --- @@ -2420,6 +2420,44 @@ private[spark] object Utils extends Logging { } }

[GitHub] spark issue #15150: [SPARK-17595] [MLLib] Use a bounded priority queue to fi...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15150 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15150: [SPARK-17595] [MLLib] Use a bounded priority queue to fi...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15150 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65601/ Test PASSed. ---

[GitHub] spark issue #15122: [SPARK-17569] Make StructuredStreaming FileStreamSource ...

2016-09-19 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/15122 @petermaxlee Thank you for the suggestions for testing. I will try out Option 1, since 2 is a bit much work for a minor PR as this. --- If your project is set up for it, you can reply to this email

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14803 **[Test build #65623 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65623/consoleFull)** for PR 14803 at commit

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14803 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14803: [SPARK-17153][SQL] Should read partition data when readi...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14803 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65623/ Test PASSed. ---

[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-19 Thread dbtsai
Github user dbtsai commented on the issue: https://github.com/apache/spark/pull/14834 Merged into master. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15147: [SPARK-17545] [SQL] Handle additional time offset format...

2016-09-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/15147 @nbeyer Thanks for your investigation. I think that sounds reasonable though I think it might be arguable because adding more cases virtually means more time and computation to parse/infer

[GitHub] spark issue #15126: [SPARK-17513][SQL] Make StreamExecution garbage-collect ...

2016-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15126 Since @frreiss hasn't updated the pr yet, I'm going to merge this one and assign the jira ticket to Fred. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #15126: [SPARK-17513][SQL] Make StreamExecution garbage-collect ...

2016-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15126 Merging in master/2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #15067: [SPARK-17513] [STREAMING] [SQL] Make StreamExecution gar...

2016-09-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15067 @frreiss can you close this now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-09-19 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15148 A few high-level comments/questions: * Should this go into the `feature` package as a feature estimator/transformer? That is where other dimensionality reduction techniques have gone and

[GitHub] spark pull request #14803: [SPARK-17153][SQL] Should read partition data whe...

2016-09-19 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/14803#discussion_r79526950 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSourceSuite.scala --- @@ -608,6 +608,34 @@ class FileStreamSourceSuite extends

[GitHub] spark issue #15146: [SPARK-17590][SQL] Analyze CTE definitions at once and a...

2016-09-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15146 @hvanhovell This is for analyzer change and adds CTE in CTE feature. I don't expect there is performance improvement. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13513 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65628/ Test FAILed. ---

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13513 **[Test build #65628 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65628/consoleFull)** for PR 13513 at commit

[GitHub] spark issue #15146: [SPARK-17590][SQL] Analyze CTE definitions at once and a...

2016-09-19 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/15146 I guess if using the same analyzed plan increases the chance to reuse exchange, then it may improve the performance. Anyway, it is not the purpose of this change. Because the analyzed subquery plan

[GitHub] spark issue #13513: [SPARK-15698][SQL][Streaming] Add the ability to remove ...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/13513 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14784: [SPARK-17210][SPARKR] sparkr.zip is not distributed to e...

2016-09-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14784 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15146: [SPARK-17590][SQL] Analyze CTE definitions at once and a...

2016-09-19 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15146 **[Test build #65630 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65630/consoleFull)** for PR 15146 at commit

[GitHub] spark pull request #15158: [SPARK-17603] [SQL] Utilize Hive-generated Statis...

2016-09-19 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/15158 [SPARK-17603] [SQL] Utilize Hive-generated Statistics For Partitioned Tables ### What changes were proposed in this pull request? For non-partitioned tables, Hive-generated statistics are

  1   2   3   4   5   >