[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14959 **[Test build #66758 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66758/consoleFull)** for PR 14959 at commit

[GitHub] spark issue #14959: [SPARK-17387][PYSPARK] Creating SparkContext() from pyth...

2016-10-11 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/14959 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark pull request #13194: [SPARK-15402] [ML] [PySpark] PySpark ml.evaluatio...

2016-10-11 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/13194#discussion_r82877590 --- Diff: python/pyspark/ml/evaluation.py --- @@ -311,19 +330,25 @@ def setParams(self, predictionCol="prediction", labelCol="label", if

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82866433 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82874206 --- Diff: sbin/spark-daemon.sh --- @@ -146,13 +176,11 @@ run_command() { case "$mode" in (class) - nohup nice -n

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82874084 --- Diff: sbin/spark-daemon.sh --- @@ -146,13 +176,11 @@ run_command() { case "$mode" in (class) - nohup nice -n

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82875044 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82871978 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #15338: [SPARK-11653][Deploy] Allow spark-daemon.sh to ru...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15338#discussion_r82872476 --- Diff: sbin/spark-daemon.sh --- @@ -122,6 +123,35 @@ if [ "$SPARK_NICENESS" = "" ]; then export SPARK_NICENESS=0 fi

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15307 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66757/ Test FAILed. ---

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66757 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66757/consoleFull)** for PR 15307 at commit

[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-11 Thread ajbozarth
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/15410 Just to raise an idea that would possibly mean less code change, would simply having a flag that causing a "currently processing applications" type message to display without an actual count with

[GitHub] spark issue #15307: [SPARK-17731][SQL][STREAMING] Metrics for structured str...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15307 **[Test build #66757 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66757/consoleFull)** for PR 15307 at commit

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876529 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry:

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876442 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -412,6 +419,63 @@ class UDFRegistration private[sql] (functionRegistry:

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876419 --- Diff: python/pyspark/sql/context.py --- @@ -202,6 +202,26 @@ def registerFunction(self, name, f, returnType=StringType()): """

[GitHub] spark pull request #9766: [SPARK-11775][PYSPARK][SQL] Allow PySpark to regis...

2016-10-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/9766#discussion_r82876509 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/UDFRegistration.scala --- @@ -17,9 +17,15 @@ package org.apache.spark.sql +

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-11 Thread ericl
Github user ericl commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82875191 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,16 @@ case class FileSourceScanExec( }

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15375 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66750/ Test FAILed. ---

[GitHub] spark issue #15425: [SPARK-17816] [Core] [Branch-2.0] Fix ConcurrentModifica...

2016-10-11 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15425 LGTM. Merging to 2.0. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15375 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15375 **[Test build #66750 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66750/consoleFull)** for PR 15375 at commit

[GitHub] spark pull request #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiv...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15431 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15431 I'll go ahead and merge with master. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread brkyvz
Github user brkyvz commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82872469 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -176,7 +184,9 @@ class StreamExecution(

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14690 **[Test build #66751 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66751/consoleFull)** for PR 14690 at commit

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14690 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66751/ Test FAILed. ---

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15408 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66749/ Test FAILed. ---

[GitHub] spark issue #15408: [SPARK-17839][CORE] Use Nio's directbuffer instead of Bu...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15408 **[Test build #66749 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66749/consoleFull)** for PR 15408 at commit

[GitHub] spark pull request #15148: [SPARK-5992][ML] Locality Sensitive Hashing

2016-10-11 Thread Yunni
Github user Yunni commented on a diff in the pull request: https://github.com/apache/spark/pull/15148#discussion_r82871238 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/RandomProjection.scala --- @@ -0,0 +1,146 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82869819 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"longer

[GitHub] spark pull request #15382: [SPARK-17810] [SQL] Default spark.sql.warehouse.d...

2016-10-11 Thread avulanov
Github user avulanov commented on a diff in the pull request: https://github.com/apache/spark/pull/15382#discussion_r82869165 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -757,7 +758,10 @@ private[sql] class SQLConf extends Serializable with

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 It might not be a R specific issue. I am trying to create a test case on Scala side in SQLUtilsSuite.scala. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 I think we should find out the root cause of the negative length of "NA" field. Yesterday, I debugged R side and I have not found out the reason yet. --- If your project is set up for it,

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 New test on Mac: > df <- data.frame(Date = as.Date(c(rep("2016-01-10", 10), "NA", "NA")), id = 1:12) > > dim(createDataFrame(df)) 16/10/11 12:10:30 ERROR Executor:

[GitHub] spark issue #13440: [SPARK-15699] [ML] Implement a Chi-Squared test statisti...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/13440 **[Test build #66756 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66756/consoleFull)** for PR 13440 at commit

[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-11 Thread tgravescs
Github user tgravescs commented on the issue: https://github.com/apache/spark/pull/15410 ah, yeah startup would definitely be a good case for this and like I mentioned its better then nothing so I'm ok with concept. I wonder for the other use case where it hasn't looked in ~

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r82865118 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -125,15 +125,24 @@ private[spark] object SerDe { } def

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 MacBook Pro (Retina, 15-inch, Mid 2015) This is the machine that I test the patch on. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15421 **[Test build #66755 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66755/consoleFull)** for PR 15421 at commit

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 @falaki I saw the exception on Mac too. But I don't find the root cause of negative length in the input stream. Catching the exception will solve the problem. Do you want to explore the reason

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/15421 @wangmiao1981 thanks for testing on Windows. I added a check for this. Would you please try again and let me know? Unfortunately, I don't have access to a windows box for testing. --- If your

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r82864117 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -125,15 +125,24 @@ private[spark] object SerDe { } def

[GitHub] spark pull request #15335: [SPARK-17769][Core][Scheduler]Some FetchFailure r...

2016-10-11 Thread markhamstra
Github user markhamstra commented on a diff in the pull request: https://github.com/apache/spark/pull/15335#discussion_r82864060 --- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala --- @@ -1255,27 +1255,46 @@ class DAGScheduler( s"longer

[GitHub] spark pull request #15421: [SPARK-17811] SparkR cannot parallelize data.fram...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on a diff in the pull request: https://github.com/apache/spark/pull/15421#discussion_r82863921 --- Diff: core/src/main/scala/org/apache/spark/api/r/SerDe.scala --- @@ -125,15 +125,24 @@ private[spark] object SerDe { } def

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 >> Finally, this would require us to read the schema files. That's something I'm trying to avoid in this patch. > Not sure what you mean here, but the parquet change should be execution

[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread jkbradley
Github user jkbradley commented on the issue: https://github.com/apache/spark/pull/15431 LGTM2 Thanks! Is it fine with you if this just gets fixed in master, not branch-2.0 (since the other PR is not in branch-2.0 since it adds a new public API)? --- If your project is set

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread wangmiao1981
Github user wangmiao1981 commented on the issue: https://github.com/apache/spark/pull/15421 @falaki I patched your fix to a clean build. I still see the following error: > df <- data.frame(Date = as.Date(c(rep("2016-01-10", 10), "NA", "NA")), id = 1:12) > >

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11336 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11336 **[Test build #66753 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66753/consoleFull)** for PR 11336 at commit

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/11336 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66753/ Test FAILed. ---

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15421 hmm, still the same error in the new test case in appveyor ``` Failed - 1. Error: SPARK-17811: can create

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread ericl
Github user ericl commented on the issue: https://github.com/apache/spark/pull/14690 > For one thing, a ListingFileCatalog performs a file tree traversal right off the bat. However, the external catalog returns the locations of partitions as part of the listPartitionsByFilter call. I

[GitHub] spark issue #15431: [SPARK-15153] [ML] [SparkR] Fix SparkR spark.naiveBayes ...

2016-10-11 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/15431 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request #15389: [SPARK-17817][PySpark] PySpark RDD Repartitioning...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15389 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15421 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/15421 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66746/ Test PASSed. ---

[GitHub] spark issue #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15295 LGTM except that one comment on naming. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15295#discussion_r82860005 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala --- @@ -132,4 +136,9 @@ class RuntimeConfig private[sql](sqlConf: SQLConf = new

[GitHub] spark pull request #15295: [SPARK-17720][SQL] introduce static SQL conf

2016-10-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/15295#discussion_r82859976 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RuntimeConfig.scala --- @@ -36,6 +37,7 @@ class RuntimeConfig private[sql](sqlConf: SQLConf = new

[GitHub] spark issue #15421: [SPARK-17811] SparkR cannot parallelize data.frame with ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15421 **[Test build #66746 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66746/consoleFull)** for PR 15421 at commit

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14719 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/66747/ Test PASSed. ---

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14719 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #14719: [SPARK-17154][SQL] Wrong result can be returned or Analy...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14719 **[Test build #66747 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66747/consoleFull)** for PR 14719 at commit

[GitHub] spark issue #15432: [SPARK-17854][SQL] rand/randn allows null as input seed

2016-10-11 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/15432 hm - maybe we should just cast any NullType input into some concrete type defined by an ExpectsInputTypes expression? --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #15437: [SPARK-17876] Write StructuredStreaming WAL to a stream ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15437 **[Test build #66754 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66754/consoleFull)** for PR 15437 at commit

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82858522 --- Diff: python/pyspark/sql/streaming.py --- @@ -189,6 +189,282 @@ def resetTerminated(self): self._jsqm.resetTerminated() +class

[GitHub] spark pull request #15437: [SPARK-17876] Write StructuredStreaming WAL to a ...

2016-10-11 Thread brkyvz
GitHub user brkyvz opened a pull request: https://github.com/apache/spark/pull/15437 [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes

[GitHub] spark issue #15375: [SPARK-17790][SPARKR] Support for parallelizing R data.f...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15375 **[Test build #3324 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3324/consoleFull)** for PR 15375 at commit

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread koeninger
Github user koeninger commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82857280 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82856942 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -530,7 +692,7 @@ class StreamExecution( case

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82857011 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -516,12 +563,127 @@ class StreamExecution(

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82856924 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -221,8 +247,15 @@ class StreamExecution(

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82856802 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -105,11 +105,21 @@ class StreamExecution( var

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82856363 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamMetrics.scala --- @@ -0,0 +1,244 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 I believe that using a method like `TableFileCatalog.filterPartitions` to build a new file catalog restricted to some pruned partitions is a sound approach, however I'm starting to reconsider the

[GitHub] spark issue #11336: [SPARK-9325][SPARK-R] collect() head() and show() for Co...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/11336 **[Test build #66753 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66753/consoleFull)** for PR 11336 at commit

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82855608 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -516,12 +563,127 @@ class StreamExecution(

[GitHub] spark pull request #15307: [SPARK-17731][SQL][STREAMING] Metrics for structu...

2016-10-11 Thread tdas
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/15307#discussion_r82855520 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala --- @@ -176,7 +184,9 @@ class StreamExecution( //

[GitHub] spark pull request #15436: [SPARK-17875] [BUILD] Remove unneeded direct depe...

2016-10-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15436#discussion_r82854783 --- Diff: NOTICE --- @@ -162,7 +162,7 @@ Please visit the Netty web site for more information: * http://netty.io/ -Copyright 2011 The

[GitHub] spark pull request #15436: [SPARK-17875] [BUILD] Remove unneeded direct depe...

2016-10-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/15436#discussion_r82854900 --- Diff: dev/deps/spark-deps-hadoop-2.3 --- @@ -130,7 +130,6 @@ metrics-json-3.1.2.jar metrics-jvm-3.1.2.jar minlog-1.3.0.jar mx4j-3.0.2.jar

[GitHub] spark issue #15436: [SPARK-17875] [BUILD] Remove unneeded direct dependence ...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/15436 **[Test build #66752 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66752/consoleFull)** for PR 15436 at commit

[GitHub] spark pull request #15436: [SPARK-17875] [BUILD] Remove unneeded direct depe...

2016-10-11 Thread srowen
GitHub user srowen opened a pull request: https://github.com/apache/spark/pull/15436 [SPARK-17875] [BUILD] Remove unneeded direct dependence on Netty 3.x ## What changes were proposed in this pull request? Remove unneeded direct dependency on Netty 3.x. I left the

[GitHub] spark issue #15190: [SPARK-17620][SQL] Determine Serde by hive.default.filef...

2016-10-11 Thread dilipbiswal
Github user dilipbiswal commented on the issue: https://github.com/apache/spark/pull/15190 @yhuai We will use Parquet format in your example. We look at ```SQL spark.sql.sources.default ``` configuration to decide on the format to use ? Here is the output for your perusal.

[GitHub] spark issue #15410: [SPARK-17843][Web UI] Indicate event logs pending for pr...

2016-10-11 Thread vijoshi
Github user vijoshi commented on the issue: https://github.com/apache/spark/pull/15410 @tgravescs - you're right - for newer logs that are generated, there could be a window of time (10 secs or whatever the user configures) where the new logs are not picked up for replay and the UI

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/15422 > For example, in MR you have the ability to even set the percentage of bad records you want to tolerate (we dont have that in spark). I may be wrong. But in MR, I think bad records just

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14690 **[Test build #66751 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/66751/consoleFull)** for PR 14690 at commit

[GitHub] spark issue #14690: [SPARK-16980][SQL] Load only catalog table partition met...

2016-10-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/14690 Ah cripes. I committed something I didn't want to. I'm rebasing again in a few... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #15435: [SPARK-17139][ML] Add model summary for MultinomialLogis...

2016-10-11 Thread sethah
Github user sethah commented on the issue: https://github.com/apache/spark/pull/15435 I'll try to take a look before too long. For now, I see there are no tests, could you please add tests, using the summary tests for binary classification as a guide? Thanks! --- If your project is

[GitHub] spark pull request #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic dele...

2016-10-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/15384 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #15384: [SPARK-17346][SQL][Tests]Fix the flaky topic deletion in...

2016-10-11 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/15384 Merging it to master, and branch 2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #14690: [SPARK-16980][SQL] Load only catalog table partit...

2016-10-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/14690#discussion_r82850487 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -225,13 +225,16 @@ case class FileSourceScanExec( }

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/15422 @mridulm for the scenario you're imagining, maybe the data is OK, sure. That doesn't mean it's true in all cases. Yeah, this is really to work around bad input, which you can to some degree do at

[GitHub] spark issue #13194: [SPARK-15402] [ML] [PySpark] PySpark ml.evaluation shoul...

2016-10-11 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/13194 Just one question, otherwise LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #13194: [SPARK-15402] [ML] [PySpark] PySpark ml.evaluatio...

2016-10-11 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/13194#discussion_r82849461 --- Diff: python/pyspark/ml/evaluation.py --- @@ -21,7 +21,8 @@ from pyspark.ml.wrapper import JavaParams from pyspark.ml.param import Param,

[GitHub] spark pull request #15398: [SPARK-17647][SQL] Fix backslash escaping in 'LIK...

2016-10-11 Thread jodersky
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/15398#discussion_r82849408 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/StringUtils.scala --- @@ -25,26 +25,25 @@ object StringUtils { //

[GitHub] spark issue #15422: [SPARK-17850][Core]HadoopRDD should not catch EOFExcepti...

2016-10-11 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/15422 @marmbrus +1 on logging, that is definitely something which was probably missed here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

<    1   2   3   4   5   6   7   >