[GitHub] spark issue #19571: [SPARK-15474][SQL] Write and read back non-emtpy schema ...

2017-10-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19571 @gatorsmile and @cloud-fan . For ORC compatibility, I checked the ORC code, but it's not clearly tested. I'll try to add some suite as a separate issue. ---

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19480 **[Test build #83084 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83084/testReport)** for PR 19480 at commit

[GitHub] spark pull request #19577: [SPARK-22355][SQL] Dataset.collect is not threads...

2017-10-26 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19577#discussion_r147194202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -2661,7 +2657,12 @@ class Dataset[T] private[sql]( */ def

[GitHub] spark issue #19577: [SPARK-22355][SQL] Dataset.collect is not threadsafe

2017-10-26 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19577 Good catch, LGTM. Here is my observation. This problem occurs since (this

[GitHub] spark issue #19578: [SPARK-21983][SQL] Fix Antlr 4.7 deprecation warnings

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19578 **[Test build #83086 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83086/testReport)** for PR 19578 at commit

[GitHub] spark issue #19578: [SPARK-21983][SQL] Fix Antlr 4.7 deprecation warnings

2017-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19578 Jenkins, add to whitelist --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19383: [SPARK-20643][core] Add listener implementation t...

2017-10-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19383 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...

2017-10-26 Thread Tagar
Github user Tagar commented on the issue: https://github.com/apache/spark/pull/19553 @Whoosh can you please check if there is a similar problem that affects

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-26 Thread sathiyapk
Github user sathiyapk commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r147190951 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/ReplaceExceptWithFilter.scala --- @@ -0,0 +1,111 @@ +/* + *

[GitHub] spark issue #19383: [SPARK-20643][core] Add listener implementation to colle...

2017-10-26 Thread squito
Github user squito commented on the issue: https://github.com/apache/spark/pull/19383 merged to master. thanks @vanzin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread susanxhuynh
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147162791 --- Diff: docs/running-on-mesos.md --- @@ -485,39 +485,87 @@ See the [configuration page](configuration.html) for information on Spark config

[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread susanxhuynh
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147165029 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -173,6 +177,88 @@

[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread susanxhuynh
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147165077 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerBackendUtil.scala --- @@ -173,6 +177,88 @@

[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread susanxhuynh
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147161356 --- Diff: docs/running-on-mesos.md --- @@ -485,39 +485,87 @@ See the [configuration page](configuration.html) for information on Spark config

[GitHub] spark pull request #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread susanxhuynh
Github user susanxhuynh commented on a diff in the pull request: https://github.com/apache/spark/pull/19437#discussion_r147159788 --- Diff: docs/running-on-mesos.md --- @@ -485,39 +485,87 @@ See the [configuration page](configuration.html) for information on Spark config

[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19437 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83085/ Test PASSed. ---

[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19437 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19437 **[Test build #83085 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83085/testReport)** for PR 19437 at commit

[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19437 **[Test build #83085 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83085/testReport)** for PR 19437 at commit

[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2017-10-26 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r147179694 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -118,7 +118,8 @@ private [sql] object

[GitHub] spark issue #16479: [SPARK-19085][SQL] cleanup OutputWriterFactory and Outpu...

2017-10-26 Thread lokkju
Github user lokkju commented on the issue: https://github.com/apache/spark/pull/16479 So it turns out just copying the conversion code doesn't work, as seen in spark-avro/#240 - and now I'm running into the same thing writing my own datasource. As an datasource in the end requires

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19570 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19570 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83083/ Test PASSed. ---

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19570 **[Test build #83083 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83083/testReport)** for PR 19570 at commit

[GitHub] spark pull request #19553: [SPARK-22330][CORE] Linear containsKey operation ...

2017-10-26 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19553#discussion_r147161700 --- Diff: core/src/main/scala/org/apache/spark/api/java/JavaUtils.scala --- @@ -43,10 +43,15 @@ private[spark] object JavaUtils { override def

[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19553 **[Test build #3962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3962/testReport)** for PR 19553 at commit

[GitHub] spark issue #19553: [SPARK-22330][CORE] Linear containsKey operation for ser...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19553 **[Test build #3962 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3962/testReport)** for PR 19553 at commit

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

2017-10-26 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19563#discussion_r147145637 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/HashExpressionsSuite.scala --- @@ -639,6 +639,63 @@ class

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19480 **[Test build #83084 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83084/testReport)** for PR 19480 at commit

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19559 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19559 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83081/ Test PASSed. ---

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19559 **[Test build #83081 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83081/testReport)** for PR 19559 at commit

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19563 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83082/ Test PASSed. ---

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19563 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19563 **[Test build #83082 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83082/testReport)** for PR 19563 at commit

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r147126818 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,38 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale)

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r147126889 --- Diff: python/pyspark/sql/dataframe.py --- @@ -1880,11 +1880,13 @@ def toPandas(self): import pandas as pd if

[GitHub] spark pull request #18664: [SPARK-21375][PYSPARK][SQL] Add Date and Timestam...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18664#discussion_r147127081 --- Diff: python/pyspark/sql/types.py --- @@ -1619,11 +1619,38 @@ def to_arrow_type(dt): arrow_type = pa.decimal(dt.precision, dt.scale)

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19565 Yes, it changed the probability of samples indeed compared with current code. But according to the comments coming from @jkbradley in #18924 , "in order to make **corpusSize**, batchSize,

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

2017-10-26 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19563#discussion_r147129483 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -389,9 +389,10 @@ abstract class HashExpression[E]

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19559 **[Test build #83080 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83080/testReport)** for PR 19559 at commit

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread akopich
Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 And the empty docs were not explicitly filtered out. They've just been ignored in `submitMiniBatch`. --- - To unsubscribe,

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread akopich
Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 I'm saying they are not the same, but for larger datasets this should not matter. There is a change in logic. The hack with `val batchSize = (miniBatchFraction *

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19559 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83080/ Test FAILed. ---

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19565 I agree, they're the same. You said at https://github.com/apache/spark/pull/19565#issuecomment-339638791 that they weren't. But if you're saying the code already filters out empty docs

[GitHub] spark pull request #19510: [SPARK-22292][Mesos] Added spark.mem.max support ...

2017-10-26 Thread ArtRand
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19510#discussion_r147121858 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -64,6 +64,7 @@

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread akopich
Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 Consider the following scenario. Let `docs` be an RDD containing 1000 empty documents and 1000 non-empty documents and let `miniBatchFraction = 0.05`. Assume, we use

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19570 **[Test build #83083 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83083/testReport)** for PR 19570 at commit

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19570 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19570: [SPARK-22335][SQL] Clarify union behavior on Dataset of ...

2017-10-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19570 @HyukjinKwon Thanks for reviewing. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark pull request #19551: [SPARK-17902][R] Revive stringsAsFactors option f...

2017-10-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19551 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19551: [SPARK-17902][R] Revive stringsAsFactors option for coll...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19551 Thank you for review @falaki and @felixcheung. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19551: [SPARK-17902][R] Revive stringsAsFactors option for coll...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19551 Merged to master, branch-2.2 and branch-2.1. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

2017-10-26 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19570#discussion_r147119002 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql]( * * Also as

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

2017-10-26 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19570#discussion_r147118974 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql]( * * Also as

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19565 I assume not-selecting a record in a sample is cheaper than just about any other operation, including filtering on a predicate. All else equal, I'd rather sample, then evaluate a predicate on only

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19570#discussion_r147117969 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql]( * * Also

[GitHub] spark pull request #19570: [SPARK-22335][SQL] Clarify union behavior on Data...

2017-10-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19570#discussion_r147117297 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1753,6 +1753,27 @@ class Dataset[T] private[sql]( * * Also

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread akopich
Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 @WeichenXu123, yes there indeed is a difference in logic. Eventually it boils down to semantics of `miniBatchFraction`. If it is a fraction of non-empty documents being sampled, the version with

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19480 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19480 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83079/ Test FAILed. ---

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19480 **[Test build #83079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83079/testReport)** for PR 19480 at commit

[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

2017-10-26 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19573 @DonnyZone Thanks for taking a look. I think not quite the same. After https://github.com/apache/spark/pull/18270, all `grouping__id` are transformed to be `GroupingID` , which makes

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

2017-10-26 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19563#discussion_r147106199 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -389,9 +389,15 @@ abstract class HashExpression[E] extends

[GitHub] spark issue #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit problem i...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19563 **[Test build #83082 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83082/testReport)** for PR 19563 at commit

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19559 **[Test build #83081 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83081/testReport)** for PR 19559 at commit

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-10-26 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19222 @hvanhovell @tejasapatil would it be possible to review this? Or, do you know who is appropriate for reviewing this? --- - To

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-26 Thread thunterdb
Github user thunterdb commented on the issue: https://github.com/apache/spark/pull/19439 @hhbyyh I recall now the reason for an extra `origin` field, which is to get around the standard issue of many small image files in S3 or other distributed file systems. It is standard to compact

[GitHub] spark issue #9282: [SPARK-10986][Mesos] Set the context class loader in the ...

2017-10-26 Thread klion26
Github user klion26 commented on the issue: https://github.com/apache/spark/pull/9282 received ClassNotFound error in Yarn-Cluster mode(spark 1.6.2),_doesn't reproduce the problem_ The error message is such as below: ``` [2017-10-26 16:53:18,274] ERROR Error while

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19565 Filtering after sampling makes more sense. Though sampling isn't deterministic, it doesn't change the probability that any particular sample is produced. ---

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19559 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19559 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83078/ Test FAILed. ---

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19559 **[Test build #83078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83078/testReport)** for PR 19559 at commit

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19559 **[Test build #83080 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83080/testReport)** for PR 19559 at commit

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19565 @akopich IMO the filter won't cost too much, don't worry about the performance. (Or you can make a test to make sure) ---

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread akopich
Github user akopich commented on the issue: https://github.com/apache/spark/pull/19565 I am sure that caching may by avoided here. Hence, it should not be used. @srowen, maybe I don't get something, but I'm afraid, that currently lineage for a single mini-batch submission

[GitHub] spark issue #19480: [SPARK-22226][SQL] splitExpression can create too many m...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19480 **[Test build #83079 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83079/testReport)** for PR 19480 at commit

[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-10-26 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147087721 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -175,17 +176,36 @@ trait

[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-10-26 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147087477 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -451,15 +465,20 @@ trait

[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-10-26 Thread skonto
Github user skonto commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147087405 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -349,13 +349,22

[GitHub] spark issue #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should filter ou...

2017-10-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19565 Regarding caching: I think that can be ignored for purposes of this change. All this does is add a filter, and it doesn't cause an RDD to computed more than it was before. The only question

[GitHub] spark pull request #19563: [SPARK-22284][SQL] Fix 64KB JVM bytecode limit pr...

2017-10-26 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19563#discussion_r147084523 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/hash.scala --- @@ -389,9 +389,15 @@ abstract class HashExpression[E]

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-26 Thread DaimonPl
Github user DaimonPl commented on the issue: https://github.com/apache/spark/pull/16578 @mallman how about finalizing it as is? IMHO performance improvements are worth more than (possibly) redundant workaround - it could be cleaned later ---

[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

2017-10-26 Thread DonnyZone
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/19573 Is it similar to the below issue? https://github.com/apache/spark/pull/19178 --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r147075121 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-10-26 Thread ArtRand
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147069130 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -427,10 +441,10 @@ trait

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread DonnyZone
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/19559 @gatorsmile @gatorsmile There are still two issues need to be figured out. (1)It will be complicated to determine whether a literal function should be resolved as Expression or

[GitHub] spark pull request #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, ...

2017-10-26 Thread DonnyZone
Github user DonnyZone commented on a diff in the pull request: https://github.com/apache/spark/pull/19559#discussion_r147068164 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -783,6 +783,25 @@ class Analyzer( }

[GitHub] spark pull request #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, ...

2017-10-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19559#discussion_r147068227 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -844,7 +863,12 @@ class Analyzer(

[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-10-26 Thread ArtRand
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147067951 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosCoarseGrainedSchedulerBackend.scala --- @@ -349,13 +349,22

[GitHub] spark pull request #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, ...

2017-10-26 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19559#discussion_r147067888 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -783,6 +783,25 @@ class Analyzer( }

[GitHub] spark pull request #19519: [SPARK-21840][core] Add trait that allows conf to...

2017-10-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19519 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-10-26 Thread ArtRand
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147067641 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -451,15 +465,20 @@ trait

[GitHub] spark pull request #19390: [SPARK-18935][MESOS] Fix dynamic reservations on ...

2017-10-26 Thread ArtRand
Github user ArtRand commented on a diff in the pull request: https://github.com/apache/spark/pull/19390#discussion_r147066762 --- Diff: resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala --- @@ -175,17 +176,36 @@ trait

[GitHub] spark issue #19519: [SPARK-21840][core] Add trait that allows conf to be dir...

2017-10-26 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19519 LGTM, merging to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19559 **[Test build #83078 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83078/testReport)** for PR 19559 at commit

[GitHub] spark issue #19559: [SPARK-22333][SQL]timeFunctionCall(CURRENT_DATE, CURRENT...

2017-10-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19559 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19529: [SPARK-22308] Support alternative unit testing st...

2017-10-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19529 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19529: [SPARK-22308] Support alternative unit testing styles in...

2017-10-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19529 Thanks! Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

<    1   2   3   4   >