[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19916 @cloud-fan I think a lot of caller-side changes would be needed as well, since now we are not passing any variable name when we are referencing with `addReferenceMinorObj`. I am not sure that

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156077696 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark issue #19928: [SPARK-22267][SQL][TEST] Spark SQL incorrectly reads ORC...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19928 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156077470 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156077509 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156077428 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

2017-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19792 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84708/ Test PASSed. ---

[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

2017-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19792 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19792 **[Test build #84708 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84708/testReport)** for PR 19792 at commit

[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19916 adding a local variable to generate more readable code is better, but that needs a lot of caller-side change, which may not worth. ---

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156074485 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -105,6 +105,11 @@ abstract class Expression extends

[GitHub] spark pull request #19940: [SPARK-22750][SQL] Reuse mutable states when poss...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19940#discussion_r156073714 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -184,6 +190,27 @@ class CodegenContext {

[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19940 Oh, the initialization is not right away in declaration. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19940 Shall we make them as `final` variables to guarantee this? I think this is an important requirement to prevent something wrong when wrongly using the shared global variables. ---

[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

2017-12-11 Thread gberger
Github user gberger commented on the issue: https://github.com/apache/spark/pull/19792 Good catch @HyukjinKwon! I reverted those changes and added a test to cover this regression. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19940 @viirya this is the requirement I followed in this change which ensures that is it safe to share the variable across all the operators, since all the access are read only and there cannot be

[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19940 @mgaido91 Do you mean the shared global variables are required to be only assigned once (initialization) and never changed again? ---

[GitHub] spark issue #19792: [SPARK-22566][PYTHON] Better error message for `_merge_t...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19792 **[Test build #84708 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84708/testReport)** for PR 19792 at commit

[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19942#discussion_r156069510 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -291,6 +291,18 @@ private[deploy] class SparkSubmitArguments(args:

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156067549 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156066515 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156066138 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156065731 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark issue #19942: [SPARK-22754][DEPLOY] Check whether spark.executor.heart...

2017-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19942 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156064910 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark pull request #19942: [SPARK-22754][DEPLOY] Check whether spark.executo...

2017-12-11 Thread caneGuy
GitHub user caneGuy opened a pull request: https://github.com/apache/spark/pull/19942 [SPARK-22754][DEPLOY] Check whether spark.executor.heartbeatInterval bigger… … than spark.network.timeout or not ## What changes were proposed in this pull request? If

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156062977 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/ExpressionCodegen.scala --- @@ -0,0 +1,246 @@ +/* + *

[GitHub] spark pull request #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluato...

2017-12-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19676 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-11 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19893 The `TestHiveContext` is shared among test suites, maybe it's not a good time to change this for now. Could we create a new class that inherit from `SparkFunSuite` that examines leaking

[GitHub] spark issue #19676: [SPARK-14516][FOLLOWUP] Adding ClusteringEvaluator to ex...

2017-12-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19676 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156060138 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer(

[GitHub] spark issue #19919: [SPARK-22727] spark.executor.instances's default value s...

2017-12-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19919 If the default is correctly described as 2, then I think there is nothing to do here and this should be closed. This change will cause other problems, at least. ---

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156059425 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer(

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156059043 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer(

[GitHub] spark issue #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAndRead

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19941 **[Test build #84707 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84707/testReport)** for PR 19941 at commit

[GitHub] spark pull request #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when w...

2017-12-11 Thread hvanhovell
Github user hvanhovell commented on a diff in the pull request: https://github.com/apache/spark/pull/19193#discussion_r156058341 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -1920,7 +1927,34 @@ class Analyzer(

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156057964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -105,6 +105,11 @@ abstract class Expression

[GitHub] spark pull request #19940: [SPARK-22750][SQL] Reuse mutable states when poss...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19940#discussion_r156057818 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -184,6 +190,27 @@ class CodegenContext

[GitHub] spark pull request #19941: [SPARK-22753][SQL] Get rid of dataSource.writeAnd...

2017-12-11 Thread xuanyuanking
GitHub user xuanyuanking opened a pull request: https://github.com/apache/spark/pull/19941 [SPARK-22753][SQL] Get rid of dataSource.writeAndRead ## What changes were proposed in this pull request? As the discussion in https://github.com/apache/spark/pull/16481 and

[GitHub] spark issue #19805: [SPARK-22649][PYTHON][SQL] Adding localCheckpoint to Dat...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19805 **[Test build #84706 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84706/testReport)** for PR 19805 at commit

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156057427 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1001,16 +1039,25 @@ class

[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19916 @cloud-fan thanks for your answer. I don't think that something like: ``` ((MyClass) reference[3] /* variableName */).doSomething(); ``` would be readable. But it is just my opinion.

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156056362 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala --- @@ -105,6 +105,11 @@ abstract class Expression

[GitHub] spark pull request #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nest...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19813#discussion_r156055243 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -55,8 +55,42 @@ import

[GitHub] spark pull request #19940: [SPARK-22750][SQL] Reuse mutable states when poss...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19940#discussion_r156054632 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -184,6 +190,27 @@ class

[GitHub] spark issue #19916: [SPARK-22716][SQL] Replace addReferenceObj to reduce the...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19916 In a real application I think the difference is much smaller than 2.4%. So I think it's ok to remove `addReferenceObj`. One problem is readability of the generated code, may be we can

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19937 **[Test build #84705 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84705/testReport)** for PR 19937 at commit

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14151 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84704/ Test FAILed. ---

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14151 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14151 **[Test build #84704 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84704/testReport)** for PR 14151 at commit

[GitHub] spark issue #19193: [WIP][SPARK-21896][SQL] Fix Stack Overflow when window f...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19193 cc @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19940: [SPARK-22750][SQL] Reuse mutable states when poss...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19940#discussion_r156052035 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -184,6 +190,27 @@ class CodegenContext

[GitHub] spark pull request #19940: [SPARK-22750][SQL] Reuse mutable states when poss...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19940#discussion_r156051417 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -184,6 +190,27 @@ class

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19937 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19937 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84703/ Test FAILed. ---

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19937 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19937 **[Test build #84703 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84703/testReport)** for PR 19937 at commit

[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19940 @viirya we have seen that using arrays affects performance. Thus if we can reduce their usage it is better. I don't think that debugging is harder. These variables I made shared are never

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r156037616 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -48,9 +48,26 @@ object ExtractPythonUDFFromAggregate

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r156028776 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala --- @@ -32,7 +31,5 @@ case class PythonUDF(

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r156034167 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/AggregateInPandasExec.scala --- @@ -0,0 +1,143 @@ +/* + * Licensed to the

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r156037157 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ExtractPythonUDFs.scala --- @@ -48,9 +48,26 @@ object ExtractPythonUDFFromAggregate

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r156031375 --- Diff: python/pyspark/sql/tests.py --- @@ -4016,6 +4016,124 @@ def test_unsupported_types(self): with self.assertRaisesRegexp(Exception,

[GitHub] spark pull request #19872: WIP: [SPARK-22274][PySpark] User-defined aggregat...

2017-12-11 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/19872#discussion_r156038036 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala --- @@ -15,10 +15,9 @@ * limitations under the

[GitHub] spark issue #19813: [SPARK-22600][SQL] Fix 64kb limit for deeply nested expr...

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19813 ping @cloud-fan Previous comments are all addressed. Please review this again. Thanks. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19940 A high level question is, do we need to share mutable status, if we can compact global variables into array later? Will sharing mutable status increase the difficulty of debugging codegen in

[GitHub] spark issue #19929: [SPARK-22629][PYTHON] Add deterministic flag to pyspark ...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19929 @gatorsmile sorry, I saw that you did the path for scala UDF. Might you help reviewing this please? Thanks. --- - To

[GitHub] spark issue #19940: [SPARK-22750][SQL] Reuse mutable states when possible

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19940 @cloud-fan @kiszk @viirya might you please help reviewing this? Thanks. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19811 @kiszk I see, thanks. Wouldn't be better anyway to store all the initialization in the same array so that we ensure that the ordering is the same as before this patch? ---

[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19938 Good point. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19893: [SPARK-16139][TEST] Add logging functionality for leaked...

2017-12-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/19893 I've analysed the hive related test flow and found SparkSession and SQLContext sharing between suites as you mentioned. Here is the execution flow: 1. The first hive test suite

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-11 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19811 Good question, got it. There are some cases that we have to ensure order of declarations. The current code asks developers to ensure it by using `inline = true`. We can see [an

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19811 @kiszk yes, I know, but I meant a different thing. I will try to explain with an example, let me know if I am not clear enough. Since we are not defining everything as an array and we are

[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14151 **[Test build #84704 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84704/testReport)** for PR 14151 at commit

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-11 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19811 @mgaido91 The latest code does not generate a loop for initializations. This is an optimization to reduce the JVM bytecode size. As @cloud-fan pointed out

[GitHub] spark issue #19911: [SPARK-22729][SQL] Add getTruncateQuery to JdbcDialect

2017-12-11 Thread danielvdende
Github user danielvdende commented on the issue: https://github.com/apache/spark/pull/19911 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19715: [SPARK-22397][ML]add multiple columns support to ...

2017-12-11 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19715#discussion_r156010806 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/QuantileDiscretizer.scala --- @@ -129,34 +156,102 @@ final class QuantileDiscretizer

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19937 LGTM too --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19811: [SPARK-18016][SQL] Code Generation: Constant Pool Limit ...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19811 hi @kiszk, have you already checked why there are python failures? I wanted to share a concern I have with the current implementation. Maybe, it is not a problem, but I think that currently we are

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19937 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156010028 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec(

[GitHub] spark issue #19937: [SPARK-22746][SQL] Avoid the generation of useless mutab...

2017-12-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19937 **[Test build #84703 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84703/testReport)** for PR 19937 at commit

[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156007853 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec(

[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156005276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec(

[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156005086 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -617,6 +619,7 @@ case class SortMergeJoinExec(

[GitHub] spark pull request #19937: [SPARK-22746][SQL] Avoid the generation of useles...

2017-12-11 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19937#discussion_r156004320 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/SortMergeJoinExec.scala --- @@ -513,26 +513,28 @@ case class SortMergeJoinExec(

[GitHub] spark pull request #19811: [SPARK-18016][SQL] Code Generation: Constant Pool...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19811#discussion_r156003506 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -177,11 +189,64 @@ class

[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

2017-12-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r156002166 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -805,26 +908,44 @@ case class

[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

2017-12-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r15593 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -1070,6 +1071,24 @@ class

[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

2017-12-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r156003342 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -256,6 +258,85 @@ case class

[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

2017-12-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r156000426 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/CodeGenerationSuite.scala --- @@ -380,4 +380,19 @@ class

[GitHub] spark pull request #19082: [SPARK-21870][SQL] Split aggregation code into sm...

2017-12-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19082#discussion_r156002134 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/HashAggregateExec.scala --- @@ -863,25 +984,43 @@ case class

[GitHub] spark issue #19938: [SPARK-22747][SQL] Localize lifetime of mutable states i...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19938 actually there is one real problem: after we fold many global variables into an array, the variable name may become something like `arr[1]`, which can't be used as the parameter name.

[GitHub] spark issue #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH TIME ZON...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19939 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19939: [SPARK-20557] [SQL] Only support TIMESTAMP WITH T...

2017-12-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19939#discussion_r156002148 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -235,6 +239,61 @@ class

<    1   2   3   4   5