[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143810176 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143813871 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143814023 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143817365 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -646,18 +648,14 @@ class CoarseGrainedSchedulerB

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143817162 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -601,40 +602,41 @@ class CoarseGrainedSchedulerB

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143816092 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143819863 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala --- @@ -237,6 +246,43 @@ class BlockManagerMasterEndpoint(

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143818025 --- Diff: core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala --- @@ -698,6 +696,11 @@ class CoarseGrainedSchedulerBa

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143802830 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143803799 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143799387 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143801381 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143816714 --- Diff: core/src/main/scala/org/apache/spark/internal/config/package.scala --- @@ -76,6 +76,14 @@ package object config { .timeConf(TimeUnit.MILLI

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143815711 --- Diff: core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala --- @@ -416,63 +423,52 @@ private[spark] class ExecutorAllocationManager(

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143799104 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #19041: [SPARK-21097][CORE] Add option to recover cached ...

2017-10-10 Thread vanzin
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/19041#discussion_r143800553 --- Diff: core/src/main/scala/org/apache/spark/CacheRecoveryManager.scala --- @@ -0,0 +1,250 @@ +/* + * Licensed to the Apache Software Foundation (A

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143819790 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark issue #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions counts ...

2017-10-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18966 LGTM pending Jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: revi

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19082 I agree with you. #18810 compares the following two code. 1. Interpreter execution of Java code by whole-stage codegen with passing row data in scalar values 2. JITted execution of Java code by

[GitHub] spark issue #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions counts ...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18966 **[Test build #82598 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82598/testReport)** for PR 18966 at commit [`516a72a`](https://github.com/apache/spark/commit/51

[GitHub] spark issue #19309: [SPARK-19558][sql] Add config key to register QueryExecu...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19309 **[Test build #82597 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82597/testReport)** for PR 19309 at commit [`bcc44e9`](https://github.com/apache/spark/commit/bc

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82599 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82599/testReport)** for PR 18732 at commit [`dc1d406`](https://github.com/apache/spark/commit/dc

[GitHub] spark issue #19309: [SPARK-19558][sql] Add config key to register QueryExecu...

2017-10-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19309 Let me summarize the PR. Please correct me if anything is missing. **Background** Currently, our users can register a listener one by one during the executions by calling the Spa

[GitHub] spark issue #19309: [SPARK-19558][sql] Add config key to register QueryExecu...

2017-10-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19309 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143813642 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed t

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143812619 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +73,18 @@ case class ArrowEvalPythonExec

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143812311 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,35 @@ class RelationalGroupedDataset protected[s

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19082 You know, when we disable the whole-stage codegen, we still do the expression codegen, which byte code size is smaller than 8K at most cases. Thus, it could be faster than whole-stage codegen. Th

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143810948 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143810736 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143810539 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143810355 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143804627 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143803904 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143810078 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143808170 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143809031 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143810175 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143805303 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread gengliangwang
Github user gengliangwang commented on a diff in the pull request: https://github.com/apache/spark/pull/19451#discussion_r143803469 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1242,6 +1243,53 @@ object ReplaceIntersectWithS

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143809711 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-10-10 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r143808338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -89,6 +89,14 @@ object CodeFormatter

[GitHub] spark issue #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions counts ...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18966 **[Test build #82596 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82596/testReport)** for PR 18966 at commit [`4c47802`](https://github.com/apache/spark/commit/4c

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82587/ Test PASSed. ---

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82587 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82587/testReport)** for PR 18732 at commit [`9c2b10e`](https://github.com/apache/spark/commit/9

[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19250 **[Test build #82595 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82595/testReport)** for PR 19250 at commit [`5607160`](https://github.com/apache/spark/commit/56

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19439 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82591/ Test PASSed. ---

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19439 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #82591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82591/testReport)** for PR 19439 at commit [`119bf35`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18732 I had some minor comments on the docs, otherwise LGTM! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143803982 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/FlatMapGroupsInPandasExec.scala --- @@ -0,0 +1,103 @@ +/* + * Licensed

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143802697 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +73,18 @@ case class ArrowEvalPythonExe

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19424 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-10-10 Thread eyalfa
Github user eyalfa commented on the issue: https://github.com/apache/spark/pull/19181 @hvanhovell , thanks :+1: --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: rev

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143802019 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/RelationalGroupedDataset.scala --- @@ -435,6 +435,35 @@ class RelationalGroupedDataset protected[

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19424 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82586/ Test PASSed. ---

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19181 **[Test build #82594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82594/testReport)** for PR 19181 at commit [`6b901ee`](https://github.com/apache/spark/commit/6b

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19424 **[Test build #82586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82586/testReport)** for PR 19424 at commit [`e8e8fee`](https://github.com/apache/spark/commit/e

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82585/ Test PASSed. ---

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-10-10 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19181 I will merge this when it passes tests. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comm

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82585 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82585/testReport)** for PR 18732 at commit [`b88a4d8`](https://github.com/apache/spark/commit/b

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143800589 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143800072 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82584/ Test PASSed. ---

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18732 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143799780 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143799505 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143794449 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143799083 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,133 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread MrBago
Github user MrBago commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143798662 --- Diff: python/pyspark/ml/image.py --- @@ -0,0 +1,133 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82584/testReport)** for PR 18732 at commit [`a036f70`](https://github.com/apache/spark/commit/a

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143799187 --- Diff: python/pyspark/sql/group.py --- @@ -192,7 +193,84 @@ def pivot(self, pivot_col, values=None): jgd = self._jgd.pivot(pivot_col)

[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-10-10 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r143798289 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -769,16 +769,21 @@ class CodegenContext {

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19082 The whole-stage codegen has two advantages according to [this paper](http://www.vldb.org/pvldb/vol4/p539-neumann.pdf). 1. enable compiler optimizations among operations (3. in page 2) 2. pass d

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82589/ Test FAILed. ---

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19269 **[Test build #82589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82589/testReport)** for PR 19269 at commit [`3e855a5`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82588/ Test FAILed. ---

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19269 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional comma

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19269 **[Test build #82588 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82588/testReport)** for PR 19269 at commit [`12bb97f`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #19424: [SPARK-22197][SQL] push down operators to data source be...

2017-10-10 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19424 What are the guarantees made by the previous batches in the optimizer? The work done by `FilterAndProject` seems redundant to me because the optimizer should already push filters below projection. Is

[GitHub] spark issue #19438: [SPARK-22208] [SQL] Improve percentile_approx by not rou...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19438 **[Test build #82593 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82593/testReport)** for PR 19438 at commit [`1180265`](https://github.com/apache/spark/commit/11

[GitHub] spark pull request #19459: [SPARK-20791][PYSPARK] Use Arrow to create Spark ...

2017-10-10 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/19459#discussion_r143788090 --- Diff: python/pyspark/sql/session.py --- @@ -510,9 +511,43 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark issue #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-10 Thread rdblue
Github user rdblue commented on the issue: https://github.com/apache/spark/pull/19269 > There is no restriction to let the output of data writers be visible to other writers, so it's possible to launch a write task just for cleaning up the data of other writers. Agreed. Other

[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19250 **[Test build #82592 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82592/testReport)** for PR 19250 at commit [`5c03e07`](https://github.com/apache/spark/commit/5c

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19439 **[Test build #82591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82591/testReport)** for PR 19439 at commit [`119bf35`](https://github.com/apache/spark/commit/11

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19082 @kiszk Thanks for your summary. I had a few related discussions with @rednaxelafx and @liancheng in the recent weeks. Vertical cuts like https://github.com/apache/spark/pull/19082 is pretty promi

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on the issue: https://github.com/apache/spark/pull/19439 @viirya thank you for the great comments, I've updated the PR. I'm waiting to hear back from @dakirsa on the source of the two BGR and BGRA images. --- -

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143783349 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143782986 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143782832 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143782452 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143782366 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143781910 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark issue #19222: [SPARK-10399][CORE][SQL] Introduce multiple MemoryBlocks...

2017-10-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19222 ping @hvanhovell @tejasapatil --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread imatiach-msft
Github user imatiach-msft commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143779796 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Found

[GitHub] spark issue #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19451 **[Test build #82590 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82590/testReport)** for PR 19451 at commit [`1baecfd`](https://github.com/apache/spark/commit/1

[GitHub] spark issue #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19451 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82590/ Test FAILed. ---

<    1   2   3   4   5   6   >