[GitHub] spark issue #13501: [SPARK-15759] [SQL] Fallback to non-codegen when fail to...

2016-06-03 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13501 cc @rxin @marmbrus @sameeragarwal --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65754375 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -221,7 +221,8 @@ public BytesToBytesMap( SparkEnv.get

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65753944 --- Diff: core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java --- @@ -221,7 +221,8 @@ public BytesToBytesMap( SparkEnv.get

[GitHub] spark pull request #13318: [SPARK-15391] [SQL] manage the temporary memory o...

2016-06-03 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r65753668 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -70,9 +70,14 @@ public int compare(PackedRecordPointer left

[GitHub] spark pull request #13501: [SPARK-15759] [SQL] Fallback to non-codegen when ...

2016-06-03 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13501 [SPARK-15759] [SQL] Fallback to non-codegen when fail to compile generated code ## What changes were proposed in this pull request? In case of any bugs in whole-stage codegen, the

[GitHub] spark pull request #13418: [SPARK-15677][SQL] Query with scalar sub-query in...

2016-06-01 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65441184 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1468,7 +1468,8 @@ object DecimalAggregates extends

[GitHub] spark issue #13443: [SPARK-15671] performance regression CoalesceRDD.pickBin...

2016-06-01 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13443 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #13443: [SPARK-15671] performance regression CoalesceRDD.pickBin...

2016-06-01 Thread davies
Github user davies commented on the issue: https://github.com/apache/spark/pull/13443 LGTM. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

2016-05-31 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13421#discussion_r65285433 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -720,15 +721,23 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-15680][SQL] Disable comments in generated code in...

2016-05-31 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13421#discussion_r65278850 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -720,15 +721,23 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-15557][SQL] cast the string into DoubleType when ...

2016-05-31 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13368 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-15327] [SQL] fix split expression in whole stage ...

2016-05-31 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13235 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-15557][SQL] expressi[on ((cast(99 as decimal) + '...

2016-05-31 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13368 @dilipbiswal Could you update the title of this PR to say what's the actual change in this PR (something like `cast the string into DoubleType when it's used together wi

[GitHub] spark pull request: [SPARK-15557][SQL] expressi[on ((cast(99 as decimal) + '...

2016-05-31 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13368 In MySQL ``` mysql> select cast(99 as decimal(19, 6)) + '3.001'; +--+ | cast(99 as

[GitHub] spark pull request: [SPARK-15677][SQL] Query with scalar sub-query in the SE...

2016-05-31 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13418#discussion_r65273025 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1468,7 +1468,8 @@ object DecimalAggregates extends

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memory of timso...

2016-05-31 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13318 @ericl Comments addressed, could you take another look? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-15438] [SQL] improve explain of whole stage codeg...

2016-05-31 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13204 @yucai It's true that the case you posted is a little confusing, you can see the expected boundary on Spark UI. The other one is too verbose the see the plan (the logical parts), ma

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-27 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64961859 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15441][SQL] support null object in oute...

2016-05-27 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13322#discussion_r64935159 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -170,6 +174,17 @@ object ExpressionEncoder

[GitHub] spark pull request: [SPARK-15441][SQL] support null object in oute...

2016-05-27 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13322#discussion_r64934548 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala --- @@ -763,6 +763,22 @@ class DatasetSuite extends QueryTest with

[GitHub] spark pull request: [SPARK-15140][SPARK-15441][SQL][WIP] support n...

2016-05-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13322#issuecomment-222023561 @cloud-fan Maybe it's not that easy to propogate the special column all the way down, we could just use this trick to fix the outer join issue? --- If your proje

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13155#issuecomment-222022339 @frreiss Thanks for working on this. Had left some comments on how to rewrite the subquery, let me know how do you think, thanks. --- If your project is set up for it

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64836855 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64835290 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64835067 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64834697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64834409 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64834162 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64833603 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64832620 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

2016-05-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13107#issuecomment-221993686 @rxin After fixing those two, we still have some other limits (the number of elements should be less than 512 mm), especially for on-heap mode. There are: 1) the

[GitHub] spark pull request: [SPARK-8428][SPARK-13850] Fix integer overflow...

2016-05-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13336#issuecomment-221991388 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r64805207 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -101,14 +94,17 @@ public void expandPointerArray(LongArray

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r64782151 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -101,14 +94,17 @@ public void expandPointerArray(LongArray

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r64781946 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/UnsafeKVExternalSorter.java --- @@ -86,14 +88,16 @@ public UnsafeKVExternalSorter

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-26 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13318#discussion_r64780973 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleInMemorySorter.java --- @@ -101,14 +94,17 @@ public void expandPointerArray(LongArray

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-26 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13318#issuecomment-221793398 It was 0.70 (corrected), it's 30% lower after this patch. For the simplest aggregate (one integer key and one integer value), the key-value pair need 40 bytes

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-25 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13318#issuecomment-221789570 cc @ericl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-15391] [SQL] manage the temporary memor...

2016-05-25 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13318 [SPARK-15391] [SQL] manage the temporary memory of timsort ## What changes were proposed in this pull request? Currently, the memory for temporary buffer used by TimSort is always

[GitHub] spark pull request: [MINOR][PYSPARK][EXAMPLES] Changed examples to...

2016-05-25 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13303#issuecomment-221713325 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [MINOR][PYSPARK][EXAMPLES] Changed examples to...

2016-05-25 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13303#issuecomment-221711298 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-13850] Force the sorter to Spill when n...

2016-05-24 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13107#issuecomment-221476424 TimSort require a temporary buffer to store the shorter part, which could be half of the size of pointer array in worst case. This depends on the original order of rows

[GitHub] spark pull request: [SPARK-15370] [SQL] Update RewriteCorrelatedSc...

2016-05-24 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13155#discussion_r64491862 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -1695,16 +1695,176 @@ object

[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-24 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13273#issuecomment-221365653 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-15433][PySpark] PySpark core test shoul...

2016-05-24 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13214#issuecomment-221339762 LGTM, Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-24 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13192#issuecomment-221338736 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-24 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r64432741 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -49,6 +49,24 @@ object CodeFormatter

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r64332536 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -49,6 +49,24 @@ object CodeFormatter

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r64332407 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -49,6 +49,24 @@ object CodeFormatter

[GitHub] spark pull request: [SPARK-15498][TESTS] fix slow tests

2016-05-23 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13273#discussion_r64331268 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -27,12 +25,12 @@ import

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13230#issuecomment-220761614 Could you close this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13230#issuecomment-220736515 LGTM, merging into 1.6 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-15327] [SQL] fix split expression in wh...

2016-05-20 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13235 [SPARK-15327] [SQL] fix split expression in whole stage codegen ## What changes were proposed in this pull request? Right now, we will split the code for expressions into multiple functions

[GitHub] spark pull request: [SPARK-14031] [SQL] speedup CSV writer

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13229#issuecomment-220732619 cc @liancheng @falaki --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220731956 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-20 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13230#discussion_r64104163 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala --- @@ -2028,13 +2028,268 @@ class SQLQuerySuite extends QueryTest with

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220709836 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request: [SPARK-14031] [SQL] speedup CSV writer

2016-05-20 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13229 [SPARK-14031] [SQL] speedup CSV writer ## What changes were proposed in this pull request? Currently, we create an CSVWriter for every row, it's very expensive and memory hungry,

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13192#issuecomment-220678036 @dongjoon-hyun Maybe we could have a method Expression.genCodeWithComment() that is used by generated projections and operators, it requires change more places, not

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-20 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r64082929 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -39,6 +40,23 @@ object CodeFormatter

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12979#issuecomment-220677426 Maybe 1.5 and 1.6 could share the same PR (if no much conflicts) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12979#issuecomment-220675619 @sarutak Could you create another PR for 1.6? (If we have not fix the security bug in 1.6) --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12979#issuecomment-220675159 LGTM, Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-20 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220672478 @mengxr Disable this test in master and 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15438] [SQL] improve explain of whole s...

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13204#issuecomment-220527920 @yhuai @marmbrus Updated to #3. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: explain of whole stage codegen

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13204#issuecomment-220505833 @yucai Yes, corrected, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13188#issuecomment-220505375 Could you remove ss_max, otherwise LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63985324 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/benchmark/tpcds/TPCDSQueryBenchmark.scala --- @@ -0,0 +1,132 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63971900 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/TPCDSQueryBenchmark.scala --- @@ -0,0 +1,132

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63971344 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/queries/q31.sql --- @@ -0,0 +1,60 @@ +WITH ss AS +(SELECT

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63970987 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/queries/q78.sql --- @@ -0,0 +1,64 @@ +WITH ws AS +(SELECT

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63970138 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/queries/q16.sql --- @@ -0,0 +1,21 @@ +SELECT count(DISTINCT

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63970061 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/queries/q14a.sql --- @@ -0,0 +1,121 @@ +WITH cross_items AS

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63969828 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/queries/q77.sql --- @@ -0,0 +1,100 @@ +WITH ss AS

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63968968 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/queries/q95.sql --- @@ -0,0 +1,28 @@ +WITH ws_wh AS

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63968907 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/queries/ss_max.sql --- @@ -0,0 +1,14 @@ +SELECT --- End

[GitHub] spark pull request: [SPARK-15078][SQL] Add all TPCDS 1.4 benchmark...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13188#discussion_r63968750 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/tpcds/TPCDSQueryBenchmark.scala --- @@ -0,0 +1,132

[GitHub] spark pull request: explain of whole stage codegen

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13204#issuecomment-220466852 Another proposal is to have a special prefix for the operators that are part of whole stage codegen. ``` >>> df = sqlCtx.range(1000);df2 = sqlCtx.r

[GitHub] spark pull request: explain of whole stage codegen

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13204#issuecomment-220465870 cc @marmbrus @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: explain of whole stage codegen

2016-05-19 Thread davies
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/13204 explain of whole stage codegen ## What changes were proposed in this pull request? Currently, the explain of a query with whole-stage codegen looks like this ``` >&g

[GitHub] spark pull request: [SPARK-15417][SQL][Python] PySpark shell alway...

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13203#issuecomment-220463575 @andrewor14 So the problem is that getOrCreate() is not aware of the configuration (which could be different from the existing one), should we show an warning on that

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63943856 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -706,6 +711,35 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63944461 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -78,8 +77,9 @@ trait CodegenSupport extends SparkPlan

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63944155 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -717,6 +751,20 @@ abstract class

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63943606 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -706,6 +711,35 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63943394 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/GenerateColumnAccessor.scala --- @@ -224,7 +224,9 @@ object GenerateColumnAccessor

[GitHub] spark pull request: [SPARK-15390] fix broadcast with 100 millions ...

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13182#issuecomment-220416710 Merging this into master and 2.0, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63920473 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -706,6 +711,60 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63919915 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -740,6 +813,9 @@ abstract class

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63916719 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -706,6 +711,60 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-4452][SPARK-11293][Core][BRANCH-1.6] Sh...

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13027#issuecomment-220390625 @lianhuiwang This PR is useful, other people could easily patch it by themselves, thanks for it. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63916876 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -706,6 +711,60 @@ class CodegenContext

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63917072 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -740,6 +813,9 @@ abstract class

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/12979#issuecomment-220390039 @sarutak I like this idea, could you simplify it and minimize the changes? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-15165][SPARK-15205][SQL] Introduce plac...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/12979#discussion_r63915953 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeGenerator.scala --- @@ -717,6 +776,20 @@ abstract class

[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...

2016-05-19 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/13189#issuecomment-220385998 @cloud-fan Could you have a screen shot for metrics of BroadcastExchange ? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13189#discussion_r63914891 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/ui/SQLListener.scala --- @@ -360,17 +370,27 @@ private[spark] class SQLHistoryListener(conf

[GitHub] spark pull request: [SPARK-14670][SQL][WIP] allow updating driver ...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13189#discussion_r63914100 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/exchange/BroadcastExchangeExec.scala --- @@ -66,25 +67,25 @@ case class

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r63913511 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateMutableProjection.scala --- @@ -124,6 +124,7 @@ object

[GitHub] spark pull request: [SPARK-13135][SQL] Don't print expressions rec...

2016-05-19 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/13192#discussion_r63913005 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/CodeFormatter.scala --- @@ -24,13 +24,13 @@ package

<    1   2   3   4   5   6   7   8   9   10   >