[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20513 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/20513#discussion_r166244254 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -61,6 +61,9 @@ case class InMemoryTableScanExec( }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema) } + // TODO: revisit this. Shall we always turn off whole stage codegen if the output data are rows? + override def supportCodegen: Boolean = supportsBatch --- End diff -- We confirmed that we got performance improvements in several cases. I will revisit this with more cases after 2.3 release. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20513#discussion_r166211814 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -61,6 +61,9 @@ case class InMemoryTableScanExec( }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema) } + // TODO: revisit this. Shall we always turn off whole stage codegen if the output data are rows? + override def supportCodegen: Boolean = supportsBatch --- End diff -- I think this is safe to keep the same behavior of 2.2. I'm not sure if enabling whole stage codegen can hurt performance for scan nodes, btw. We can revisit this, of course. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20513#discussion_r166192327 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -61,6 +61,9 @@ case class InMemoryTableScanExec( }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema) } + // TODO: revisit this. Shall we always turn off whole stage codegen if the output data are rows? + override def supportCodegen: Boolean = supportsBatch --- End diff -- Yeah, we can do more perf measurement after 2.3 release --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20513#discussion_r166184445 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala --- @@ -61,6 +61,9 @@ case class InMemoryTableScanExec( }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema) } + // TODO: revisit this. Shall we always turn off whole stage codegen if the output data are rows? + override def supportCodegen: Boolean = supportsBatch --- End diff -- In 2.4 we should look into this. My gut feeling is we don't need to enable whole stage codegen for scan nodes that output data as rows. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/20513 [SPARK-23312][SQL][followup] add a config to turn off vectorized cache reader ## What changes were proposed in this pull request? https://github.com/apache/spark/pull/20483 tried to provide a way to turn off the new columnar cache reader, to restore the behavior in 2.2. However even we turn off that config, the behavior is still different than 2.2. If the output data are rows, we still enable whole stage codegen for the scan node, which is different with 2.2, we should also fix it. ## How was this patch tested? existing tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/cloud-fan/spark cache Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/20513.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #20513 commit 8525b2c7e540991c75c8d61bfc5a8361cae78c7b Author: Wenchen FanDate: 2018-02-06T04:17:03Z followup --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org