[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/20513


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-06 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/20513#discussion_r166244254
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -61,6 +61,9 @@ case class InMemoryTableScanExec(
 }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema)
   }
 
+  // TODO: revisit this. Shall we always turn off whole stage codegen if 
the output data are rows?
+  override def supportCodegen: Boolean = supportsBatch
--- End diff --

We confirmed that we got performance improvements in several cases.
I will revisit this with more cases after 2.3 release.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-06 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/20513#discussion_r166211814
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -61,6 +61,9 @@ case class InMemoryTableScanExec(
 }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema)
   }
 
+  // TODO: revisit this. Shall we always turn off whole stage codegen if 
the output data are rows?
+  override def supportCodegen: Boolean = supportsBatch
--- End diff --

I think this is safe to keep the same behavior of 2.2.

I'm not sure if enabling whole stage codegen can hurt performance for scan 
nodes, btw. We can revisit this, of course.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-05 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/20513#discussion_r166192327
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -61,6 +61,9 @@ case class InMemoryTableScanExec(
 }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema)
   }
 
+  // TODO: revisit this. Shall we always turn off whole stage codegen if 
the output data are rows?
+  override def supportCodegen: Boolean = supportsBatch
--- End diff --

Yeah, we can do more perf measurement after 2.3 release


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-05 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/20513#discussion_r166184445
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryTableScanExec.scala
 ---
@@ -61,6 +61,9 @@ case class InMemoryTableScanExec(
 }) && !WholeStageCodegenExec.isTooManyFields(conf, relation.schema)
   }
 
+  // TODO: revisit this. Shall we always turn off whole stage codegen if 
the output data are rows?
+  override def supportCodegen: Boolean = supportsBatch
--- End diff --

In 2.4 we should look into this. My gut feeling is we don't need to enable 
whole stage codegen for scan nodes that output data as rows.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #20513: [SPARK-23312][SQL][followup] add a config to turn...

2018-02-05 Thread cloud-fan
GitHub user cloud-fan opened a pull request:

https://github.com/apache/spark/pull/20513

[SPARK-23312][SQL][followup] add a config to turn off vectorized cache 
reader

## What changes were proposed in this pull request?

https://github.com/apache/spark/pull/20483 tried to provide a way to turn 
off the new columnar cache reader, to restore the behavior in 2.2. However even 
we turn off that config, the behavior is still different than 2.2.

If the output data are rows, we still enable whole stage codegen for the 
scan node, which is different with 2.2, we should also fix it.

## How was this patch tested?

existing tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/cloud-fan/spark cache

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/20513.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20513


commit 8525b2c7e540991c75c8d61bfc5a8361cae78c7b
Author: Wenchen Fan 
Date:   2018-02-06T04:17:03Z

followup




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org