[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/22865


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228776973
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

SGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228776300
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

SGTM


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-28 Thread bersprockets
Github user bersprockets commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228771361
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

I see, because of this check:

https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L338
So when the data contains a Map column, for example., the vectorized reader 
is not used, even though spark.sql.parquet.enableVectorizedReader=true.

How about something like:

"If true, enables Parquet's native record-level filtering using the pushed 
down filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' is enabled *and the vectorized reader is not 
used. You can ensure the vectorized reader is not used by setting 
'spark.sql.parquet.enableVectorizedReader' to false*"






---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-28 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228755473
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

cc @cloud-fan 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228731568
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

Btw vectorized reader at parquet side is in progress at Parquet side. Maybe 
ideally 3.0 is a good target to replace.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/22865#discussion_r228731385
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala ---
@@ -462,7 +462,7 @@ object SQLConf {
   val PARQUET_RECORD_FILTER_ENABLED = 
buildConf("spark.sql.parquet.recordLevelFilter.enabled")
 .doc("If true, enables Parquet's native record-level filtering using 
the pushed down " +
   "filters. This configuration only has an effect when 
'spark.sql.parquet.filterPushdown' " +
-  "is enabled.")
+  "is enabled and spark.sql.parquet.enableVectorizedReader is 
disabled.")
--- End diff --

I didn't document this intentionally because 
`spark.sql.parquet.enableVectorizedReader` can be disabled for other conditions 
even if it's enabled (like for complex types).

Shall we clearly document this?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #22865: [DOC] Fix doc for spark.sql.parquet.recordLevelFi...

2018-10-27 Thread bersprockets
GitHub user bersprockets opened a pull request:

https://github.com/apache/spark/pull/22865

[DOC] Fix doc for spark.sql.parquet.recordLevelFilter.enabled

## What changes were proposed in this pull request?

Updated the doc string value for 
spark.sql.parquet.recordLevelFilter.enabled to indicate that 
spark.sql.parquet.enableVectorizedReader must be disabled.

The code in ParquetFileFormat uses 
spark.sql.parquet.recordLevelFilter.enabled only after falling back to 
parquet-mr (see else for this if statement): 
https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L412

https://github.com/apache/spark/blob/d5573c578a1eea9ee04886d9df37c7178e67bb30/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala#L427-L430

Tests also bear this out.

## How was this patch tested?

This is just a doc string fix: I built Spark and ran a single test.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bersprockets/spark confdocfix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/22865.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22865


commit af8a85ae4a1e477801bf104af6d4909cd822ba01
Author: Bruce Robbins 
Date:   2018-10-27T21:47:50Z

update doc string for spark.sql.parquet.recordLevelFilter.enabled




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org