[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-15 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-574544365
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-13 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-573649775
 
 
   @cloud-fan I am still worry how does this behave in different modes, 
especially in the PERMISSIVE mode. Let me write tests for that. In the 
PERMISSIVE mode, we return `null`s if we cannot convert/parse a CSV field. What 
happens if I apply predicates for `null`. I guess it should be covered properly 
by expression implementations.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-11 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-573391098
 
 
   And in `pushedFilters`, I store only filters that refers to the schema, and 
the filters that I can handle.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-11 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-573390974
 
 
   @cloud-fan The problem was in the line 
https://github.com/apache/spark/pull/26973/commits/f0aa0a88bfa0c87007f8781ba7fac8f9cd3057ba#diff-faa3cfad03552057c3cb431c5ce87f03L52
 where I returned an empty array.
   
   The comment for pushFilters says that I should return filters that should be 
evaluated after scanning
   
https://github.com/apache/spark/blob/053dd858d38e6107bc71e0aa3a4954291b74f8c8/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/SupportsPushDownFilters.java#L31
   I guessed if I consumed all filters then I should return nothing. But 
looking at ORC and Parquet, I noticed that they return all filters:
   
https://github.com/apache/spark/blob/5114389aef2cacaacc82e6025696b33d6d20b2a6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetScanBuilder.scala#L65
   
https://github.com/apache/spark/blob/5114389aef2cacaacc82e6025696b33d6d20b2a6/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/orc/OrcScanBuilder.scala#L65


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-11 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-573387323
 
 
   @HyukjinKwon @dongjoon-hyun @cloud-fan Could you review this PR, please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-10 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-572996859
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-09 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-572448533
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-06 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-571239614
 
 
   @hvanhovell WDYT?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2020-01-02 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-570321110
 
 
   @HyukjinKwon @cloud-fan @dongjoon-hyun @maropu Could you take a look at the 
PR, please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2019-12-27 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-569323494
 
 
   >  If the pushed filters are converted to back Spark's expressions, I think 
there's no point of doing it (as Spark's optimizer should do that instead).
   
   @HyukjinKwon Spark's optimizer applies filters to an entire row fully 
converted to desired types but pushed filters are applied to some parts of the 
row. So, we can skip some value conversions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2019-12-24 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-568717093
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2019-12-23 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-568680164
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2019-12-23 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-568447761
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2019-12-23 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-568409809
 
 
   @HyukjinKwon I have updated the PR description. Please, tell me if something 
is unclear.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2019-12-22 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-568302637
 
 
   jenkins, retest this, please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown in CSV datasource

2019-12-21 Thread GitBox
MaxGekk commented on issue #26973: [SPARK-30323][SQL] Support filters pushdown 
in CSV datasource
URL: https://github.com/apache/spark/pull/26973#issuecomment-568181247
 
 
   @HyukjinKwon @dongjoon-hyun @cloud-fan May I ask you to review this PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org