[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-12-08 Thread Apache Arrow JIRA Bot (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644922#comment-17644922 ] Apache Arrow JIRA Bot commented on ARROW-17313: --- This issue was last updated over 90 days

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-10 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578080#comment-17578080 ] Weston Pace commented on ARROW-17313: - There is also ARROW-15589, which I had referenced above, also

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-10 Thread David Li (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577977#comment-17577977 ] David Li commented on ARROW-17313: -- ARROW-17159 is a very similar issue, except motivated by

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576937#comment-17576937 ] Weston Pace commented on ARROW-17313: - That sounds good to me. Even if we end up later unifying

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576926#comment-17576926 ] Ziheng Wang commented on ARROW-17313: - There is no physical way you can do this with a .csv.gz file

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576914#comment-17576914 ] Antoine Pitrou commented on ARROW-17313: Well, if arbitrary byte ranges need to be supported,

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576900#comment-17576900 ] Weston Pace commented on ARROW-17313: - Yes. I think the original Substrait use case was based on

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576889#comment-17576889 ] Antoine Pitrou commented on ARROW-17313: Hmm... so this is partitioning on the client side by

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576885#comment-17576885 ] Weston Pace commented on ARROW-17313: - I don't think I've been explaining myself well. Let's

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576881#comment-17576881 ] Antoine Pitrou commented on ARROW-17313: What I strive to understand is why the Substrait

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576880#comment-17576880 ] Weston Pace commented on ARROW-17313: - Would it help to think of these not as byte ranges but as

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576858#comment-17576858 ] Antoine Pitrou commented on ARROW-17313: The intent of datasets has always be that each file

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-08 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576852#comment-17576852 ] Weston Pace commented on ARROW-17313: - I think the {{FileFragment}} would be a good place for this.

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-06 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576154#comment-17576154 ] Antoine Pitrou commented on ARROW-17313: Ok, so perhaps byte ranges should actually be provided

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576079#comment-17576079 ] Ziheng Wang commented on ARROW-17313: - Ideally we update the Dataset Scanner to be able to take in

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575967#comment-17575967 ] Ziheng Wang commented on ARROW-17313: - Also this will not support compressed formats, at least at

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575937#comment-17575937 ] Weston Pace commented on ARROW-17313: - > We should reject a partial read if newlines_in_values is

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575928#comment-17575928 ] Antoine Pitrou commented on ARROW-17313: Nothing. The Substrait producer should produce valid

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Weston Pace (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575930#comment-17575930 ] Weston Pace commented on ARROW-17313: - > It's not too late to change the Substrait spec, is it? > Or

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Ziheng Wang (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575916#comment-17575916 ] Ziheng Wang commented on ARROW-17313: - Ah I meant what we should do about the linbreaks and quotes

[jira] [Commented] (ARROW-17313) [C++] Add Byte Range to CSV Reader ReadOptions

2022-08-05 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575902#comment-17575902 ] Antoine Pitrou commented on ARROW-17313: There's not much to elaborate.