[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17644922#comment-17644922
]
Apache Arrow JIRA Bot commented on ARROW-17313:
---
This issue was last updated over 90 days
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578080#comment-17578080
]
Weston Pace commented on ARROW-17313:
-
There is also ARROW-15589, which I had referenced above, also
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577977#comment-17577977
]
David Li commented on ARROW-17313:
--
ARROW-17159 is a very similar issue, except motivated by
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576937#comment-17576937
]
Weston Pace commented on ARROW-17313:
-
That sounds good to me. Even if we end up later unifying
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576926#comment-17576926
]
Ziheng Wang commented on ARROW-17313:
-
There is no physical way you can do this with a .csv.gz file
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576914#comment-17576914
]
Antoine Pitrou commented on ARROW-17313:
Well, if arbitrary byte ranges need to be supported,
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576900#comment-17576900
]
Weston Pace commented on ARROW-17313:
-
Yes. I think the original Substrait use case was based on
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576889#comment-17576889
]
Antoine Pitrou commented on ARROW-17313:
Hmm... so this is partitioning on the client side by
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576885#comment-17576885
]
Weston Pace commented on ARROW-17313:
-
I don't think I've been explaining myself well. Let's
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576881#comment-17576881
]
Antoine Pitrou commented on ARROW-17313:
What I strive to understand is why the Substrait
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576880#comment-17576880
]
Weston Pace commented on ARROW-17313:
-
Would it help to think of these not as byte ranges but as
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576858#comment-17576858
]
Antoine Pitrou commented on ARROW-17313:
The intent of datasets has always be that each file
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576852#comment-17576852
]
Weston Pace commented on ARROW-17313:
-
I think the {{FileFragment}} would be a good place for this.
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576154#comment-17576154
]
Antoine Pitrou commented on ARROW-17313:
Ok, so perhaps byte ranges should actually be provided
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17576079#comment-17576079
]
Ziheng Wang commented on ARROW-17313:
-
Ideally we update the Dataset Scanner to be able to take in
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575967#comment-17575967
]
Ziheng Wang commented on ARROW-17313:
-
Also this will not support compressed formats, at least at
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575937#comment-17575937
]
Weston Pace commented on ARROW-17313:
-
> We should reject a partial read if newlines_in_values is
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575928#comment-17575928
]
Antoine Pitrou commented on ARROW-17313:
Nothing. The Substrait producer should produce valid
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575930#comment-17575930
]
Weston Pace commented on ARROW-17313:
-
> It's not too late to change the Substrait spec, is it?
> Or
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575916#comment-17575916
]
Ziheng Wang commented on ARROW-17313:
-
Ah I meant what we should do about the linbreaks and quotes
[
https://issues.apache.org/jira/browse/ARROW-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17575902#comment-17575902
]
Antoine Pitrou commented on ARROW-17313:
There's not much to elaborate.
21 matches
Mail list logo