[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-03-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293914#comment-17293914 ] Antoine Pitrou commented on ARROW-10308: That said, feel free to post your chunker

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-03-02 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17293912#comment-17293912 ] Antoine Pitrou commented on ARROW-10308: Hmm, I'm a bit surprised by your assessment of

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-03-01 Thread Dror Speiser (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292895#comment-17292895 ] Dror Speiser commented on ARROW-10308: -- Yeah for sure; I went into the open registry when you

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-03-01 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292874#comment-17292874 ] Antoine Pitrou commented on ARROW-10308: [~drorspei] The data is very interesting, thank you. I

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-03-01 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292872#comment-17292872 ] Antoine Pitrou commented on ARROW-10308: "NUMA", as in "non-uniform memory access" means that

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-02-27 Thread Dror Speiser (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17292214#comment-17292214 ] Dror Speiser commented on ARROW-10308: -- Hi Diana, Cool! I've created a small benchmark that spins

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-02-25 Thread Diana Clarke (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17291016#comment-17291016 ] Diana Clarke commented on ARROW-10308: -- Hi Dror, Profiling different file sizes, compositions, and

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2021-02-23 Thread Dror Speiser (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289489#comment-17289489 ] Dror Speiser commented on ARROW-10308: -- How about I run a benchmark of multiple file sizes,

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-18 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216196#comment-17216196 ] Wes McKinney commented on ARROW-10308: -- It occurred to me that the 48 vcpu machine is likely a

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-17 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17216056#comment-17216056 ] Antoine Pitrou commented on ARROW-10308: For the record, another set of CSV datasets here:

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-16 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215519#comment-17215519 ] Antoine Pitrou commented on ARROW-10308: > Antoine, do you think this is a good idea? Do you

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-16 Thread Dror Speiser (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215483#comment-17215483 ] Dror Speiser commented on ARROW-10308: -- Yeah, Azure doesn't tell me how many physical cores are at

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-16 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215407#comment-17215407 ] Antoine Pitrou commented on ARROW-10308: For the record, on a 12-core 24-thread CPU, I get

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-16 Thread Antoine Pitrou (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215401#comment-17215401 ] Antoine Pitrou commented on ARROW-10308: "vcpu" doesn't mean anything precise unfortunately.

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-16 Thread Wes McKinney (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215396#comment-17215396 ] Wes McKinney commented on ARROW-10308: -- I do think we should be doing better here than we are so it

[jira] [Commented] (ARROW-10308) [Python] read_csv from python is slow on some work loads

2020-10-16 Thread Dror Speiser (Jira)
[ https://issues.apache.org/jira/browse/ARROW-10308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215393#comment-17215393 ] Dror Speiser commented on ARROW-10308: -- Thanks for the suggestions :) I am indeed getting the files