[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7305:
--
Comment: was deleted
(was: I dont think memory_profile is registering memory usage here
[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000928#comment-17000928
]
Bogdan Klichuk commented on ARROW-7305:
---
Looking at a bigger example
{code:java}
df =
[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000908#comment-17000908
]
Bogdan Klichuk commented on ARROW-7305:
---
I dont think memory_profile is registering memory usage
[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17000893#comment-17000893
]
Bogdan Klichuk commented on ARROW-7305:
---
I have tried this in ubuntu docker and results for 0.14.1
[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7305:
--
Attachment: 50mb.csv.gz
> [Python] High memory usage writing pyarrow.Table with large strings
[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998688#comment-16998688
]
Bogdan Klichuk commented on ARROW-7305:
---
Sorry for delay, attaching a gzipped 50mb csv file with
[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7305:
--
Comment: was deleted
(was: Seems like its transformation of pandas to pyarrow.Table.
If you
[
https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16992058#comment-16992058
]
Bogdan Klichuk commented on ARROW-7305:
---
Seems like its transformation of pandas to pyarrow.Table.
Bogdan Klichuk created ARROW-7305:
-
Summary: High memory usage writing pyarrow.Table to parquet
Key: ARROW-7305
URL: https://issues.apache.org/jira/browse/ARROW-7305
Project: Apache Arrow
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16976379#comment-16976379
]
Bogdan Klichuk commented on ARROW-7150:
---
Yeah its not that simple on my end. It's just one of json
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974110#comment-16974110
]
Bogdan Klichuk commented on ARROW-7150:
---
Been always thinking of avro as alternative since i pretty
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974108#comment-16974108
]
Bogdan Klichuk commented on ARROW-7150:
---
[~emkornfi...@gmail.com] yeah, i'm not too familiar with
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7150:
--
Environment: Mac OS X (was: Mac OS X. Pyarrow==0.14.1)
> [Python] Explain parquet file size
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7150:
--
Affects Version/s: (was: 0.14.1)
0.15.1
> [Python] Explain parquet
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7150:
--
Description:
Having columnar storage format in mind, with gzip compression enabled, I can't
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7150:
--
Attachment: 820.parquet
> [Python] Explain parquet file size growth
>
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973501#comment-16973501
]
Bogdan Klichuk commented on ARROW-7150:
---
Hello. Tried 0.15.1, got the same results. I managed to
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973501#comment-16973501
]
Bogdan Klichuk edited comment on ARROW-7150 at 11/13/19 4:31 PM:
-
Hello.
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7150:
--
Environment: Mac OS X. Pyarrow==0.14.1 (was: Mac OS X. Pyarrow==0.15.1)
> [Python] Explain
Bogdan Klichuk created ARROW-7150:
-
Summary: [Python] Explain parquet file size growth
Key: ARROW-7150
URL: https://issues.apache.org/jira/browse/ARROW-7150
Project: Apache Arrow
Issue Type:
[
https://issues.apache.org/jira/browse/ARROW-7150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-7150:
--
Description:
Having columnar storage format in mind, with gzip compression enabled, I can't
[
https://issues.apache.org/jira/browse/ARROW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-6481:
--
Summary: [Python] Bad performance of read_csv() with column_types (was:
Bad performance of
[
https://issues.apache.org/jira/browse/ARROW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-6481:
--
Environment: ubuntu xenial, python2.7 (was: ubuntu xenial)
> Bad performance of read_csv()
[
https://issues.apache.org/jira/browse/ARROW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16925007#comment-16925007
]
Bogdan Klichuk commented on ARROW-6481:
---
I don't think hashtable lookup on each column has to make
[
https://issues.apache.org/jira/browse/ARROW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-6481:
--
Description:
Case: Dataset wit 20k columns. Amount of rows can be 0.
[
https://issues.apache.org/jira/browse/ARROW-6481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-6481:
--
Description:
Case: Dataset wit 20k columns. Amount of rows can be 0.
Bogdan Klichuk created ARROW-6481:
-
Summary: Bad performance of read_csv() with column_types
Key: ARROW-6481
URL: https://issues.apache.org/jira/browse/ARROW-6481
Project: Apache Arrow
Issue
[
https://issues.apache.org/jira/browse/ARROW-6301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16916798#comment-16916798
]
Bogdan Klichuk commented on ARROW-6301:
---
Bumping this thread with related segfault, that
[
https://issues.apache.org/jira/browse/ARROW-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16876769#comment-16876769
]
Bogdan Klichuk commented on ARROW-5791:
---
Thanks a lot!
> [Python] pyarrow.csv.read_csv hangs +
[
https://issues.apache.org/jira/browse/ARROW-5811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-5811:
--
Summary: [Python] pyarrow.csv.read_csv: Ability to not infer column types.
(was:
Bogdan Klichuk created ARROW-5811:
-
Summary: pyarrow.csv.read_csv: Ability to not infer column types.
Key: ARROW-5811
URL: https://issues.apache.org/jira/browse/ARROW-5811
Project: Apache Arrow
[
https://issues.apache.org/jira/browse/ARROW-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875626#comment-16875626
]
Bogdan Klichuk commented on ARROW-5791:
---
Just to point, I can successfully convert a dataframe (if
[
https://issues.apache.org/jira/browse/ARROW-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875624#comment-16875624
]
Bogdan Klichuk commented on ARROW-5791:
---
[~bhulette] It's a shame I threw away the idea of "maybe
Bogdan Klichuk created ARROW-5791:
-
Summary: pyarrow.csv.read_csv hangs + eats all RAM
Key: ARROW-5791
URL: https://issues.apache.org/jira/browse/ARROW-5791
Project: Apache Arrow
Issue Type:
[
https://issues.apache.org/jira/browse/ARROW-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bogdan Klichuk updated ARROW-5791:
--
Description:
I have quite a sparse dataset in CSV format. A wide table that has several rows
35 matches
Mail list logo