[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296890#comment-16296890 ] ASF GitHub Bot commented on ARROW-232: -- wesm commented on issue #1425: ARROW-232: [Python] Add unit test for writing Parquet file from chunked table URL: https://github.com/apache/arrow/pull/1425#issuecomment-352774917 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296892#comment-16296892 ] ASF GitHub Bot commented on ARROW-232: -- wesm closed pull request #1425: ARROW-232: [Python] Add unit test for writing Parquet file from chunked table URL: https://github.com/apache/arrow/pull/1425 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/python/pyarrow/tests/test_parquet.py b/python/pyarrow/tests/test_parquet.py index fc8c8f0c9..c2bb31c9b 100644 --- a/python/pyarrow/tests/test_parquet.py +++ b/python/pyarrow/tests/test_parquet.py @@ -118,6 +118,24 @@ def test_pandas_parquet_2_0_rountrip(tmpdir): tm.assert_frame_equal(df, df_read) +@parquet +def test_chunked_table_write(tmpdir): +# ARROW-232 +df = alltypes_sample(size=10) + +# The nanosecond->ms conversion is a nuisance, so we just avoid it here +del df['datetime'] + +batch = pa.RecordBatch.from_pandas(df) +table = pa.Table.from_batches([batch] * 3) +_check_roundtrip(table, version='2.0') + +df, _ = dataframe_with_lists() +batch = pa.RecordBatch.from_pandas(df) +table = pa.Table.from_batches([batch] * 3) +_check_roundtrip(table, version='2.0') + + @parquet def test_pandas_parquet_datetime_tz(): import pyarrow.parquet as pq This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292063#comment-16292063 ] ASF GitHub Bot commented on ARROW-232: -- wesm opened a new pull request #1425: ARROW-232: [Python] Add unit test for writing Parquet file from chunked table URL: https://github.com/apache/arrow/pull/1425 This requires PARQUET-1092 https://github.com/apache/parquet-cpp/pull/426 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Wes McKinney > Labels: pull-request-available > Fix For: 0.9.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215946#comment-16215946 ] Wes McKinney commented on ARROW-232: I took a look at how to do this in parquet-cpp. This is going to require some work to make {{FileWriter::Impl::WriteColumnChunk}} less monolithic, since a column chunk in Parquet may consist of data coming from multiple chunks in a table column > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Wes McKinney > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160042#comment-16160042 ] Wes McKinney commented on ARROW-232: I will defer this to 0.8.0 since there's more work that needs to be done in parquet-cpp for this. The record batch iterator can be used to solve other outstanding problems (like writing chunked tables to stream format). Patch forthcoming today > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Wes McKinney > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159863#comment-16159863 ] Uwe L. Korn commented on ARROW-232: --- > This can be solved by creating a record batch iterator for tables This does not like a solution to me. As far as I yet understood chunking, the chunks sizes do not need to match between different columns inside a Table. > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Wes McKinney > Fix For: 0.7.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113313#comment-16113313 ] Wes McKinney commented on ARROW-232: Moving this to 0.7.0; there are a number of interrelated issues involving chunked tables > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn > Fix For: 0.7.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table
[ https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090172#comment-16090172 ] Wes McKinney commented on ARROW-232: This can be solved by creating a record batch iterator for tables > C++/Parquet: Support writing chunked arrays as part of a table > --- > > Key: ARROW-232 > URL: https://issues.apache.org/jira/browse/ARROW-232 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn > Fix For: 0.6.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)