[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296890#comment-16296890
 ] 

ASF GitHub Bot commented on ARROW-232:
--

wesm commented on issue #1425: ARROW-232: [Python] Add unit test for writing 
Parquet file from chunked table
URL: https://github.com/apache/arrow/pull/1425#issuecomment-352774917
 
 
   +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-12-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16296892#comment-16296892
 ] 

ASF GitHub Bot commented on ARROW-232:
--

wesm closed pull request #1425: ARROW-232: [Python] Add unit test for writing 
Parquet file from chunked table
URL: https://github.com/apache/arrow/pull/1425
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/python/pyarrow/tests/test_parquet.py 
b/python/pyarrow/tests/test_parquet.py
index fc8c8f0c9..c2bb31c9b 100644
--- a/python/pyarrow/tests/test_parquet.py
+++ b/python/pyarrow/tests/test_parquet.py
@@ -118,6 +118,24 @@ def test_pandas_parquet_2_0_rountrip(tmpdir):
 tm.assert_frame_equal(df, df_read)
 
 
+@parquet
+def test_chunked_table_write(tmpdir):
+# ARROW-232
+df = alltypes_sample(size=10)
+
+# The nanosecond->ms conversion is a nuisance, so we just avoid it here
+del df['datetime']
+
+batch = pa.RecordBatch.from_pandas(df)
+table = pa.Table.from_batches([batch] * 3)
+_check_roundtrip(table, version='2.0')
+
+df, _ = dataframe_with_lists()
+batch = pa.RecordBatch.from_pandas(df)
+table = pa.Table.from_batches([batch] * 3)
+_check_roundtrip(table, version='2.0')
+
+
 @parquet
 def test_pandas_parquet_datetime_tz():
 import pyarrow.parquet as pq


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16292063#comment-16292063
 ] 

ASF GitHub Bot commented on ARROW-232:
--

wesm opened a new pull request #1425: ARROW-232: [Python] Add unit test for 
writing Parquet file from chunked table
URL: https://github.com/apache/arrow/pull/1425
 
 
   This requires PARQUET-1092 https://github.com/apache/parquet-cpp/pull/426


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
>  Labels: pull-request-available
> Fix For: 0.9.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-10-23 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16215946#comment-16215946
 ] 

Wes McKinney commented on ARROW-232:


I took a look at how to do this in parquet-cpp. This is going to require some 
work to make {{FileWriter::Impl::WriteColumnChunk}} less monolithic, since a 
column chunk in Parquet may consist of data coming from multiple chunks in a 
table column

> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-09-09 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16160042#comment-16160042
 ] 

Wes McKinney commented on ARROW-232:


I will defer this to 0.8.0 since there's more work that needs to be done in 
parquet-cpp for this. The record batch iterator can be used to solve other 
outstanding problems  (like writing chunked tables to stream format). Patch 
forthcoming today

> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
> Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-09-09 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159863#comment-16159863
 ] 

Uwe L. Korn commented on ARROW-232:
---

> This can be solved by creating a record batch iterator for tables

This does not like a solution to me. As far as I yet understood chunking, the 
chunks sizes do not need to match between different columns inside a Table.

> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Wes McKinney
> Fix For: 0.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-08-03 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16113313#comment-16113313
 ] 

Wes McKinney commented on ARROW-232:


Moving this to 0.7.0; there are a number of interrelated issues involving 
chunked tables

> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
> Fix For: 0.7.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (ARROW-232) C++/Parquet: Support writing chunked arrays as part of a table

2017-07-17 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090172#comment-16090172
 ] 

Wes McKinney commented on ARROW-232:


This can be solved by creating a record batch iterator for tables

> C++/Parquet: Support writing chunked arrays as part of a table 
> ---
>
> Key: ARROW-232
> URL: https://issues.apache.org/jira/browse/ARROW-232
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
> Fix For: 0.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)