[jira] [Assigned] (ARROW-5630) [Python] Table of nested arrays doesn't round trip

2019-09-16 Thread Wes McKinney (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-5630:
---

Assignee: Wes McKinney

> [Python] Table of nested arrays doesn't round trip
> --
>
> Key: ARROW-5630
> URL: https://issues.apache.org/jira/browse/ARROW-5630
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: pyarrow 0.13, Windows 10
>Reporter: Philip Felton
>Assignee: Wes McKinney
>Priority: Major
>  Labels: parquet
> Fix For: 0.15.0
>
>
> This is pyarrow 0.13 on Windows.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> def make_table(num_rows):
> typ = pa.list_(pa.field("item", pa.float32(), False))
> return pa.Table.from_arrays([
> pa.array([[0] * (i%10) for i in range(0, num_rows)], type=typ),
> pa.array([[0] * ((i+5)%10) for i in range(0, num_rows)], type=typ)
> ], ['a', 'b'])
> pq.write_table(make_table(100), 'test.parquet')
> pq.read_table('test.parquet')
> {code}
> The last line throws the following exception:
> {noformat}
> ---
> ArrowInvalid  Traceback (most recent call last)
>  in 
> > 1 pq.read_table('full.parquet')
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read_table(source, 
> columns, use_threads, metadata, use_pandas_metadata, memory_map, filesystem)
>1150 return fs.read_parquet(path, columns=columns,
>1151use_threads=use_threads, 
> metadata=metadata,
> -> 1152
> use_pandas_metadata=use_pandas_metadata)
>1153 
>1154 pf = ParquetFile(source, metadata=metadata)
> ~\Anaconda3\lib\site-packages\pyarrow\filesystem.py in read_parquet(self, 
> path, columns, metadata, schema, use_threads, use_pandas_metadata)
> 179  filesystem=self)
> 180 return dataset.read(columns=columns, use_threads=use_threads,
> --> 181 use_pandas_metadata=use_pandas_metadata)
> 182 
> 183 def open(self, path, mode='rb'):
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
>1012 table = piece.read(columns=columns, 
> use_threads=use_threads,
>1013partitions=self.partitions,
> -> 1014
> use_pandas_metadata=use_pandas_metadata)
>1015 tables.append(table)
>1016 
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, partitions, open_file_func, file, use_pandas_metadata)
> 562 table = reader.read_row_group(self.row_group, **options)
> 563 else:
> --> 564 table = reader.read(**options)
> 565 
> 566 if len(self.partition_keys) > 0:
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
> 212 columns, use_pandas_metadata=use_pandas_metadata)
> 213 return self.reader.read_all(column_indices=column_indices,
> --> 214 use_threads=use_threads)
> 215 
> 216 def scan_contents(self, columns=None, batch_size=65536):
> ~\Anaconda3\lib\site-packages\pyarrow\_parquet.pyx in 
> pyarrow._parquet.ParquetReader.read_all()
> ~\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Column 1 named b expected length 932066 but got length 932063
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (ARROW-5630) [Python] Table of nested arrays doesn't round trip

2019-08-20 Thread Francois Saint-Jacques (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-5630:
-

Assignee: (was: Francois Saint-Jacques)

> [Python] Table of nested arrays doesn't round trip
> --
>
> Key: ARROW-5630
> URL: https://issues.apache.org/jira/browse/ARROW-5630
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: pyarrow 0.13, Windows 10
>Reporter: Philip Felton
>Priority: Major
>  Labels: parquet
> Fix For: 1.0.0
>
>
> This is pyarrow 0.13 on Windows.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> def make_table(num_rows):
> typ = pa.list_(pa.field("item", pa.float32(), False))
> return pa.Table.from_arrays([
> pa.array([[0] * (i%10) for i in range(0, num_rows)], type=typ),
> pa.array([[0] * ((i+5)%10) for i in range(0, num_rows)], type=typ)
> ], ['a', 'b'])
> pq.write_table(make_table(100), 'test.parquet')
> pq.read_table('test.parquet')
> {code}
> The last line throws the following exception:
> {noformat}
> ---
> ArrowInvalid  Traceback (most recent call last)
>  in 
> > 1 pq.read_table('full.parquet')
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read_table(source, 
> columns, use_threads, metadata, use_pandas_metadata, memory_map, filesystem)
>1150 return fs.read_parquet(path, columns=columns,
>1151use_threads=use_threads, 
> metadata=metadata,
> -> 1152
> use_pandas_metadata=use_pandas_metadata)
>1153 
>1154 pf = ParquetFile(source, metadata=metadata)
> ~\Anaconda3\lib\site-packages\pyarrow\filesystem.py in read_parquet(self, 
> path, columns, metadata, schema, use_threads, use_pandas_metadata)
> 179  filesystem=self)
> 180 return dataset.read(columns=columns, use_threads=use_threads,
> --> 181 use_pandas_metadata=use_pandas_metadata)
> 182 
> 183 def open(self, path, mode='rb'):
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
>1012 table = piece.read(columns=columns, 
> use_threads=use_threads,
>1013partitions=self.partitions,
> -> 1014
> use_pandas_metadata=use_pandas_metadata)
>1015 tables.append(table)
>1016 
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, partitions, open_file_func, file, use_pandas_metadata)
> 562 table = reader.read_row_group(self.row_group, **options)
> 563 else:
> --> 564 table = reader.read(**options)
> 565 
> 566 if len(self.partition_keys) > 0:
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
> 212 columns, use_pandas_metadata=use_pandas_metadata)
> 213 return self.reader.read_all(column_indices=column_indices,
> --> 214 use_threads=use_threads)
> 215 
> 216 def scan_contents(self, columns=None, batch_size=65536):
> ~\Anaconda3\lib\site-packages\pyarrow\_parquet.pyx in 
> pyarrow._parquet.ParquetReader.read_all()
> ~\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Column 1 named b expected length 932066 but got length 932063
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Assigned] (ARROW-5630) [Python] Table of nested arrays doesn't round trip

2019-06-25 Thread Francois Saint-Jacques (JIRA)


 [ 
https://issues.apache.org/jira/browse/ARROW-5630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francois Saint-Jacques reassigned ARROW-5630:
-

Assignee: Francois Saint-Jacques

> [Python] Table of nested arrays doesn't round trip
> --
>
> Key: ARROW-5630
> URL: https://issues.apache.org/jira/browse/ARROW-5630
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
> Environment: pyarrow 0.13, Windows 10
>Reporter: Philip Felton
>Assignee: Francois Saint-Jacques
>Priority: Major
>  Labels: parquet
> Fix For: 0.14.0
>
>
> This is pyarrow 0.13 on Windows.
> {code:python}
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> def make_table(num_rows):
> typ = pa.list_(pa.field("item", pa.float32(), False))
> return pa.Table.from_arrays([
> pa.array([[0] * (i%10) for i in range(0, num_rows)], type=typ),
> pa.array([[0] * ((i+5)%10) for i in range(0, num_rows)], type=typ)
> ], ['a', 'b'])
> pq.write_table(make_table(100), 'test.parquet')
> pq.read_table('test.parquet')
> {code}
> The last line throws the following exception:
> {noformat}
> ---
> ArrowInvalid  Traceback (most recent call last)
>  in 
> > 1 pq.read_table('full.parquet')
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read_table(source, 
> columns, use_threads, metadata, use_pandas_metadata, memory_map, filesystem)
>1150 return fs.read_parquet(path, columns=columns,
>1151use_threads=use_threads, 
> metadata=metadata,
> -> 1152
> use_pandas_metadata=use_pandas_metadata)
>1153 
>1154 pf = ParquetFile(source, metadata=metadata)
> ~\Anaconda3\lib\site-packages\pyarrow\filesystem.py in read_parquet(self, 
> path, columns, metadata, schema, use_threads, use_pandas_metadata)
> 179  filesystem=self)
> 180 return dataset.read(columns=columns, use_threads=use_threads,
> --> 181 use_pandas_metadata=use_pandas_metadata)
> 182 
> 183 def open(self, path, mode='rb'):
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
>1012 table = piece.read(columns=columns, 
> use_threads=use_threads,
>1013partitions=self.partitions,
> -> 1014
> use_pandas_metadata=use_pandas_metadata)
>1015 tables.append(table)
>1016 
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, partitions, open_file_func, file, use_pandas_metadata)
> 562 table = reader.read_row_group(self.row_group, **options)
> 563 else:
> --> 564 table = reader.read(**options)
> 565 
> 566 if len(self.partition_keys) > 0:
> ~\Anaconda3\lib\site-packages\pyarrow\parquet.py in read(self, columns, 
> use_threads, use_pandas_metadata)
> 212 columns, use_pandas_metadata=use_pandas_metadata)
> 213 return self.reader.read_all(column_indices=column_indices,
> --> 214 use_threads=use_threads)
> 215 
> 216 def scan_contents(self, columns=None, batch_size=65536):
> ~\Anaconda3\lib\site-packages\pyarrow\_parquet.pyx in 
> pyarrow._parquet.ParquetReader.read_all()
> ~\Anaconda3\lib\site-packages\pyarrow\error.pxi in pyarrow.lib.check_status()
> ArrowInvalid: Column 1 named b expected length 932066 but got length 932063
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)