[jira] [Assigned] (ARROW-5241) [Python] Add option to disable writing statistics to parquet file

2019-06-18 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-5241: Assignee: Joris Van den Bossche > [Python] Add option to disable writing

[jira] [Commented] (ARROW-5122) [Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory

2019-05-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836455#comment-16836455 ] Joris Van den Bossche commented on ARROW-5122: -- Thanks for testing again! > [Python]

[jira] [Created] (ARROW-5295) [Python] accept pyarrow values / scalars in constructor functions ?

2019-05-09 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5295: Summary: [Python] accept pyarrow values / scalars in constructor functions ? Key: ARROW-5295 URL: https://issues.apache.org/jira/browse/ARROW-5295

[jira] [Commented] (ARROW-2667) [C++/Python] Add pandas-like take method to Array/Column/ChunkedArray

2019-05-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836268#comment-16836268 ] Joris Van den Bossche commented on ARROW-2667: -- {quote}Note that pandas' `take` is a bit

[jira] [Commented] (ARROW-5293) [C++] Take kernel on DictionaryArray does not preserve ordered flag

2019-05-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836263#comment-16836263 ] Joris Van den Bossche commented on ARROW-5293: -- cc [~bkietz] > [C++] Take kernel on

[jira] [Created] (ARROW-5291) [Python] Add wrapper for "take" kernel on Array

2019-05-09 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5291: Summary: [Python] Add wrapper for "take" kernel on Array Key: ARROW-5291 URL: https://issues.apache.org/jira/browse/ARROW-5291 Project: Apache Arrow

[jira] [Commented] (ARROW-2103) [C++] Implement take kernel functions - string/binary value type

2019-05-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836254#comment-16836254 ] Joris Van den Bossche commented on ARROW-2103: -- [~bkietz] I think you already implemented

[jira] [Created] (ARROW-5293) [C++] Take kernel on DictionaryArray does not preserve ordered flag

2019-05-09 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5293: Summary: [C++] Take kernel on DictionaryArray does not preserve ordered flag Key: ARROW-5293 URL: https://issues.apache.org/jira/browse/ARROW-5293

[jira] [Comment Edited] (ARROW-2667) [C++/Python] Add pandas-like take method to Array/Column/ChunkedArray

2019-05-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16836268#comment-16836268 ] Joris Van den Bossche edited comment on ARROW-2667 at 5/9/19 10:19 AM:

[jira] [Created] (ARROW-5301) [Python] parquet documentation outdated on nthreads argument

2019-05-11 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5301: Summary: [Python] parquet documentation outdated on nthreads argument Key: ARROW-5301 URL: https://issues.apache.org/jira/browse/ARROW-5301 Project:

[jira] [Assigned] (ARROW-5286) [Python] support Structs in Table.from_pandas given a known schema

2019-05-13 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-5286: Assignee: Joris Van den Bossche > [Python] support Structs in

[jira] [Commented] (ARROW-5286) [Python] support Structs in Table.from_pandas given a known schema

2019-05-13 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838389#comment-16838389 ] Joris Van den Bossche commented on ARROW-5286: -- Actually, also converting from dicts

[jira] [Commented] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2019-05-13 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838442#comment-16838442 ] Joris Van den Bossche commented on ARROW-3424: -- Currently, a list of files is already

[jira] [Commented] (ARROW-4516) [Python] Error while creating a ParquetDataset on a path without `_common_dataset` but with an empty `_tempfile`

2019-05-13 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838604#comment-16838604 ] Joris Van den Bossche commented on ARROW-4516: -- Similarly to ARROW-1079 /

[jira] [Created] (ARROW-5311) [C++] Return more specific invalid Status in Take kernel

2019-05-13 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5311: Summary: [C++] Return more specific invalid Status in Take kernel Key: ARROW-5311 URL: https://issues.apache.org/jira/browse/ARROW-5311 Project:

[jira] [Created] (ARROW-5310) [Python] better error message on creating ParquetDataset from empty directory

2019-05-13 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5310: Summary: [Python] better error message on creating ParquetDataset from empty directory Key: ARROW-5310 URL: https://issues.apache.org/jira/browse/ARROW-5310

[jira] [Updated] (ARROW-5293) [C++] Take kernel on DictionaryArray does not preserve ordered flag

2019-05-13 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5293: - Fix Version/s: 0.14.0 > [C++] Take kernel on DictionaryArray does not preserve

[jira] [Commented] (ARROW-5349) [Python/C++] Provide a way to specify the file path in parquet ColumnChunkMetaData

2019-05-22 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845670#comment-16845670 ] Joris Van den Bossche commented on ARROW-5349: -- [~mdurant] fastparquet only sets the

[jira] [Commented] (ARROW-5349) [Python/C++] Provide a way to specify the file path in parquet ColumnChunkMetaData

2019-05-22 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845883#comment-16845883 ] Joris Van den Bossche commented on ARROW-5349: -- Given that, for the API, it might make more

[jira] [Commented] (ARROW-5349) [Python/C++] Provide a way to specify the file path in parquet ColumnChunkMetaData

2019-05-22 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845879#comment-16845879 ] Joris Van den Bossche commented on ARROW-5349: -- Thanks. It is actually also quite clear in

[jira] [Updated] (ARROW-5271) [Python] Interface for converting pandas ExtensionArray / other custom array objects to pyarrow Array

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5271: - Description: Related to ARROW-2428, which describes the issue to convert back to

[jira] [Closed] (ARROW-4814) [Python] Exception when writing nested columns that are tuples to parquet

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-4814. Resolution: Resolved I opened ARROW-5286 to also support specifying a struct type

[jira] [Updated] (ARROW-5286) [Python] support Structs in Table.from_pandas given a known schema

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5286: - Fix Version/s: 0.14.0 > [Python] support Structs in Table.from_pandas given a

[jira] [Commented] (ARROW-5287) [Python] automatic type inference for arrays of tuples

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835730#comment-16835730 ] Joris Van den Bossche commented on ARROW-5287: -- Yes, I understand the "ambiguous" reason,

[jira] [Commented] (ARROW-4814) [Python] Exception when writing nested columns that are tuples to parquet

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835671#comment-16835671 ] Joris Van den Bossche commented on ARROW-4814: -- This is actually a different issue (not

[jira] [Created] (ARROW-5286) [Python] support Structs in Table.from_pandas given a known schema

2019-05-08 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5286: Summary: [Python] support Structs in Table.from_pandas given a known schema Key: ARROW-5286 URL: https://issues.apache.org/jira/browse/ARROW-5286

[jira] [Updated] (ARROW-5287) [Python] automatic type inference for arrays of tuples

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5287: - Description: Arrays of tuples are support to be converted to either ListArray or

[jira] [Commented] (ARROW-2667) [C++/Python] Add pandas-like take method to Array/Column/ChunkedArray

2019-05-21 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845046#comment-16845046 ] Joris Van den Bossche commented on ARROW-2667: -- I only added it to Array, not yet to

[jira] [Created] (ARROW-5379) [Python] support pandas' nullable Integer type in from_pandas

2019-05-20 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5379: Summary: [Python] support pandas' nullable Integer type in from_pandas Key: ARROW-5379 URL: https://issues.apache.org/jira/browse/ARROW-5379 Project:

[jira] [Created] (ARROW-5349) [Python/C++] Provide a way to specify the file path in parquet ColumnChunkMetaData

2019-05-16 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5349: Summary: [Python/C++] Provide a way to specify the file path in parquet ColumnChunkMetaData Key: ARROW-5349 URL: https://issues.apache.org/jira/browse/ARROW-5349

[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2019-05-16 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841417#comment-16841417 ] Joris Van den Bossche commented on ARROW-1983: -- Copying the questions that [~pearu]

[jira] [Commented] (ARROW-1983) [Python] Add ability to write parquet `_metadata` file

2019-05-16 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841424#comment-16841424 ] Joris Van den Bossche commented on ARROW-1983: -- > what would be desired the interface for

[jira] [Commented] (ARROW-5311) [C++] Return more specific invalid Status in Take kernel

2019-05-15 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840410#comment-16840410 ] Joris Van den Bossche commented on ARROW-5311: -- With IndexError, we could match typical

[jira] [Closed] (ARROW-2709) [Python] write_to_dataset poor performance when splitting

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-2709. Resolution: Duplicate > [Python] write_to_dataset poor performance when splitting

[jira] [Commented] (ARROW-2709) [Python] write_to_dataset poor performance when splitting

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829173#comment-16829173 ] Joris Van den Bossche commented on ARROW-2709: -- This seems a duplicate of ARROW-2628, so

[jira] [Commented] (ARROW-2079) [Python] Possibly use `_common_metadata` for schema if `_metadata` isn't available

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829034#comment-16829034 ] Joris Van den Bossche commented on ARROW-2079: -- It seems to me that this is closed by

[jira] [Updated] (ARROW-2628) [Python] parquet.write_to_dataset is memory-hungry on large DataFrames

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2628: - Component/s: C++ > [Python] parquet.write_to_dataset is memory-hungry on large

[jira] [Created] (ARROW-5237) [Python] pandas_version key in pandas metadata no longer populated

2019-04-29 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5237: Summary: [Python] pandas_version key in pandas metadata no longer populated Key: ARROW-5237 URL: https://issues.apache.org/jira/browse/ARROW-5237

[jira] [Updated] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2882: - Component/s: C++ > [C++][Python] Support AWS Firehose partition_scheme

[jira] [Commented] (ARROW-2079) [Python] Possibly use `_common_metadata` for schema if `_metadata` isn't available

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829355#comment-16829355 ] Joris Van den Bossche commented on ARROW-2079: -- But the {{_common_metadata}} we could

[jira] [Commented] (ARROW-2628) [Python] parquet.write_to_dataset is memory-hungry on large DataFrames

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829198#comment-16829198 ] Joris Van den Bossche commented on ARROW-2628: -- Similar report in ARROW-2709, which had a PR

[jira] [Commented] (ARROW-3806) [Python] When converting nested types to pandas, use tuples

2019-04-29 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16829231#comment-16829231 ] Joris Van den Bossche commented on ARROW-3806: -- There is currently no automatic type

[jira] [Updated] (ARROW-5241) [Python] Add option to disable writing statistics to parquet file

2019-04-30 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5241: - Summary: [Python] Add option to disable writing statistics to parquet file

[jira] [Updated] (ARROW-5241) [Python] Add option to disable writing statistics

2019-04-30 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5241: - Labels: parquet (was: ) > [Python] Add option to disable writing statistics >

[jira] [Commented] (ARROW-2428) [Python] Support ExtensionArrays in to_pandas conversion

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-2428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834100#comment-16834100 ] Joris Van den Bossche commented on ARROW-2428: -- [~xhochy] did you have already a specific

[jira] [Updated] (ARROW-4723) Skip _files when reading a directory containing parquet files

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4723: - Labels: parquet (was: ) > Skip _files when reading a directory containing

[jira] [Updated] (ARROW-4633) [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4633: - Labels: newbie parquet (was: newbie) > [Python]

[jira] [Updated] (ARROW-4823) [Python] read_csv shouldn't close file handles it doesn't own

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4823: - Labels: csv (was: ) > [Python] read_csv shouldn't close file handles it doesn't

[jira] [Updated] (ARROW-4001) [Python] Create Parquet Schema in python

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4001: - Labels: parquet (was: ) > [Python] Create Parquet Schema in python >

[jira] [Created] (ARROW-5271) [Python] Interface for converting pandas ExtensionArray / other custom array objects to pyarrow Array

2019-05-06 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5271: Summary: [Python] Interface for converting pandas ExtensionArray / other custom array objects to pyarrow Array Key: ARROW-5271 URL:

[jira] [Updated] (ARROW-4398) [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY serialization (read and write)

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4398: - Labels: parquet (was: ) > [Python] Add benchmarks for Arrow<>Parquet BYTE_ARRAY

[jira] [Updated] (ARROW-4076) [Python] schema validation and filters

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4076?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4076: - Labels: easyfix parquet pull-request-available (was: easyfix

[jira] [Commented] (ARROW-4505) [C++] Nicer PrettyPrint for date32

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16834054#comment-16834054 ] Joris Van den Bossche commented on ARROW-4505: -- And the same for TimestampArray: {code} In

[jira] [Updated] (ARROW-5086) [Python] Space leak in ParquetFile.read_row_group()

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5086: - Labels: parquet (was: ) > [Python] Space leak in ParquetFile.read_row_group()

[jira] [Updated] (ARROW-5122) [Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5122: - Labels: parquet (was: ) > [Python] pyarrow.parquet.read_table raises non-file

[jira] [Updated] (ARROW-3947) [Python] query distinct values of a given partition from a ParquetDataset

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3947: - Labels: parquet (was: ) > [Python] query distinct values of a given partition

[jira] [Updated] (ARROW-5072) [Python] write_table fails silently on S3 errors

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5072: - Labels: parquet (was: ) > [Python] write_table fails silently on S3 errors >

[jira] [Updated] (ARROW-5028) [Python][C++] Arrow to Parquet conversion drops and corrupts values

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5028: - Labels: parquet (was: ) > [Python][C++] Arrow to Parquet conversion drops and

[jira] [Updated] (ARROW-4883) [Python] read_csv() returns garbage if given file object in text mode

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4883: - Labels: csv (was: ) > [Python] read_csv() returns garbage if given file object

[jira] [Updated] (ARROW-4885) [Python] read_csv() can't handle decimal128 columns

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4885: - Labels: csv (was: ) > [Python] read_csv() can't handle decimal128 columns >

[jira] [Updated] (ARROW-4516) [Python] Error while creating a ParquetDataset on a path without `_common_dataset` but with an empty `_tempfile`

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4516: - Labels: parquet (was: ) > [Python] Error while creating a ParquetDataset on a

[jira] [Updated] (ARROW-4470) [Python] Pyarrow using considerable more memory when reading partitioned Parquet file

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4470: - Labels: parquet (was: ) > [Python] Pyarrow using considerable more memory when

[jira] [Comment Edited] (ARROW-5104) [Python/C++] Schema for empty tables include index column as integer

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835375#comment-16835375 ] Joris Van den Bossche edited comment on ARROW-5104 at 5/8/19 7:01 AM:

[jira] [Commented] (ARROW-5104) [Python/C++] Schema for empty tables include index column as integer

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835375#comment-16835375 ] Joris Van den Bossche commented on ARROW-5104: -- [~fjetter] thanks for the report! The

[jira] [Commented] (ARROW-3448) [Python] Pandas roundtrip doesn't preserve list of datetime objects

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835411#comment-16835411 ] Joris Van den Bossche commented on ARROW-3448: -- Is this something we would like to fix? In

[jira] [Commented] (ARROW-4432) [Python][Hypothesis] Empty table - pandas roundtrip produces inequal tables

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4432?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835422#comment-16835422 ] Joris Van den Bossche commented on ARROW-4432: -- With latest master, the only difference is

[jira] [Commented] (ARROW-5122) [Python] pyarrow.parquet.read_table raises non-file path error when given a windows path to a directory

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835438#comment-16835438 ] Joris Van den Bossche commented on ARROW-5122: -- [~IlyaOrson] could you try again? >

[jira] [Updated] (ARROW-4967) Object type and stats lost when using 96-bit timestamps

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4967: - Component/s: C++ > Object type and stats lost when using 96-bit timestamps >

[jira] [Updated] (ARROW-4967) [C++] Parquet: Object type and stats lost when using 96-bit timestamps

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4967: - Summary: [C++] Parquet: Object type and stats lost when using 96-bit timestamps

[jira] [Commented] (ARROW-4967) Object type and stats lost when using 96-bit timestamps

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835477#comment-16835477 ] Joris Van den Bossche commented on ARROW-4967: -- [~yiannisliodakis] Regarding the logical

[jira] [Commented] (ARROW-3654) [Python] Column with CategoricalIndex fails to be read back

2019-05-02 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16831599#comment-16831599 ] Joris Van den Bossche commented on ARROW-3654: -- [~aberres] I cannot reproduce this with

[jira] [Updated] (ARROW-3779) [C++/Python] Validate timezone passed to pa.timestamp

2019-05-02 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3779: - Summary: [C++/Python] Validate timezone passed to pa.timestamp (was: [Python]

[jira] [Created] (ARROW-5248) [Python] support dateutil timezones

2019-05-02 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5248: Summary: [Python] support dateutil timezones Key: ARROW-5248 URL: https://issues.apache.org/jira/browse/ARROW-5248 Project: Apache Arrow

[jira] [Updated] (ARROW-4359) [Python][Parquet] Column metadata is not saved or loaded in parquet

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4359: - Description: Hi all, a while ago I posted this issue: ARROW-3866 While working

[jira] [Commented] (ARROW-4359) [Python] Column metadata is not saved or loaded in parquet

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833585#comment-16833585 ] Joris Van den Bossche commented on ARROW-4359: -- The arrow field metadata could in principle

[jira] [Updated] (ARROW-4359) [Python][Parquet] Column metadata is not saved or loaded in parquet

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4359: - Description: Hi all, a while ago I posted this issue:

[jira] [Updated] (ARROW-5264) [Java] Allow enabling/disabling boundary checking dynamically in the code

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5264: - Summary: [Java] Allow enabling/disabling boundary checking dynamically in the

[jira] [Updated] (ARROW-4359) [Python] Column metadata is not saved or loaded in parquet

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4359: - Summary: [Python] Column metadata is not saved or loaded in parquet (was:

[jira] [Updated] (ARROW-3866) [Python] Column metadata is not transferred to tables in pyarrow

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3866: - Description: Hello everyone, transferring this from Github for Pyarrow. While

[jira] [Commented] (ARROW-4492) [Python] Failure reading Parquet column as pandas Categorical in 0.12

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833590#comment-16833590 ] Joris Van den Bossche commented on ARROW-4492: -- [~gsakkis] What error did you get exactly?

[jira] [Updated] (ARROW-5258) [C++/Python] Expose file metadata of dataset pieces to caller

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5258: - Labels: parquet (was: ) > [C++/Python] Expose file metadata of dataset pieces

[jira] [Comment Edited] (ARROW-5138) [Python/C++] Row group retrieval doesn't restore index properly

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833719#comment-16833719 ] Joris Van den Bossche edited comment on ARROW-5138 at 5/6/19 11:29 AM:

[jira] [Commented] (ARROW-5139) [Python/C++] Empty column selection no longer restores index

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833597#comment-16833597 ] Joris Van den Bossche commented on ARROW-5139: -- [~fjetter] thanks for the report! A little

[jira] [Comment Edited] (ARROW-5139) [Python/C++] Empty column selection no longer restores index

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833597#comment-16833597 ] Joris Van den Bossche edited comment on ARROW-5139 at 5/6/19 9:37 AM:

[jira] [Commented] (ARROW-5139) [Python/C++] Empty column selection no longer restores index

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833656#comment-16833656 ] Joris Van den Bossche commented on ARROW-5139: -- A "quick and dirty" fix would be to special

[jira] [Commented] (ARROW-842) [Python] Handle more kinds of null sentinel objects from pandas 0.x

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833703#comment-16833703 ] Joris Van den Bossche commented on ARROW-842: - {{pd.NaT}} is one that is not recognized, but,

[jira] [Commented] (ARROW-5138) [Python/C++] Row group retrieval doesn't restore index properly

2019-05-06 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16833719#comment-16833719 ] Joris Van den Bossche commented on ARROW-5138: -- The issue here is that there is a mismatch

[jira] [Updated] (ARROW-4350) [Python] nested numpy arrays

2019-05-03 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4350: - Summary: [Python] nested numpy arrays (was: [Python] pyarrow table convert to

[jira] [Assigned] (ARROW-5238) [Python] Improve usability of pyarrow.dictionary function

2019-05-03 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-5238: Assignee: Joris Van den Bossche > [Python] Improve usability of

[jira] [Commented] (ARROW-5208) [Python] Inconsistent resulting type during casting in pa.array() when mask is present

2019-04-26 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16826904#comment-16826904 ] Joris Van den Bossche commented on ARROW-5208: -- To get started, I think the developer docs

[jira] [Updated] (ARROW-5089) [C++/Python] Writing dictionary encoded columns to parquet is extremely slow when using chunk size

2019-04-26 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5089: - Labels: parquet performance (was: performance) > [C++/Python] Writing

[jira] [Updated] (ARROW-5085) [Python/C++] Conversion of dict encoded null column fails in parquet writing when using RowGroups

2019-04-26 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5085: - Labels: parquet (was: ) > [Python/C++] Conversion of dict encoded null column

[jira] [Commented] (ARROW-3210) [Python] Creating ParquetDataset creates partitioned ParquetFiles with mismatched Parquet schemas

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835519#comment-16835519 ] Joris Van den Bossche commented on ARROW-3210: -- This has been fixed by ARROW-2891 (ensuring

[jira] [Comment Edited] (ARROW-3210) [Python] Creating ParquetDataset creates partitioned ParquetFiles with mismatched Parquet schemas

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16835519#comment-16835519 ] Joris Van den Bossche edited comment on ARROW-3210 at 5/8/19 11:34 AM:

[jira] [Closed] (ARROW-3210) [Python] Creating ParquetDataset creates partitioned ParquetFiles with mismatched Parquet schemas

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-3210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-3210. Resolution: Fixed Fix Version/s: (was: 0.14.0) 0.10.0

[jira] [Updated] (ARROW-4814) [Python] Exception when writing nested columns that are tuples to parquet

2019-05-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-4814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4814: - Labels: pandas (was: pandas parquet) > [Python] Exception when writing nested

[jira] [Created] (ARROW-5287) [Python] automatic type inference for arrays of tuples

2019-05-08 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5287: Summary: [Python] automatic type inference for arrays of tuples Key: ARROW-5287 URL: https://issues.apache.org/jira/browse/ARROW-5287 Project: Apache

[jira] [Updated] (ARROW-5853) [Python] Expose boolean filter kernel on Array

2019-07-04 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5853: - Description: Expose the filter kernel

[jira] [Updated] (ARROW-5853) [Python] Expose boolean filter kernel on Array

2019-07-04 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5853: - Description: Expose the filter kernel

[jira] [Created] (ARROW-5857) [Python] converting multidimensional numpy arrays to nested list type

2019-07-04 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5857: Summary: [Python] converting multidimensional numpy arrays to nested list type Key: ARROW-5857 URL: https://issues.apache.org/jira/browse/ARROW-5857

<    1   2   3   4   5   6   7   8   9   10   >