[jira] [Updated] (ARROW-6114) [Python] Datatypes are not preserved when a pandas dataframe partitioned and saved as parquet file using pyarrow

2019-08-02 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6114: - Summary: [Python] Datatypes are not preserved when a pandas dataframe

[jira] [Commented] (ARROW-5480) [Python] Pandas categorical type doesn't survive a round-trip through parquet

2019-08-02 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16898725#comment-16898725 ] Joris Van den Bossche commented on ARROW-5480: -- {quote}One slightly higher level issue is

[jira] [Comment Edited] (ARROW-6173) [Python] error loading csv submodule

2019-08-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903046#comment-16903046 ] Joris Van den Bossche edited comment on ARROW-6173 at 8/8/19 2:54 PM:

[jira] [Commented] (ARROW-6176) [Python] Allow to subclass ExtensionArray to attach to custom extension type

2019-08-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903064#comment-16903064 ] Joris Van den Bossche commented on ARROW-6176: -- This might be done by adding a

[jira] [Commented] (ARROW-6173) [Python] error loading csv submodule

2019-08-08 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903046#comment-16903046 ] Joris Van den Bossche commented on ARROW-6173: -- The IO modules are not imported by default,

[jira] [Created] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-08 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-6179: Summary: [C++] ExtensionType subclass for "unknown" types? Key: ARROW-6179 URL: https://issues.apache.org/jira/browse/ARROW-6179 Project: Apache Arrow

[jira] [Created] (ARROW-6176) [Python] Allow to subclass ExtensionArray to attach to custom extension type

2019-08-08 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-6176: Summary: [Python] Allow to subclass ExtensionArray to attach to custom extension type Key: ARROW-6176 URL: https://issues.apache.org/jira/browse/ARROW-6176

[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903935#comment-16903935 ] Joris Van den Bossche commented on ARROW-6179: -- I suppose, if we go for this, it would

[jira] [Created] (ARROW-6187) [C++] fallback to storage type when writing ExtensionType to Parquet

2019-08-09 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-6187: Summary: [C++] fallback to storage type when writing ExtensionType to Parquet Key: ARROW-6187 URL: https://issues.apache.org/jira/browse/ARROW-6187

[jira] [Commented] (ARROW-6179) [C++] ExtensionType subclass for "unknown" types?

2019-08-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16903969#comment-16903969 ] Joris Van den Bossche commented on ARROW-6179: -- The bigquery usage of this, is that open

[jira] [Commented] (ARROW-6081) FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmptb2ao6te_job_6e0a8ca1.parquet'

2019-07-31 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16897125#comment-16897125 ] Joris Van den Bossche commented on ARROW-6081: -- The final error comes from bigquery, so you

[jira] [Created] (ARROW-6082) [Python] create pa.dictionary() type with non-integer indices type crashes

2019-07-31 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-6082: Summary: [Python] create pa.dictionary() type with non-integer indices type crashes Key: ARROW-6082 URL: https://issues.apache.org/jira/browse/ARROW-6082

[jira] [Assigned] (ARROW-6082) [Python] create pa.dictionary() type with non-integer indices type crashes

2019-08-01 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-6082: Assignee: Joris Van den Bossche > [Python] create pa.dictionary() type

[jira] [Assigned] (ARROW-6642) [Python] chained access of ParquetDataset's metadata segfaults

2019-09-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-6642: Assignee: Joris Van den Bossche > [Python] chained access of

[jira] [Commented] (ARROW-6620) [Python][CI] pandas-master build failing due to removal of "to_sparse" method

2019-09-19 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933625#comment-16933625 ] Joris Van den Bossche commented on ARROW-6620: -- [~wesm] I think this should already be

[jira] [Assigned] (ARROW-6652) [Python] to_pandas conversion removes timezone from type

2019-09-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-6652: Assignee: Joris Van den Bossche > [Python] to_pandas conversion removes

[jira] [Commented] (ARROW-6652) [Python] to_pandas conversion removes timezone from type

2019-09-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935106#comment-16935106 ] Joris Van den Bossche commented on ARROW-6652: -- This should be an easy fix. It seems that

[jira] [Commented] (ARROW-6652) [Python] to_pandas conversion removes timezone from type

2019-09-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16935111#comment-16935111 ] Joris Van den Bossche commented on ARROW-6652: -- Quickly did a PR

[jira] [Commented] (ARROW-4359) [Python] Column metadata is not saved or loaded in parquet

2019-09-19 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16933389#comment-16933389 ] Joris Van den Bossche commented on ARROW-4359: -- It could also be an option to use the

[jira] [Commented] (ARROW-6623) [CI][Python] Dask docker integration test broken perhaps by statistics-related change

2019-09-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934278#comment-16934278 ] Joris Van den Bossche commented on ARROW-6623: -- I elaborated a bit more on the dask issue,

[jira] [Created] (ARROW-6642) [Python] chained access of ParquetDataset's metadata segfaults

2019-09-20 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6642: Summary: [Python] chained access of ParquetDataset's metadata segfaults Key: ARROW-6642 URL: https://issues.apache.org/jira/browse/ARROW-6642

[jira] [Created] (ARROW-6704) [C++] Cast from timestamp to higher resolution does not check out of bounds timestamps

2019-09-26 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6704: Summary: [C++] Cast from timestamp to higher resolution does not check out of bounds timestamps Key: ARROW-6704 URL:

[jira] [Updated] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6737: - Description: {code} from pyarrow import json import pyarrow.parquet as pq r =

[jira] [Updated] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6737: - Description: {code} from pyarrow import json import pyarrow.parquet as pq r =

[jira] [Updated] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6737: - Description: {code} from pyarrow import json import pyarrow.parquet as pq r =

[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941683#comment-16941683 ] Joris Van den Bossche commented on ARROW-6737: -- [~harish1792] would you be able to provide a

[jira] [Created] (ARROW-6749) [Python] Conversion of non-ns timestamp array to numpy gives wrong values

2019-10-01 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6749: Summary: [Python] Conversion of non-ns timestamp array to numpy gives wrong values Key: ARROW-6749 URL: https://issues.apache.org/jira/browse/ARROW-6749

[jira] [Commented] (ARROW-6749) [Python] Conversion of non-ns timestamp array to numpy gives wrong values

2019-10-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941724#comment-16941724 ] Joris Van den Bossche commented on ARROW-6749: -- So the reason for this is that we are taking

[jira] [Commented] (ARROW-6749) [Python] Conversion of non-ns timestamp array to numpy gives wrong values

2019-10-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16941725#comment-16941725 ] Joris Van den Bossche commented on ARROW-6749: -- We seem to explicitly test this right now

[jira] [Commented] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-09-26 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16939148#comment-16939148 ] Joris Van den Bossche commented on ARROW-6719: -- Are you able to share some data or a script

[jira] [Updated] (ARROW-6713) [Python] Getting "ArrowIOError: Corrupted file, smaller than file footer" when reading large number of parquet files to ParquetDataset()

2019-09-26 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6713: - Labels: parquet (was: ) > [Python] Getting "ArrowIOError: Corrupted file,

[jira] [Updated] (ARROW-6719) Parquet read_table error in Python3.7: pyarrow.lib.ArrowInvalid: Column data for field with type list<...> is inconsistent with schema list<...>

2019-09-26 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6719: - Labels: parquet (was: ) > Parquet read_table error in Python3.7:

[jira] [Assigned] (ARROW-6674) [Python] Fix or ignore the test warnings

2019-09-24 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-6674: Assignee: Joris Van den Bossche > [Python] Fix or ignore the test

[jira] [Created] (ARROW-6674) [Python] Fix or ignore the test warnings

2019-09-24 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6674: Summary: [Python] Fix or ignore the test warnings Key: ARROW-6674 URL: https://issues.apache.org/jira/browse/ARROW-6674 Project: Apache Arrow

[jira] [Assigned] (ARROW-6158) [Python] possible to create StructArray with type that conflicts with child array's types

2019-09-24 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-6158: Assignee: Joris Van den Bossche > [Python] possible to create StructArray

[jira] [Commented] (ARROW-6551) [Python] Dask Parquet integration test failure

2019-09-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934196#comment-16934196 ] Joris Van den Bossche commented on ARROW-6551: -- This failed again in the last nightly run

[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942560#comment-16942560 ] Joris Van den Bossche commented on ARROW-6737: -- Thanks for providing the sample file. This

[jira] [Updated] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6760: - Description: When a column accidentally changes type in a JSON file (which is

[jira] [Commented] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942582#comment-16942582 ] Joris Van den Bossche commented on ARROW-6760: -- Indeed, a better error message would be

[jira] [Updated] (ARROW-6760) [C++] JSON: improve error message when column changed type

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6760: - Summary: [C++] JSON: improve error message when column changed type (was: JSON

[jira] [Created] (ARROW-6763) [Python] Parquet s3 tests are skipped because dependencies are not installed

2019-10-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6763: Summary: [Python] Parquet s3 tests are skipped because dependencies are not installed Key: ARROW-6763 URL: https://issues.apache.org/jira/browse/ARROW-6763

[jira] [Closed] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-6737. Resolution: Duplicate > Nested column branch had multiple children >

[jira] [Created] (ARROW-6762) [C++] JSON reader segfaults on newline

2019-10-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-6762: Summary: [C++] JSON reader segfaults on newline Key: ARROW-6762 URL: https://issues.apache.org/jira/browse/ARROW-6762 Project: Apache Arrow

[jira] [Commented] (ARROW-6737) Nested column branch had multiple children

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942567#comment-16942567 ] Joris Van den Bossche commented on ARROW-6737: -- I noticed that reading this file on master

[jira] [Commented] (ARROW-5655) [Python] Table.from_pydict/from_arrays not using types in specified schema correctly

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16942841#comment-16942841 ] Joris Van den Bossche commented on ARROW-5655: -- [~kszucs] I think this might already be

[jira] [Assigned] (ARROW-5855) [Python] Add support for Duration type

2019-10-02 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-5855: Assignee: Joris Van den Bossche > [Python] Add support for Duration type

[jira] [Comment Edited] (ARROW-6623) [CI][Python] Dask docker integration test broken perhaps by statistics-related change

2019-09-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934241#comment-16934241 ] Joris Van den Bossche edited comment on ARROW-6623 at 9/20/19 9:33 AM:

[jira] [Commented] (ARROW-6623) [CI][Python] Dask docker integration test broken perhaps by statistics-related change

2019-09-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934241#comment-16934241 ] Joris Van den Bossche commented on ARROW-6623: -- I opened an issue on the dask tracker:

[jira] [Commented] (ARROW-6551) [Python] Dask Parquet integration test failure

2019-09-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6551?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16934238#comment-16934238 ] Joris Van den Bossche commented on ARROW-6551: -- OK, I see there is already ARROW-6623 for

[jira] [Created] (ARROW-5603) [Python] registere pytest markers to avoid warnings

2019-06-14 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5603: Summary: [Python] registere pytest markers to avoid warnings Key: ARROW-5603 URL: https://issues.apache.org/jira/browse/ARROW-5603 Project: Apache

[jira] [Updated] (ARROW-5603) [Python] registere pytest markers to avoid warnings

2019-06-14 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5603: - Description: Currently the python test suite gives warnings like: {code}

[jira] [Commented] (ARROW-5888) [Python][C++] Parquet write metadata not roundtrip safe for timezone timestamps

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881503#comment-16881503 ] Joris Van den Bossche commented on ARROW-5888: -- The Parquet file format has no notion of

[jira] [Assigned] (ARROW-5873) [Python][C++] Segmentation fault when comparing schema with None

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-5873: Assignee: Joris Van den Bossche > [Python][C++] Segmentation fault when

[jira] [Commented] (ARROW-5889) [Python][C++] Parquet backwards compat for timestamps without timezone broken

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881495#comment-16881495 ] Joris Van den Bossche commented on ARROW-5889: -- This is very much related to ARROW-5878 /

[jira] [Created] (ARROW-5890) [C++][Python] Support ExtensionType arrays in more kernels

2019-07-09 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5890: Summary: [C++][Python] Support ExtensionType arrays in more kernels Key: ARROW-5890 URL: https://issues.apache.org/jira/browse/ARROW-5890 Project:

[jira] [Updated] (ARROW-5890) [C++][Python] Support ExtensionType arrays in more kernels

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5890: - Fix Version/s: 1.0.0 > [C++][Python] Support ExtensionType arrays in more

[jira] [Updated] (ARROW-5450) [Python] TimestampArray.to_pylist() fails with OverflowError: Python int too large to convert to C long

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5450: - Fix Version/s: 1.0.0 > [Python] TimestampArray.to_pylist() fails with

[jira] [Updated] (ARROW-5889) [Python][C++] Parquet backwards compat for timestamps without timezone broken

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5889: - Fix Version/s: 0.14.1 > [Python][C++] Parquet backwards compat for timestamps

[jira] [Commented] (ARROW-5889) [Python][C++] Parquet backwards compat for timestamps without timezone broken

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881492#comment-16881492 ] Joris Van den Bossche commented on ARROW-5889: -- [~fjetter] thanks for the testing and the

[jira] [Closed] (ARROW-5857) [Python] converting multidimensional numpy arrays to nested list type

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-5857. Resolution: Duplicate There was actually already an open issue for this:

[jira] [Assigned] (ARROW-5790) [Python] Passing zero-dim numpy array to pa.array causes segfault

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-5790: Assignee: Joris Van den Bossche > [Python] Passing zero-dim numpy array

[jira] [Commented] (ARROW-5895) [Python] New version stores timestamps as epoch ms instead of ISO timestamp string

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881549#comment-16881549 ] Joris Van den Bossche commented on ARROW-5895: -- [~johwilso1] Thanks for the report. Can you

[jira] [Commented] (ARROW-5610) [Python] Define extension type API in Python to "receive" or "send" a foreign extension type

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881581#comment-16881581 ] Joris Van den Bossche commented on ARROW-5610: -- I am trying to wrap my head around what is

[jira] [Commented] (ARROW-5895) [Python] New version stores timestamps as epoch ms instead of ISO timestamp string

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881585#comment-16881585 ] Joris Van den Bossche commented on ARROW-5895: -- So what changed in 0.14.0 compared to 0.13

[jira] [Comment Edited] (ARROW-5895) [Python] New version stores timestamps as epoch ms instead of ISO timestamp string

2019-07-09 Thread Joris Van den Bossche (JIRA)
[ https://issues.apache.org/jira/browse/ARROW-5895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881585#comment-16881585 ] Joris Van den Bossche edited comment on ARROW-5895 at 7/9/19 10:07 PM:

[jira] [Created] (ARROW-7027) [Python] pa.table(..) returns instead of raises error if passing invalid object

2019-10-30 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7027: Summary: [Python] pa.table(..) returns instead of raises error if passing invalid object Key: ARROW-7027 URL: https://issues.apache.org/jira/browse/ARROW-7027

[jira] [Assigned] (ARROW-7027) [Python] pa.table(..) returns instead of raises error if passing invalid object

2019-10-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7027: Assignee: Joris Van den Bossche > [Python] pa.table(..) returns instead

[jira] [Commented] (ARROW-7022) [Python] __arrow_array__ does not work for ExtensionTypes in Table.from_pandas

2019-10-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16962854#comment-16962854 ] Joris Van den Bossche commented on ARROW-7022: -- In the end, this appears not related to the

[jira] [Commented] (ARROW-6820) [C++] [Doc] [Format] Map specification and implementation inconsistent

2019-11-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968370#comment-16968370 ] Joris Van den Bossche commented on ARROW-6820: -- To see the description in the (old) docs,

[jira] [Commented] (ARROW-7071) [Python] Add Array convenience method to create "masked" view with different validity bitmap

2019-11-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968345#comment-16968345 ] Joris Van den Bossche commented on ARROW-7071: -- > NB: I'm not sure what kind of pitfalls

[jira] [Commented] (ARROW-7071) [Python] Add Array convenience method to create "masked" view with different validity bitmap

2019-11-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968354#comment-16968354 ] Joris Van den Bossche commented on ARROW-7071: -- Now, I think the main question is: what API

[jira] [Created] (ARROW-7068) [C++] Expose the offsets of a ListArray as a Int32Array

2019-11-05 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7068: Summary: [C++] Expose the offsets of a ListArray as a Int32Array Key: ARROW-7068 URL: https://issues.apache.org/jira/browse/ARROW-7068 Project: Apache

[jira] [Comment Edited] (ARROW-7076) `pip install pyarrow` with python 3.8 fail with message : Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly

2019-11-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968398#comment-16968398 ] Joris Van den Bossche edited comment on ARROW-7076 at 11/6/19 2:31 PM:

[jira] [Commented] (ARROW-7076) `pip install pyarrow` with python 3.8 fail with message : Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly

2019-11-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968398#comment-16968398 ] Joris Van den Bossche commented on ARROW-7076: -- There are not yet binary wheels available

[jira] [Commented] (ARROW-7076) `pip install pyarrow` with python 3.8 fail with message : Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly

2019-11-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16968417#comment-16968417 ] Joris Van den Bossche commented on ARROW-7076: -- See ARROW-6920 for wheels for Python 3.8 (I

[jira] [Commented] (ARROW-7039) [Python] Typecheck expects pandas to be installed

2019-10-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16964041#comment-16964041 ] Joris Van den Bossche commented on ARROW-7039: -- Ah, this was probably never covered by the

[jira] [Updated] (ARROW-6579) [Python] Parallel pyarrow.parquet.write_to_dataset

2019-10-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6579: - Component/s: Python > [Python] Parallel pyarrow.parquet.write_to_dataset >

[jira] [Updated] (ARROW-6579) [Python] Parallel pyarrow.parquet.write_to_dataset

2019-10-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6579: - Labels: dataset (was: ) > [Python] Parallel pyarrow.parquet.write_to_dataset >

[jira] [Created] (ARROW-7031) [Python] Expose the offsets of a ListArray in python

2019-10-30 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7031: Summary: [Python] Expose the offsets of a ListArray in python Key: ARROW-7031 URL: https://issues.apache.org/jira/browse/ARROW-7031 Project: Apache

[jira] [Commented] (ARROW-7031) [Python] Expose the offsets of a ListArray in python

2019-10-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16963184#comment-16963184 ] Joris Van den Bossche commented on ARROW-7031: -- While looking at this, I bumped into the

[jira] [Commented] (ARROW-6820) [C++] [Doc] [Format] Map specification and implementation inconsistent

2019-11-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973192#comment-16973192 ] Joris Van den Bossche commented on ARROW-6820: -- If both C++ and Java use "entries", we can

[jira] [Created] (ARROW-7154) [C++] Build error when building tests but not with snappy

2019-11-13 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7154: Summary: [C++] Build error when building tests but not with snappy Key: ARROW-7154 URL: https://issues.apache.org/jira/browse/ARROW-7154 Project:

[jira] [Updated] (ARROW-7154) [C++] Build error when building tests but not with snappy

2019-11-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7154: - Description: Since the docker-compose PR landed, I am having build errors like:

[jira] [Commented] (ARROW-7154) [C++] Build error when building tests but not with snappy

2019-11-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16973461#comment-16973461 ] Joris Van den Bossche commented on ARROW-7154: -- Creating a new conda env from scratch (which

[jira] [Updated] (ARROW-7168) [Python] pa.array() doesn't respect specified dictionary type

2019-11-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7168: - Summary: [Python] pa.array() doesn't respect specified dictionary type (was:

[jira] [Commented] (ARROW-7168) [Python] pa.array() doesn't respect provided dictionary type with all NaNs

2019-11-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16974511#comment-16974511 ] Joris Van den Bossche commented on ARROW-7168: -- [~buhrmann] thanks for the report. When

[jira] [Updated] (ARROW-7168) [Python] pa.array() doesn't respect provided dictionary type with all NaNs

2019-11-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7168: - Summary: [Python] pa.array() doesn't respect provided dictionary type with all

[jira] [Resolved] (ARROW-7023) [Python] pa.array does not use "from_pandas" semantics for pd.Index

2019-11-05 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-7023. -- Resolution: Fixed Issue resolved by pull request 5753

[jira] [Created] (ARROW-7066) [Python] support returning ChunkedArray from __arrow_array__ ?

2019-11-05 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7066: Summary: [Python] support returning ChunkedArray from __arrow_array__ ? Key: ARROW-7066 URL: https://issues.apache.org/jira/browse/ARROW-7066

[jira] [Commented] (ARROW-7063) [C++] Schema print method prints too much metadata

2019-11-05 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16967438#comment-16967438 ] Joris Van den Bossche commented on ARROW-7063: -- I also ran into this recently when looking

[jira] [Assigned] (ARROW-3444) [Python] Table.nbytes attribute

2019-11-08 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-3444: Assignee: Joris Van den Bossche > [Python] Table.nbytes attribute >

[jira] [Commented] (ARROW-7222) [Python] Wipe any existing generated Python API documentation when updating website

2019-11-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979135#comment-16979135 ] Joris Van den Bossche commented on ARROW-7222: -- It's indeed a different problem (and solving

[jira] [Commented] (ARROW-7222) [Python] Wipe any existing generated Python API documentation when updating website

2019-11-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979100#comment-16979100 ] Joris Van den Bossche commented on ARROW-7222: -- It could also be an option to keep older

[jira] [Commented] (ARROW-1644) [C++][Parquet] Read and write nested Parquet data with a mix of struct and list nesting levels

2019-11-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979326#comment-16979326 ] Joris Van den Bossche commented on ARROW-1644: -- [~RinkeHoekstra] that looks unrelated (the

[jira] [Commented] (ARROW-7226) [JSON][Python] Json loader fails on example in documentation.

2019-11-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16979375#comment-16979375 ] Joris Van den Bossche commented on ARROW-7226: -- So this may not be adequately documented,

[jira] [Assigned] (ARROW-7296) [Python] Add ORC api documentation

2019-12-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7296: Assignee: Joris Van den Bossche > [Python] Add ORC api documentation >

[jira] [Updated] (ARROW-7305) [Python] High memory usage writing pyarrow.Table to parquet

2019-12-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7305: - Labels: parquet (was: ) > [Python] High memory usage writing pyarrow.Table to

[jira] [Updated] (ARROW-7305) [Python] High memory usage writing pyarrow.Table with large strings to parquet

2019-12-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7305: - Summary: [Python] High memory usage writing pyarrow.Table with large strings to

[jira] [Updated] (ARROW-7345) [Python] Writing partitions with NaNs silently drops data

2019-12-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7345: - Labels: dataset parquet (was: parquet) > [Python] Writing partitions with NaNs

[jira] [Updated] (ARROW-7345) [Python] Writing partitions with NaNs silently drops data

2019-12-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7345: - Labels: parquet (was: ) > [Python] Writing partitions with NaNs silently drops

<    1   2   3   4   5   6   7   8   9   10   >