[jira] [Commented] (ARROW-8498) [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works

2020-04-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17086429#comment-17086429 ] Joris Van den Bossche commented on ARROW-8498: -- [~uwe] fixed this recently (ARROW-8159), so

[jira] [Resolved] (ARROW-8498) [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works

2020-04-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-8498. -- Fix Version/s: 0.17.0 Assignee: Uwe Korn Resolution: Fixed >

[jira] [Created] (ARROW-8345) [Python] feather.read_table should not require pandas

2020-04-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8345: Summary: [Python] feather.read_table should not require pandas Key: ARROW-8345 URL: https://issues.apache.org/jira/browse/ARROW-8345 Project: Apache

[jira] [Closed] (ARROW-8418) [Python] partition_filename_cb in write_to_dataset should be passed additional keyword arguments rather than just keys

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-8418. Resolution: Not A Problem > [Python] partition_filename_cb in write_to_dataset

[jira] [Comment Edited] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073116#comment-17073116 ] Joris Van den Bossche edited comment on ARROW-2882 at 4/14/20, 7:46 AM:

[jira] [Updated] (ARROW-5205) [Python][C++] Improved error messages when user erroneously uses a non-local resource URI to open a file

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5205: - Fix Version/s: 0.17.0 > [Python][C++] Improved error messages when user

[jira] [Reopened] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reopened ARROW-8414: -- More failures (eg

[jira] [Commented] (ARROW-8418) [Python] partition_filename_cb in write_to_dataset should be passed additional keyword arguments rather than just keys

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082940#comment-17082940 ] Joris Van den Bossche commented on ARROW-8418: -- [~varunbpatil] I think using a partial

[jira] [Closed] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-8208. Resolution: Fixed > [PYTHON] Row Group Filtering With ParquetDataset >

[jira] [Updated] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8208: - Labels: dataset (was: dataset dataset-parquet-read) > [PYTHON] Row Group

[jira] [Reopened] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reopened ARROW-8208: -- > [PYTHON] Row Group Filtering With ParquetDataset >

[jira] [Created] (ARROW-8439) [Python] Filesystem docs are outdated

2020-04-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8439: Summary: [Python] Filesystem docs are outdated Key: ARROW-8439 URL: https://issues.apache.org/jira/browse/ARROW-8439 Project: Apache Arrow

[jira] [Commented] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082957#comment-17082957 ] Joris Van den Bossche commented on ARROW-3424: -- The new dataset API supports creating a

[jira] [Commented] (ARROW-8418) [Python] partition_filename_cb in write_to_dataset should be passed additional keyword arguments rather than just keys

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082951#comment-17082951 ] Joris Van den Bossche commented on ARROW-8418: -- OK, thanks for the quick answer! > [Python]

[jira] [Commented] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082959#comment-17082959 ] Joris Van den Bossche commented on ARROW-2882: -- With ARROW-8039, this is now also exposed in

[jira] [Created] (ARROW-8442) [Python] NullType.to_pandas_dtype inconsisent with dtype returned in to_pandas/to_numpy

2020-04-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8442: Summary: [Python] NullType.to_pandas_dtype inconsisent with dtype returned in to_pandas/to_numpy Key: ARROW-8442 URL:

[jira] [Commented] (ARROW-2444) [Python][C++] Better handle reading empty parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083022#comment-17083022 ] Joris Van den Bossche commented on ARROW-2444: -- Opened ARROW-8442 specifically for the

[jira] [Commented] (ARROW-2444) [Python][C++] Better handle reading empty parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083021#comment-17083021 ] Joris Van den Bossche commented on ARROW-2444: -- There is still this behaviour of

[jira] [Commented] (ARROW-2444) [Python][C++] Better handle reading empty parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17083010#comment-17083010 ] Joris Van den Bossche commented on ARROW-2444: -- I didn't read the full discussion on the

[jira] [Commented] (ARROW-6976) Possible memory leak in pyarrow read_parquet

2020-04-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17088344#comment-17088344 ] Joris Van den Bossche commented on ARROW-6976: -- [~Athlete_369] that can be possible,

[jira] [Commented] (ARROW-8378) [Python] "empty" dtype metadata leads to wrong Parquet column type

2020-04-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079297#comment-17079297 ] Joris Van den Bossche commented on ARROW-8378: -- [~yiannisliodakis] I think the "bug" lies in

[jira] [Commented] (ARROW-8364) [Python] Get Access to the type_to_type_id dictionary

2020-04-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079301#comment-17079301 ] Joris Van den Bossche commented on ARROW-8364: -- [~archybald] can you give a concrete code

[jira] [Commented] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082123#comment-17082123 ] Joris Van den Bossche commented on ARROW-8414: -- Pull request:

[jira] [Resolved] (ARROW-8408) [Python] Add memory_map= toggle to pyarrow.feather.read_feather

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-8408. -- Resolution: Fixed Issue resolved by pull request 6905

[jira] [Assigned] (ARROW-8416) [Python] Provide a "feather" alias in the dataset API

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8416: Assignee: Joris Van den Bossche > [Python] Provide a "feather" alias in

[jira] [Created] (ARROW-8416) [Python] Provide a "feather" alias in the dataset API

2020-04-13 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8416: Summary: [Python] Provide a "feather" alias in the dataset API Key: ARROW-8416 URL: https://issues.apache.org/jira/browse/ARROW-8416 Project: Apache

[jira] [Assigned] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8290: Assignee: Joris Van den Bossche > [Python][Dataset] Improve ergonomy of

[jira] [Created] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-13 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8414: Summary: [Python] Non-deterministic row order failure in test_parquet.py Key: ARROW-8414 URL: https://issues.apache.org/jira/browse/ARROW-8414

[jira] [Updated] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8414: - Labels: pull-request-available (was: ) > [Python] Non-deterministic row order

[jira] [Commented] (ARROW-8406) [Python] FileSystem.from_uri erases the drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082178#comment-17082178 ] Joris Van den Bossche commented on ARROW-8406: -- AFAIK, on CI this happened on Azure (but not

[jira] [Updated] (ARROW-8406) [Python] FileSystem.from_uri erases the drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8406: - Fix Version/s: 0.17.0 > [Python] FileSystem.from_uri erases the drive on Windows

[jira] [Commented] (ARROW-8406) [Python] test_fs fails when run from a different drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082191#comment-17082191 ] Joris Van den Bossche commented on ARROW-8406: -- [~apitrou] I am not sure it is just the

[jira] [Commented] (ARROW-8406) [Python] test_fs fails when run from a different drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082224#comment-17082224 ] Joris Van den Bossche commented on ARROW-8406: -- > I cannot reproduce the test_parquet

[jira] [Comment Edited] (ARROW-8364) [Python] Get Access to the type_to_type_id dictionary

2020-04-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17079392#comment-17079392 ] Joris Van den Bossche edited comment on ARROW-8364 at 4/9/20, 2:35 PM:

[jira] [Commented] (ARROW-8106) [Python] Builds on master broken by pandas 1.0.2 release

2020-03-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058522#comment-17058522 ] Joris Van den Bossche commented on ARROW-8106: -- Did a quick PR to "fix" the test, can leave

[jira] [Assigned] (ARROW-8106) [Python] Builds on master broken by pandas 1.0.2 release

2020-03-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8106: Assignee: Joris Van den Bossche > [Python] Builds on master broken by

[jira] [Created] (ARROW-8314) [Python] Provide a method to select a subset of columns of a Table

2020-04-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8314: Summary: [Python] Provide a method to select a subset of columns of a Table Key: ARROW-8314 URL: https://issues.apache.org/jira/browse/ARROW-8314

[jira] [Commented] (ARROW-7009) [C++] Refactor filter/take kernels to use Datum instead of overloads

2020-04-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17075080#comment-17075080 ] Joris Van den Bossche commented on ARROW-7009: -- At least for {{filter}} it seems this has

[jira] [Created] (ARROW-8342) [Python] dask and kartothek integration tests are failing

2020-04-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8342: Summary: [Python] dask and kartothek integration tests are failing Key: ARROW-8342 URL: https://issues.apache.org/jira/browse/ARROW-8342 Project:

[jira] [Commented] (ARROW-8342) [Python] dask and kartothek integration tests are failing

2020-04-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17076082#comment-17076082 ] Joris Van den Bossche commented on ARROW-8342: -- So some things that are failing right now

[jira] [Closed] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-8208. Resolution: Implemented > [PYTHON] Row Group Filtering With ParquetDataset >

[jira] [Commented] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067016#comment-17067016 ] Joris Van den Bossche commented on ARROW-8208: -- This is now implemented, and also already

[jira] [Updated] (ARROW-8213) [Python][Dataset] Opening a dataset with a local incorrect path gives confusing error message

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8213: - Summary: [Python][Dataset] Opening a dataset with a local incorrect path gives

[jira] [Updated] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8208: - Description: Hello, I tried to use the row_group filtering at the file level

[jira] [Commented] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067018#comment-17067018 ] Joris Van den Bossche commented on ARROW-8208: -- [~cclienti] feedback on those new

[jira] [Commented] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067021#comment-17067021 ] Joris Van den Bossche commented on ARROW-8208: -- And also related to ARROW-8063 and

[jira] [Created] (ARROW-8220) [Python] Make dataset FileFormat objects serializable

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8220: Summary: [Python] Make dataset FileFormat objects serializable Key: ARROW-8220 URL: https://issues.apache.org/jira/browse/ARROW-8220 Project: Apache

[jira] [Updated] (ARROW-8221) [Python][Dataset] Expose schema inference / validation options in the factory

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8221: - Description: ARROW-8058 added options related to schema inference / validation

[jira] [Created] (ARROW-8221) [Python][Dataset] Expose schema inference / validation options in the factory

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8221: Summary: [Python][Dataset] Expose schema inference / validation options in the factory Key: ARROW-8221 URL: https://issues.apache.org/jira/browse/ARROW-8221

[jira] [Commented] (ARROW-8235) [C++][Compute] Filter out nulls by default

2020-03-26 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067954#comment-17067954 ] Joris Van den Bossche commented on ARROW-8235: -- Is the "KEEP" option useful? I am not aware

[jira] [Commented] (ARROW-8235) [C++][Compute] Filter out nulls by default

2020-03-26 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067960#comment-17067960 ] Joris Van den Bossche commented on ARROW-8235: -- Yes, base R emits null. (but you could maybe

[jira] [Commented] (ARROW-5176) [Python] Automate formatting of python files

2020-03-26 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17067854#comment-17067854 ] Joris Van den Bossche commented on ARROW-5176: -- Shall we "just do it" ? I just needed to

[jira] [Updated] (ARROW-8210) [C++][Dataset] Handling of duplicate columns in Dataset factory and scanning

2020-03-26 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8210: - Fix Version/s: 1.0.0 > [C++][Dataset] Handling of duplicate columns in Dataset

[jira] [Updated] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-03-27 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8244: - Component/s: Python > [Python][Parquet] Add `write_to_dataset` option to

[jira] [Updated] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-03-27 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8244: - Labels: parquet (was: ) > [Python][Parquet] Add `write_to_dataset` option to

[jira] [Updated] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-03-27 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8244: - Fix Version/s: 0.17.0 > [Python][Parquet] Add `write_to_dataset` option to

[jira] [Commented] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-03-27 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17068794#comment-17068794 ] Joris Van den Bossche commented on ARROW-8244: -- Thanks for opening the issue [~rjzamora]

[jira] [Updated] (ARROW-8213) [Python][Dataset] Opening a dataset with a local incorrect path gives confusing error message

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8213: - Description: Even after the previous PRs related to local paths

[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072558#comment-17072558 ] Joris Van den Bossche commented on ARROW-8245: -- [~coverman] would you be able to do a PR

[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072548#comment-17072548 ] Joris Van den Bossche commented on ARROW-8245: -- I just checked, and the C++ Datasets API

[jira] [Updated] (ARROW-8210) [C++][Dataset] Handling of duplicate columns in Dataset factory and scanning

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8210: - Description: While testing duplicate column names, I ran into multiple issues:

[jira] [Created] (ARROW-8210) [C++]

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8210: Summary: [C++] Key: ARROW-8210 URL: https://issues.apache.org/jira/browse/ARROW-8210 Project: Apache Arrow Issue Type: Bug

[jira] [Updated] (ARROW-8210) [C++][Dataset] Handling of duplicate columns in Dataset factory and scanning

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8210: - Component/s: C++ > [C++][Dataset] Handling of duplicate columns in Dataset

[jira] [Updated] (ARROW-8210) [C++][Dataset] Handling of duplicate columns in Dataset factory and scanning

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8210: - Summary: [C++][Dataset] Handling of duplicate columns in Dataset factory and

[jira] [Updated] (ARROW-8210) [C++][Dataset] Handling of duplicate columns in Dataset factory and scanning

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8210: - Description: While testing duplicate column names, I ran into multiple issues:

[jira] [Created] (ARROW-8213) [Python][Dataste] Opening a dataset with a local incorrect path gives confusing error message

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8213: Summary: [Python][Dataste] Opening a dataset with a local incorrect path gives confusing error message Key: ARROW-8213 URL:

[jira] [Updated] (ARROW-8210) [C++][Dataset] Handling of duplicate columns in Dataset factory and scanning

2020-03-25 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8210: - Component/s: C++ - Dataset > [C++][Dataset] Handling of duplicate columns in

[jira] [Created] (ARROW-8209) [Python] Accessing duplicate column of Table by name gives wrong error

2020-03-25 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8209: Summary: [Python] Accessing duplicate column of Table by name gives wrong error Key: ARROW-8209 URL: https://issues.apache.org/jira/browse/ARROW-8209

[jira] [Commented] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071845#comment-17071845 ] Joris Van den Bossche commented on ARROW-8244: -- So to summarize the issue: the python

[jira] [Updated] (ARROW-8286) [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8286: - Labels: dataset (was: ) > [Python] Creating dataset from pathlib results in

[jira] [Updated] (ARROW-8286) [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8286: - Component/s: Python > [Python] Creating dataset from pathlib results in

[jira] [Created] (ARROW-8286) [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8286: Summary: [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset Key: ARROW-8286 URL:

[jira] [Updated] (ARROW-8284) [C++][Dataset] Schema evolution for timestamp columns

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8284: - Docs Text: (was: In a dataset, one can timestamp columns with different

[jira] [Updated] (ARROW-8283) [C++/Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8283: - Description: When passing a list of files to the constructor of

[jira] [Updated] (ARROW-8284) [C++][Dataset] Schema evolution for timestamp columns

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8284: - Description: In a dataset, one can have timestamp columns with different

[jira] [Updated] (ARROW-8284) [C++][Dataset] Schema evolution for timestamp columns

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8284: - Description: In a dataset, one can timestamp columns with different resolutions.

[jira] [Created] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

2020-03-31 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8290: Summary: [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor Key: ARROW-8290 URL: https://issues.apache.org/jira/browse/ARROW-8290

[jira] [Updated] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8290: - Labels: dataset (was: ) > [Python][Dataset] Improve ergonomy of the

[jira] [Commented] (ARROW-8283) [C++/Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071871#comment-17071871 ] Joris Van den Bossche commented on ARROW-8283: -- [~npr] I am not fully sure it is a

[jira] [Updated] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8292: - Description: This is already a very simple fix to allow manually specifying the

[jira] [Updated] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8292: - Parent: ARROW-8221 Issue Type: Sub-task (was: Task) > [Python][Dataset]

[jira] [Updated] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8292: - Fix Version/s: 0.17.0 > [Python][Dataset] Passthrough schema to Factory.finish()

[jira] [Assigned] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8292: Assignee: Joris Van den Bossche > [Python][Dataset] Passthrough schema to

[jira] [Created] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8292: Summary: [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function Key: ARROW-8292 URL: https://issues.apache.org/jira/browse/ARROW-8292

[jira] [Updated] (ARROW-8221) [Python][Dataset] Expose schema inference / validation options in the factory

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8221: - Fix Version/s: (was: 0.17.0) 1.0.0 > [Python][Dataset]

[jira] [Updated] (ARROW-7965) [Python] Hold a reference to the dataset factory for later reuse

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7965: - Fix Version/s: 0.17.0 > [Python] Hold a reference to the dataset factory for

[jira] [Commented] (ARROW-7965) [Python] Hold a reference to the dataset factory for later reuse

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072496#comment-17072496 ] Joris Van den Bossche commented on ARROW-7965: -- This depends on ARROW-8164, but if that gets

[jira] [Resolved] (ARROW-7907) [Python] Conversion to pandas of empty table with timestamp type aborts

2020-03-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-7907. -- Resolution: Fixed This is fixed now by ARROW-8142 (and in the PR a test was

[jira] [Created] (ARROW-8276) [C++][Dataset] Scannin a Fragment does not take into account the partition columns

2020-03-30 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8276: Summary: [C++][Dataset] Scannin a Fragment does not take into account the partition columns Key: ARROW-8276 URL: https://issues.apache.org/jira/browse/ARROW-8276

[jira] [Commented] (ARROW-8276) [C++][Dataset] Scanning a Fragment does not take into account the partition columns

2020-03-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071140#comment-17071140 ] Joris Van den Bossche commented on ARROW-8276: -- Reproducer in python: {code} import pyarrow

[jira] [Comment Edited] (ARROW-8276) [C++][Dataset] Scanning a Fragment does not take into account the partition columns

2020-03-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17071140#comment-17071140 ] Joris Van den Bossche edited comment on ARROW-8276 at 3/30/20, 4:52 PM:

[jira] [Assigned] (ARROW-8063) [Python] Add user guide documentation for Datasets API

2020-03-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8063: Assignee: Joris Van den Bossche > [Python] Add user guide documentation

[jira] [Updated] (ARROW-8283) [C++/Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8283: - Fix Version/s: 1.0.0 > [C++/Python][Dataset] Non-existent files are silently

[jira] [Commented] (ARROW-2860) [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072799#comment-17072799 ] Joris Van den Bossche commented on ARROW-2860: -- So once we use the datasets API under the

[jira] [Commented] (ARROW-8307) [Python] Expose use_memory_map option in pyarrow.feather APIs

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17073012#comment-17073012 ] Joris Van den Bossche commented on ARROW-8307: -- In the using-facing parquet APIs, the option

[jira] [Commented] (ARROW-2860) [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072797#comment-17072797 ] Joris Van den Bossche commented on ARROW-2860: -- So several things here: - The original root

[jira] [Commented] (ARROW-2366) [Python][C++][Parquet] Support reading Parquet files having a permutation of column order

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072810#comment-17072810 ] Joris Van den Bossche commented on ARROW-2366: -- This is now implemented in the C++ Datasets

[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072793#comment-17072793 ] Joris Van den Bossche commented on ARROW-8245: -- You can already start with a PR fixing it in

[jira] [Updated] (ARROW-8039) [Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8039: - Summary: [Python][Dataset] Support using dataset API in pyarrow.parquet with a

[jira] [Commented] (ARROW-2659) [Python] More graceful reading of empty String columns in ParquetDataset

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17072806#comment-17072806 ] Joris Van den Bossche commented on ARROW-2659: -- The original root cause of this issue

<    5   6   7   8   9   10   11   12   13   14   >