[jira] [Comment Edited] (ARROW-8276) [C++][Dataset] Scanning a Fragment does not take into account the partition columns

2020-03-30 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071140#comment-17071140 ] Joris Van den Bossche edited comment on ARROW-8276 at 3/30/20, 4:52 PM: ---

[jira] [Updated] (ARROW-8284) [C++][Dataset] Schema evolution for timestamp columns

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8284: - Docs Text: (was: In a dataset, one can timestamp columns with different resolut

[jira] [Updated] (ARROW-8283) [C++/Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8283: - Description: When passing a list of files to the constructor of {{pyarrow.datase

[jira] [Updated] (ARROW-8284) [C++][Dataset] Schema evolution for timestamp columns

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8284: - Description: In a dataset, one can have timestamp columns with different resoluti

[jira] [Updated] (ARROW-8284) [C++][Dataset] Schema evolution for timestamp columns

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8284: - Description: In a dataset, one can timestamp columns with different resolutions.

[jira] [Created] (ARROW-8286) [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8286: Summary: [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset Key: ARROW-8286 URL: https://issues.apache.org/jira/browse/ARR

[jira] [Updated] (ARROW-8286) [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8286: - Labels: dataset (was: ) > [Python] Creating dataset from pathlib results in Unio

[jira] [Updated] (ARROW-8286) [Python] Creating dataset from pathlib results in UnionDataset instead of FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8286: - Component/s: Python > [Python] Creating dataset from pathlib results in UnionData

[jira] [Commented] (ARROW-8244) [Python][Parquet] Add `write_to_dataset` option to populate the "file_path" metadata fields

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071845#comment-17071845 ] Joris Van den Bossche commented on ARROW-8244: -- So to summarize the issue: t

[jira] [Created] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

2020-03-31 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8290: Summary: [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor Key: ARROW-8290 URL: https://issues.apache.org/jira/browse/ARROW-8290

[jira] [Updated] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8290: - Labels: dataset (was: ) > [Python][Dataset] Improve ergonomy of the FileSystemDa

[jira] [Commented] (ARROW-8283) [C++/Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17071871#comment-17071871 ] Joris Van den Bossche commented on ARROW-8283: -- [~npr] I am not fully sure i

[jira] [Created] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8292: Summary: [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function Key: ARROW-8292 URL: https://issues.apache.org/jira/browse/ARROW-8292

[jira] [Updated] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8292: - Description: This is already a very simple fix to allow manually specifying the s

[jira] [Updated] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8292: - Parent: ARROW-8221 Issue Type: Sub-task (was: Task) > [Python][Dataset]

[jira] [Updated] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8292: - Fix Version/s: 0.17.0 > [Python][Dataset] Passthrough schema to Factory.finish()

[jira] [Assigned] (ARROW-8292) [Python][Dataset] Passthrough schema to Factory.finish() in dataset() function

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8292: Assignee: Joris Van den Bossche > [Python][Dataset] Passthrough schema to

[jira] [Updated] (ARROW-8221) [Python][Dataset] Expose schema inference / validation options in the factory

2020-03-31 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8221: - Fix Version/s: (was: 0.17.0) 1.0.0 > [Python][Dataset] Exp

[jira] [Updated] (ARROW-7965) [Python] Hold a reference to the dataset factory for later reuse

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7965: - Fix Version/s: 0.17.0 > [Python] Hold a reference to the dataset factory for late

[jira] [Commented] (ARROW-7965) [Python] Hold a reference to the dataset factory for later reuse

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072496#comment-17072496 ] Joris Van den Bossche commented on ARROW-7965: -- This depends on ARROW-8164,

[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072548#comment-17072548 ] Joris Van den Bossche commented on ARROW-8245: -- I just checked, and the C++

[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072558#comment-17072558 ] Joris Van den Bossche commented on ARROW-8245: -- [~coverman] would you be abl

[jira] [Updated] (ARROW-8213) [Python][Dataset] Opening a dataset with a local incorrect path gives confusing error message

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8213: - Description: Even after the previous PRs related to local paths (https://github.

[jira] [Commented] (ARROW-8245) [Python][Parquet] Skip hidden directories when reading partitioned parquet files

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072793#comment-17072793 ] Joris Van den Bossche commented on ARROW-8245: -- You can already start with a

[jira] [Commented] (ARROW-2860) [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072797#comment-17072797 ] Joris Van den Bossche commented on ARROW-2860: -- So several things here: - T

[jira] [Commented] (ARROW-2860) [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072799#comment-17072799 ] Joris Van den Bossche commented on ARROW-2860: -- So once we use the datasets

[jira] [Commented] (ARROW-2659) [Python] More graceful reading of empty String columns in ParquetDataset

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072806#comment-17072806 ] Joris Van den Bossche commented on ARROW-2659: -- The original root cause of t

[jira] [Updated] (ARROW-8039) [Python][Dataset] Support using dataset API in pyarrow.parquet with a minimal ParquetDataset shim

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8039: - Summary: [Python][Dataset] Support using dataset API in pyarrow.parquet with a mi

[jira] [Commented] (ARROW-2366) [Python][C++][Parquet] Support reading Parquet files having a permutation of column order

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17072810#comment-17072810 ] Joris Van den Bossche commented on ARROW-2366: -- This is now implemented in t

[jira] [Updated] (ARROW-8283) [C++/Python][Dataset] Non-existent files are silently dropped in pa.dataset.FileSystemDataset

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8283: - Fix Version/s: 1.0.0 > [C++/Python][Dataset] Non-existent files are silently drop

[jira] [Commented] (ARROW-8307) [Python] Expose use_memory_map option in pyarrow.feather APIs

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073012#comment-17073012 ] Joris Van den Bossche commented on ARROW-8307: -- In the using-facing parquet

[jira] [Commented] (ARROW-8213) [Python][Dataset] Opening a dataset with a local incorrect path gives confusing error message

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073051#comment-17073051 ] Joris Van den Bossche commented on ARROW-8213: -- Hmm, I would like to avoid t

[jira] [Commented] (ARROW-8251) [Python] pandas.ExtensionDtype does not survive round trip with write_to_dataset

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073080#comment-17073080 ] Joris Van den Bossche commented on ARROW-8251: -- [~Ged.Steponavicius] Thanks

[jira] [Commented] (ARROW-8251) [Python] pandas.ExtensionDtype does not survive round trip with write_to_dataset

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073084#comment-17073084 ] Joris Van den Bossche commented on ARROW-8251: -- ARROW-7782 might be related

[jira] [Commented] (ARROW-8307) [Python] Expose use_memory_map option in pyarrow.feather APIs

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073100#comment-17073100 ] Joris Van den Bossche commented on ARROW-8307: -- The {{memory_map}} is around

[jira] [Commented] (ARROW-2801) [Python][C++][Dataset] Implement splt_row_groups for ParquetDataset

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073103#comment-17073103 ] Joris Van den Bossche commented on ARROW-2801: -- The datasets API now provide

[jira] [Commented] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073116#comment-17073116 ] Joris Van den Bossche commented on ARROW-2882: -- This kind of partitioning sc

[jira] [Commented] (ARROW-8307) [Python] Expose use_memory_map option in pyarrow.feather APIs

2020-04-01 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073148#comment-17073148 ] Joris Van den Bossche commented on ARROW-8307: -- cc [~apitrou] > [Python] Ex

[jira] [Created] (ARROW-8314) [Python] Provide a method to select a subset of columns of a Table

2020-04-02 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8314: Summary: [Python] Provide a method to select a subset of columns of a Table Key: ARROW-8314 URL: https://issues.apache.org/jira/browse/ARROW-8314 Proj

[jira] [Commented] (ARROW-7009) [C++] Refactor filter/take kernels to use Datum instead of overloads

2020-04-04 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17075080#comment-17075080 ] Joris Van den Bossche commented on ARROW-7009: -- At least for {{filter}} it s

[jira] [Created] (ARROW-8342) [Python] dask and kartothek integration tests are failing

2020-04-05 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8342: Summary: [Python] dask and kartothek integration tests are failing Key: ARROW-8342 URL: https://issues.apache.org/jira/browse/ARROW-8342 Project: Apach

[jira] [Commented] (ARROW-8342) [Python] dask and kartothek integration tests are failing

2020-04-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076082#comment-17076082 ] Joris Van den Bossche commented on ARROW-8342: -- So some things that are fail

[jira] [Commented] (ARROW-8340) [Documentation] Sphinx documentation does not build with just-released Sphinx 3.0.0

2020-04-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076341#comment-17076341 ] Joris Van den Bossche commented on ARROW-8340: -- One problem I noticed in my

[jira] [Commented] (ARROW-8340) [Documentation] Sphinx documentation does not build with just-released Sphinx 3.0.0

2020-04-06 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076343#comment-17076343 ] Joris Van den Bossche commented on ARROW-8340: -- Actually, there are more bre

[jira] [Created] (ARROW-8345) [Python] feather.read_table should not require pandas

2020-04-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8345: Summary: [Python] feather.read_table should not require pandas Key: ARROW-8345 URL: https://issues.apache.org/jira/browse/ARROW-8345 Project: Apache Ar

[jira] [Commented] (ARROW-8378) [Python] "empty" dtype metadata leads to wrong Parquet column type

2020-04-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079297#comment-17079297 ] Joris Van den Bossche commented on ARROW-8378: -- [~yiannisliodakis] I think t

[jira] [Commented] (ARROW-8364) [Python] Get Access to the type_to_type_id dictionary

2020-04-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079301#comment-17079301 ] Joris Van den Bossche commented on ARROW-8364: -- [~archybald] can you give a

[jira] [Comment Edited] (ARROW-8364) [Python] Get Access to the type_to_type_id dictionary

2020-04-09 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17079392#comment-17079392 ] Joris Van den Bossche edited comment on ARROW-8364 at 4/9/20, 2:35 PM:

[jira] [Created] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-13 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8414: Summary: [Python] Non-deterministic row order failure in test_parquet.py Key: ARROW-8414 URL: https://issues.apache.org/jira/browse/ARROW-8414 Project

[jira] [Commented] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082123#comment-17082123 ] Joris Van den Bossche commented on ARROW-8414: -- Pull request: https://github

[jira] [Updated] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8414: - Labels: pull-request-available (was: ) > [Python] Non-deterministic row order fa

[jira] [Resolved] (ARROW-8408) [Python] Add memory_map= toggle to pyarrow.feather.read_feather

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-8408. -- Resolution: Fixed Issue resolved by pull request 6905 [https://github.com/apach

[jira] [Assigned] (ARROW-8416) [Python] Provide a "feather" alias in the dataset API

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8416: Assignee: Joris Van den Bossche > [Python] Provide a "feather" alias in th

[jira] [Created] (ARROW-8416) [Python] Provide a "feather" alias in the dataset API

2020-04-13 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8416: Summary: [Python] Provide a "feather" alias in the dataset API Key: ARROW-8416 URL: https://issues.apache.org/jira/browse/ARROW-8416 Project: Apache Ar

[jira] [Assigned] (ARROW-8290) [Python][Dataset] Improve ergonomy of the FileSystemDataset constructor

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8290: Assignee: Joris Van den Bossche > [Python][Dataset] Improve ergonomy of th

[jira] [Commented] (ARROW-8406) [Python] FileSystem.from_uri erases the drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082178#comment-17082178 ] Joris Van den Bossche commented on ARROW-8406: -- AFAIK, on CI this happened o

[jira] [Updated] (ARROW-8406) [Python] FileSystem.from_uri erases the drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8406: - Fix Version/s: 0.17.0 > [Python] FileSystem.from_uri erases the drive on Windows

[jira] [Commented] (ARROW-8406) [Python] test_fs fails when run from a different drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082191#comment-17082191 ] Joris Van den Bossche commented on ARROW-8406: -- [~apitrou] I am not sure it

[jira] [Commented] (ARROW-8406) [Python] test_fs fails when run from a different drive on Windows

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082224#comment-17082224 ] Joris Van den Bossche commented on ARROW-8406: -- > I cannot reproduce the tes

[jira] [Assigned] (ARROW-8427) [C++][Dataset] Do not ignore file paths with underscore/dot when full path was specified

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8427: Assignee: Ben Kietzman > [C++][Dataset] Do not ignore file paths with unde

[jira] [Created] (ARROW-8427) [C++][Dataset] Do not ignore file paths with underscore/dot when full path was specified

2020-04-13 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8427: Summary: [C++][Dataset] Do not ignore file paths with underscore/dot when full path was specified Key: ARROW-8427 URL: https://issues.apache.org/jira/browse/ARROW-

[jira] [Reopened] (ARROW-8414) [Python] Non-deterministic row order failure in test_parquet.py

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reopened ARROW-8414: -- More failures (eg https://github.com/apache/arrow/pull/6919/checks?check_run_id=58

[jira] [Created] (ARROW-8439) [Python] Filesystem docs are outdated

2020-04-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8439: Summary: [Python] Filesystem docs are outdated Key: ARROW-8439 URL: https://issues.apache.org/jira/browse/ARROW-8439 Project: Apache Arrow Is

[jira] [Commented] (ARROW-8418) [Python] partition_filename_cb in write_to_dataset should be passed additional keyword arguments rather than just keys

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082940#comment-17082940 ] Joris Van den Bossche commented on ARROW-8418: -- [~varunbpatil] I think using

[jira] [Commented] (ARROW-8418) [Python] partition_filename_cb in write_to_dataset should be passed additional keyword arguments rather than just keys

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082951#comment-17082951 ] Joris Van den Bossche commented on ARROW-8418: -- OK, thanks for the quick ans

[jira] [Closed] (ARROW-8418) [Python] partition_filename_cb in write_to_dataset should be passed additional keyword arguments rather than just keys

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-8418. Resolution: Not A Problem > [Python] partition_filename_cb in write_to_dataset shou

[jira] [Commented] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082957#comment-17082957 ] Joris Van den Bossche commented on ARROW-3424: -- The new dataset API supports

[jira] [Comment Edited] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073116#comment-17073116 ] Joris Van den Bossche edited comment on ARROW-2882 at 4/14/20, 7:46 AM: ---

[jira] [Commented] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17082959#comment-17082959 ] Joris Van den Bossche commented on ARROW-2882: -- With ARROW-8039, this is now

[jira] [Updated] (ARROW-5205) [Python][C++] Improved error messages when user erroneously uses a non-local resource URI to open a file

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5205: - Fix Version/s: 0.17.0 > [Python][C++] Improved error messages when user erroneous

[jira] [Updated] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8208: - Labels: dataset (was: dataset dataset-parquet-read) > [PYTHON] Row Group Filteri

[jira] [Reopened] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reopened ARROW-8208: -- > [PYTHON] Row Group Filtering With ParquetDataset > --

[jira] [Closed] (ARROW-8208) [PYTHON] Row Group Filtering With ParquetDataset

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-8208. Resolution: Fixed > [PYTHON] Row Group Filtering With ParquetDataset >

[jira] [Commented] (ARROW-2444) [Python][C++] Better handle reading empty parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083010#comment-17083010 ] Joris Van den Bossche commented on ARROW-2444: -- I didn't read the full discu

[jira] [Commented] (ARROW-2444) [Python][C++] Better handle reading empty parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083021#comment-17083021 ] Joris Van den Bossche commented on ARROW-2444: -- There is still this behaviou

[jira] [Created] (ARROW-8442) [Python] NullType.to_pandas_dtype inconsisent with dtype returned in to_pandas/to_numpy

2020-04-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8442: Summary: [Python] NullType.to_pandas_dtype inconsisent with dtype returned in to_pandas/to_numpy Key: ARROW-8442 URL: https://issues.apache.org/jira/browse/ARROW-8

[jira] [Commented] (ARROW-2444) [Python][C++] Better handle reading empty parquet files

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17083022#comment-17083022 ] Joris Van den Bossche commented on ARROW-2444: -- Opened ARROW-8442 specifical

[jira] [Created] (ARROW-8446) [Python][Dataset] Detect and use _metadata file in a list of file paths

2020-04-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8446: Summary: [Python][Dataset] Detect and use _metadata file in a list of file paths Key: ARROW-8446 URL: https://issues.apache.org/jira/browse/ARROW-8446

[jira] [Updated] (ARROW-8447) [C++][Dataset] Ensure Scanner::ToTable preserve ordering of ScanTasks

2020-04-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8447: - Fix Version/s: 1.0.0 > [C++][Dataset] Ensure Scanner::ToTable preserve ordering o

[jira] [Commented] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2020-04-15 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084305#comment-17084305 ] Joris Van den Bossche commented on ARROW-7385: -- No specific update. A PR is

[jira] [Commented] (ARROW-8276) [C++][Dataset] Scanning a Fragment does not take into account the partition columns

2020-04-15 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17084332#comment-17084332 ] Joris Van den Bossche commented on ARROW-8276: -- Fully agreed. In my mind, a

[jira] [Commented] (ARROW-8498) [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works

2020-04-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17086429#comment-17086429 ] Joris Van den Bossche commented on ARROW-8498: -- [~uwe] fixed this recently (

[jira] [Resolved] (ARROW-8498) [Python] Schema.from_pandas fails on extension type, while Table.from_pandas works

2020-04-18 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-8498. -- Fix Version/s: 0.17.0 Assignee: Uwe Korn Resolution: Fixed > [P

[jira] [Commented] (ARROW-6976) Possible memory leak in pyarrow read_parquet

2020-04-20 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17088344#comment-17088344 ] Joris Van den Bossche commented on ARROW-6976: -- [~Athlete_369] that can be p

[jira] [Commented] (ARROW-8545) [Python] Allow fast writing of Decimal column to parquet

2020-04-23 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17090385#comment-17090385 ] Joris Van den Bossche commented on ARROW-8545: -- As [~jacek.pliszka] says, th

[jira] [Updated] (ARROW-8074) [C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset?

2020-04-27 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8074: - Fix Version/s: 1.0.0 > [C++][Dataset] Support for file-like objects (buffers) in

[jira] [Commented] (ARROW-7076) `pip install pyarrow` with python 3.8 fail with message : Could not build wheels for pyarrow which use PEP 517 and cannot be installed directly

2020-04-27 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094192#comment-17094192 ] Joris Van den Bossche commented on ARROW-7076: -- [~ManthanAdmane] you will ne

[jira] [Updated] (ARROW-8610) [Rust] DivideByZero when running arrow crate when simd feature is disabled

2020-04-27 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8610: - Summary: [Rust] DivideByZero when running arrow crate when simd feature is disabl

[jira] [Updated] (ARROW-3861) [Python] ParquetDataset().read columns argument always returns partition column

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3861: - Fix Version/s: 1.0.0 > [Python] ParquetDataset().read columns argument always ret

[jira] [Assigned] (ARROW-3861) [Python] ParquetDataset().read columns argument always returns partition column

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-3861: Assignee: Joris Van den Bossche > [Python] ParquetDataset().read columns a

[jira] [Updated] (ARROW-6114) [Python] Datatypes are not preserved when a pandas dataframe partitioned and saved as parquet file using pyarrow

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-6114: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python] D

[jira] [Updated] (ARROW-3388) [Python] boolean Partition keys in ParquetDataset are reconstructed as string

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3388: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python] b

[jira] [Commented] (ARROW-3388) [Python] boolean Partition keys in ParquetDataset are reconstructed as string

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094329#comment-17094329 ] Joris Van den Bossche commented on ARROW-3388: -- An update on this issue: als

[jira] [Commented] (ARROW-3388) [Python] boolean Partition keys in ParquetDataset are reconstructed as string

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094334#comment-17094334 ] Joris Van den Bossche commented on ARROW-3388: -- Correction: it actually does

[jira] [Comment Edited] (ARROW-3388) [Python] boolean Partition keys in ParquetDataset are reconstructed as string

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17094334#comment-17094334 ] Joris Van den Bossche edited comment on ARROW-3388 at 4/28/20, 9:36 AM: ---

[jira] [Updated] (ARROW-3388) [Python] boolean Partition keys in ParquetDataset are reconstructed as string

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3388: - Fix Version/s: 1.0.0 > [Python] boolean Partition keys in ParquetDataset are reco

[jira] [Created] (ARROW-8613) [C++][Dataset] Raise error for unparsable partition value

2020-04-28 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8613: Summary: [C++][Dataset] Raise error for unparsable partition value Key: ARROW-8613 URL: https://issues.apache.org/jira/browse/ARROW-8613 Project: Apach

[jira] [Assigned] (ARROW-8251) [Python] pandas.ExtensionDtype does not survive round trip with write_to_dataset

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8251: Assignee: Joris Van den Bossche > [Python] pandas.ExtensionDtype does not

[jira] [Updated] (ARROW-8251) [Python] pandas.ExtensionDtype does not survive round trip with write_to_dataset

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8251: - Fix Version/s: 1.0.0 > [Python] pandas.ExtensionDtype does not survive round trip

[jira] [Updated] (ARROW-7782) [Python] Losing index information when using write_to_dataset with partition_cols

2020-04-28 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7782: - Fix Version/s: 1.0.0 > [Python] Losing index information when using write_to_data

<    1   2   3   4   5   6   7   8   9   10   >