[jira] [Created] (ARROW-7638) [Python] Segfault when inspecting dataset.Source with invalid file/partitioning

2020-01-21 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7638: Summary: [Python] Segfault when inspecting dataset.Source with invalid file/partitioning Key: ARROW-7638 URL: https://issues.apache.org/jira/browse/ARROW-7638

[jira] [Updated] (ARROW-7638) [Python] Segfault when inspecting dataset.Source with invalid file/partitioning

2020-01-21 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7638: - Description: Getting a segfault with: {code} In [1]: import pyarrow.dataset as

[jira] [Created] (ARROW-7547) [C++] [Python] [Dataset] Additional reader options in ParquetFileFormat

2020-01-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7547: Summary: [C++] [Python] [Dataset] Additional reader options in ParquetFileFormat Key: ARROW-7547 URL: https://issues.apache.org/jira/browse/ARROW-7547

[jira] [Created] (ARROW-7545) [C++] Scanning dataset with dictionary type hangs

2020-01-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7545: Summary: [C++] Scanning dataset with dictionary type hangs Key: ARROW-7545 URL: https://issues.apache.org/jira/browse/ARROW-7545 Project: Apache Arrow

[jira] [Updated] (ARROW-7545) [C++] [Dataset] Scanning dataset with dictionary type hangs

2020-01-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7545: - Summary: [C++] [Dataset] Scanning dataset with dictionary type hangs (was:

[jira] [Commented] (ARROW-7545) [C++] [Dataset] Scanning dataset with dictionary type hangs

2020-01-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17012681#comment-17012681 ] Joris Van den Bossche commented on ARROW-7545: -- So if the table has a single dictionary

[jira] [Commented] (ARROW-7413) [Python][Dataset] Add tests for PartitionSchemeDiscovery

2020-01-08 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17010713#comment-17010713 ] Joris Van den Bossche commented on ARROW-7413: -- [~bkietz] I suppose you are not working on

[jira] [Updated] (ARROW-5757) [Python] Stop supporting Python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5757: - Fix Version/s: (was: 2.0.0) 1.0.0 > [Python] Stop

[jira] [Commented] (ARROW-5757) [Python] Stop supporting Python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014965#comment-17014965 ] Joris Van den Bossche commented on ARROW-5757: -- Wes: {quote}We probably need to discuss

[jira] [Updated] (ARROW-7561) [Doc][Python] fix conda environment command

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7561: - Fix Version/s: 0.16.0 > [Doc][Python] fix conda environment command >

[jira] [Commented] (ARROW-7555) [Python] Drop support for python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014961#comment-17014961 ] Joris Van den Bossche commented on ARROW-7555: -- With conda, on the other hand, Python 2.7 is

[jira] [Commented] (ARROW-7555) [Python] Drop support for python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014963#comment-17014963 ] Joris Van den Bossche commented on ARROW-7555: -- But closing this as a duplicate of

[jira] [Closed] (ARROW-7555) [Python] Drop support for python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-7555. Resolution: Duplicate > [Python] Drop support for python 2.7 >

[jira] [Commented] (ARROW-5757) [Python] Stop supporting Python 2.7

2020-01-14 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17014964#comment-17014964 ] Joris Van den Bossche commented on ARROW-5757: -- Some discussion happened in ARROW-7555

[jira] [Created] (ARROW-7569) [Python] Add API to map Arrow types to pandas ExtensionDtypes for to_pandas conversions

2020-01-14 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7569: Summary: [Python] Add API to map Arrow types to pandas ExtensionDtypes for to_pandas conversions Key: ARROW-7569 URL:

[jira] [Created] (ARROW-7497) [Python] pandas master failures: pandas.util.testing is deprecated

2020-01-06 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-7497: Summary: [Python] pandas master failures: pandas.util.testing is deprecated Key: ARROW-7497 URL: https://issues.apache.org/jira/browse/ARROW-7497

[jira] [Resolved] (ARROW-7087) [Python] Table Metadata disappear when we write a partitioned dataset

2020-01-07 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-7087. -- Resolution: Fixed Issue resolved by pull request 6127

[jira] [Assigned] (ARROW-7087) [Python] Table Metadata disappear when we write a partitioned dataset

2020-01-07 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-7087: Assignee: François Blanchard > [Python] Table Metadata disappear when we

[jira] [Updated] (ARROW-7512) [C++] Dictionary memo missing elements in id_to_dictionary_ map after deserialization

2020-01-08 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7512: - Summary: [C++] Dictionary memo missing elements in id_to_dictionary_ map after

[jira] [Updated] (ARROW-8088) [C++][Dataset] Partition columns with specified dictionary type result in all nulls

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8088: - Description: When specifying an explicit schema for the Partitioning, and when

[jira] [Commented] (ARROW-3391) [Python] Support \0 characters in binary Parquet predicate values

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057780#comment-17057780 ] Joris Van den Bossche commented on ARROW-3391: -- [~uwe] What kind of wrong results do you

[jira] [Updated] (ARROW-8087) [C++][Dataset] Order of keys with HivePartitioning is lost in resulting schema

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8087: - Fix Version/s: 0.17.0 > [C++][Dataset] Order of keys with HivePartitioning is

[jira] [Created] (ARROW-8087) [C++][Dataset] Order of keys with HivePartitioning is lost in resulting schema

2020-03-12 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8087: Summary: [C++][Dataset] Order of keys with HivePartitioning is lost in resulting schema Key: ARROW-8087 URL: https://issues.apache.org/jira/browse/ARROW-8087

[jira] [Created] (ARROW-8088) [C++][Dataset] Partition columns with specified dictionary type result in all nulls

2020-03-12 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8088: Summary: [C++][Dataset] Partition columns with specified dictionary type result in all nulls Key: ARROW-8088 URL: https://issues.apache.org/jira/browse/ARROW-8088

[jira] [Updated] (ARROW-7858) [C++][Python] Support casting an Extension type to its storage type

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7858: - Issue Type: Improvement (was: Test) > [C++][Python] Support casting an

[jira] [Closed] (ARROW-5379) [Python] support pandas' nullable Integer type in from_pandas

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-5379. Resolution: Fixed > [Python] support pandas' nullable Integer type in from_pandas

[jira] [Commented] (ARROW-5379) [Python] support pandas' nullable Integer type in from_pandas

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057723#comment-17057723 ] Joris Van den Bossche commented on ARROW-5379: -- With the latest releases of pandas and

[jira] [Commented] (ARROW-8066) [Python] Specify behavior for converting tz-aware datetime.datetime objects to Arrow format

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057712#comment-17057712 ] Joris Van den Bossche commented on ARROW-8066: -- At least we should normalize to UTC, I think

[jira] [Resolved] (ARROW-7986) [Python] pa.Array.from_pandas cannot convert pandas.Series containing pyspark.ml.linalg.SparseVector

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche resolved ARROW-7986. -- Resolution: Fixed > [Python] pa.Array.from_pandas cannot convert pandas.Series

[jira] [Closed] (ARROW-7986) [Python] pa.Array.from_pandas cannot convert pandas.Series containing pyspark.ml.linalg.SparseVector

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-7986. > [Python] pa.Array.from_pandas cannot convert pandas.Series containing >

[jira] [Commented] (ARROW-7986) [Python] pa.Array.from_pandas cannot convert pandas.Series containing pyspark.ml.linalg.SparseVector

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17057704#comment-17057704 ] Joris Van den Bossche commented on ARROW-7986: -- OK, closing the issue here then, since the

[jira] [Created] (ARROW-8074) [C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset?

2020-03-11 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8074: Summary: [C++][Dataset] Support for file-like objects (buffers) in FileSystemDataset? Key: ARROW-8074 URL: https://issues.apache.org/jira/browse/ARROW-8074

[jira] [Updated] (ARROW-4633) [Python] ParquetFile.read(use_threads=False) creates ThreadPool anyway

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-4633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-4633: - Labels: dataset-parquet-read newbie parquet (was: newbie parquet) > [Python]

[jira] [Updated] (ARROW-2860) [Python][Parquet][C++] Null values in a single partition of Parquet dataset, results in invalid schema on read

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2860: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) >

[jira] [Updated] (ARROW-2728) [Python][C++][Dataset] Support partitioned Parquet datasets using glob-style file paths

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2728: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) >

[jira] [Updated] (ARROW-2659) [Python] More graceful reading of empty String columns in ParquetDataset

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2659: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python]

[jira] [Updated] (ARROW-2098) [Python] Implement "errors as null" option when coercing Python object arrays to Arrow format

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2098: - Labels: (was: parquet) > [Python] Implement "errors as null" option when

[jira] [Updated] (ARROW-1956) [Python] Support reading specific partitions from a partitioned parquet dataset

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-1956: - Labels: dataset-parquet-read parquet (was: parquet) > [Python] Support reading

[jira] [Updated] (ARROW-2079) [Python][C++] Possibly use `_common_metadata` for schema if `_metadata` isn't available

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2079: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) >

[jira] [Updated] (ARROW-2366) [Python][C++][Parquet] Support reading Parquet files having a permutation of column order

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2366: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) >

[jira] [Updated] (ARROW-2444) [Python][C++] Better handle reading empty parquet files

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2444: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) >

[jira] [Updated] (ARROW-1848) [Python] Add documentation examples for reading single Parquet files and datasets from HDFS

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-1848: - Labels: dataset-parquet-read filesystem parquet (was: filesystem parquet) >

[jira] [Updated] (ARROW-1682) [Python] Add documentation / example for reading a directory of Parquet files on S3

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-1682: - Labels: dataset-parquet-read filesystem parquet (was: filesystem parquet) >

[jira] [Updated] (ARROW-2077) [Python] Document on how to use Storefact & Arrow to read Parquet from S3/Azure/...

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2077: - Labels: dataset-parquet-read parquet (was: parquet) > [Python] Document on how

[jira] [Updated] (ARROW-5825) [Python] Exceptions swallowed in ParquetManifest._visit_directories

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5825: - Labels: dataset-parquet-read parquet (was: parquet) > [Python] Exceptions

[jira] [Updated] (ARROW-2801) [Python][C++][Dataset] Implement splt_row_groups for ParquetDataset

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2801: - Labels: dataset dataset-parquet-read parquet pull-request-available (was:

[jira] [Updated] (ARROW-5310) [Python] better error message on creating ParquetDataset from empty directory

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5310: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python]

[jira] [Updated] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2882: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) >

[jira] [Updated] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7385: - Labels: dataset parquet parquet-read (was: dataset parquet) > [Python]

[jira] [Updated] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7385: - Labels: dataset parquet (was: parquet) > [Python] ParquetDataset deadlock with

[jira] [Updated] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7385: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet

[jira] [Updated] (ARROW-3424) [Python] Improved workflow for loading an arbitrary collection of Parquet files

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3424: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python]

[jira] [Updated] (ARROW-3391) [Python] Support \0 characters in binary Parquet predicate values

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3391: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python] 

[jira] [Updated] (ARROW-3705) [Python] Add "nrows" argument to parquet.read_table read indicated number of rows from file instead of whole file

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3705: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python]

[jira] [Updated] (ARROW-3861) [Python] ParquetDataset().read columns argument always returns partition column

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3861: - Labels: dataset dataset-parquet-read parquet python (was: dataset parquet

[jira] [Updated] (ARROW-3245) [Python] Infer index and/or filtering from parquet column statistics

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3245: - Labels: dataset-parquet-read parquet (was: parquet) > [Python] Infer index

[jira] [Updated] (ARROW-3245) [Python] Infer index and/or filtering from parquet column statistics

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3245: - Labels: dataset dataset-parquet-read parquet (was: dataset-parquet-read

[jira] [Updated] (ARROW-3244) [Python] Multi-file parquet loading without scan

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3244: - Labels: dataset dataset-parquet-read parquet (was: dataset parquet) > [Python]

[jira] [Updated] (ARROW-5666) [Python] Underscores in partition (string) values are dropped when reading dataset

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5666: - Labels: dataset-parquet-read parquet (was: parquet) > [Python] Underscores in

[jira] [Updated] (ARROW-5572) [Python] raise error message when passing invalid filter in parquet reading

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-5572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-5572: - Labels: dataset-parquet-read parquet (was: parquet) > [Python] raise error

[jira] [Updated] (ARROW-3947) [Python] query distinct values of a given partition from a ParquetDataset

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3947: - Labels: dataset-parquet-read parquet (was: parquet) > [Python] query distinct

[jira] [Updated] (ARROW-7996) [Python] Error serializing empty pandas DataFrame with pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7996: - Labels: serialization (was: ) > [Python] Error serializing empty pandas

[jira] [Commented] (ARROW-8004) [Python] Define API for user-defined conversions of array cell values in pyarrow.array

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8004?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055776#comment-17055776 ] Joris Van den Bossche commented on ARROW-8004: -- For a more limited use case than general

[jira] [Updated] (ARROW-8010) [Python] Fixed size list not convertible to Numpy Array / pandas Series

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8010: - Summary: [Python] Fixed size list not convertible to Numpy Array / pandas Series

[jira] [Commented] (ARROW-7680) [C++][Dataset] Partition discovery is not working with windows path

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055828#comment-17055828 ] Joris Van den Bossche commented on ARROW-7680: -- Indeed, we are still getting the same error

[jira] [Commented] (ARROW-8010) [Python] Fixed size list not convertible to Numpy Array / pandas Series

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055833#comment-17055833 ] Joris Van den Bossche commented on ARROW-8010: -- [~balancap] Thanks for the report! I think

[jira] [Commented] (ARROW-7996) [Python] Error serializing empty pandas DataFrame with pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055760#comment-17055760 ] Joris Van den Bossche commented on ARROW-7996: -- The error comes from deserializing the

[jira] [Commented] (ARROW-7680) [C++][Dataset] Partition discovery is not working with windows path

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055815#comment-17055815 ] Joris Van den Bossche commented on ARROW-7680: -- Since ARROW-7677 is not yet resolved, I

[jira] [Comment Edited] (ARROW-7677) [C++] Handle Windows file paths with backslashes in GetTargetStats

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055814#comment-17055814 ] Joris Van den Bossche edited comment on ARROW-7677 at 3/10/20, 10:56 AM:

[jira] [Commented] (ARROW-7677) [C++] Handle Windows file paths with backslashes in GetTargetStats

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055814#comment-17055814 ] Joris Van den Bossche commented on ARROW-7677: -- It came up in a partitioned parquet dataset

[jira] [Closed] (ARROW-8010) [Python] Fixed size list not convertible to Numpy Array / pandas Series

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-8010. Resolution: Duplicate > [Python] Fixed size list not convertible to Numpy Array /

[jira] [Updated] (ARROW-2728) [Python][C++][Dataset] Support partitioned Parquet datasets using glob-style file paths

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-2728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-2728: - Component/s: C++ - Dataset > [Python][C++][Dataset] Support partitioned Parquet

[jira] [Updated] (ARROW-3154) [Python][C++] Document how to write _metadata, _common_metadata files with Parquet datasets

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-3154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-3154: - Component/s: C++ - Dataset > [Python][C++] Document how to write _metadata,

[jira] [Commented] (ARROW-7997) [Python] Schema equals method with inconsistent docs in pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055766#comment-17055766 ] Joris Van den Bossche commented on ARROW-7997: -- [~otaviocv] Thanks for the report! That is

[jira] [Updated] (ARROW-7997) [Python] Schema equals method with inconsistent docs in pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7997: - Component/s: Python > [Python] Schema equals method with inconsistent docs in

[jira] [Updated] (ARROW-7997) [Python] Schema equals method with inconsistent docs in pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7997: - Summary: [Python] Schema equals method with inconsistent docs in pyarrow (was:

[jira] [Commented] (ARROW-8052) [Python] requirements-test.txt cannot be used with conda install --file

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055830#comment-17055830 ] Joris Van den Bossche commented on ARROW-8052: -- I don't think this should be expected to

[jira] [Commented] (ARROW-8093) [CI][Crossbow] Pandas integration test fails

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058256#comment-17058256 ] Joris Van den Bossche commented on ARROW-8093: -- This is a duplicate of ARROW-7857 (sorry, I

[jira] [Closed] (ARROW-8093) [CI][Crossbow] Pandas integration test fails

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche closed ARROW-8093. Fix Version/s: (was: 0.17.0) Resolution: Duplicate > [CI][Crossbow]

[jira] [Updated] (ARROW-7857) [Python] Failing test with pandas master for extension type conversion

2020-03-12 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7857: - Fix Version/s: 0.17.0 > [Python] Failing test with pandas master for extension

[jira] [Commented] (ARROW-7996) Error serializing empty pandas DataFrame with pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055756#comment-17055756 ] Joris Van den Bossche commented on ARROW-7996: -- [~jdavidagudelo] Thanks for the report! A

[jira] [Updated] (ARROW-7996) [Python] Error serializing empty pandas DataFrame with pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-7996: - Summary: [Python] Error serializing empty pandas DataFrame with pyarrow (was:

[jira] [Commented] (ARROW-7956) [Python] Memory leak in pyarrow functions .ipc.serialize_pandas/deserialize_pandas

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17055783#comment-17055783 ] Joris Van den Bossche commented on ARROW-7956: -- [~wesm] I think this was closed by

[jira] [Updated] (ARROW-8060) [Python] Make dataset Expression objects serializable

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8060: - Fix Version/s: 0.17.0 > [Python] Make dataset Expression objects serializable >

[jira] [Created] (ARROW-8059) [Python] Make FileSystem objects serializable

2020-03-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8059: Summary: [Python] Make FileSystem objects serializable Key: ARROW-8059 URL: https://issues.apache.org/jira/browse/ARROW-8059 Project: Apache Arrow

[jira] [Updated] (ARROW-8059) [Python] Make FileSystem objects serializable

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche updated ARROW-8059: - Fix Version/s: 0.17.0 > [Python] Make FileSystem objects serializable >

[jira] [Created] (ARROW-8060) [Python] Make dataset Expression objects serializable

2020-03-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8060: Summary: [Python] Make dataset Expression objects serializable Key: ARROW-8060 URL: https://issues.apache.org/jira/browse/ARROW-8060 Project: Apache

[jira] [Created] (ARROW-8062) [C++][Dataset] Parquet Dataset factory from a _metadata/_common_metadata file

2020-03-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8062: Summary: [C++][Dataset] Parquet Dataset factory from a _metadata/_common_metadata file Key: ARROW-8062 URL: https://issues.apache.org/jira/browse/ARROW-8062

[jira] [Commented] (ARROW-8047) [Python][Documentation] Document migration from ParquetDataset to pyarrow.datasets

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056235#comment-17056235 ] Joris Van den Bossche commented on ARROW-8047: -- I also created ARROW-8063 for general user

[jira] [Commented] (ARROW-8061) [C++][Dataset] Ability to specify granularity of ParquetFileFragment (support row groups)

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056248#comment-17056248 ] Joris Van den Bossche commented on ARROW-8061: -- > Note that parallelism of RowGroup is

[jira] [Created] (ARROW-8061) [C++][Dataset] Ability to specify granularity of ParquetFileFragment (support row groups)

2020-03-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8061: Summary: [C++][Dataset] Ability to specify granularity of ParquetFileFragment (support row groups) Key: ARROW-8061 URL:

[jira] [Commented] (ARROW-8061) [C++][Dataset] Ability to specify granularity of ParquetFileFragment (support row groups)

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056201#comment-17056201 ] Joris Van den Bossche commented on ARROW-8061: -- Example usecase for this: for Dask, wich

[jira] [Created] (ARROW-8063) [Python] Add user guide documentation for Datasets API

2020-03-10 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8063: Summary: [Python] Add user guide documentation for Datasets API Key: ARROW-8063 URL: https://issues.apache.org/jira/browse/ARROW-8063 Project: Apache

[jira] [Commented] (ARROW-7997) [Python] Schema equals method with inconsistent docs in pyarrow

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056241#comment-17056241 ] Joris Van den Bossche commented on ARROW-7997: -- Actually, there is just today work going on

[jira] [Commented] (ARROW-8059) [Python] Make FileSystem objects serializable

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056251#comment-17056251 ] Joris Van den Bossche commented on ARROW-8059: -- Specifically for dask's usecase, it might

[jira] [Commented] (ARROW-8039) [C++][Python][Dataset] Assemble a minimal ParquetDataset shim

2020-03-10 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17056284#comment-17056284 ] Joris Van den Bossche commented on ARROW-8039: -- > We might focus this by saying that the

[jira] [Assigned] (ARROW-8427) [C++][Dataset] Do not ignore file paths with underscore/dot when full path was specified

2020-04-13 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joris Van den Bossche reassigned ARROW-8427: Assignee: Ben Kietzman > [C++][Dataset] Do not ignore file paths with

[jira] [Created] (ARROW-8427) [C++][Dataset] Do not ignore file paths with underscore/dot when full path was specified

2020-04-13 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-8427: Summary: [C++][Dataset] Do not ignore file paths with underscore/dot when full path was specified Key: ARROW-8427 URL:

[jira] [Commented] (ARROW-8276) [C++][Dataset] Scanning a Fragment does not take into account the partition columns

2020-04-15 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-8276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084332#comment-17084332 ] Joris Van den Bossche commented on ARROW-8276: -- Fully agreed. In my mind, all the

[jira] [Commented] (ARROW-7385) [Python] ParquetDataset deadlock with different metadata_nthreads values

2020-04-15 Thread Joris Van den Bossche (Jira)
[ https://issues.apache.org/jira/browse/ARROW-7385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17084305#comment-17084305 ] Joris Van den Bossche commented on ARROW-7385: -- No specific update. A PR is certainly

<    4   5   6   7   8   9   10   11   12   13   >