[I] Fail to open partitioned parquet with s3fs + pyarrow [arrow]

2023-11-20 Thread via GitHub
yf-yang opened a new issue, #38794: URL: https://github.com/apache/arrow/issues/38794 ### Describe the bug, including details regarding any error messages, version, and platform. ## How the parquet is created: ``` python import polars as pl import pyarrow.dataset as ds

[PR] add csv custom quote(and escape) file [arrow-testing]

2023-11-20 Thread via GitHub
Asura7969 opened a new pull request, #97: URL: https://github.com/apache/arrow-testing/pull/97 Part of [arrow-datafusion#8251](https://github.com/apache/arrow-datafusion/pull/8251) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

Re: [I] ci: add pipeline that fails PR until a milestone is assigned [arrow-adbc]

2023-11-20 Thread via GitHub
lidavidm closed issue #1296: ci: add pipeline that fails PR until a milestone is assigned URL: https://github.com/apache/arrow-adbc/issues/1296 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

Re: [I] [Java] Reword warning about required `--add-opens=java.base/java.nio=ALL-UNNAMED` [arrow]

2023-11-20 Thread via GitHub
lidavidm closed issue #38764: [Java] Reword warning about required `--add-opens=java.base/java.nio=ALL-UNNAMED` URL: https://github.com/apache/arrow/issues/38764 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[I] Data race of calling `GetToTimeFunc` of fixed timestamp data type [arrow]

2023-11-20 Thread via GitHub
7phs opened a new issue, #38795: URL: https://github.com/apache/arrow/issues/38795 ### Describe the bug, including details regarding any error messages, version, and platform. Hello, I found a data race in `databricks-sql-go`. A root cause of data race is timestamp fields of

Re: [I] [Java][FlightRPC] Add support for Array parameter binding in JDBC driver [arrow]

2023-11-20 Thread via GitHub
lidavidm closed issue #38732: [Java][FlightRPC] Add support for Array parameter binding in JDBC driver URL: https://github.com/apache/arrow/issues/38732 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[I] [Integration] Enable C Data Interface integration testing on Rust [arrow]

2023-11-20 Thread via GitHub
pitrou opened a new issue, #38798: URL: https://github.com/apache/arrow/issues/38798 ### Describe the enhancement requested Arrow Rust has added entrypoints for C Data Interface integration testing, so this can now be enabled on our side:

[I] [R] Embedded nuls in Parquets are not correctly signalled [arrow]

2023-11-20 Thread via GitHub
lgaborini opened a new issue, #38804: URL: https://github.com/apache/arrow/issues/38804 ### Describe the bug, including details regarding any error messages, version, and platform. With {arrow} 14.0.0, I was able to import a large number of CSVs, merge the schemas, establish a

[I] [Python] duration[arrow] support in pandas [arrow]

2023-11-20 Thread via GitHub
sergun opened a new issue, #38805: URL: https://github.com/apache/arrow/issues/38805 ### Describe the usage question you have. Please include as many useful details as possible. Is type duration[pyarrow] supported in pandas? This piece of code creates pd.DataFrame with

[I] pyarrow.concat_tables did not really enable the unify_schemas, it only supply unify_schemas options but not actually enable it [arrow]

2023-11-20 Thread via GitHub
braindevices opened a new issue, #38809: URL: https://github.com/apache/arrow/issues/38809 ### Describe the bug, including details regarding any error messages, version, and platform. ### Describe the enhancement requested in R there is `concat_tables(..., unify_schemas =

Re: [I] please expose the unify_schemas option in pyarrow.concat_tables [arrow]

2023-11-20 Thread via GitHub
braindevices closed issue #38807: please expose the unify_schemas option in pyarrow.concat_tables URL: https://github.com/apache/arrow/issues/38807 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[I] [R] cmake download doesn't work on macOS [arrow]

2023-11-20 Thread via GitHub
jonkeane opened a new issue, #38811: URL: https://github.com/apache/arrow/issues/38811 ### Describe the bug, including details regarding any error messages, version, and platform. When downloading from

[I] Exception on csv.read_csv() leaves opened handle on Windows [arrow]

2023-11-20 Thread via GitHub
INRIX-Mark-Gershaft opened a new issue, #38812: URL: https://github.com/apache/arrow/issues/38812 ### Describe the bug, including details regarding any error messages, version, and platform. On Windows via pyarrow reading CSV file with a very, very, very long field (several million

[I] go/adbc/driver/snowflake: GetObjects API inconsistent case sensitivity for patterns [arrow-adbc]

2023-11-20 Thread via GitHub
ryan-syed opened a new issue, #1314: URL: https://github.com/apache/arrow-adbc/issues/1314 The spec doesn't seem clear for case-sensitivity. Similarly, in the `getObjectsDbSchemas` driver implementation it uses `LIKE` whereas `getObjectsTables` uses `ILIKE`. What should be the

[I] pyarrow loading the parquet does not follow the saved row order [arrow]

2023-11-20 Thread via GitHub
braindevices opened a new issue, #38808: URL: https://github.com/apache/arrow/issues/38808 ### Describe the bug, including details regarding any error messages, version, and platform. when we load the parquet partitioned dataset, we would expect the row order is the same as the

[I] please expose the uni [arrow]

2023-11-20 Thread via GitHub
braindevices opened a new issue, #38807: URL: https://github.com/apache/arrow/issues/38807 ### Describe the enhancement requested in R there is `concat_tables(..., unify_schemas = TRUE)`(https://arrow.apache.org/docs/r/reference/concat_tables.html) In python this function does not

[I] Errors using pyarrow Dataset with adbc_ingest() for adbc_driver_postgres() [arrow-adbc]

2023-11-20 Thread via GitHub
DanteOz opened a new issue, #1310: URL: https://github.com/apache/arrow-adbc/issues/1310 I'm running into errors when trying to bulk load data into postgres using `adbc_driver_postgres` with a `pyarrow.dataset`. The dataset is composed of partitioned csv files. I have verified that using

[I] [Python] Build in Amazon Linux 2023 fails [arrow]

2023-11-20 Thread via GitHub
bascheibler opened a new issue, #38810: URL: https://github.com/apache/arrow/issues/38810 ### Describe the bug, including details regarding any error messages, version, and platform. I'm trying to build a slim version of PyArrow, so that it fits in an AWS Lambda function. The base

[I] [Python] Add function/method to deepcopy a `pa.Table` [arrow]

2023-11-20 Thread via GitHub
hendrikmakait opened a new issue, #38806: URL: https://github.com/apache/arrow/issues/38806 ### Describe the enhancement requested # Problem In Dask, we need to force deep-copies of `pa.Table`s to ensure that views/slices sever references to the original buffers and allow us

[I] [C#] [arrow]

2023-11-20 Thread via GitHub
CurtHagenlocher opened a new issue, #38816: URL: https://github.com/apache/arrow/issues/38816 ### Describe the bug, including details regarding any error messages, version, and platform. The IArrowRecord implementation on StructArray should go through the Fields property instead of

[I] Parquet dictionary filter pushdown slow [arrow]

2023-11-20 Thread via GitHub
alippai opened a new issue, #38818: URL: https://github.com/apache/arrow/issues/38818 ### Describe the enhancement requested I have a sorted column of type `pa.utf8()`. If I write it directly and set the parquet `use_dictionary`, the `read_table(..., filters=[('column', '=',

[I] [C++][Parquet] Update parquet.thrift to sync with 2.10.0 [arrow]

2023-11-20 Thread via GitHub
wgtmac opened a new issue, #38814: URL: https://github.com/apache/arrow/issues/38814 ### Describe the enhancement requested Apache Parquet has just released its format version 2.10.0: https://parquet.apache.org/blog/2023/11/20/2.10.0/. We need to update parquet.thirft and its

[I] [CI][C++] arrow-dataset-dataset-writer-test in the ASAN UBSAN job failed [arrow]

2023-11-20 Thread via GitHub
kou opened a new issue, #38817: URL: https://github.com/apache/arrow/issues/38817 ### Describe the bug, including details regarding any error messages, version, and platform. Since https://github.com/apache/arrow/commit/7df1cdd0fe364f86eca15bfc482210994ba8e2f0 but the change isn't

Re: [I] [C++] Implement file writes for Azure filesystem [arrow]

2023-11-20 Thread via GitHub
kou closed issue #38333: [C++] Implement file writes for Azure filesystem URL: https://github.com/apache/arrow/issues/38333 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To