[jira] [Created] (ARROW-16424) [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles
Ariana Villegas created ARROW-16424: --- Summary: [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles Key: ARROW-16424 URL: https://issues.apache.org/jira/browse/ARROW-16424 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Ariana Villegas FromProto function in {{arrow/engine/substrait/relation_internal.cc}} parse {{uri_path}} with {{string_view}} utilities. However this should be done with {{Uri}} class from {{arrow/util/uri.h.}} {code:c++} else if (util::string_view{path}.ends_with(".arrow")) { format = std::make_shared(); } else if (util::string_view{path}.ends_with(".feather")) { format = std::make_shared(); {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[GitHub] [arrow-julia] pcjentsch opened a new issue, #320: Versions in footer and message do not agree, this causes issues reading Arrow files with other libraries (such as `arrow-rs`).
pcjentsch opened a new issue, #320: URL: https://github.com/apache/arrow-julia/issues/320 here version is set as V5 https://github.com/apache/arrow-julia/blob/a3f6da7c1d59f8321315f4955a3b45c48d38aab4/src/write.jl#L374 and here as V4 https://github.com/apache/arrow-julia/blob/a3f6da7c1d59f8321315f4955a3b45c48d38aab4/src/write.jl#L374 This causes issues reading files written with Arrow.jl from other languages' arrow implementations. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-16423) R arrow/dplyr: simple join and collect crashes session
Andrew C Thomas created ARROW-16423: --- Summary: R arrow/dplyr: simple join and collect crashes session Key: ARROW-16423 URL: https://issues.apache.org/jira/browse/ARROW-16423 Project: Apache Arrow Issue Type: Bug Components: R Affects Versions: 7.0.0 Reporter: Andrew C Thomas Trying to do an inner join style filter on an open_dataset, and R crashes, but not reliably the first time. Sometimes takes a couple of tries until it does. Reprex follows. -- library (arrow) library (dplyr) library (tidyr) DataSet <- expand_grid (A = 1:10, B = 1:10, C = 1:1) %>% group_by (A, B) write_dataset(DataSet, "TestBreakData") for (DoThisUntilItBreaks in 1:100) { message (DoThisUntilItBreaks) D2 <- open_dataset("TestBreakData") %>% inner_join (data.frame (A=1L, B=1:5)) %>% collect } -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16422) CentOS 8 package "No match for argument: arrow-devel"
Chandler Sobel-Sorenson created ARROW-16422: --- Summary: CentOS 8 package "No match for argument: arrow-devel" Key: ARROW-16422 URL: https://issues.apache.org/jira/browse/ARROW-16422 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 7.0.0 Reporter: Chandler Sobel-Sorenson I'm running the commands to install C++ and GLib (C) Packages for CentOS 8 and getting the error "No match for argument: arrow-devel": {{[root@EagI NanoPlot-1.33.0]# dnf install -y epel-release}} {{Last metadata expiration check: 2:00:55 ago on Fri Apr 29 10:30:24 2022.}} {{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}} {{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}} {{Package epel-release-8-15.el8.noarch is already installed.}} {{Dependencies resolved.}} {{Nothing to do.}} {{Complete!}} {{[root@EagI NanoPlot-1.33.0]# dnf install -y https://apache.jfrog.io/artifactory/arrow/centos/$(cut -d: -f5 /etc/system-release-cpe | cut -d. -f1)/apache-arrow-release-latest.rpm}} {{Last metadata expiration check: 2:01:08 ago on Fri Apr 29 10:30:24 2022.}} {{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}} {{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}} {{apache-arrow-release-latest.rpm 138 kB/s | 60 kB 00:00 }} {{Package apache-arrow-release-6.0.1-1.el8.noarch is already installed.}} {{Dependencies resolved.}} {{Nothing to do.}} {{Complete!}} {{[root@EagI NanoPlot-1.33.0]# dnf config-manager --set-enabled epel}} {{[root@EagI NanoPlot-1.33.0]# dnf config-manager --set-enabled powertools}} {{[root@EagI NanoPlot-1.33.0]# dnf install -y arrow-devel}} {{CentOS Linux 8 - PowerTools 50 kB/s | 4.3 kB 00:00 }} {{Extra Packages for Enterprise Linux 8 - x86_64 38 kB/s | 8.0 kB 00:00 }} {{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}} {{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}} {{No match for argument: arrow-devel}} {{Error: Unable to find a match: arrow-devel}} {{[root@EagI NanoPlot-1.33.0]# }} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16421) [R] Permission error on Windows when deleting file in dataset
Will Jones created ARROW-16421: -- Summary: [R] Permission error on Windows when deleting file in dataset Key: ARROW-16421 URL: https://issues.apache.org/jira/browse/ARROW-16421 Project: Apache Arrow Issue Type: Improvement Components: R Affects Versions: 7.0.0 Reporter: Will Jones Assignee: Will Jones On Windows this fails: {code:r} library(arrow) write_dataset(iris, "test_dataset") con <- open_dataset("test_dataset") |> to_duckdb() file.remove("test_dataset/part-0.parquet") #> Warning in file.remove("test_dataset/part-0.parquet"): cannot remove file #> 'test_dataset/part-0.parquet', reason 'Permission denied' #> [1] FALSE {code} But on MacOS it does not: {code:R} library(arrow) write_dataset(iris, "test_dataset") con <- open_dataset("test_dataset") |> to_duckdb() file.remove("test_dataset/part-0.parquet") #> [1] TRUE {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16420) [Python] pq.write_to_dataset always ignores partitioning
David Li created ARROW-16420: Summary: [Python] pq.write_to_dataset always ignores partitioning Key: ARROW-16420 URL: https://issues.apache.org/jira/browse/ARROW-16420 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.0 Reporter: David Li The code unconditionally sets {{partitioning}} to None, so the user-supplied partitioning is ignored. https://github.com/apache/arrow/blob/edf7334fc38ec9bc2e019bf400403e7c61fb585e/python/pyarrow/parquet/__init__.py#L3143-L3146 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16419) [Python] pyarrow._exec_plan.execplan doesn't wait for plan to finish
David Li created ARROW-16419: Summary: [Python] pyarrow._exec_plan.execplan doesn't wait for plan to finish Key: ARROW-16419 URL: https://issues.apache.org/jira/browse/ARROW-16419 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 8.0.0 Reporter: David Li It calls StopProducing but doesn't actually wait for finished(). This tends to cause "Plan was destroyed before finishing" to get printed. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16418) [R] Refactor the difftime() and as.diffime() bindings
Dragoș Moldovan-Grünfeld created ARROW-16418: Summary: [R] Refactor the difftime() and as.diffime() bindings Key: ARROW-16418 URL: https://issues.apache.org/jira/browse/ARROW-16418 Project: Apache Arrow Issue Type: Improvement Components: R Affects Versions: 8.0.0 Reporter: Dragoș Moldovan-Grünfeld Fix For: 9.0.0 ARROW-16060 is solved and these 2 functions have high cyclomatic complexity -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16417) [C++][Python] Segfault in test_exec_plan.py / test_joins
David Li created ARROW-16417: Summary: [C++][Python] Segfault in test_exec_plan.py / test_joins Key: ARROW-16417 URL: https://issues.apache.org/jira/browse/ARROW-16417 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 8.0.0 Reporter: David Li Occurs during wheel verification. It also happens to master. The failure is sporadic but fairly reliable. test_joins is parameterized; it's not consistent in the parameters it occurs on, but it consistently occurs on that test. The backtrace reaches into malloc_consolidate. MALLOC_CHECK doesn't help. However: {noformat} (gdb) b main Breakpoint 1 at 0x11ea20: file /home/conda/feedstock_root/build_artifacts/python-split_1625973859697/work/Programs/python.c, line 15. (gdb) command 1 Type commands for breakpoint(s) 1, one per line. End with a line saying just "end". >call mcheck(0) >continue >end {noformat} This fairly consistently fails with "memory clobbered before allocated block" but the location varies. This may be a red herring though. I also tried LD_PRELOADING a secure build of mimalloc to see if it would catch any sort of heap corruption but instead the tests pass consistently with mimalloc. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16416) [C++] Support cast-function in Substrait
Yaron Gvili created ARROW-16416: --- Summary: [C++] Support cast-function in Substrait Key: ARROW-16416 URL: https://issues.apache.org/jira/browse/ARROW-16416 Project: Apache Arrow Issue Type: Improvement Reporter: Yaron Gvili The cast-function is special in Arrow, because its operation is determined by its output type rather than just by its parameter, and so it requires special handling in Substrait to support it. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16415) [R] Update strptime and fast_strptime bindings to use tz
Dragoș Moldovan-Grünfeld created ARROW-16415: Summary: [R] Update strptime and fast_strptime bindings to use tz Key: ARROW-16415 URL: https://issues.apache.org/jira/browse/ARROW-16415 Project: Apache Arrow Issue Type: Improvement Components: R Affects Versions: 7.0.0 Reporter: Dragoș Moldovan-Grünfeld Fix For: 9.0.0 Both functions mention they do not support {{tz}} - the timezone argument. ARROW-12820 has been addressed and the bindings definitions need updating. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16414) [R] Remove ARROW_R_WITH_ARROW and arrow_available()
Neal Richardson created ARROW-16414: --- Summary: [R] Remove ARROW_R_WITH_ARROW and arrow_available() Key: ARROW-16414 URL: https://issues.apache.org/jira/browse/ARROW-16414 Project: Apache Arrow Issue Type: Improvement Components: R Reporter: Neal Richardson Fix For: 9.0.0 Followup to ARROW-15294. Possibly we should re-define arrow_available in R like: {code:R} arrow_available <- function() { .Deprecated(msg = "arrow_available() is always TRUE now") TRUE } {code} Pull out everything else, including the {{test_that}} wrapping. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16413) [C++][Python] FileFormat::GetReaderAsync hangs with an fsspec filesystem
Joris Van den Bossche created ARROW-16413: - Summary: [C++][Python] FileFormat::GetReaderAsync hangs with an fsspec filesystem Key: ARROW-16413 URL: https://issues.apache.org/jira/browse/ARROW-16413 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Joris Van den Bossche Fix For: 8.0.0 See https://github.com/dask/dask/pull/8993 for details. When using an fsspec filesystem (or maybe more generally a PyFileSystem), inspecting a file through the FileFormat.inspect is hanging (this eg happens in ParquetDatasetFactory) -- This message was sent by Atlassian Jira (v8.20.7#820007)