[jira] [Created] (ARROW-16424) [C++] Add support for all options specified by substrait::ReadRel::LocalFiles::FileOrFiles

2022-04-29 Thread Ariana Villegas (Jira)
Ariana Villegas created ARROW-16424:
---

 Summary: [C++] Add support for all options specified by 
substrait::ReadRel::LocalFiles::FileOrFiles
 Key: ARROW-16424
 URL: https://issues.apache.org/jira/browse/ARROW-16424
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Ariana Villegas


FromProto function in {{arrow/engine/substrait/relation_internal.cc}} parse 
{{uri_path}} with {{string_view}} utilities. However this should be done with 
{{Uri}} class from {{arrow/util/uri.h.}}
{code:c++}
else if (util::string_view{path}.ends_with(".arrow")) {
  format = std::make_shared();
} else if (util::string_view{path}.ends_with(".feather")) {
  format = std::make_shared();
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[GitHub] [arrow-julia] pcjentsch opened a new issue, #320: Versions in footer and message do not agree, this causes issues reading Arrow files with other libraries (such as `arrow-rs`).

2022-04-29 Thread GitBox


pcjentsch opened a new issue, #320:
URL: https://github.com/apache/arrow-julia/issues/320

   here version is set as V5
   
https://github.com/apache/arrow-julia/blob/a3f6da7c1d59f8321315f4955a3b45c48d38aab4/src/write.jl#L374
   
   and here as V4
   
https://github.com/apache/arrow-julia/blob/a3f6da7c1d59f8321315f4955a3b45c48d38aab4/src/write.jl#L374
   
   This causes issues reading files written with Arrow.jl from other languages' 
arrow implementations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-16423) R arrow/dplyr: simple join and collect crashes session

2022-04-29 Thread Andrew C Thomas (Jira)
Andrew C Thomas created ARROW-16423:
---

 Summary: R arrow/dplyr: simple join and collect crashes session
 Key: ARROW-16423
 URL: https://issues.apache.org/jira/browse/ARROW-16423
 Project: Apache Arrow
  Issue Type: Bug
  Components: R
Affects Versions: 7.0.0
Reporter: Andrew C Thomas


Trying to do an inner join style filter on an open_dataset, and R crashes, but 
not reliably the first time. Sometimes takes a couple of tries until it does.

Reprex follows.

--

library (arrow)
library (dplyr)
library (tidyr)

DataSet <- expand_grid (A = 1:10, B = 1:10, C = 1:1) %>%
  group_by (A, B)
write_dataset(DataSet, "TestBreakData")

for (DoThisUntilItBreaks in 1:100) {
  message (DoThisUntilItBreaks)
  D2 <- open_dataset("TestBreakData") %>% inner_join (data.frame (A=1L, B=1:5)) 
%>% collect
}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16422) CentOS 8 package "No match for argument: arrow-devel"

2022-04-29 Thread Chandler Sobel-Sorenson (Jira)
Chandler Sobel-Sorenson created ARROW-16422:
---

 Summary: CentOS 8 package "No match for argument: arrow-devel"
 Key: ARROW-16422
 URL: https://issues.apache.org/jira/browse/ARROW-16422
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 7.0.0
Reporter: Chandler Sobel-Sorenson


I'm running the commands to install C++ and GLib (C) Packages for CentOS 8 and 
getting the error "No match for argument: arrow-devel":

{{[root@EagI NanoPlot-1.33.0]# dnf install -y epel-release}}
{{Last metadata expiration check: 2:00:55 ago on Fri Apr 29 10:30:24 2022.}}
{{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}}
{{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}}
{{Package epel-release-8-15.el8.noarch is already installed.}}
{{Dependencies resolved.}}
{{Nothing to do.}}
{{Complete!}}
{{[root@EagI NanoPlot-1.33.0]# dnf install -y 
https://apache.jfrog.io/artifactory/arrow/centos/$(cut -d: -f5 
/etc/system-release-cpe | cut -d. -f1)/apache-arrow-release-latest.rpm}}
{{Last metadata expiration check: 2:01:08 ago on Fri Apr 29 10:30:24 2022.}}
{{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}}
{{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}}
{{apache-arrow-release-latest.rpm                                               
                                                          138 kB/s |  60 kB     
00:00    }}
{{Package apache-arrow-release-6.0.1-1.el8.noarch is already installed.}}
{{Dependencies resolved.}}
{{Nothing to do.}}
{{Complete!}}
{{[root@EagI NanoPlot-1.33.0]# dnf config-manager --set-enabled epel}}
{{[root@EagI NanoPlot-1.33.0]# dnf config-manager --set-enabled powertools}}
{{[root@EagI NanoPlot-1.33.0]# dnf install -y arrow-devel}}
{{CentOS Linux 8 - PowerTools                                                   
                                                           50 kB/s | 4.3 kB     
00:00    }}
{{Extra Packages for Enterprise Linux 8 - x86_64                                
                                                           38 kB/s | 8.0 kB     
00:00    }}
{{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}}
{{Module yaml error: Unexpected key in data: static_context [line 9 col 3]}}
{{No match for argument: arrow-devel}}
{{Error: Unable to find a match: arrow-devel}}
{{[root@EagI NanoPlot-1.33.0]# }}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16421) [R] Permission error on Windows when deleting file in dataset

2022-04-29 Thread Will Jones (Jira)
Will Jones created ARROW-16421:
--

 Summary: [R] Permission error on Windows when deleting file in 
dataset
 Key: ARROW-16421
 URL: https://issues.apache.org/jira/browse/ARROW-16421
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 7.0.0
Reporter: Will Jones
Assignee: Will Jones


On Windows this fails: 
{code:r}
library(arrow)

write_dataset(iris, "test_dataset")

con <- open_dataset("test_dataset") |> to_duckdb()

file.remove("test_dataset/part-0.parquet")
#> Warning in file.remove("test_dataset/part-0.parquet"): cannot remove file
#> 'test_dataset/part-0.parquet', reason 'Permission denied'
#> [1] FALSE
{code}

But on MacOS it does not:

{code:R}
library(arrow)

write_dataset(iris, "test_dataset")

con <- open_dataset("test_dataset") |> to_duckdb()

file.remove("test_dataset/part-0.parquet")
#> [1] TRUE
{code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16420) [Python] pq.write_to_dataset always ignores partitioning

2022-04-29 Thread David Li (Jira)
David Li created ARROW-16420:


 Summary: [Python] pq.write_to_dataset always ignores partitioning
 Key: ARROW-16420
 URL: https://issues.apache.org/jira/browse/ARROW-16420
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 8.0.0
Reporter: David Li


The code unconditionally sets {{partitioning}} to None, so the user-supplied 
partitioning is ignored. 

https://github.com/apache/arrow/blob/edf7334fc38ec9bc2e019bf400403e7c61fb585e/python/pyarrow/parquet/__init__.py#L3143-L3146



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16419) [Python] pyarrow._exec_plan.execplan doesn't wait for plan to finish

2022-04-29 Thread David Li (Jira)
David Li created ARROW-16419:


 Summary: [Python] pyarrow._exec_plan.execplan doesn't wait for 
plan to finish
 Key: ARROW-16419
 URL: https://issues.apache.org/jira/browse/ARROW-16419
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 8.0.0
Reporter: David Li


It calls StopProducing but doesn't actually wait for finished(). This tends to 
cause "Plan was destroyed before finishing" to get printed.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16418) [R] Refactor the difftime() and as.diffime() bindings

2022-04-29 Thread Jira
Dragoș Moldovan-Grünfeld created ARROW-16418:


 Summary: [R] Refactor the difftime() and as.diffime() bindings 
 Key: ARROW-16418
 URL: https://issues.apache.org/jira/browse/ARROW-16418
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 8.0.0
Reporter: Dragoș Moldovan-Grünfeld
 Fix For: 9.0.0


ARROW-16060 is solved and these 2 functions have high cyclomatic complexity



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16417) [C++][Python] Segfault in test_exec_plan.py / test_joins

2022-04-29 Thread David Li (Jira)
David Li created ARROW-16417:


 Summary: [C++][Python] Segfault in test_exec_plan.py / test_joins
 Key: ARROW-16417
 URL: https://issues.apache.org/jira/browse/ARROW-16417
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 8.0.0
Reporter: David Li


Occurs during wheel verification. It also happens to master. The failure is 
sporadic but fairly reliable. test_joins is parameterized; it's not consistent 
in the parameters it occurs on, but it consistently occurs on that test.

The backtrace reaches into malloc_consolidate. MALLOC_CHECK doesn't help. 
However:
{noformat}
(gdb) b main
Breakpoint 1 at 0x11ea20: file 
/home/conda/feedstock_root/build_artifacts/python-split_1625973859697/work/Programs/python.c,
 line 15.
(gdb) command 1
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>call mcheck(0)
>continue
>end {noformat}
This fairly consistently fails with "memory clobbered before allocated block" 
but the location varies. 

This may be a red herring though. I also tried LD_PRELOADING a secure build of 
mimalloc to see if it would catch any sort of heap corruption but instead the 
tests pass consistently with mimalloc.

 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16416) [C++] Support cast-function in Substrait

2022-04-29 Thread Yaron Gvili (Jira)
Yaron Gvili created ARROW-16416:
---

 Summary: [C++] Support cast-function in Substrait
 Key: ARROW-16416
 URL: https://issues.apache.org/jira/browse/ARROW-16416
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Yaron Gvili


The cast-function is special in Arrow, because its operation is determined by 
its output type rather than just by its parameter, and so it requires special 
handling in Substrait to support it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16415) [R] Update strptime and fast_strptime bindings to use tz

2022-04-29 Thread Jira
Dragoș Moldovan-Grünfeld created ARROW-16415:


 Summary: [R] Update strptime and fast_strptime bindings to use tz 
 Key: ARROW-16415
 URL: https://issues.apache.org/jira/browse/ARROW-16415
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Affects Versions: 7.0.0
Reporter: Dragoș Moldovan-Grünfeld
 Fix For: 9.0.0


Both functions mention they do not support {{tz}} - the timezone argument. 
ARROW-12820 has been addressed and the bindings definitions need updating.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16414) [R] Remove ARROW_R_WITH_ARROW and arrow_available()

2022-04-29 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-16414:
---

 Summary: [R] Remove ARROW_R_WITH_ARROW and arrow_available()
 Key: ARROW-16414
 URL: https://issues.apache.org/jira/browse/ARROW-16414
 Project: Apache Arrow
  Issue Type: Improvement
  Components: R
Reporter: Neal Richardson
 Fix For: 9.0.0


Followup to ARROW-15294. Possibly we should re-define arrow_available in R like:

{code:R}
arrow_available <- function() {
  .Deprecated(msg = "arrow_available() is always TRUE now")
  TRUE
}
{code}

Pull out everything else, including the {{test_that}} wrapping.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16413) [C++][Python] FileFormat::GetReaderAsync hangs with an fsspec filesystem

2022-04-29 Thread Joris Van den Bossche (Jira)
Joris Van den Bossche created ARROW-16413:
-

 Summary: [C++][Python] FileFormat::GetReaderAsync hangs with an 
fsspec filesystem
 Key: ARROW-16413
 URL: https://issues.apache.org/jira/browse/ARROW-16413
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Reporter: Joris Van den Bossche
 Fix For: 8.0.0


See https://github.com/dask/dask/pull/8993 for details. 

When using an fsspec filesystem (or maybe more generally a PyFileSystem), 
inspecting a file through the FileFormat.inspect is hanging (this eg happens in 
ParquetDatasetFactory)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)