[jira] [Created] (ARROW-13687) [Ruby] Add support for loading table by Arrow Dataset

2021-08-20 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-13687:


 Summary: [Ruby] Add support for loading table by Arrow Dataset
 Key: ARROW-13687
 URL: https://issues.apache.org/jira/browse/ARROW-13687
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Ruby
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13686) [Python] Update deprecated pytest yield_fixture functions

2021-08-20 Thread Eduardo Ponce (Jira)
Eduardo Ponce created ARROW-13686:
-

 Summary: [Python] Update deprecated pytest yield_fixture functions
 Key: ARROW-13686
 URL: https://issues.apache.org/jira/browse/ARROW-13686
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Eduardo Ponce
 Fix For: 6.0.0


[Since pytest 3.0, fixture functions support the *yield* statement as a 
replacement for 
*yield_fixture*|https://docs.pytest.org/en/6.2.x/yieldfixture.html]. When 
pytest is run for PyArrow the following deprecation warning is shown:

{code}
the pyarrow/tests/test_serialization.py:283: PytestDeprecationWarning: 
@pytest.yield_fixture is deprecated. Use @pytest.fixture instead; they are the 
same.
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13685) [Python] Cannot write dataset to S3FileSystem if bucket already exists

2021-08-20 Thread Caleb Overman (Jira)
Caleb Overman created ARROW-13685:
-

 Summary: [Python] Cannot write dataset to S3FileSystem if bucket 
already exists
 Key: ARROW-13685
 URL: https://issues.apache.org/jira/browse/ARROW-13685
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 5.0.0
Reporter: Caleb Overman


I'm trying to write a parquet file to an existing S3 bucket using the new 
S3FileSystem interface. However, this is failing with an AWS Access Denied 
error (I do have necessary access). It appears to be trying to recreate the 
bucket which already exists.
{code:java}
import numpy as np
import pyarrow as pa
from pyarrow import fs
import pyarrow.dataset as ds

s3 = fs.S3FileSystem(region="us-west-2")
table = pa.table({"a": range(10), "b": np.random.randn(10), "c": [1, 2] * 5})
ds.write_dataset(
table,
"my-bucket/test.parquet",
format="parquet",
filesystem=s3,
){code}
{code:java}
OSError: When creating bucket 'my-bucket': AWS Error [code 15]: Access Denied
{code}
I'm seeing the same behavior using `S3FileSystem.create_dir` when 
`recursive=True`.
{code:java}
s3.create_dir("my-bucket/test_dir/", recursive=True) # Fails
s3.create_dir("my-bucket/test_dir/", recursive=False) # Succeeds
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13684) [C++][Compute] Strftime kernel follow-up

2021-08-20 Thread Rok Mihevc (Jira)
Rok Mihevc created ARROW-13684:
--

 Summary: [C++][Compute] Strftime kernel follow-up
 Key: ARROW-13684
 URL: https://issues.apache.org/jira/browse/ARROW-13684
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Rok Mihevc
Assignee: Rok Mihevc
 Fix For: 6.0.0


As per ARROW-13174 
[comments|https://github.com/apache/arrow/pull/10647#issuecomment-901783928] we 
should:
 * Correct default format string for non-UTC timestamps
 * Allow non-zoned timestamps to be printed
 * Better document %S flag behavior



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13683) [R][CI] Test with Windows UCRT R

2021-08-20 Thread Neal Richardson (Jira)
Neal Richardson created ARROW-13683:
---

 Summary: [R][CI] Test with Windows UCRT R
 Key: ARROW-13683
 URL: https://issues.apache.org/jira/browse/ARROW-13683
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration, R
Reporter: Neal Richardson


We're already building Arrow C++ with the UCRT toolchain (including gcc 10) in 
the Rtools40 build, but we aren't testing the R bindings and the whole package 
with it. https://github.com/r-windows/docs/blob/master/ucrt.md has instructions 
for how to download the UCRT R; any download and setup probably should happen 
in (or be upstreamed to) https://github.com/r-lib/actions/issues/340. 

There is already a CRAN check for UCRT, so we have some limited exposure by not 
having CI for it. Adding a job would also allow us to reproduce the link error 
Jeroen observed on ARROW-9616. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13682) [C++] Add TDigest::Merge(const TDigest&)

2021-08-20 Thread David Li (Jira)
David Li created ARROW-13682:


 Summary: [C++] Add TDigest::Merge(const TDigest&)
 Key: ARROW-13682
 URL: https://issues.apache.org/jira/browse/ARROW-13682
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: David Li
Assignee: Yibo Cai
 Fix For: 6.0.0


Currently it's inconvenient to merge a single TDigest, but this is useful in 
contexts like the aggregate kernels. Follow-up from ARROW-13520.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13681) pyarow.compute.list_parent_indices only computes for first chunk

2021-08-20 Thread Tor Eivind McKenzie-Syvertsen (Jira)
Tor Eivind McKenzie-Syvertsen created ARROW-13681:
-

 Summary:  pyarow.compute.list_parent_indices only computes for 
first chunk
 Key: ARROW-13681
 URL: https://issues.apache.org/jira/browse/ARROW-13681
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Tor Eivind McKenzie-Syvertsen


I came across this issue due to very unexpected behaviour from the "explode" 
function obtained here:
https://issues.apache.org/jira/browse/ARROW-12099
indices = pc.list_parent_indices(table[col_name])

if table[column] in this example contains several chunks, the indices will look 
perfectly fine for that chunk, but erratic and unexpected results for second 
chunk.
No warning or info was given either

A workaround that solved the problem for me is:
{code:java}
  indices = pc.list_parent_indices(table.combine_chunks()[col_name])
{code}
The behaviour then changes dramatically.

I'm assuming this isnt expected and should be fixed?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)