[jira] [Created] (ARROW-13687) [Ruby] Add support for loading table by Arrow Dataset
Kouhei Sutou created ARROW-13687: Summary: [Ruby] Add support for loading table by Arrow Dataset Key: ARROW-13687 URL: https://issues.apache.org/jira/browse/ARROW-13687 Project: Apache Arrow Issue Type: Improvement Components: Ruby Reporter: Kouhei Sutou Assignee: Kouhei Sutou -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13686) [Python] Update deprecated pytest yield_fixture functions
Eduardo Ponce created ARROW-13686: - Summary: [Python] Update deprecated pytest yield_fixture functions Key: ARROW-13686 URL: https://issues.apache.org/jira/browse/ARROW-13686 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Eduardo Ponce Fix For: 6.0.0 [Since pytest 3.0, fixture functions support the *yield* statement as a replacement for *yield_fixture*|https://docs.pytest.org/en/6.2.x/yieldfixture.html]. When pytest is run for PyArrow the following deprecation warning is shown: {code} the pyarrow/tests/test_serialization.py:283: PytestDeprecationWarning: @pytest.yield_fixture is deprecated. Use @pytest.fixture instead; they are the same. {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13685) [Python] Cannot write dataset to S3FileSystem if bucket already exists
Caleb Overman created ARROW-13685: - Summary: [Python] Cannot write dataset to S3FileSystem if bucket already exists Key: ARROW-13685 URL: https://issues.apache.org/jira/browse/ARROW-13685 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 5.0.0 Reporter: Caleb Overman I'm trying to write a parquet file to an existing S3 bucket using the new S3FileSystem interface. However, this is failing with an AWS Access Denied error (I do have necessary access). It appears to be trying to recreate the bucket which already exists. {code:java} import numpy as np import pyarrow as pa from pyarrow import fs import pyarrow.dataset as ds s3 = fs.S3FileSystem(region="us-west-2") table = pa.table({"a": range(10), "b": np.random.randn(10), "c": [1, 2] * 5}) ds.write_dataset( table, "my-bucket/test.parquet", format="parquet", filesystem=s3, ){code} {code:java} OSError: When creating bucket 'my-bucket': AWS Error [code 15]: Access Denied {code} I'm seeing the same behavior using `S3FileSystem.create_dir` when `recursive=True`. {code:java} s3.create_dir("my-bucket/test_dir/", recursive=True) # Fails s3.create_dir("my-bucket/test_dir/", recursive=False) # Succeeds {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13684) [C++][Compute] Strftime kernel follow-up
Rok Mihevc created ARROW-13684: -- Summary: [C++][Compute] Strftime kernel follow-up Key: ARROW-13684 URL: https://issues.apache.org/jira/browse/ARROW-13684 Project: Apache Arrow Issue Type: Improvement Reporter: Rok Mihevc Assignee: Rok Mihevc Fix For: 6.0.0 As per ARROW-13174 [comments|https://github.com/apache/arrow/pull/10647#issuecomment-901783928] we should: * Correct default format string for non-UTC timestamps * Allow non-zoned timestamps to be printed * Better document %S flag behavior -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13683) [R][CI] Test with Windows UCRT R
Neal Richardson created ARROW-13683: --- Summary: [R][CI] Test with Windows UCRT R Key: ARROW-13683 URL: https://issues.apache.org/jira/browse/ARROW-13683 Project: Apache Arrow Issue Type: Improvement Components: Continuous Integration, R Reporter: Neal Richardson We're already building Arrow C++ with the UCRT toolchain (including gcc 10) in the Rtools40 build, but we aren't testing the R bindings and the whole package with it. https://github.com/r-windows/docs/blob/master/ucrt.md has instructions for how to download the UCRT R; any download and setup probably should happen in (or be upstreamed to) https://github.com/r-lib/actions/issues/340. There is already a CRAN check for UCRT, so we have some limited exposure by not having CI for it. Adding a job would also allow us to reproduce the link error Jeroen observed on ARROW-9616. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13682) [C++] Add TDigest::Merge(const TDigest&)
David Li created ARROW-13682: Summary: [C++] Add TDigest::Merge(const TDigest&) Key: ARROW-13682 URL: https://issues.apache.org/jira/browse/ARROW-13682 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: David Li Assignee: Yibo Cai Fix For: 6.0.0 Currently it's inconvenient to merge a single TDigest, but this is useful in contexts like the aggregate kernels. Follow-up from ARROW-13520. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13681) pyarow.compute.list_parent_indices only computes for first chunk
Tor Eivind McKenzie-Syvertsen created ARROW-13681: - Summary: pyarow.compute.list_parent_indices only computes for first chunk Key: ARROW-13681 URL: https://issues.apache.org/jira/browse/ARROW-13681 Project: Apache Arrow Issue Type: Bug Reporter: Tor Eivind McKenzie-Syvertsen I came across this issue due to very unexpected behaviour from the "explode" function obtained here: https://issues.apache.org/jira/browse/ARROW-12099 indices = pc.list_parent_indices(table[col_name]) if table[column] in this example contains several chunks, the indices will look perfectly fine for that chunk, but erratic and unexpected results for second chunk. No warning or info was given either A workaround that solved the problem for me is: {code:java} indices = pc.list_parent_indices(table.combine_chunks()[col_name]) {code} The behaviour then changes dramatically. I'm assuming this isnt expected and should be fixed? -- This message was sent by Atlassian Jira (v8.3.4#803005)