[jira] [Created] (ARROW-16931) [Ruby] Add support for nullable in Arrow::Field
Title: Message Title Kouhei Sutou created an issue Apache Arrow / ARROW-16931 [Ruby] Add support for nullable in Arrow::Field Issue Type: Improvement Assignee: Kouhei Sutou Components: Ruby Created: 29/Jun/22 01:47 Priority: Major Reporter: Kouhei Sutou Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Created] (ARROW-16930) [Java] Consolidate ORC code
Title: Message Title David Dali Susanibar Arce created an issue Apache Arrow / ARROW-16930 [Java] Consolidate ORC code Issue Type: Sub-task Assignee: David Dali Susanibar Arce Components: Java Created: 29/Jun/22 00:38 Priority: Minor Reporter: David Dali Susanibar Arce Move ORC adaptor Cpp code to arrow/java/adaptor/orc/src/main/cpp CC Larry White Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Created] (ARROW-16929) [C++] Remove ExecBatchIterator
Title: Message Title Wes McKinney created an issue Apache Arrow / ARROW-16929 [C++] Remove ExecBatchIterator Issue Type: Improvement Assignee: Unassigned Components: C++ Created: 28/Jun/22 19:48 Fix Versions: 9.0.0 Priority: Major Reporter: Wes McKinney The only place left using it is in GroupBy in arrow/compute/exec/aggregate.cc. This can be refactored to use ExecSpan. As part of this removal, we should adapt the benchmarks for ExecSpanIterator to demonstrate the performance improvement there Add Comment
[jira] [Created] (ARROW-16928) [C++] Reconsider filesystem equality
Title: Message Title Antoine Pitrou created an issue Apache Arrow / ARROW-16928 [C++] Reconsider filesystem equality Issue Type: Task Assignee: Unassigned Components: C++ Created: 28/Jun/22 17:30 Priority: Minor Reporter: Antoine Pitrou Filesystems support an equality method to compare filesystem instances. The original idea is that all filesystem parameters should be transparent and easily read back, so it should be possible to support equality (similarly, it was envisioned to allow roundtripping filesystems through URIs, though the filesystem-to-URI direction was never implemented). However, along the way, filesystems like S3 grew increasingly complex and opaque modes of configuration where equality can only be approximated. It can also be costly to compute (for example, S3Options::Equals involves fetching the actual secret key and session token, which can take some time: these mere operations consume 5 seconds in the PyArrow test suite). Right now, filesystem equality is merely used for testing on the Python side (to try and validate filesystem pickling). We should decide whether we want to continue supporting filesystem equality and, if so, what the semantics are (is approximate equality useful?). Add Comment
[jira] [Created] (ARROW-16927) GitHub action fails on forks (R)
Title: Message Title Todd Farmer created an issue Apache Arrow / ARROW-16927 GitHub action fails on forks (R) Issue Type: Bug Assignee: Unassigned Components: Developer Tools, R Created: 28/Jun/22 16:54 Priority: Minor Reporter: Todd Farmer Starting on 2022-06-25, I began to receive daily emails from my GitHub fork of Arrow, alerting me to the following failing action: Upload R Nightly builds Upload R Nightly builds The action fails in Download Artifacts, with the following log content: Run if [ -z $PREFIX ]; then 20nightly-packaging-2022-06-28-0 21fatal: not a git repository (or any of the parent directories): .git 22Downloading nightly-packaging-2022-06-28-0's artifacts. 23Destination directory is /home/runner/work/arrow/arrow/binaries/nightly-packaging-2022-06-28-0 2425[ state] Task / Branch Artifacts 26--- 27[FAILURE] r-nightly-packages uploaded 0 / 9 28 └ https://github.com/ursacomputing/crossbow/runs/7091119162?check_suite_focus=true29 └ https://github.com/ursacomputing/crossbow/runs/7090086432?check_suite_focus=true30 └ https://github.com/ursacomputing/crossbow/runs/7090086299?check_suite_focus=true31 └ https://github.com/ursacomputing/crossbow/runs/7090086198?check_suite_focus=true32 └ https://github.com/ursacomputing/crossbow/runs/7090086062?check_suite_focus=true33 └ https://github.com/ursacomputing/crossbow/runs/7089567242?check_suite_focus=true34 └ https://github.com/ursacomputing/crossbow/runs/7088905795?check_suite_focus=true35 └ https://github.com/ursacomputing/crossbow/runs/7088905652?check_suite_focus=true36 └ https://github.com/ursacomputing/crossbow/runs/7088905551?check_suite_focus=true37 └
[jira] [Created] (ARROW-16926) csv reader errors clobbered by subsequent reads
Title: Message Title Whispell Whispell created an issue Apache Arrow / ARROW-16926 csv reader errors clobbered by subsequent reads Issue Type: Bug Assignee: Unassigned Components: Go Created: 28/Jun/22 15:25 Priority: Minor Reporter: Whispell Whispell Original Estimate: 168h Remaining Estimate: 168h Currently you can reproduce this issue by reading a csv file with garbage string values where float64 are expected. If you place the bad data in the first part of the file, then subsequent r.r.Read() will clobber the parse err that was set inside r.read(rec) So at the bottom of the loop body, r.read(rec) is called, we end up in func (r *Reader) parseFloat64(field array.Builder, str string) it encounters an error, and sets err on the reader: v, err := strconv.ParseFloat(str, 64) if err != nil && r.err == nil { r.err = err field.AppendNull() return } However, when we come back out of the call to the loop, we advance in the for loop without checking the err and on the subsequent call to r.r.Read() we clobber the r.err. This means that if the last chunk has no error, after we read the csv, calls to r.Err() on the reader will return nil, even though an err took place during parse.
[jira] [Created] (ARROW-16925) [R] Investigate other (undefined) approaches
Title: Message Title Dragoș Moldovan-Grünfeld created an issue Apache Arrow / ARROW-16925 [R] Investigate other (undefined) approaches Issue Type: Sub-task Affects Versions: 8.0.0 Assignee: Dragoș Moldovan-Grünfeld Components: R Created: 28/Jun/22 14:58 Priority: Major Reporter: Dragoș Moldovan-Grünfeld Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Created] (ARROW-16923) [R] Investigate injection with rlang::inject()
Title: Message Title Dragoș Moldovan-Grünfeld created an issue Apache Arrow / ARROW-16923 [R] Investigate injection with rlang::inject() Issue Type: Sub-task Affects Versions: 8.0.0 Assignee: Dragoș Moldovan-Grünfeld Components: R Created: 28/Jun/22 14:48 Priority: Major Reporter: Dragoș Moldovan-Grünfeld Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Created] (ARROW-16924) [R] Investigate the data mask levels approach
Title: Message Title Dragoș Moldovan-Grünfeld created an issue Apache Arrow / ARROW-16924 [R] Investigate the data mask levels approach Issue Type: Sub-task Affects Versions: 8.0.0 Assignee: Dragoș Moldovan-Grünfeld Components: R Created: 28/Jun/22 14:50 Priority: Major Reporter: Dragoș Moldovan-Grünfeld Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[jira] [Created] (ARROW-16922) [R] Investigate simple injection with !!
Title: Message Title Dragoș Moldovan-Grünfeld created an issue Apache Arrow / ARROW-16922 [R] Investigate simple injection with !! Issue Type: Sub-task Affects Versions: 8.0.0 Assignee: Dragoș Moldovan-Grünfeld Components: R Created: 28/Jun/22 14:28 Priority: Major Reporter: Dragoș Moldovan-Grünfeld Add Comment This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)
[GitHub] [arrow-testing] pitrou commented on pull request #56: ARROW-11417: Add integration files for buffer compression
pitrou commented on PR #56: URL: https://github.com/apache/arrow-testing/pull/56#issuecomment-1168754007 @liukun4515 The compression file was generated using Arrow C++ IIRC. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-16921) [C#] Add decompression support for Record Batches
Title: Message Title Rishabh Rana created an issue Apache Arrow / ARROW-16921 [C#] Add decompression support for Record Batches Issue Type: New Feature Assignee: Rishabh Rana Components: C# Created: 28/Jun/22 12:41 Priority: Major Reporter: Rishabh Rana C# Implementation does not support reading batches written in other implementations of Arrow when the compression is specified in IPC Write options. e.g. Reading this batch from pyarrow in C# will fail: pyarrow.ipc.RecordStreamBatchWriter(sink, schema, options=pyarrow,ipcWriteOptions(compression="lz4")) This is to support decompression (lz4 & zstd) in the C# implementation. Add Comment
[GitHub] [arrow-julia] tpgillam opened a new issue, #327: DST ambiguities in ZonedDateTime not supported
tpgillam opened a new issue, #327: URL: https://github.com/apache/arrow-julia/issues/327 It seems like the ArrowTypes representation of ZonedDateTime doesn't include enough information to resolve ambiguities around DST, e.g.: ```julia julia> zdt = ZonedDateTime(DateTime(2020, 11, 1, 6), tz"America/New_York"; from_utc=true) 2020-11-01T01:00:00-05:00 julia> arrow_zdt = ArrowTypes.toarrow(zdt) Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")}(160419240) julia> ArrowTypes.fromarrow(ZonedDateTime, arrow_zdt) ERROR: AmbiguousTimeError: Local DateTime 2020-11-01T01:00:00 is ambiguous within America/New_York Stacktrace: [1] (::TimeZones.var"#construct#8"{DateTime, VariableTimeZone})(T::Type{Local}) @ TimeZones ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:46 [2] #ZonedDateTime#7 @ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:50 [inlined] [3] ZonedDateTime @ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:37 [inlined] [4] convert(#unused#::Type{ZonedDateTime}, x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")}) @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:265 [5] fromarrow(#unused#::Type{ZonedDateTime}, x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, Symbol("America/New_York")}) @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:300 [6] top-level scope @ REPL[16]:1 ``` Knowing very little about how how we're constrained within the arrow spec, can this be fixed by storing the UTC timestamp? I'm _guessing_ we're running into this as we're storing a local timestamp + the timezone (?), which isn't quite enough information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (ARROW-16920) [Java]: DictionaryProvider leaks memory while adding dictionaries with duplicate encoding
Title: Message Title Vimal Varghese created an issue Apache Arrow / ARROW-16920 [Java]: DictionaryProvider leaks memory while adding dictionaries with duplicate encoding Issue Type: Bug Affects Versions: 7.0.0 Assignee: Unassigned Components: Java Created: 28/Jun/22 09:03 Priority: Major Reporter: Vimal Varghese DictionaryProvider leaks memory while adding dictionaries with duplicate encoding. Is this expected? Should the provider release the memory of the existing dictionary vector if it accepts another one with same encoding id ? Sample code: "dictionaryProvider" should " not leak memory while adding dictionaries with duplicate encoding" in { val allocator: RootAllocator = new RootAllocator() val vector: ListVector = ListVector.empty("vector", allocator) val dictionaryVector1: ListVector = ListVector.empty("dict1", allocator) val dictionaryVector2: ListVector = ListVector.empty("dict2", allocator) val writer1: UnionListWriter = vector.getWriter writer1.allocate writer1.setValueCount(1) val dictWriter1: UnionListWriter = dictionaryVector1.getWriter dictWriter1.allocate dictWriter1.setValueCount(1) val dictWriter2: UnionListWriter = dictionaryVector2.getWriter dictWriter2.allocate dictWriter2.setValueCount(1) val dictionary1: Dictionary = new Dictionary(dictionaryVector1, new DictionaryEncoding(1L, false, None.orNull)) val dictionary2: Dictionary = new Dictionary(dictionaryVector2, new DictionaryEncoding(1L, false, None.orNull)) val provider = new DictionaryProvider.MapDictionaryProvider provider.put(dictionary1) provider.put(dictionary2)
[jira] [Created] (ARROW-16919) [C++] Flight integration tests fail on verify rc nightly on linux amd64
Title: Message Title Raúl Cumplido created an issue Apache Arrow / ARROW-16919 [C++] Flight integration tests fail on verify rc nightly on linux amd64 Issue Type: Bug Assignee: Unassigned Components: C++, Continuous Integration, FlightRPC Created: 28/Jun/22 08:32 Fix Versions: 9.0.0 Labels: Nightly Priority: Critical Reporter: Raúl Cumplido Some of our nightly builds to verify the release are failing: - verify-rc-source-integration-linux-almalinux-8-amd64 - verify-rc-source-integration-linux-ubuntu-18.04-amd64 - verify-rc-source-integration-linux-ubuntu-20.04-amd64 - verify-rc-source-integration-linux-ubuntu-22.04-amd64 with the following: # FAILURES # FAILED TEST: middleware C++ producing, C++ consuming 1 failures File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd output = subprocess.check_output(cmd, stderr=subprocess.STDOUT) File "/usr/lib/python3.8/subprocess.py", line 411, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File "/usr/lib/python3.8/subprocess.py", line 512, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['/tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client', '-host', 'localhost', '-port=36719', '-scenario', 'middleware']' died with . During handling of the above exception, another
[jira] [Created] (ARROW-16918) Adding UTC and local time zone conversion functions to Gandiva
Title: Message Title Palak Pariawala created an issue Apache Arrow / ARROW-16918 Adding UTC and local time zone conversion functions to Gandiva Issue Type: New Feature Assignee: Unassigned Components: C++ - Gandiva Created: 28/Jun/22 07:46 Labels: pull-request-available newbie Priority: Minor Reporter: Palak Pariawala Original Estimate: 168h Remaining Estimate: 168h Adding functions in Gandiva to convert timestamps between UTC and local time zones to_utc_timestamp(timestamp, timezone name) from_utc_timestamp(timestamp, timezone name) Add Comment
[jira] [Created] (ARROW-16917) Add a Secondary Cache to cache gandiva object code
Title: Message Title Siddhant Rao created an issue Apache Arrow / ARROW-16917 Add a Secondary Cache to cache gandiva object code Issue Type: New Feature Assignee: Unassigned Components: C++ - Gandiva Created: 28/Jun/22 07:42 Priority: Minor Reporter: Siddhant Rao Arrow gandiva has a primary cache but this cache doesn't persist across restarts. Integrate a new API in project and filter make calls that allow the user to specify the implementation of the secondary cache by providing a c++ and java interface to this persistent cache. Add Comment