[jira] [Created] (ARROW-16931) [Ruby] Add support for nullable in Arrow::Field

2022-06-28 Thread Kouhei Sutou (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Kouhei Sutou created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16931  
 
 
  [Ruby] Add support for nullable in Arrow::Field   
 

  
 
 
 
 

 
Issue Type: 
  Improvement  
 
 
Assignee: 
 Kouhei Sutou  
 
 
Components: 
 Ruby  
 
 
Created: 
 29/Jun/22 01:47  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Kouhei Sutou  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  
 

  
 

   



[jira] [Created] (ARROW-16930) [Java] Consolidate ORC code

2022-06-28 Thread David Dali Susanibar Arce (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 David Dali Susanibar Arce created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16930  
 
 
  [Java] Consolidate ORC code   
 

  
 
 
 
 

 
Issue Type: 
  Sub-task  
 
 
Assignee: 
 David Dali Susanibar Arce  
 
 
Components: 
 Java  
 
 
Created: 
 29/Jun/22 00:38  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 David Dali Susanibar Arce  
 

  
 
 
 
 

 
 Move ORC adaptor Cpp code to arrow/java/adaptor/orc/src/main/cpp   CC Larry White   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
   

[jira] [Created] (ARROW-16929) [C++] Remove ExecBatchIterator

2022-06-28 Thread Wes McKinney (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Wes McKinney created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16929  
 
 
  [C++] Remove ExecBatchIterator   
 

  
 
 
 
 

 
Issue Type: 
  Improvement  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 C++  
 
 
Created: 
 28/Jun/22 19:48  
 
 
Fix Versions: 
 9.0.0  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Wes McKinney  
 

  
 
 
 
 

 
 The only place left using it is in GroupBy in arrow/compute/exec/aggregate.cc. This can be refactored to use ExecSpan.  As part of this removal, we should adapt the benchmarks for ExecSpanIterator to demonstrate the performance improvement there   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 
  

[jira] [Created] (ARROW-16928) [C++] Reconsider filesystem equality

2022-06-28 Thread Antoine Pitrou (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Antoine Pitrou created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16928  
 
 
  [C++] Reconsider filesystem equality   
 

  
 
 
 
 

 
Issue Type: 
  Task  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 C++  
 
 
Created: 
 28/Jun/22 17:30  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Antoine Pitrou  
 

  
 
 
 
 

 
 Filesystems support an equality method to compare filesystem instances. The original idea is that all filesystem parameters should be transparent and easily read back, so it should be possible to support equality (similarly, it was envisioned to allow roundtripping filesystems through URIs, though the filesystem-to-URI direction was never implemented). However, along the way, filesystems like S3 grew increasingly complex and opaque modes of configuration where equality can only be approximated. It can also be costly to compute (for example, S3Options::Equals involves fetching the actual secret key and session token, which can take some time: these mere operations consume 5 seconds in the PyArrow test suite). Right now, filesystem equality is merely used for testing on the Python side (to try and validate filesystem pickling). We should decide whether we want to continue supporting filesystem equality and, if so, what the semantics are (is approximate equality useful?).  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

[jira] [Created] (ARROW-16927) GitHub action fails on forks (R)

2022-06-28 Thread Todd Farmer (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Todd Farmer created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16927  
 
 
  GitHub action fails on forks (R)   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 Developer Tools, R  
 
 
Created: 
 28/Jun/22 16:54  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Todd Farmer  
 

  
 
 
 
 

 
 Starting on 2022-06-25, I began to receive daily emails from my GitHub fork of Arrow, alerting me to the following failing action: Upload R Nightly builds Upload R Nightly builds The action fails in Download Artifacts, with the following log content: 

 

Run if [ -z $PREFIX ]; then
20nightly-packaging-2022-06-28-0
21fatal: not a git repository (or any of the parent directories): .git
22Downloading nightly-packaging-2022-06-28-0's artifacts.
23Destination directory is /home/runner/work/arrow/arrow/binaries/nightly-packaging-2022-06-28-0
2425[  state] Task / Branch   Artifacts
26---
27[FAILURE] r-nightly-packages uploaded 0 / 9
28 └ https://github.com/ursacomputing/crossbow/runs/7091119162?check_suite_focus=true29 └ https://github.com/ursacomputing/crossbow/runs/7090086432?check_suite_focus=true30 └ https://github.com/ursacomputing/crossbow/runs/7090086299?check_suite_focus=true31 └ https://github.com/ursacomputing/crossbow/runs/7090086198?check_suite_focus=true32 └ https://github.com/ursacomputing/crossbow/runs/7090086062?check_suite_focus=true33 └ https://github.com/ursacomputing/crossbow/runs/7089567242?check_suite_focus=true34 └ https://github.com/ursacomputing/crossbow/runs/7088905795?check_suite_focus=true35 └ https://github.com/ursacomputing/crossbow/runs/7088905652?check_suite_focus=true36 └ https://github.com/ursacomputing/crossbow/runs/7088905551?check_suite_focus=true37 └ 

[jira] [Created] (ARROW-16926) csv reader errors clobbered by subsequent reads

2022-06-28 Thread Whispell Whispell (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Whispell Whispell created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16926  
 
 
  csv reader errors clobbered by subsequent reads   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 Go  
 
 
Created: 
 28/Jun/22 15:25  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Whispell Whispell  
 
 
Original Estimate: 
168h 
 
 
Remaining Estimate:  
168h 
 

  
 
 
 
 

 
 Currently you can reproduce this issue by reading a csv file with garbage string values where float64 are expected. If you place the bad data in the first part of the file, then subsequent r.r.Read() will clobber the parse err that was set inside r.read(rec) So at the bottom of the loop body, r.read(rec) is called, we end up in func (r *Reader) parseFloat64(field array.Builder, str string) it encounters an error, and sets err on the reader: v, err := strconv.ParseFloat(str, 64) if err != nil && r.err == nil  { r.err = err field.AppendNull() return } However, when we come back out of the call to the loop, we advance in the for loop without checking the err and on the subsequent call to r.r.Read() we clobber the r.err. This means that if the last chunk has no error, after we read the csv, calls to r.Err() on the reader will return nil, even though an err took place during parse.  
 

  
 
 
 
 

 
  

[jira] [Created] (ARROW-16925) [R] Investigate other (undefined) approaches

2022-06-28 Thread Jira
Title: Message Title


 
 
 
 

 
 
 

 
   
 Dragoș Moldovan-Grünfeld created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16925  
 
 
  [R] Investigate other (undefined) approaches   
 

  
 
 
 
 

 
Issue Type: 
  Sub-task  
 
 
Affects Versions: 
 8.0.0  
 
 
Assignee: 
 Dragoș Moldovan-Grünfeld  
 
 
Components: 
 R  
 
 
Created: 
 28/Jun/22 14:58  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Dragoș Moldovan-Grünfeld  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

   

[jira] [Created] (ARROW-16923) [R] Investigate injection with rlang::inject()

2022-06-28 Thread Jira
Title: Message Title


 
 
 
 

 
 
 

 
   
 Dragoș Moldovan-Grünfeld created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16923  
 
 
  [R] Investigate injection with rlang::inject()   
 

  
 
 
 
 

 
Issue Type: 
  Sub-task  
 
 
Affects Versions: 
 8.0.0  
 
 
Assignee: 
 Dragoș Moldovan-Grünfeld  
 
 
Components: 
 R  
 
 
Created: 
 28/Jun/22 14:48  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Dragoș Moldovan-Grünfeld  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

 

[jira] [Created] (ARROW-16924) [R] Investigate the data mask levels approach

2022-06-28 Thread Jira
Title: Message Title


 
 
 
 

 
 
 

 
   
 Dragoș Moldovan-Grünfeld created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16924  
 
 
  [R] Investigate the data mask levels approach   
 

  
 
 
 
 

 
Issue Type: 
  Sub-task  
 
 
Affects Versions: 
 8.0.0  
 
 
Assignee: 
 Dragoș Moldovan-Grünfeld  
 
 
Components: 
 R  
 
 
Created: 
 28/Jun/22 14:50  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Dragoș Moldovan-Grünfeld  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

  

[jira] [Created] (ARROW-16922) [R] Investigate simple injection with !!

2022-06-28 Thread Jira
Title: Message Title


 
 
 
 

 
 
 

 
   
 Dragoș Moldovan-Grünfeld created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16922  
 
 
  [R] Investigate simple injection with !!   
 

  
 
 
 
 

 
Issue Type: 
  Sub-task  
 
 
Affects Versions: 
 8.0.0  
 
 
Assignee: 
 Dragoș Moldovan-Grünfeld  
 
 
Components: 
 R  
 
 
Created: 
 28/Jun/22 14:28  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Dragoș Moldovan-Grünfeld  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v8.20.10#820010-sha1:ace47f9)  
 
 

 
   
 

   

[GitHub] [arrow-testing] pitrou commented on pull request #56: ARROW-11417: Add integration files for buffer compression

2022-06-28 Thread GitBox


pitrou commented on PR #56:
URL: https://github.com/apache/arrow-testing/pull/56#issuecomment-1168754007

   @liukun4515 The compression file was generated using Arrow C++ IIRC.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-16921) [C#] Add decompression support for Record Batches

2022-06-28 Thread Rishabh Rana (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Rishabh Rana created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16921  
 
 
  [C#] Add decompression support for Record Batches   
 

  
 
 
 
 

 
Issue Type: 
  New Feature  
 
 
Assignee: 
 Rishabh Rana  
 
 
Components: 
 C#  
 
 
Created: 
 28/Jun/22 12:41  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Rishabh Rana  
 

  
 
 
 
 

 
 C# Implementation does not support reading batches written in other implementations of Arrow when the compression is specified in IPC Write options. e.g. Reading this batch from pyarrow in C# will fail: pyarrow.ipc.RecordStreamBatchWriter(sink, schema, options=pyarrow,ipcWriteOptions(compression="lz4"))   This is to support decompression (lz4 & zstd) in the C# implementation.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
  

[GitHub] [arrow-julia] tpgillam opened a new issue, #327: DST ambiguities in ZonedDateTime not supported

2022-06-28 Thread GitBox


tpgillam opened a new issue, #327:
URL: https://github.com/apache/arrow-julia/issues/327

   It seems like the ArrowTypes representation of ZonedDateTime doesn't include 
enough information to resolve ambiguities around DST, e.g.:
   
   ```julia
   julia> zdt = ZonedDateTime(DateTime(2020, 11, 1, 6), tz"America/New_York"; 
from_utc=true)
   2020-11-01T01:00:00-05:00
   
   julia> arrow_zdt = ArrowTypes.toarrow(zdt)
   Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, 
Symbol("America/New_York")}(160419240)
   
   julia> ArrowTypes.fromarrow(ZonedDateTime, arrow_zdt)
   ERROR: AmbiguousTimeError: Local DateTime 2020-11-01T01:00:00 is ambiguous 
within America/New_York
   Stacktrace:
[1] (::TimeZones.var"#construct#8"{DateTime, 
VariableTimeZone})(T::Type{Local})
  @ TimeZones 
~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:46
[2] #ZonedDateTime#7
  @ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:50 
[inlined]
[3] ZonedDateTime
  @ ~/.julia/packages/TimeZones/X0cjt/src/types/zoneddatetime.jl:37 
[inlined]
[4] convert(#unused#::Type{ZonedDateTime}, 
x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, 
Symbol("America/New_York")})
  @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:265
[5] fromarrow(#unused#::Type{ZonedDateTime}, 
x::Arrow.Timestamp{Arrow.Flatbuf.TimeUnits.MILLISECOND, 
Symbol("America/New_York")})
  @ Arrow ~/.julia/packages/Arrow/ZlMFU/src/eltypes.jl:300
[6] top-level scope
  @ REPL[16]:1
   ```
   
   Knowing very little about how how we're constrained within the arrow spec, 
can this be fixed by storing the UTC timestamp? I'm _guessing_ we're running 
into this as we're storing a local timestamp + the timezone (?), which isn't 
quite enough information.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (ARROW-16920) [Java]: DictionaryProvider leaks memory while adding dictionaries with duplicate encoding

2022-06-28 Thread Vimal Varghese (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Vimal Varghese created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16920  
 
 
  [Java]: DictionaryProvider leaks memory while adding dictionaries with duplicate encoding   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Affects Versions: 
 7.0.0  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 Java  
 
 
Created: 
 28/Jun/22 09:03  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Vimal Varghese  
 

  
 
 
 
 

 
 DictionaryProvider leaks memory while adding dictionaries with duplicate encoding. Is this expected? Should the provider release the memory of the existing dictionary vector if it accepts another one with same encoding id ? Sample code: 

 

"dictionaryProvider" should " not leak memory while adding dictionaries with duplicate encoding" in {

  val allocator: RootAllocator = new RootAllocator()

  val vector: ListVector = ListVector.empty("vector", allocator)
  val dictionaryVector1: ListVector = ListVector.empty("dict1", allocator)
  val dictionaryVector2: ListVector = ListVector.empty("dict2", allocator)

  val writer1: UnionListWriter = vector.getWriter
  writer1.allocate
  writer1.setValueCount(1)

  val dictWriter1: UnionListWriter = dictionaryVector1.getWriter
  dictWriter1.allocate
  dictWriter1.setValueCount(1)

  val dictWriter2: UnionListWriter = dictionaryVector2.getWriter
  dictWriter2.allocate
  dictWriter2.setValueCount(1)

  val dictionary1: Dictionary = new Dictionary(dictionaryVector1, new DictionaryEncoding(1L, false, None.orNull))
  val dictionary2: Dictionary = new Dictionary(dictionaryVector2, new DictionaryEncoding(1L, false, None.orNull))

  val provider = new DictionaryProvider.MapDictionaryProvider
  provider.put(dictionary1)
  provider.put(dictionary2)

  

[jira] [Created] (ARROW-16919) [C++] Flight integration tests fail on verify rc nightly on linux amd64

2022-06-28 Thread Jira
Title: Message Title


 
 
 
 

 
 
 

 
   
 Raúl Cumplido created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16919  
 
 
  [C++] Flight integration tests fail on verify rc nightly on linux amd64   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 C++, Continuous Integration, FlightRPC  
 
 
Created: 
 28/Jun/22 08:32  
 
 
Fix Versions: 
 9.0.0  
 
 
Labels: 
 Nightly  
 
 
Priority: 
  Critical  
 
 
Reporter: 
 Raúl Cumplido  
 

  
 
 
 
 

 
 Some of our nightly builds to verify the release are failing: - verify-rc-source-integration-linux-almalinux-8-amd64 - verify-rc-source-integration-linux-ubuntu-18.04-amd64 - verify-rc-source-integration-linux-ubuntu-20.04-amd64 - verify-rc-source-integration-linux-ubuntu-22.04-amd64 with the following: 

 

 # FAILURES #
FAILED TEST: middleware C++ producing,  C++ consuming
1 failures
  File "/arrow/dev/archery/archery/integration/util.py", line 139, in run_cmd
    output = subprocess.check_output(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.8/subprocess.py", line 411, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.8/subprocess.py", line 512, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/tmp/arrow-HEAD.PZocX/cpp-build/release/flight-test-integration-client', '-host', 'localhost', '-port=36719', '-scenario', 'middleware']' died with .
During handling of the above exception, another 

[jira] [Created] (ARROW-16918) Adding UTC and local time zone conversion functions to Gandiva

2022-06-28 Thread Palak Pariawala (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Palak Pariawala created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16918  
 
 
  Adding UTC and local time zone conversion functions to Gandiva   
 

  
 
 
 
 

 
Issue Type: 
  New Feature  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 C++ - Gandiva  
 
 
Created: 
 28/Jun/22 07:46  
 
 
Labels: 
 pull-request-available newbie  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Palak Pariawala  
 
 
Original Estimate: 
168h 
 
 
Remaining Estimate:  
168h 
 

  
 
 
 
 

 
 Adding functions in Gandiva to convert timestamps between UTC and local time zones to_utc_timestamp(timestamp, timezone name) from_utc_timestamp(timestamp, timezone name)  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

[jira] [Created] (ARROW-16917) Add a Secondary Cache to cache gandiva object code

2022-06-28 Thread Siddhant Rao (Jira)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Siddhant Rao created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Apache Arrow /  ARROW-16917  
 
 
  Add a Secondary Cache to cache gandiva object code   
 

  
 
 
 
 

 
Issue Type: 
  New Feature  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 C++ - Gandiva  
 
 
Created: 
 28/Jun/22 07:42  
 
 
Priority: 
  Minor  
 
 
Reporter: 
 Siddhant Rao  
 

  
 
 
 
 

 
 Arrow gandiva has a primary cache but this cache doesn't persist across restarts. Integrate a new API in project and filter make calls that allow the user to specify the implementation of the secondary cache by providing a c++ and java interface to this persistent cache.  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment