[jira] [Created] (ARROW-18253) [C++][Parquet] Improve bounds checking on some inputs
Micah Kornfield created ARROW-18253: --- Summary: [C++][Parquet] Improve bounds checking on some inputs Key: ARROW-18253 URL: https://issues.apache.org/jira/browse/ARROW-18253 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield In some cases we don't check for lower bound of 0, on some non-performance critical paths we only have DCHECKs, and while unlikely in some cases we cast from size_t to int32 which can overflow, adding some safety checks here would be useful. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-17535) [Python] List arrays aren't supported in to_pandas calls
Micah Kornfield created ARROW-17535: --- Summary: [Python] List arrays aren't supported in to_pandas calls Key: ARROW-17535 URL: https://issues.apache.org/jira/browse/ARROW-17535 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Micah Kornfield EXTENSION is not in the list of types allowed. I think in order to enable EXTENSION we need to be able to call to_pylist or similar on the original extension array from C++ code, in case there were user provided overrides. Off the top of my head one way of doing this would be to pass through an additional std::unorderd_map where PyObject is the bound to_pylist python function. Are there other alternative that might be cleaner? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (ARROW-16484) [Go][Parquet] Ensure a WriterVersion is written out in parquet go.
Micah Kornfield created ARROW-16484: --- Summary: [Go][Parquet] Ensure a WriterVersion is written out in parquet go. Key: ARROW-16484 URL: https://issues.apache.org/jira/browse/ARROW-16484 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Micah Kornfield Assignee: Matthew Topol We should ensure a unique version information for parquet files is populated. I tried searching the go code but could only find reading the version back. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16326) [C++][Python] Add GCS Timeout parameter for GCS FileSystem.
Micah Kornfield created ARROW-16326: --- Summary: [C++][Python] Add GCS Timeout parameter for GCS FileSystem. Key: ARROW-16326 URL: https://issues.apache.org/jira/browse/ARROW-16326 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Micah Kornfield Assignee: Micah Kornfield Follow-up from [https://github.com/apache/arrow/pull/12763] if gcs testbench isn't installed properly the failure mode is tests timeouts because the connection hangs. We should add a timeout parameter to prevent this -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16270) [C++][Python][FileSystem] Make directory paths returned uniform
Micah Kornfield created ARROW-16270: --- Summary: [C++][Python][FileSystem] Make directory paths returned uniform Key: ARROW-16270 URL: https://issues.apache.org/jira/browse/ARROW-16270 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Micah Kornfield Depending on if paths are selected with recursion or without code the result of the returned directories changes to include a slash or not include a slash (see code linked below). It would be nice to provide consistent output here. It isn't clear i the breaking change is worthwhile here. [1] https://github.com/apache/arrow/blob/3eaa7dd0e8b3dabc5438203331f05e3e6c011e37/python/pyarrow/tests/test_fs.py#L688 [2] https://github.com/apache/arrow/blob/3eaa7dd0e8b3dabc5438203331f05e3e6c011e37/cpp/src/arrow/filesystem/test_util.cc#L767 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (ARROW-16227) [Archery] Make cpp argument list keyword only
Micah Kornfield created ARROW-16227: --- Summary: [Archery] Make cpp argument list keyword only Key: ARROW-16227 URL: https://issues.apache.org/jira/browse/ARROW-16227 Project: Apache Arrow Issue Type: Improvement Components: Archery Reporter: Micah Kornfield Assignee: Micah Kornfield cpp params should be keyword only. See [https://github.com/apache/arrow/pull/12763/files#r852112789] (i.e. adding *, before all keyword options. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16226) [C++] Add better coverage for filesystem tell.
Micah Kornfield created ARROW-16226: --- Summary: [C++] Add better coverage for filesystem tell. Key: ARROW-16226 URL: https://issues.apache.org/jira/browse/ARROW-16226 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield Add a C++ generic file system test that writes wrote N bytes to a file. then seeks to N/2 and and read the remainder. Verify the remainder bytes are N/2 and expected from the bytes writter. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16160) [C++] IPC Stream Reader doesn't check if extra fields are present for RecordBatches
Micah Kornfield created ARROW-16160: --- Summary: [C++] IPC Stream Reader doesn't check if extra fields are present for RecordBatches Key: ARROW-16160 URL: https://issues.apache.org/jira/browse/ARROW-16160 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Affects Versions: 6.0.1 Reporter: Micah Kornfield I looked through recent commits and I don't think this issue has been patched since: {code:title=test.python|borderStyle=solid} import pyarrow as pa with pa.output_stream("/tmp/f1") as sink: with pa.RecordBatchStreamWriter(sink, rb1.schema) as writer: writer.write(rb1) end_rb1 = sink.tell() with pa.output_stream("/tmp/f2") as sink: with pa.RecordBatchStreamWriter(sink, rb2.schema) as writer: writer.write(rb2) start_rb2_only = sink.tell() writer.write(rb2) end_rb2 = sink.tell() # Stitch to togher rb1.schema, rb1 and rb2 without schema. with pa.output_stream("/tmp/f3") as sink: with pa.input_stream("/tmp/f1") as inp: sink.write(inp.read(end_rb1)) with pa.input_stream("/tmp/f2") as inp: inp.seek(start_rb2_only) sink.write(inp.read(end_rb2 - start_rb2_only)) with pa.ipc.open_stream("/tmp/f3") as sink: print(sink.read_all()) {code} Yields: {code} {{pyarrow.Table c1: int64 c1: [[1],[1]] {code} I would expect this to error because the second stiched in record batch has more fields then necessary but it appears to load just fine. Is this intended behavior? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16102) [C++] Builds that us cpp/cmake_modules/FindgRPCAlt.cmake cannot build GCS support
Micah Kornfield created ARROW-16102: --- Summary: [C++] Builds that us cpp/cmake_modules/FindgRPCAlt.cmake cannot build GCS support Key: ARROW-16102 URL: https://issues.apache.org/jira/browse/ARROW-16102 Project: Apache Arrow Issue Type: Improvement Reporter: Micah Kornfield cpp/cmake_modules/FindgRPCAlt.cmake somehow exposes the same libraries defined in build_absl_once (defined in cpp/cmake_modules/ThirdpartyToolchain.cmake) causing CMake to fail when GCS client is enabled for building. I tried playing around with various options but given my limited CMake skills I could not figure out an easy solution to this. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-16048) [PyArrow] Null buffers with Pickle protocol.
Micah Kornfield created ARROW-16048: --- Summary: [PyArrow] Null buffers with Pickle protocol. Key: ARROW-16048 URL: https://issues.apache.org/jira/browse/ARROW-16048 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Micah Kornfield Assignee: Micah Kornfield When underlying buffers are null they populate the buffer protocol ".buf" value with a null value. In some cases this can violate contracts [asserted in cpython|https://github.com/python/cpython/blob/882d8096c262a5945e0cfdd706e5db3ad2b73543/Modules/_pickle.c#L1072]. It might be best to always return an empty non-null buffer when the underlying buffer is null. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15783) [Python] Converting arrow MonthDayNanoInterval to pandas fails DCHECK
Micah Kornfield created ARROW-15783: --- Summary: [Python] Converting arrow MonthDayNanoInterval to pandas fails DCHECK Key: ARROW-15783 URL: https://issues.apache.org/jira/browse/ARROW-15783 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Micah Kornfield Assignee: Micah Kornfield InitPandasStaticData is only called on python/pandas -> Arrow and not the reverse path This causes the DCHECK to make sure the Pandas type is not null to fail if import code is never used. A workaround to users of the library is to call pa.array([1]) which would avoid this issue. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15728) [Python] Zstd IPC test is flaky.
Micah Kornfield created ARROW-15728: --- Summary: [Python] Zstd IPC test is flaky. Key: ARROW-15728 URL: https://issues.apache.org/jira/browse/ARROW-15728 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Micah Kornfield Assignee: Micah Kornfield Our internal CI system shows flakes on the test at approximately a 2% rate. By reducing the integer range we can make this much less flaky (zero observed flakes in 5000 runs). -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15727) [Python] Lists of MonthDayNano Interval can't be converted to Pandas
Micah Kornfield created ARROW-15727: --- Summary: [Python] Lists of MonthDayNano Interval can't be converted to Pandas Key: ARROW-15727 URL: https://issues.apache.org/jira/browse/ARROW-15727 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15596) thift_internal.h assumes shared_ptr type in some cases
Micah Kornfield created ARROW-15596: --- Summary: thift_internal.h assumes shared_ptr type in some cases Key: ARROW-15596 URL: https://issues.apache.org/jira/browse/ARROW-15596 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield Assignee: Micah Kornfield Thrift can still be built with boost shared_ptrs so we need to be pointer agnostic. -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-15511) [Python] GIL not held for Ndarray1DIndexer on
Micah Kornfield created ARROW-15511: --- Summary: [Python] GIL not held for Ndarray1DIndexer on Key: ARROW-15511 URL: https://issues.apache.org/jira/browse/ARROW-15511 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 6.0.1 Reporter: Micah Kornfield [In _ndarray_to_array the call to NdarrayToArrow|https://github.com/apache/arrow/blob/658bec37aa5cbdd53b5e4cdc81b8ba3962e67f11/python/pyarrow/array.pxi#L82] in explicitly excluded from the GIL. In some code-paths Ndarray1DIndexer is instantiated which will try to do PyINC_REF and PyDdecREf on initialization and destruction. These code paths do not appear to acquire the GIL. I'm not sure what the best fix is: # Acquire GIL as part of Ndarray1DIndexer construction. # Eliminate the nogil block in _ndarray_to_array # Eliminate the incref and decref calls in Ndarray1DIndexer # Something else? -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Created] (ARROW-14156) StructArray::Flatten is incorrect in some cases
Micah Kornfield created ARROW-14156: --- Summary: StructArray::Flatten is incorrect in some cases Key: ARROW-14156 URL: https://issues.apache.org/jira/browse/ARROW-14156 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 5.0.0 Reporter: Micah Kornfield When trying to flatten a struct that has children that were sliced we see incorrect results. {code:title=Bar.java|borderStyle=solid} import pyarrow as pa a = py.array([1,2,3]) sliceds = a.slice(1) composed_struct = pa.StructArray.from_buffers(pa.struct([pa.field("a", sliceds.type)]), len(sliceds), [pa.array([True, False]).buffers()[1]], children=[sliceds]) >>> composed_struct -- is_valid: [ true, false ] -- child 0 type: int64 [ 2, 3 ] >>> composed_struct.flatten() [ [ null, null ]] {code} I believe the problems is [here|https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/array/array_nested.cc#L572] the copy does not account for child array offset. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13809) [C ABI] Add support for Month, Day, Nanosecond interval type to C-ABI
Micah Kornfield created ARROW-13809: --- Summary: [C ABI] Add support for Month, Day, Nanosecond interval type to C-ABI Key: ARROW-13809 URL: https://issues.apache.org/jira/browse/ARROW-13809 Project: Apache Arrow Issue Type: New Feature Components: C Reporter: Micah Kornfield Assignee: Micah Kornfield [https://github.com/apache/arrow/pull/10177] has been merged we should support transport of the new type via the C ABI bindings. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13808) [Ruby] Add bindings for Month, Day, Nano Interval Type
Micah Kornfield created ARROW-13808: --- Summary: [Ruby] Add bindings for Month, Day, Nano Interval Type Key: ARROW-13808 URL: https://issues.apache.org/jira/browse/ARROW-13808 Project: Apache Arrow Issue Type: New Feature Components: Ruby Reporter: Micah Kornfield [https://github.com/apache/arrow/pull/10177] has been merged we should support conversion to and from this type for standard ruby types (or custom types) if possible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13807) [R] Add bindings for Month, Day, Nanos Interval Type
Micah Kornfield created ARROW-13807: --- Summary: [R] Add bindings for Month, Day, Nanos Interval Type Key: ARROW-13807 URL: https://issues.apache.org/jira/browse/ARROW-13807 Project: Apache Arrow Issue Type: New Feature Components: R Reporter: Micah Kornfield [https://github.com/apache/arrow/pull/10177] has been merged we should support conversion to and from canonical R types if available. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13806) [Python] Add conversion to/from Pandas/Python for Month, Day Nano Interval Type
Micah Kornfield created ARROW-13806: --- Summary: [Python] Add conversion to/from Pandas/Python for Month, Day Nano Interval Type Key: ARROW-13806 URL: https://issues.apache.org/jira/browse/ARROW-13806 Project: Apache Arrow Issue Type: New Feature Components: Python Reporter: Micah Kornfield Assignee: Micah Kornfield [https://github.com/apache/arrow/pull/10177] has been merged we should support conversion to and from this type for standard python surface areas. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13805) [C#] https://github.com/apache/arrow/pull/10177
Micah Kornfield created ARROW-13805: --- Summary: [C#] https://github.com/apache/arrow/pull/10177 Key: ARROW-13805 URL: https://issues.apache.org/jira/browse/ARROW-13805 Project: Apache Arrow Issue Type: New Feature Reporter: Micah Kornfield https://github.com/apache/arrow/pull/10177 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13804) [Go] Add Support for Interval Type Month, Day, Nano
Micah Kornfield created ARROW-13804: --- Summary: [Go] Add Support for Interval Type Month, Day, Nano Key: ARROW-13804 URL: https://issues.apache.org/jira/browse/ARROW-13804 Project: Apache Arrow Issue Type: New Feature Components: Go Reporter: Micah Kornfield [https://github.com/apache/arrow/pull/10177] has been merged we should ensure Go supports the new type and enable integration tests for it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13690) [Python] Use IPC writing code for pickling RecordBatches
Micah Kornfield created ARROW-13690: --- Summary: [Python] Use IPC writing code for pickling RecordBatches Key: ARROW-13690 URL: https://issues.apache.org/jira/browse/ARROW-13690 Project: Apache Arrow Issue Type: Improvement Components: Python Reporter: Micah Kornfield For wide schemas in particular the the recursive nature of the currently pickling algorithm for record batches makes it less efficient then using the IPC format (which can be done entirely in C++). Consider switching the mechanism to use the IPC format. I think this can be a backwards compatible change if the current leaving: _reconstruct_record_batch in place if we care about that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType
Micah Kornfield created ARROW-13672: --- Summary: [C++] BinaryBuilder doesn't preserve passed in DataType Key: ARROW-13672 URL: https://issues.apache.org/jira/browse/ARROW-13672 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 5.0.0 Reporter: Micah Kornfield There is a [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56] that takes a datatype for binary builder but it is discarded. When constructing an Array the type is always the value returned from type() [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390] If a consumer of the API wants to have an extension array this prevents them from passing the extension type though. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13673) [C++] BinaryBuilder doesn't preserve passed in DataType
Micah Kornfield created ARROW-13673: --- Summary: [C++] BinaryBuilder doesn't preserve passed in DataType Key: ARROW-13673 URL: https://issues.apache.org/jira/browse/ARROW-13673 Project: Apache Arrow Issue Type: Bug Components: C++ Affects Versions: 5.0.0 Reporter: Micah Kornfield There is a [constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56] that takes a datatype for binary builder but it is discarded. When constructing an Array the type is always the value returned from type() [binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390] If a consumer of the API wants to have an extension array this prevents them from passing the extension type though. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13670) Do a round of compiler warning cleanups
Micah Kornfield created ARROW-13670: --- Summary: Do a round of compiler warning cleanups Key: ARROW-13670 URL: https://issues.apache.org/jira/browse/ARROW-13670 Project: Apache Arrow Issue Type: Improvement Reporter: Micah Kornfield Assignee: Micah Kornfield During a build I found several classes without virtual destructors and some out of order initialization. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13669) Variant emplace methods appear to be missing curly braces.
Micah Kornfield created ARROW-13669: --- Summary: Variant emplace methods appear to be missing curly braces. Key: ARROW-13669 URL: https://issues.apache.org/jira/browse/ARROW-13669 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13628) [Format] Add MonthDayNano interval type.
Micah Kornfield created ARROW-13628: --- Summary: [Format] Add MonthDayNano interval type. Key: ARROW-13628 URL: https://issues.apache.org/jira/browse/ARROW-13628 Project: Apache Arrow Issue Type: Bug Components: C++, Format, Java Reporter: Micah Kornfield Assignee: Micah Kornfield Add type definition to fbs files with initial IPC implementations for Java and C++ (as discussed on the mailing list). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-13012) [C++] Add ability for retrieving dictionary and indices separately for ColumnReader
Micah Kornfield created ARROW-13012: --- Summary: [C++] Add ability for retrieving dictionary and indices separately for ColumnReader Key: ARROW-13012 URL: https://issues.apache.org/jira/browse/ARROW-13012 Project: Apache Arrow Issue Type: New Feature Reporter: Micah Kornfield In some contexts it is useful to be able to retrieve these separately instead of decoding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12907) [Java] Memory possible when exception reading from channel happens
Micah Kornfield created ARROW-12907: --- Summary: [Java] Memory possible when exception reading from channel happens Key: ARROW-12907 URL: https://issues.apache.org/jira/browse/ARROW-12907 Project: Apache Arrow Issue Type: Bug Components: Java Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12769) [Python] Negative out of range slices yield invalid arrays
Micah Kornfield created ARROW-12769: --- Summary: [Python] Negative out of range slices yield invalid arrays Key: ARROW-12769 URL: https://issues.apache.org/jira/browse/ARROW-12769 Project: Apache Arrow Issue Type: Bug Components: C++, Python Affects Versions: 4.0.0, 2.0.0 Reporter: Micah Kornfield Fix For: 5.0.0, 4.0.1 Tested on pyarrow 2.0 and pyarrow 4.0 wheels. The errors are slightly different between the 2.0. Below is a script from 4.0 This is taken from the result of test_slice_array {{ }} {{ >>> import pyarrow as pa}} {{ >>> pa.array(range(0,10))}} {{ }} {{ [}} {{ 0,}} {{ 1,}} {{ 2,}} {{ 3,}} {{ 4,}} {{ 5,}} {{ 6,}} {{ 7,}} {{ 8,}} {{ 9}} {{ ]}} {{ >>> a=pa.array(range(0,10))}} {{ >>> a[-9:-20]}} {{ }} {{ []}} {{ >>> len(a[-9:-20])}} {{ Traceback (most recent call last):}} {{ File "", line 1, in }} {{ SystemError: returned NULL without setting an error}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12340) [Java] Avro to Arrow converter doesn't appear to generate valid arrow data
Micah Kornfield created ARROW-12340: --- Summary: [Java] Avro to Arrow converter doesn't appear to generate valid arrow data Key: ARROW-12340 URL: https://issues.apache.org/jira/browse/ARROW-12340 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield I think this is related to how Unions are handled (I had thought unions of with a null and one other type would get created to the nullable type, but that is a separate issue). I haven't had time to fully diagnose, but remnants of the code I tried to use are at [https://gist.github.com/emkornfield/efd3a4c3c1012dc19cf9769198e3bffe] And the CSV file from https://issues.apache.org/jira/browse/ARROW-11629?jql=text%20~%20%22arrow%20drill%20parquet%20dictionary%22 produce data that isn't readable by the C++ implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12196) [C++] C++ IPC reading looks like it doesn't support uncompressed buffers
Micah Kornfield created ARROW-12196: --- Summary: [C++] C++ IPC reading looks like it doesn't support uncompressed buffers Key: ARROW-12196 URL: https://issues.apache.org/jira/browse/ARROW-12196 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Micah Kornfield https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.cc#L411 does seems to check for the case (I'm not sure if this is the right code though): uncompressed length may be set to -1 to indicate that the data that follows is not compressed, which can be useful for cases where compression does not yield appreciable savings. https://github.com/apache/arrow/blob/5cabd31c90dbb32d87074928f68bf5d6e97e37c6/format/Message.fbs#L59 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12195) [Archery][Integration] Support round trip tests for compression
Micah Kornfield created ARROW-12195: --- Summary: [Archery][Integration] Support round trip tests for compression Key: ARROW-12195 URL: https://issues.apache.org/jira/browse/ARROW-12195 Project: Apache Arrow Issue Type: Improvement Components: Archery, Integration Reporter: Micah Kornfield Archery and corresponding language bindings should support round trip testing for compression. Today we only have checks on generated "gold" files from C++ we should also support round trip testing, now that there is a java implementation and a WIP Go implementation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12164) [Java] Make BaseAllocator.Config public
Micah Kornfield created ARROW-12164: --- Summary: [Java] Make BaseAllocator.Config public Key: ARROW-12164 URL: https://issues.apache.org/jira/browse/ARROW-12164 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Micah Kornfield Fix For: 4.0.0 Alternatively we could make RootAllocator take immutable config. The problem is that default config cannot be used from shaded binaries because it is package private. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12163) [Java] Make compression levels configurable.
Micah Kornfield created ARROW-12163: --- Summary: [Java] Make compression levels configurable. Key: ARROW-12163 URL: https://issues.apache.org/jira/browse/ARROW-12163 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Micah Kornfield Today we use default compression levels in compressors, these should be configurable via constructor. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12115) [Java] Rename compression classes
Micah Kornfield created ARROW-12115: --- Summary: [Java] Rename compression classes Key: ARROW-12115 URL: https://issues.apache.org/jira/browse/ARROW-12115 Project: Apache Arrow Issue Type: Improvement Reporter: Micah Kornfield Zstd isn't using the commons codec, so we should rename CommonsCompressionFactory to something more generic, and the existing LZ4 implementation to something potentially more generic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12110) [Java] Implement ZSTD buffer compression for java
Micah Kornfield created ARROW-12110: --- Summary: [Java] Implement ZSTD buffer compression for java Key: ARROW-12110 URL: https://issues.apache.org/jira/browse/ARROW-12110 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12035) [Developer Tools] Update merge tool to populate component if not present based on PR title
Micah Kornfield created ARROW-12035: --- Summary: [Developer Tools] Update merge tool to populate component if not present based on PR title Key: ARROW-12035 URL: https://issues.apache.org/jira/browse/ARROW-12035 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools Reporter: Micah Kornfield If a user forgets to set a component in jira, the pr merge tool highlights this. If the component is specified in the PR title we should provide committers of automatically setting the component when updating the JIRA, so they don't need to logon separately. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12034) [Docs] Formalize Trivial PRs
Micah Kornfield created ARROW-12034: --- Summary: [Docs] Formalize Trivial PRs Key: ARROW-12034 URL: https://issues.apache.org/jira/browse/ARROW-12034 Project: Apache Arrow Issue Type: Improvement Components: Developer Tools, Documentation Reporter: Micah Kornfield Assignee: Micah Kornfield Fix For: 4.0.0 Based on ML discussion: [https://lists.apache.org/x/thread.html/rd032058727d1b61f46813f25db586e320004fe4ccbf9cdfa13df44e8@%3Cdev.arrow.apache.org%3E] Update relevant components so we can make trivial PRs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-12022) [C++][Parquet] StatisticsAsScalars doesn't support Decimal conversion for int primitives
Micah Kornfield created ARROW-12022: --- Summary: [C++][Parquet] StatisticsAsScalars doesn't support Decimal conversion for int primitives Key: ARROW-12022 URL: https://issues.apache.org/jira/browse/ARROW-12022 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Micah Kornfield Logical decimal types that are stored as int primitives are not properly handled in the code today. Also, some clarification around StatisticsAsScalars contract is required. For FLBA and ByteArray Decimals, the smaller decimal type that will fit the data is used, it isn't clear if that is desired or if it should also include information from the serialized Arrow schema if present -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11828) Expose CSVWriter object in api
Micah Kornfield created ARROW-11828: --- Summary: Expose CSVWriter object in api Key: ARROW-11828 URL: https://issues.apache.org/jira/browse/ARROW-11828 Project: Apache Arrow Issue Type: Improvement Reporter: Micah Kornfield Assignee: Micah Kornfield Based on feedback from initial CSV PR this is likely the preferred API. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11829) [C++] Update developer style guide on usage of shared_ptr
Micah Kornfield created ARROW-11829: --- Summary: [C++] Update developer style guide on usage of shared_ptr Key: ARROW-11829 URL: https://issues.apache.org/jira/browse/ARROW-11829 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-11353) [C++][Python][Parquet] We should allow for overriding to large types by providing a schema
Micah Kornfield created ARROW-11353: --- Summary: [C++][Python][Parquet] We should allow for overriding to large types by providing a schema Key: ARROW-11353 URL: https://issues.apache.org/jira/browse/ARROW-11353 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Micah Kornfield {{The following shouldn't throw}} {{>>> import pyarrow as pa}} {{>>> import pyarrow.parquet as pq}} {{>>> import pyarrow.dataset as ds}} {{>>> pa.__version__}} {{'2.0.0'}} {{>>> schema = pa.schema([pa.field("utf8", pa.utf8())])}} {{>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)}} {{>>> pq.write_table(table, "/tmp/example.parquet")}} {{>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])}} {{>>> ds.dataset("/tmp/example.parquet", schema=large_schema,}} {{format="parquet").to_table()}} {{Traceback (most recent call last):}} {{ File "", line 1, in }} {{ File "pyarrow/_dataset.pyx", line 405, in}} {{pyarrow._dataset.Dataset.to_table}} {{ File "pyarrow/_dataset.pyx", line 2262, in}} {{pyarrow._dataset.Scanner.to_table}} {{ File "pyarrow/error.pxi", line 122, in}} {{pyarrow.lib.pyarrow_internal_check_status}} {{ File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status}} {{pyarrow.lib.ArrowTypeError: fields had matching names but differing types.}} {{From: utf8: string To: utf8: large_string}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10784) [Python] Loading pyarrow.compute isn't thread safe
Micah Kornfield created ARROW-10784: --- Summary: [Python] Loading pyarrow.compute isn't thread safe Key: ARROW-10784 URL: https://issues.apache.org/jira/browse/ARROW-10784 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 2.0.0 Reporter: Micah Kornfield When using Arrow in a multithreaded environment it is possible to trigger an initialization race on the pyarrow.compute module when calling Array.flatten. Flatten calls _pc() which imports pyarrow compute but if two threads call flatten at the same time is possible that the global initialization of functions from the registry will be incomplete and therefore cause an AttributeError. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10608) [Python] Decimal256 Support finish off full support for conversion to/from decimal types
Micah Kornfield created ARROW-10608: --- Summary: [Python] Decimal256 Support finish off full support for conversion to/from decimal types Key: ARROW-10608 URL: https://issues.apache.org/jira/browse/ARROW-10608 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10607) [C++][Parquet] Support Reading/Writing Decimal256 type in Parquet
Micah Kornfield created ARROW-10607: --- Summary: [C++][Parquet] Support Reading/Writing Decimal256 type in Parquet Key: ARROW-10607 URL: https://issues.apache.org/jira/browse/ARROW-10607 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10606) [C++][Compute] Support casts to and from Decimal256 type.
Micah Kornfield created ARROW-10606: --- Summary: [C++][Compute] Support casts to and from Decimal256 type. Key: ARROW-10606 URL: https://issues.apache.org/jira/browse/ARROW-10606 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10605) [C++][Gandiva] Support Decimal256 type in gandiva computation.
Micah Kornfield created ARROW-10605: --- Summary: [C++][Gandiva] Support Decimal256 type in gandiva computation. Key: ARROW-10605 URL: https://issues.apache.org/jira/browse/ARROW-10605 Project: Apache Arrow Issue Type: Improvement Components: C++ - Gandiva Reporter: Micah Kornfield There might be a lot of work here, so sub-jiras might be added once scope is determined. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10604) [Ruby] Support Decimal256 type
Micah Kornfield created ARROW-10604: --- Summary: [Ruby] Support Decimal256 type Key: ARROW-10604 URL: https://issues.apache.org/jira/browse/ARROW-10604 Project: Apache Arrow Issue Type: Improvement Components: Ruby Reporter: Micah Kornfield The C++ implementation now support it. We need to ensure Ruby/Gobject bindings do as well. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10603) [Javascript] Support Decimal type with 256 bits of precision
Micah Kornfield created ARROW-10603: --- Summary: [Javascript] Support Decimal type with 256 bits of precision Key: ARROW-10603 URL: https://issues.apache.org/jira/browse/ARROW-10603 Project: Apache Arrow Issue Type: Improvement Components: JavaScript Reporter: Micah Kornfield The specification now supports it and there are basic implementations in C++/Java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10602) [Rust] Implement support for Decimal with 256 bits of precision.
Micah Kornfield created ARROW-10602: --- Summary: [Rust] Implement support for Decimal with 256 bits of precision. Key: ARROW-10602 URL: https://issues.apache.org/jira/browse/ARROW-10602 Project: Apache Arrow Issue Type: Improvement Components: Rust Reporter: Micah Kornfield The specification now supports it and there are basic implementations in C++/Java -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10601) [C++] CSV Reader should support Decimal256 type
Micah Kornfield created ARROW-10601: --- Summary: [C++] CSV Reader should support Decimal256 type Key: ARROW-10601 URL: https://issues.apache.org/jira/browse/ARROW-10601 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10600) [Go] Support Decimal256 type
Micah Kornfield created ARROW-10600: --- Summary: [Go] Support Decimal256 type Key: ARROW-10600 URL: https://issues.apache.org/jira/browse/ARROW-10600 Project: Apache Arrow Issue Type: Improvement Components: Go Reporter: Micah Kornfield Decimal with 256 bit precision is now allowed in the spec with a basic implementation in Java and C++. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10447) [C++][Python] Python compute kernel tests assume C++ is built with utf8proc
Micah Kornfield created ARROW-10447: --- Summary: [C++][Python] Python compute kernel tests assume C++ is built with utf8proc Key: ARROW-10447 URL: https://issues.apache.org/jira/browse/ARROW-10447 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Micah Kornfield Not sure if this is something we want to fix, but I discovered this when building without utf8proc install. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10446) [C++][Python] Timezone aware pd.Timestamp's are incorrectly converted to Timestamp arrys
Micah Kornfield created ARROW-10446: --- Summary: [C++][Python] Timezone aware pd.Timestamp's are incorrectly converted to Timestamp arrys Key: ARROW-10446 URL: https://issues.apache.org/jira/browse/ARROW-10446 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield Assignee: Micah Kornfield Fix For: 2.0.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10408) [Java] Upgrade Avro dependency to 1.10
Micah Kornfield created ARROW-10408: --- Summary: [Java] Upgrade Avro dependency to 1.10 Key: ARROW-10408 URL: https://issues.apache.org/jira/browse/ARROW-10408 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Micah Kornfield Assignee: Fokko Driesprong -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10367) [Archery] unit tests seem to be broken due to "patch" argument not being on constructor
Micah Kornfield created ARROW-10367: --- Summary: [Archery] unit tests seem to be broken due to "patch" argument not being on constructor Key: ARROW-10367 URL: https://issues.apache.org/jira/browse/ARROW-10367 Project: Apache Arrow Issue Type: Bug Components: Archery Reporter: Micah Kornfield Assignee: Micah Kornfield {{Patch is not a keyword on Version.}} {{[https://github.com/apache/arrow/pull/8475/checks?check_run_id=1290624398]}} {{cls = , version = '0.17.1'}}{{@classmethod}} {{ def parse(cls, version):}} {{ """}} {{ Parse version string to a VersionInfo instance.}} {{ :param version: version string}} {{ :return: a :class:`VersionInfo` instance}} {{ :raises: :class:`ValueError`}} {{ :rtype: :class:`VersionInfo`}} {{ .. versionchanged:: 2.11.0}} {{ Changed method from static to classmethod to}} {{ allow subclasses.}} {{ >>> semver.VersionInfo.parse('3.4.5-pre.2+build.4')}} {{ VersionInfo(major=3, minor=4, patch=5, \}} {{ prerelease='pre.2', build='build.4')}} {{ """}} {{ match = cls._REGEX.match(ensure_str(version))}} {{ if match is None:}} {{ raise ValueError("%s is not valid SemVer string" % version)}} {{ version_parts = match.groupdict()}} {{ version_parts["major"] = int(version_parts["major"])}} {{ version_parts["minor"] = int(version_parts["minor"])}} {{ version_parts["patch"] = int(version_parts["patch"])}} {{> return cls(**version_parts)}} {{E TypeError: __init__() got an unexpected keyword argument 'patch'}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10229) [C++][Parquet] Remove left over ARROW_LOG statement.
Micah Kornfield created ARROW-10229: --- Summary: [C++][Parquet] Remove left over ARROW_LOG statement. Key: ARROW-10229 URL: https://issues.apache.org/jira/browse/ARROW-10229 Project: Apache Arrow Issue Type: Improvement Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10203) Capture guidance for endianness support in contributors guide.
Micah Kornfield created ARROW-10203: --- Summary: Capture guidance for endianness support in contributors guide. Key: ARROW-10203 URL: https://issues.apache.org/jira/browse/ARROW-10203 Project: Apache Arrow Issue Type: Improvement Components: Documentation Reporter: Micah Kornfield Assignee: Micah Kornfield https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3ccak7z5t--hhhr9dy43pyhd6m-xou4qogwqvlwzsg-koxxjpt...@mail.gmail.com%3e -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10127) [Format] Update specification to support 256-bit Decimal types
Micah Kornfield created ARROW-10127: --- Summary: [Format] Update specification to support 256-bit Decimal types Key: ARROW-10127 URL: https://issues.apache.org/jira/browse/ARROW-10127 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Micah Kornfield Assignee: Micah Kornfield This will require a vote to approve merging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10077) [C++] Potential overflow in bit_stream_utils.h multiplication.
Micah Kornfield created ARROW-10077: --- Summary: [C++] Potential overflow in bit_stream_utils.h multiplication. Key: ARROW-10077 URL: https://issues.apache.org/jira/browse/ARROW-10077 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield We use a literal "8" for BitsPerByte which is interpretted as int32_t which can overflow. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10076) [C++] Use TemporaryDir for all tests that don't already use it.
Micah Kornfield created ARROW-10076: --- Summary: [C++] Use TemporaryDir for all tests that don't already use it. Key: ARROW-10076 URL: https://issues.apache.org/jira/browse/ARROW-10076 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield This ensures all files are cleaned up and for some build system it avoid the issue of requiring the ability to write to source/build path when running the test -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10075) [C++] Don't use nonstd::nullopt this breaks out vendoring abstraction.
Micah Kornfield created ARROW-10075: --- Summary: [C++] Don't use nonstd::nullopt this breaks out vendoring abstraction. Key: ARROW-10075 URL: https://issues.apache.org/jira/browse/ARROW-10075 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10074) [C++] Don't use string_view.to_string()
Micah Kornfield created ARROW-10074: --- Summary: [C++] Don't use string_view.to_string() Key: ARROW-10074 URL: https://issues.apache.org/jira/browse/ARROW-10074 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield This makes our non standard string_view incompatible with std::string_view when we eventually upgrade to C++17 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-10009) [C++] LeastSignficantBitMask has typo in name.
Micah Kornfield created ARROW-10009: --- Summary: [C++] LeastSignficantBitMask has typo in name. Key: ARROW-10009 URL: https://issues.apache.org/jira/browse/ARROW-10009 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield We should fix the typo. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9985) [C++][Parquet] Add bitmap based validity bitmap and nested reconstruction
Micah Kornfield created ARROW-9985: -- Summary: [C++][Parquet] Add bitmap based validity bitmap and nested reconstruction Key: ARROW-9985 URL: https://issues.apache.org/jira/browse/ARROW-9985 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield For low levels of this will likely be more performant then existing rep/level based reconstruction. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9810) [C++][Parquet] Generalize existing null bitmap generation
Micah Kornfield created ARROW-9810: -- Summary: [C++][Parquet] Generalize existing null bitmap generation Key: ARROW-9810 URL: https://issues.apache.org/jira/browse/ARROW-9810 Project: Apache Arrow Issue Type: Sub-task Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield Right now null bitmap generation assumes only list nesting. Generalize and refactor exisitn code without changing existing functionality to accept additional parameters to support arrow nested types: 1. Repeated ancestor def level 2. Null slot usage (for fixed size lists) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9796) [C++][Parquet] Support reading FixedSizeLists when null values are present.
Micah Kornfield created ARROW-9796: -- Summary: [C++][Parquet] Support reading FixedSizeLists when null values are present. Key: ARROW-9796 URL: https://issues.apache.org/jira/browse/ARROW-9796 Project: Apache Arrow Issue Type: New Feature Components: C++ Reporter: Micah Kornfield This won't be handled on the first pass for ARROW-1644 because FixedSizeLists are not very common and some more in depth refactoring is need to support it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9794) [C++] Add functionality to cpu_info to discriminate between Intel vs AMD x86
Micah Kornfield created ARROW-9794: -- Summary: [C++] Add functionality to cpu_info to discriminate between Intel vs AMD x86 Key: ARROW-9794 URL: https://issues.apache.org/jira/browse/ARROW-9794 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield This is needed to do runtime dispatches for places where pext/pdep can be used. These perform poorly on AMD. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9767) [Python][Parquet] Expose EngineVersion in python arrow reader properties
Micah Kornfield created ARROW-9767: -- Summary: [Python][Parquet] Expose EngineVersion in python arrow reader properties Key: ARROW-9767 URL: https://issues.apache.org/jira/browse/ARROW-9767 Project: Apache Arrow Issue Type: Sub-task Components: C++ Reporter: Micah Kornfield Probably also pays to have the default selectable by environment variable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9766) [C++][Parquet] Add EngineVersion to properties to allow for toggling new vs old logic
Micah Kornfield created ARROW-9766: -- Summary: [C++][Parquet] Add EngineVersion to properties to allow for toggling new vs old logic Key: ARROW-9766 URL: https://issues.apache.org/jira/browse/ARROW-9766 Project: Apache Arrow Issue Type: Sub-task Reporter: Micah Kornfield Assignee: Micah Kornfield This will provide an escape hatch in case the new logic some how has unuseable bugs in it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9765) [C++][CI][Windows] link errors on windows when using testing::HasSubstr match
Micah Kornfield created ARROW-9765: -- Summary: [C++][CI][Windows] link errors on windows when using testing::HasSubstr match Key: ARROW-9765 URL: https://issues.apache.org/jira/browse/ARROW-9765 Project: Apache Arrow Issue Type: Bug Components: C++, CI Reporter: Micah Kornfield I tried using using testing::HasSubstring in a test in cpp/src/parquet/arrow/arrow_schema_test.cc and it resulted in appveyor CI failing to link on windows. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9747) [C++][Java][Format] Support Decimal256 Type
Micah Kornfield created ARROW-9747: -- Summary: [C++][Java][Format] Support Decimal256 Type Key: ARROW-9747 URL: https://issues.apache.org/jira/browse/ARROW-9747 Project: Apache Arrow Issue Type: Improvement Components: C++, Format, Java Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9710) [C++] Generalize Decimal ToString in preparation for Decimal256
Micah Kornfield created ARROW-9710: -- Summary: [C++] Generalize Decimal ToString in preparation for Decimal256 Key: ARROW-9710 URL: https://issues.apache.org/jira/browse/ARROW-9710 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield Assignee: Mingyu Zhong Generalize Decimal ToString method in preparation for introducing Decimal256 bit type (and other bit widths as needed). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9671) Arrow BasicDecimal128 constructor interprets uint64_t integers with highest bit set as negative
Micah Kornfield created ARROW-9671: -- Summary: Arrow BasicDecimal128 constructor interprets uint64_t integers with highest bit set as negative Key: ARROW-9671 URL: https://issues.apache.org/jira/browse/ARROW-9671 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9614) [Java] JDBC to Arrow converter iterator should reuse the same VectorSchemaRoot
Micah Kornfield created ARROW-9614: -- Summary: [Java] JDBC to Arrow converter iterator should reuse the same VectorSchemaRoot Key: ARROW-9614 URL: https://issues.apache.org/jira/browse/ARROW-9614 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Micah Kornfield When originally reviewing the code I suggested a new VectorSchemaRoot on each call to the iterator. After further discussions on the mailing list, it seems that this is an anit-pattern for working with VectorSchemaRoot, we should update the code to update a single VectorSchemaRoot. After this change it should be easier to use JDBC converter with other components of the library (i.e. filewriter) which also make use of a single VectorSchemaRoot. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9613) [Java] Avro to Arrow converter should reuse the same VectorSchemaRoot
Micah Kornfield created ARROW-9613: -- Summary: [Java] Avro to Arrow converter should reuse the same VectorSchemaRoot Key: ARROW-9613 URL: https://issues.apache.org/jira/browse/ARROW-9613 Project: Apache Arrow Issue Type: Improvement Components: Java Reporter: Micah Kornfield When originally reviewing the code I suggested a new VectorSchemaRoot on each call to the iterator. After further discussions on the mailing list, it seems that this is an anit-pattern for working with VectorSchemaRoot, we should update the code to own a single vectorschemaroot and return it each time with new records. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9603) [C++][Parquet] Write Arrow relies on unspecified behavior for nested types
Micah Kornfield created ARROW-9603: -- Summary: [C++][Parquet] Write Arrow relies on unspecified behavior for nested types Key: ARROW-9603 URL: https://issues.apache.org/jira/browse/ARROW-9603 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Micah Kornfield parquet/column_writer.cc WriteArrow implementations at certain points checks null counts/required data and passes through the null bitmap for encoding. This only works for nested data types if the if the null slot on a parent implies a null slot on the leaf. This relationship is not required by the specifications. Most paths for creating arrays follow this pattern so it would be esoteric to hit this bug, but we should still fix it. All branches that rely on reading nullness should generate a new null bitmap based on definition levels if the column is nested, and decisions should be based off of that. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9598) [C++][Parquet] Spaced definition levels is not assigned correctly.
Micah Kornfield created ARROW-9598: -- Summary: [C++][Parquet] Spaced definition levels is not assigned correctly. Key: ARROW-9598 URL: https://issues.apache.org/jira/browse/ARROW-9598 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield The existing code assumes that there is only a single repeated parent. Code needs to backtrack until null or or a repeated parent. Unfortunately without ability to read path that can read mixed struct/repeated values we can't fully test the fix. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9528) [Python] Honor tzinfo information when converting from datetime to pyarrow
Micah Kornfield created ARROW-9528: -- Summary: [Python] Honor tzinfo information when converting from datetime to pyarrow Key: ARROW-9528 URL: https://issues.apache.org/jira/browse/ARROW-9528 Project: Apache Arrow Issue Type: Bug Components: C++, Python Reporter: Micah Kornfield Fix For: 1.0.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9310) Use feature enum in java
Micah Kornfield created ARROW-9310: -- Summary: Use feature enum in java Key: ARROW-9310 URL: https://issues.apache.org/jira/browse/ARROW-9310 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9311) [Javascript] Use feature enum in javascript
Micah Kornfield created ARROW-9311: -- Summary: [Javascript] Use feature enum in javascript Key: ARROW-9311 URL: https://issues.apache.org/jira/browse/ARROW-9311 Project: Apache Arrow Issue Type: Sub-task Components: Java Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9309) Start writing out feature enums to value (umbrella issue)
Micah Kornfield created ARROW-9309: -- Summary: Start writing out feature enums to value (umbrella issue) Key: ARROW-9309 URL: https://issues.apache.org/jira/browse/ARROW-9309 Project: Apache Arrow Issue Type: Improvement Reporter: Micah Kornfield Proposed logic: 1. Add flag where appropriate for supports dictionary replacement if there is a possibility it can be used. 2. Only add compressed buffers when requested. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9314) [Go] Use Feature enum
Micah Kornfield created ARROW-9314: -- Summary: [Go] Use Feature enum Key: ARROW-9314 URL: https://issues.apache.org/jira/browse/ARROW-9314 Project: Apache Arrow Issue Type: Sub-task Components: Go Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9313) [Rust] Use feature enum
Micah Kornfield created ARROW-9313: -- Summary: [Rust] Use feature enum Key: ARROW-9313 URL: https://issues.apache.org/jira/browse/ARROW-9313 Project: Apache Arrow Issue Type: Sub-task Components: Rust Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9312) [C++] Use feature enum
Micah Kornfield created ARROW-9312: -- Summary: [C++] Use feature enum Key: ARROW-9312 URL: https://issues.apache.org/jira/browse/ARROW-9312 Project: Apache Arrow Issue Type: Sub-task Components: C++ Reporter: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9308) Add Feature enum to schema.fbs for forward compatibity
Micah Kornfield created ARROW-9308: -- Summary: Add Feature enum to schema.fbs for forward compatibity Key: ARROW-9308 URL: https://issues.apache.org/jira/browse/ARROW-9308 Project: Apache Arrow Issue Type: Improvement Components: Format Reporter: Micah Kornfield Assignee: Micah Kornfield -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9264) [C++] Cleanup Parquet Arrow Schema code
Micah Kornfield created ARROW-9264: -- Summary: [C++] Cleanup Parquet Arrow Schema code Key: ARROW-9264 URL: https://issues.apache.org/jira/browse/ARROW-9264 Project: Apache Arrow Issue Type: Sub-task Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield We need a function/class that can take the parquet schema and a proposed arrow schema (potentially retrieved from parquet metadata) and outputs a data structure that contains, all of the information in "SchemaField" and the following additional options: 1. Corresponding Definition level for nullability (wouldn't be populated for non-null arrays). 2. Correspond Repetition level for lists (wouldn't be populated for for non-lists). 3. Definition level for "empty lists". (wouldn't be populated for legacy two level encoded lists). One option is to augment and populate these on the SchemaField. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9223) Fix to_pandas() export for timestamps within structs
Micah Kornfield created ARROW-9223: -- Summary: Fix to_pandas() export for timestamps within structs Key: ARROW-9223 URL: https://issues.apache.org/jira/browse/ARROW-9223 Project: Apache Arrow Issue Type: Bug Reporter: Micah Kornfield Currently timestamps within structs unilaterally have their timezone discarded for backwards compatibility reasons. There is a TODO in the code to come up with a better solution. This Jira tracks the solution. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-7955) [Java] Support large buffer for file/stream IPC
[ https://issues.apache.org/jira/browse/ARROW-7955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield resolved ARROW-7955. Resolution: Fixed > [Java] Support large buffer for file/stream IPC > --- > > Key: ARROW-7955 > URL: https://issues.apache.org/jira/browse/ARROW-7955 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Time Spent: 1h 10m > Remaining Estimate: 0h > > After supporting 64-bit ArrowBuf, we need to make file/stream IPC work. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-9049) [C++] Add a Result<> returning method for for constructing a dictionary
Micah Kornfield created ARROW-9049: -- Summary: [C++] Add a Result<> returning method for for constructing a dictionary Key: ARROW-9049 URL: https://issues.apache.org/jira/browse/ARROW-9049 Project: Apache Arrow Issue Type: Bug Components: C++ Reporter: Micah Kornfield Assignee: Micah Kornfield Dictionary types require a signed integer index type. Today there is a DCHECK that this is the case in the constructor. When reading data from an unknown source it is possible due to corruption (or user error) that the dictionary index type is not signed. We should add a method that checks for signedness and use that at all system boundaries to validate input data. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9039) py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions
[ https://issues.apache.org/jira/browse/ARROW-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126325#comment-17126325 ] Micah Kornfield edited comment on ARROW-9039 at 6/5/20, 2:59 AM: - Thank you for the report. This is intended behavior, the documentation was clarified I think as of 0.16 or 0.15 ([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]). Serialize/Deserialize do not provide backward compatibility. You need to you use IPC functionality ([https://arrow.apache.org/docs/python/ipc.html#streaming-serialization-and-ipc]) for compatibility guarantees (0.11 is quite old but I don't think anything should have been broken between versions). was (Author: emkornfi...@gmail.com): Thank you for the report. This is intended behavior, the documentation was clarified I think as of 0.16 or 0.15 ([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]). Serialize/Deserialize do not provide backward compatibility. You need to you use IPC functionality for compatibility guarantees (0.11 is quite old but I don't think anything should have been broken between versions). > py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent > versions > - > > Key: ARROW-9039 > URL: https://issues.apache.org/jira/browse/ARROW-9039 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1, 0.15.1 > Environment: python, windows >Reporter: Yoav Git >Priority: Minor > > I have been saving dataframes into mongodb using: > {{import pandas as pd; import pyarrow as pa}} > {{df = pd.DataFrame([[1,2,3],[2,3,4]], columns = ['x','y','z'])}} > {{byte = pa.serialize(df).to_buffer().to_pybytes()}} > and then reading back using: > {{df = pa.deserialize(pa.py_buffer(memoryview(byte)))}} > However, pyarrow is not back-compatible. i.e. both versions 0.11.1 and 0.15.1 > can read their own pybytes created by it. Alas, they cannot read each other. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-9039) py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions
[ https://issues.apache.org/jira/browse/ARROW-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126325#comment-17126325 ] Micah Kornfield edited comment on ARROW-9039 at 6/5/20, 2:58 AM: - Thank you for the report. This is intended behavior, the documentation was clarified I think as of 0.16 or 0.15 ([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]). Serialize/Deserialize do not provide backward compatibility. You need to you use IPC functionality for compatibility guarantees (0.11 is quite old but I don't think anything should have been broken between versions). was (Author: emkornfi...@gmail.com): This is intended behavior, the documentation was clarified I think as of 0.16 or 0.15 ([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]). Serialize/Deserialize do not provide backward compatibility. You need to you use IPC functionality for compatibility guarantees (0.11 is quite old but I don't think anything should have been broken between versions). > py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent > versions > - > > Key: ARROW-9039 > URL: https://issues.apache.org/jira/browse/ARROW-9039 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1, 0.15.1 > Environment: python, windows >Reporter: Yoav Git >Priority: Minor > > I have been saving dataframes into mongodb using: > {{import pandas as pd; import pyarrow as pa}} > {{df = pd.DataFrame([[1,2,3],[2,3,4]], columns = ['x','y','z'])}} > {{byte = pa.serialize(df).to_buffer().to_pybytes()}} > and then reading back using: > {{df = pa.deserialize(pa.py_buffer(memoryview(byte)))}} > However, pyarrow is not back-compatible. i.e. both versions 0.11.1 and 0.15.1 > can read their own pybytes created by it. Alas, they cannot read each other. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-9039) py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions
[ https://issues.apache.org/jira/browse/ARROW-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126325#comment-17126325 ] Micah Kornfield commented on ARROW-9039: This is intended behavior, the documentation was clarified I think as of 0.16 or 0.15 ([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]). Serialize/Deserialize do not provide backward compatibility. You need to you use IPC functionality for compatibility guarantees (0.11 is quite old but I don't think anything should have been broken between versions). > py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent > versions > - > > Key: ARROW-9039 > URL: https://issues.apache.org/jira/browse/ARROW-9039 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.11.1, 0.15.1 > Environment: python, windows >Reporter: Yoav Git >Priority: Minor > > I have been saving dataframes into mongodb using: > {{import pandas as pd; import pyarrow as pa}} > {{df = pd.DataFrame([[1,2,3],[2,3,4]], columns = ['x','y','z'])}} > {{byte = pa.serialize(df).to_buffer().to_pybytes()}} > and then reading back using: > {{df = pa.deserialize(pa.py_buffer(memoryview(byte)))}} > However, pyarrow is not back-compatible. i.e. both versions 0.11.1 and 0.15.1 > can read their own pybytes created by it. Alas, they cannot read each other. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-4144) [Java] Arrow-to-JDBC
[ https://issues.apache.org/jira/browse/ARROW-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123402#comment-17123402 ] Micah Kornfield edited comment on ARROW-4144 at 6/2/20, 6:56 AM: - [~uwe] have you come across a use-case for writing to JDBC sources? was (Author: emkornfi...@gmail.com): @uwe have you come across a use-case for writing to JDBC sources? > [Java] Arrow-to-JDBC > > > Key: ARROW-4144 > URL: https://issues.apache.org/jira/browse/ARROW-4144 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Michael Pigott >Assignee: Chen >Priority: Major > > ARROW-1780 reads a query from a JDBC data source and converts the ResultSet > to an Arrow VectorSchemaRoot. However, there is no built-in adapter for > writing an Arrow VectorSchemaRoot back to the database. > ARROW-3966 adds JDBC field metadata: > * The Catalog Name > * The Table Name > * The Field Name > * The Field Type > We can use this information to ask for the field information from the > database via the > [DatabaseMetaData|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html] > object. We can then create INSERT or UPDATE statements based on the [list > of primary > keys|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)] > in the table: > * If the value in the VectorSchemaRoot corresponding to the primary key is > NULL, insert that record into the database. > * If the value in the VectorSchemaRoot corresponding to the primary key is > not NULL, update the existing record in the database. > We can also perform the same data conversion in reverse based on the field > types queried from the database. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4144) [Java] Arrow-to-JDBC
[ https://issues.apache.org/jira/browse/ARROW-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123402#comment-17123402 ] Micah Kornfield commented on ARROW-4144: @uwe have you come across a use-case for writing to JDBC sources? > [Java] Arrow-to-JDBC > > > Key: ARROW-4144 > URL: https://issues.apache.org/jira/browse/ARROW-4144 > Project: Apache Arrow > Issue Type: Improvement > Components: Java >Reporter: Michael Pigott >Assignee: Chen >Priority: Major > > ARROW-1780 reads a query from a JDBC data source and converts the ResultSet > to an Arrow VectorSchemaRoot. However, there is no built-in adapter for > writing an Arrow VectorSchemaRoot back to the database. > ARROW-3966 adds JDBC field metadata: > * The Catalog Name > * The Table Name > * The Field Name > * The Field Type > We can use this information to ask for the field information from the > database via the > [DatabaseMetaData|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html] > object. We can then create INSERT or UPDATE statements based on the [list > of primary > keys|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)] > in the table: > * If the value in the VectorSchemaRoot corresponding to the primary key is > NULL, insert that record into the database. > * If the value in the VectorSchemaRoot corresponding to the primary key is > not NULL, update the existing record in the database. > We can also perform the same data conversion in reverse based on the field > types queried from the database. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-8972) [Java] Support range value comparison for large varchar/varbinary vectors
[ https://issues.apache.org/jira/browse/ARROW-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield resolved ARROW-8972. Resolution: Fixed > [Java] Support range value comparison for large varchar/varbinary vectors > - > > Key: ARROW-8972 > URL: https://issues.apache.org/jira/browse/ARROW-8972 > Project: Apache Arrow > Issue Type: New Feature > Components: Java >Reporter: Liya Fan >Assignee: Liya Fan >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Support comparing a range of values for LargeVarCharVector and > LargeVarBinaryVector. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-9000) Java build crashes with JDK14
[ https://issues.apache.org/jira/browse/ARROW-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield updated ARROW-9000: --- Component/s: Java > Java build crashes with JDK14 > - > > Key: ARROW-9000 > URL: https://issues.apache.org/jira/browse/ARROW-9000 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Current master tree does not build with JDK14. The issue seems to be caused > by error prone plugin: > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.6.2:compile > (default-compile) on project arrow-memory: Compilation failure > [ERROR] > /Users/laurent/devel/arrow/java/memory/src/main/java/org/apache/arrow/memory/BufferLedger.java:[545,15] > error: An unhandled exception was thrown by the Error Prone static analysis > plugin. > [ERROR] Please report this at > https://github.com/google/error-prone/issues/new and include the following: > [ERROR] > [ERROR] error-prone version: 2.3.3 > [ERROR] BugPattern: TypeParameterUnusedInFormals > [ERROR] Stack Trace: > [ERROR] java.lang.NoSuchFieldError: bound > [ERROR] at > com.google.errorprone.bugpatterns.TypeParameterUnusedInFormals.matchMethod(TypeParameterUnusedInFormals.java:71) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.processMatchers(ErrorProneScanner.java:433) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:725) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:150) > [ERROR] at > jdk.compiler/com.sun.tools.javac.tree.JCTree$JCMethodDecl.accept(JCTree.java:916) > [ERROR] at > jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:90) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.visitClass(TreeScanner.java:187) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:535) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:150) > [ERROR] at > jdk.compiler/com.sun.tools.javac.tree.JCTree$JCClassDecl.accept(JCTree.java:823) > [ERROR] at > jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.visitCompilationUnit(TreeScanner.java:144) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:546) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:150) > [ERROR] at > jdk.compiler/com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:603) > [ERROR] at > jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:56) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:55) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScannerTransformer.apply(ErrorProneScannerTransformer.java:43) > [ERROR] at > com.google.errorprone.ErrorProneAnalyzer.finished(ErrorProneAnalyzer.java:151) > [ERROR] at > jdk.compiler/com.sun.tools.javac.api.MultiTaskListener.finished(MultiTaskListener.java:132) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1423) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1370) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:959) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:316) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:176) > [ERROR] at
[jira] [Updated] (ARROW-9000) [Java] build crashes with JDK14
[ https://issues.apache.org/jira/browse/ARROW-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Micah Kornfield updated ARROW-9000: --- Summary: [Java] build crashes with JDK14 (was: Java build crashes with JDK14) > [Java] build crashes with JDK14 > --- > > Key: ARROW-9000 > URL: https://issues.apache.org/jira/browse/ARROW-9000 > Project: Apache Arrow > Issue Type: Bug > Components: Java >Reporter: Laurent Goujon >Assignee: Laurent Goujon >Priority: Minor > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Current master tree does not build with JDK14. The issue seems to be caused > by error prone plugin: > {noformat} > [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.6.2:compile > (default-compile) on project arrow-memory: Compilation failure > [ERROR] > /Users/laurent/devel/arrow/java/memory/src/main/java/org/apache/arrow/memory/BufferLedger.java:[545,15] > error: An unhandled exception was thrown by the Error Prone static analysis > plugin. > [ERROR] Please report this at > https://github.com/google/error-prone/issues/new and include the following: > [ERROR] > [ERROR] error-prone version: 2.3.3 > [ERROR] BugPattern: TypeParameterUnusedInFormals > [ERROR] Stack Trace: > [ERROR] java.lang.NoSuchFieldError: bound > [ERROR] at > com.google.errorprone.bugpatterns.TypeParameterUnusedInFormals.matchMethod(TypeParameterUnusedInFormals.java:71) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.processMatchers(ErrorProneScanner.java:433) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:725) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:150) > [ERROR] at > jdk.compiler/com.sun.tools.javac.tree.JCTree$JCMethodDecl.accept(JCTree.java:916) > [ERROR] at > jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:90) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.visitClass(TreeScanner.java:187) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:535) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:150) > [ERROR] at > jdk.compiler/com.sun.tools.javac.tree.JCTree$JCClassDecl.accept(JCTree.java:823) > [ERROR] at > jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113) > [ERROR] at > jdk.compiler/com.sun.source.util.TreeScanner.visitCompilationUnit(TreeScanner.java:144) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:546) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:150) > [ERROR] at > jdk.compiler/com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:603) > [ERROR] at > jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:56) > [ERROR] at com.google.errorprone.scanner.Scanner.scan(Scanner.java:55) > [ERROR] at > com.google.errorprone.scanner.ErrorProneScannerTransformer.apply(ErrorProneScannerTransformer.java:43) > [ERROR] at > com.google.errorprone.ErrorProneAnalyzer.finished(ErrorProneAnalyzer.java:151) > [ERROR] at > jdk.compiler/com.sun.tools.javac.api.MultiTaskListener.finished(MultiTaskListener.java:132) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1423) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1370) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:959) > [ERROR] at > jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:316) > [ERROR] at >