[jira] [Created] (ARROW-18253) [C++][Parquet] Improve bounds checking on some inputs

2022-11-04 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-18253:
---

 Summary: [C++][Parquet] Improve bounds checking on some inputs
 Key: ARROW-18253
 URL: https://issues.apache.org/jira/browse/ARROW-18253
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


In some cases we don't check for lower bound of 0, on some non-performance 
critical paths we only have DCHECKs, and while unlikely in some cases we cast 
from size_t to int32 which can overflow, adding some safety checks here would 
be useful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-17535) [Python] List arrays aren't supported in to_pandas calls

2022-08-25 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-17535:
---

 Summary: [Python] List arrays aren't supported in 
to_pandas calls
 Key: ARROW-17535
 URL: https://issues.apache.org/jira/browse/ARROW-17535
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Micah Kornfield


EXTENSION is not in the list of types allowed.  I think in order to enable 
EXTENSION we need to be able to call to_pylist or similar on the original 
extension array from C++ code, in case there were user provided overrides.  Off 
the top of my head one way of doing this would be to pass through an additional 
std::unorderd_map where PyObject is the bound to_pylist 
python function.  Are there other alternative that might be cleaner?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (ARROW-16484) [Go][Parquet] Ensure a WriterVersion is written out in parquet go.

2022-05-05 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16484:
---

 Summary: [Go][Parquet] Ensure a WriterVersion is written out in 
parquet go.
 Key: ARROW-16484
 URL: https://issues.apache.org/jira/browse/ARROW-16484
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Micah Kornfield
Assignee: Matthew Topol


We should ensure a unique version information for parquet files is populated.  
I tried searching the go code but could only find reading the version back.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16326) [C++][Python] Add GCS Timeout parameter for GCS FileSystem.

2022-04-25 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16326:
---

 Summary: [C++][Python] Add GCS Timeout parameter for GCS 
FileSystem.
 Key: ARROW-16326
 URL: https://issues.apache.org/jira/browse/ARROW-16326
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Follow-up from [https://github.com/apache/arrow/pull/12763] if gcs testbench 
isn't installed properly the failure mode is tests timeouts because the 
connection hangs.  We should add a timeout parameter to prevent this



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16270) [C++][Python][FileSystem] Make directory paths returned uniform

2022-04-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16270:
---

 Summary: [C++][Python][FileSystem] Make directory paths returned 
uniform
 Key: ARROW-16270
 URL: https://issues.apache.org/jira/browse/ARROW-16270
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Reporter: Micah Kornfield


Depending on if paths are selected with recursion or without code the result of 
the returned directories changes to include a slash or not include a slash (see 
code linked below).  It would be nice to provide consistent output here.  It 
isn't clear i the breaking change is worthwhile here.

 

 [1] 
https://github.com/apache/arrow/blob/3eaa7dd0e8b3dabc5438203331f05e3e6c011e37/python/pyarrow/tests/test_fs.py#L688

  [2] 
https://github.com/apache/arrow/blob/3eaa7dd0e8b3dabc5438203331f05e3e6c011e37/cpp/src/arrow/filesystem/test_util.cc#L767



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (ARROW-16227) [Archery] Make cpp argument list keyword only

2022-04-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16227:
---

 Summary: [Archery] Make cpp argument list keyword only
 Key: ARROW-16227
 URL: https://issues.apache.org/jira/browse/ARROW-16227
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery
Reporter: Micah Kornfield
Assignee: Micah Kornfield


cpp params should be keyword only.  See 
[https://github.com/apache/arrow/pull/12763/files#r852112789] (i.e. adding *, 
before all keyword options.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16226) [C++] Add better coverage for filesystem tell.

2022-04-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16226:
---

 Summary: [C++] Add better coverage for filesystem tell.
 Key: ARROW-16226
 URL: https://issues.apache.org/jira/browse/ARROW-16226
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Add a C++ generic file system test that writes wrote N bytes to a file. then 
seeks to N/2 and and read the remainder.  Verify the remainder bytes are N/2 
and expected from the bytes writter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16160) [C++] IPC Stream Reader doesn't check if extra fields are present for RecordBatches

2022-04-08 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16160:
---

 Summary: [C++] IPC Stream Reader doesn't check if extra fields are 
present for RecordBatches
 Key: ARROW-16160
 URL: https://issues.apache.org/jira/browse/ARROW-16160
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Affects Versions: 6.0.1
Reporter: Micah Kornfield


I looked through recent commits and I don't think this issue has been patched 
since:

{code:title=test.python|borderStyle=solid}
import pyarrow as pa
with pa.output_stream("/tmp/f1") as sink:
  with pa.RecordBatchStreamWriter(sink, rb1.schema) as writer:
writer.write(rb1)
end_rb1 = sink.tell()

with pa.output_stream("/tmp/f2") as sink:
  with pa.RecordBatchStreamWriter(sink, rb2.schema) as writer:
writer.write(rb2)
start_rb2_only = sink.tell()
writer.write(rb2)
end_rb2 = sink.tell()

# Stitch to togher rb1.schema, rb1 and rb2 without schema.
with pa.output_stream("/tmp/f3") as sink:
  with pa.input_stream("/tmp/f1") as inp:
 sink.write(inp.read(end_rb1))
  with pa.input_stream("/tmp/f2") as inp:
inp.seek(start_rb2_only)
sink.write(inp.read(end_rb2 - start_rb2_only))

with pa.ipc.open_stream("/tmp/f3") as sink:
  print(sink.read_all())
{code}
Yields:
{code}
{{pyarrow.Table
c1: int64

c1: [[1],[1]]
{code}

I would expect this to error because the second stiched in record batch has 
more fields then necessary but it appears to load just fine.  

Is this intended behavior?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16102) [C++] Builds that us cpp/cmake_modules/FindgRPCAlt.cmake cannot build GCS support

2022-04-03 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16102:
---

 Summary: [C++] Builds that us cpp/cmake_modules/FindgRPCAlt.cmake 
cannot build GCS support
 Key: ARROW-16102
 URL: https://issues.apache.org/jira/browse/ARROW-16102
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Micah Kornfield


cpp/cmake_modules/FindgRPCAlt.cmake somehow exposes the same libraries defined 
in build_absl_once (defined in cpp/cmake_modules/ThirdpartyToolchain.cmake) 
causing CMake to fail when GCS client is enabled for building.

I tried playing around with various options but given my limited CMake skills I 
could not figure out an easy solution to this.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-16048) [PyArrow] Null buffers with Pickle protocol.

2022-03-28 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-16048:
---

 Summary: [PyArrow] Null buffers with Pickle protocol.
 Key: ARROW-16048
 URL: https://issues.apache.org/jira/browse/ARROW-16048
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Micah Kornfield
Assignee: Micah Kornfield


When underlying buffers are null they populate the buffer protocol ".buf" value 
with a null value.  In some cases this can violate contracts [asserted in 
cpython|https://github.com/python/cpython/blob/882d8096c262a5945e0cfdd706e5db3ad2b73543/Modules/_pickle.c#L1072].
  It might be best to always return an empty non-null buffer when the 
underlying buffer is null.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15783) [Python] Converting arrow MonthDayNanoInterval to pandas fails DCHECK

2022-02-24 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-15783:
---

 Summary: [Python] Converting arrow MonthDayNanoInterval to pandas 
fails DCHECK
 Key: ARROW-15783
 URL: https://issues.apache.org/jira/browse/ARROW-15783
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Micah Kornfield
Assignee: Micah Kornfield


InitPandasStaticData is only called on python/pandas -> Arrow and not the 
reverse path

 

This causes the DCHECK to make sure the Pandas type is not null to fail if 
import code is never used.

 

A workaround to users of the library is to call pa.array([1]) which would avoid 
this issue.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15728) [Python] Zstd IPC test is flaky.

2022-02-17 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-15728:
---

 Summary: [Python] Zstd IPC test is flaky.
 Key: ARROW-15728
 URL: https://issues.apache.org/jira/browse/ARROW-15728
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Our internal CI system shows flakes on the test at approximately a 2% rate.  By 
reducing the integer range we can make this much less flaky (zero observed 
flakes in 5000 runs).



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15727) [Python] Lists of MonthDayNano Interval can't be converted to Pandas

2022-02-17 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-15727:
---

 Summary: [Python] Lists of MonthDayNano Interval can't be 
converted to Pandas
 Key: ARROW-15727
 URL: https://issues.apache.org/jira/browse/ARROW-15727
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15596) thift_internal.h assumes shared_ptr type in some cases

2022-02-06 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-15596:
---

 Summary: thift_internal.h assumes shared_ptr type in some cases
 Key: ARROW-15596
 URL: https://issues.apache.org/jira/browse/ARROW-15596
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Thrift can still be built with boost shared_ptrs so we need to be pointer 
agnostic.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-15511) [Python] GIL not held for Ndarray1DIndexer on

2022-01-31 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-15511:
---

 Summary: [Python] GIL not held for Ndarray1DIndexer on
 Key: ARROW-15511
 URL: https://issues.apache.org/jira/browse/ARROW-15511
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 6.0.1
Reporter: Micah Kornfield


[In _ndarray_to_array the call to 
NdarrayToArrow|https://github.com/apache/arrow/blob/658bec37aa5cbdd53b5e4cdc81b8ba3962e67f11/python/pyarrow/array.pxi#L82]
 in explicitly excluded from the GIL.  In some code-paths Ndarray1DIndexer is 
instantiated which will try to do PyINC_REF and PyDdecREf on initialization and 
destruction.  These code paths do not appear to acquire the GIL.

 

I'm not sure what the best fix is:
 # Acquire GIL as part of Ndarray1DIndexer construction.
 # Eliminate the nogil block in _ndarray_to_array
 # Eliminate the incref and decref calls in Ndarray1DIndexer
 # Something else?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (ARROW-14156) StructArray::Flatten is incorrect in some cases

2021-09-28 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-14156:
---

 Summary: StructArray::Flatten is incorrect in some cases
 Key: ARROW-14156
 URL: https://issues.apache.org/jira/browse/ARROW-14156
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 5.0.0
Reporter: Micah Kornfield


When trying to flatten a struct that has children that were sliced we see 
incorrect results.

 
{code:title=Bar.java|borderStyle=solid}
import pyarrow as pa

a = py.array([1,2,3])

sliceds = a.slice(1)

composed_struct = pa.StructArray.from_buffers(pa.struct([pa.field("a", 
sliceds.type)]), len(sliceds), [pa.array([True, False]).buffers()[1]], 
children=[sliceds])

>>> composed_struct

-- is_valid:
  [
    true,
    false
  ]

-- child 0 type: int64
  [
    2,
    3
  ]

>>> composed_struct.flatten()
[
[
  null,
  null
]]
{code}
 
I believe the problems is 
[here|https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/array/array_nested.cc#L572]
 the copy does not account for child array offset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13809) [C ABI] Add support for Month, Day, Nanosecond interval type to C-ABI

2021-08-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13809:
---

 Summary: [C ABI] Add support for Month, Day, Nanosecond interval 
type to C-ABI
 Key: ARROW-13809
 URL: https://issues.apache.org/jira/browse/ARROW-13809
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C
Reporter: Micah Kornfield
Assignee: Micah Kornfield


[https://github.com/apache/arrow/pull/10177] has been merged we should support 
transport of the new type via the C ABI bindings.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13808) [Ruby] Add bindings for Month, Day, Nano Interval Type

2021-08-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13808:
---

 Summary: [Ruby] Add bindings for Month, Day, Nano Interval Type
 Key: ARROW-13808
 URL: https://issues.apache.org/jira/browse/ARROW-13808
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Ruby
Reporter: Micah Kornfield


[https://github.com/apache/arrow/pull/10177] has been merged we should support 
conversion to and from this type for standard ruby types (or custom types) if 
possible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13807) [R] Add bindings for Month, Day, Nanos Interval Type

2021-08-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13807:
---

 Summary: [R] Add bindings for Month, Day, Nanos Interval Type
 Key: ARROW-13807
 URL: https://issues.apache.org/jira/browse/ARROW-13807
 Project: Apache Arrow
  Issue Type: New Feature
  Components: R
Reporter: Micah Kornfield


[https://github.com/apache/arrow/pull/10177] has been merged we should support 
conversion to and from canonical R types if available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13806) [Python] Add conversion to/from Pandas/Python for Month, Day Nano Interval Type

2021-08-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13806:
---

 Summary: [Python] Add conversion to/from Pandas/Python for Month, 
Day Nano Interval Type
 Key: ARROW-13806
 URL: https://issues.apache.org/jira/browse/ARROW-13806
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Python
Reporter: Micah Kornfield
Assignee: Micah Kornfield


[https://github.com/apache/arrow/pull/10177] has been merged we should support 
conversion to and from this type for standard python surface areas.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13805) [C#] https://github.com/apache/arrow/pull/10177

2021-08-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13805:
---

 Summary: [C#] https://github.com/apache/arrow/pull/10177
 Key: ARROW-13805
 URL: https://issues.apache.org/jira/browse/ARROW-13805
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Micah Kornfield


https://github.com/apache/arrow/pull/10177



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13804) [Go] Add Support for Interval Type Month, Day, Nano

2021-08-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13804:
---

 Summary: [Go] Add Support for Interval Type Month, Day, Nano
 Key: ARROW-13804
 URL: https://issues.apache.org/jira/browse/ARROW-13804
 Project: Apache Arrow
  Issue Type: New Feature
  Components: Go
Reporter: Micah Kornfield


[https://github.com/apache/arrow/pull/10177] has been merged we should ensure 
Go supports the new type and enable integration tests for it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13690) [Python] Use IPC writing code for pickling RecordBatches

2021-08-22 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13690:
---

 Summary: [Python] Use IPC writing code for pickling RecordBatches
 Key: ARROW-13690
 URL: https://issues.apache.org/jira/browse/ARROW-13690
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Micah Kornfield


For wide schemas in particular the the recursive nature of the currently 
pickling algorithm for record batches makes it less efficient then using the 
IPC format (which can be done entirely in C++).

 

Consider switching the mechanism to use the IPC format.  I think this can be a 
backwards compatible change if the current leaving: _reconstruct_record_batch 
in place if we care about that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13672) [C++] BinaryBuilder doesn't preserve passed in DataType

2021-08-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13672:
---

 Summary: [C++] BinaryBuilder doesn't preserve passed in DataType
 Key: ARROW-13672
 URL: https://issues.apache.org/jira/browse/ARROW-13672
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 5.0.0
Reporter: Micah Kornfield


There is a 
[constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56]
 that takes a datatype for binary builder but it is discarded.  When 
constructing an Array the type is always the value returned from type() 
[binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390]

If a consumer of the API wants to have an extension array this prevents them 
from passing the extension type though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13673) [C++] BinaryBuilder doesn't preserve passed in DataType

2021-08-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13673:
---

 Summary: [C++] BinaryBuilder doesn't preserve passed in DataType
 Key: ARROW-13673
 URL: https://issues.apache.org/jira/browse/ARROW-13673
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Affects Versions: 5.0.0
Reporter: Micah Kornfield


There is a 
[constructor|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L56]
 that takes a datatype for binary builder but it is discarded.  When 
constructing an Array the type is always the value returned from type() 
[binary|https://github.com/apache/arrow/blob/1430c93f68960e10a50d27f465eb174e76ac06b2/cpp/src/arrow/array/builder_binary.h#L390]

If a consumer of the API wants to have an extension array this prevents them 
from passing the extension type though.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13670) Do a round of compiler warning cleanups

2021-08-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13670:
---

 Summary: Do a round of compiler warning cleanups
 Key: ARROW-13670
 URL: https://issues.apache.org/jira/browse/ARROW-13670
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Micah Kornfield
Assignee: Micah Kornfield


During a build I found several classes without virtual destructors and some out 
of order initialization.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13669) Variant emplace methods appear to be missing curly braces.

2021-08-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13669:
---

 Summary: Variant emplace methods appear to be missing curly braces.
 Key: ARROW-13669
 URL: https://issues.apache.org/jira/browse/ARROW-13669
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13628) [Format] Add MonthDayNano interval type.

2021-08-13 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13628:
---

 Summary: [Format] Add MonthDayNano interval type.
 Key: ARROW-13628
 URL: https://issues.apache.org/jira/browse/ARROW-13628
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Format, Java
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Add type definition to fbs files with initial IPC implementations for Java and 
C++ (as discussed on the mailing list).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-13012) [C++] Add ability for retrieving dictionary and indices separately for ColumnReader

2021-06-08 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-13012:
---

 Summary: [C++] Add ability  for retrieving dictionary and indices 
separately for ColumnReader
 Key: ARROW-13012
 URL: https://issues.apache.org/jira/browse/ARROW-13012
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Micah Kornfield


In some contexts it is useful to be able to retrieve these separately instead 
of decoding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12907) [Java] Memory possible when exception reading from channel happens

2021-05-29 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12907:
---

 Summary: [Java] Memory possible when exception reading from 
channel happens
 Key: ARROW-12907
 URL: https://issues.apache.org/jira/browse/ARROW-12907
 Project: Apache Arrow
  Issue Type: Bug
  Components: Java
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12769) [Python] Negative out of range slices yield invalid arrays

2021-05-12 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12769:
---

 Summary: [Python] Negative out of range slices yield invalid arrays
 Key: ARROW-12769
 URL: https://issues.apache.org/jira/browse/ARROW-12769
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Affects Versions: 4.0.0, 2.0.0
Reporter: Micah Kornfield
 Fix For: 5.0.0, 4.0.1


Tested on pyarrow 2.0 and pyarrow 4.0 wheels.  The errors are slightly 
different between the 2.0.  Below is a script from 4.0

 

This is taken from the result of test_slice_array

{{ }}
{{ >>> import pyarrow as pa}}
{{ >>> pa.array(range(0,10))}}
{{ }}
{{ [}}
{{ 0,}}
{{ 1,}}
{{ 2,}}
{{ 3,}}
{{ 4,}}
{{ 5,}}
{{ 6,}}
{{ 7,}}
{{ 8,}}
{{ 9}}
{{ ]}}
{{ >>> a=pa.array(range(0,10))}}
{{ >>> a[-9:-20]}}
{{ }}
{{ []}}
{{ >>> len(a[-9:-20])}}
{{ Traceback (most recent call last):}}
{{ File "", line 1, in }}
{{ SystemError:  returned NULL without setting an error}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12340) [Java] Avro to Arrow converter doesn't appear to generate valid arrow data

2021-04-12 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12340:
---

 Summary: [Java] Avro to Arrow converter doesn't appear to generate 
valid arrow data
 Key: ARROW-12340
 URL: https://issues.apache.org/jira/browse/ARROW-12340
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield


I think this is related to how Unions are handled (I had thought unions of with 
a null and one other type would get created to the nullable type, but that is a 
separate issue).

 

I haven't had time to fully diagnose, but remnants of the code I tried to use 
are at [https://gist.github.com/emkornfield/efd3a4c3c1012dc19cf9769198e3bffe]

 

And the CSV file from 
https://issues.apache.org/jira/browse/ARROW-11629?jql=text%20~%20%22arrow%20drill%20parquet%20dictionary%22

 

produce data that isn't readable by the C++ implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12196) [C++] C++ IPC reading looks like it doesn't support uncompressed buffers

2021-04-04 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12196:
---

 Summary: [C++] C++ IPC reading looks like it doesn't support 
uncompressed buffers 
 Key: ARROW-12196
 URL: https://issues.apache.org/jira/browse/ARROW-12196
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Micah Kornfield


https://github.com/apache/arrow/blob/master/cpp/src/arrow/ipc/reader.cc#L411 
does seems to check for the case (I'm not sure if this is the right code 
though):
  uncompressed length may be set to -1 to indicate that the data that follows 
is not compressed, which can be useful for cases where compression does not 
yield appreciable savings.

https://github.com/apache/arrow/blob/5cabd31c90dbb32d87074928f68bf5d6e97e37c6/format/Message.fbs#L59



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12195) [Archery][Integration] Support round trip tests for compression

2021-04-04 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12195:
---

 Summary: [Archery][Integration] Support round trip tests for 
compression
 Key: ARROW-12195
 URL: https://issues.apache.org/jira/browse/ARROW-12195
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Archery, Integration
Reporter: Micah Kornfield


Archery and corresponding language bindings should support round trip testing 
for compression.  

Today we only have checks on generated "gold" files from C++ we should also 
support round trip testing, now that there is a java implementation and a WIP 
Go implementation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12164) [Java] Make BaseAllocator.Config public

2021-03-31 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12164:
---

 Summary: [Java] Make BaseAllocator.Config public
 Key: ARROW-12164
 URL: https://issues.apache.org/jira/browse/ARROW-12164
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Micah Kornfield
 Fix For: 4.0.0


Alternatively we could make RootAllocator take immutable config.  The problem 
is that default config cannot be used from shaded binaries because it is 
package private.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12163) [Java] Make compression levels configurable.

2021-03-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12163:
---

 Summary: [Java] Make compression levels configurable.
 Key: ARROW-12163
 URL: https://issues.apache.org/jira/browse/ARROW-12163
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Micah Kornfield


Today we use default compression levels in compressors, these should be 
configurable via constructor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12115) [Java] Rename compression classes

2021-03-26 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12115:
---

 Summary: [Java] Rename compression classes
 Key: ARROW-12115
 URL: https://issues.apache.org/jira/browse/ARROW-12115
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Micah Kornfield


Zstd isn't using the commons codec, so we should rename 
CommonsCompressionFactory to something more generic, and the existing LZ4 
implementation to something potentially more generic.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12110) [Java] Implement ZSTD buffer compression for java

2021-03-26 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12110:
---

 Summary: [Java] Implement ZSTD buffer compression for java
 Key: ARROW-12110
 URL: https://issues.apache.org/jira/browse/ARROW-12110
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12035) [Developer Tools] Update merge tool to populate component if not present based on PR title

2021-03-20 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12035:
---

 Summary: [Developer Tools] Update merge tool to populate component 
if not present based on PR title
 Key: ARROW-12035
 URL: https://issues.apache.org/jira/browse/ARROW-12035
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools
Reporter: Micah Kornfield


If a user forgets to set a component in jira, the pr merge tool highlights 
this.  If the component is specified in the PR title we should provide 
committers of automatically setting the component when updating the JIRA, so 
they don't need to logon separately.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12034) [Docs] Formalize Trivial PRs

2021-03-20 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12034:
---

 Summary: [Docs] Formalize Trivial PRs
 Key: ARROW-12034
 URL: https://issues.apache.org/jira/browse/ARROW-12034
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Developer Tools, Documentation
Reporter: Micah Kornfield
Assignee: Micah Kornfield
 Fix For: 4.0.0


Based on ML discussion: 
[https://lists.apache.org/x/thread.html/rd032058727d1b61f46813f25db586e320004fe4ccbf9cdfa13df44e8@%3Cdev.arrow.apache.org%3E]

 

Update relevant components so we can make trivial PRs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-12022) [C++][Parquet] StatisticsAsScalars doesn't support Decimal conversion for int primitives

2021-03-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-12022:
---

 Summary: [C++][Parquet] StatisticsAsScalars doesn't support 
Decimal conversion for int primitives
 Key: ARROW-12022
 URL: https://issues.apache.org/jira/browse/ARROW-12022
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Micah Kornfield


Logical decimal types that are stored as int primitives are not properly 
handled in the code today.  Also, some clarification around StatisticsAsScalars 
contract is required.  For FLBA and ByteArray Decimals, the smaller decimal 
type that will fit the data is used, it isn't clear if that is desired or if it 
should also include information from the serialized Arrow schema if present



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11828) Expose CSVWriter object in api

2021-02-28 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-11828:
---

 Summary: Expose CSVWriter object in api
 Key: ARROW-11828
 URL: https://issues.apache.org/jira/browse/ARROW-11828
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Based on feedback from initial CSV PR this is likely the preferred API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11829) [C++] Update developer style guide on usage of shared_ptr

2021-02-28 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-11829:
---

 Summary: [C++] Update developer style guide on usage of shared_ptr
 Key: ARROW-11829
 URL: https://issues.apache.org/jira/browse/ARROW-11829
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-11353) [C++][Python][Parquet] We should allow for overriding to large types by providing a schema

2021-01-22 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-11353:
---

 Summary: [C++][Python][Parquet] We should allow for overriding to 
large types by providing a schema
 Key: ARROW-11353
 URL: https://issues.apache.org/jira/browse/ARROW-11353
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Micah Kornfield


{{The following shouldn't throw}}

{{>>> import pyarrow as pa}}
{{>>> import pyarrow.parquet as pq}}
{{>>> import pyarrow.dataset as ds}}
{{>>> pa.__version__}}
{{'2.0.0'}}
{{>>> schema = pa.schema([pa.field("utf8", pa.utf8())])}}
{{>>> table = pa.Table.from_pydict(\{"utf8": ["foo", "bar"]}, schema)}}
{{>>> pq.write_table(table, "/tmp/example.parquet")}}
{{>>> large_schema = pa.schema([pa.field("utf8", pa.large_utf8())])}}
{{>>> ds.dataset("/tmp/example.parquet", schema=large_schema,}}
{{format="parquet").to_table()}}
{{Traceback (most recent call last):}}
{{  File "", line 1, in }}
{{  File "pyarrow/_dataset.pyx", line 405, in}}
{{pyarrow._dataset.Dataset.to_table}}
{{  File "pyarrow/_dataset.pyx", line 2262, in}}
{{pyarrow._dataset.Scanner.to_table}}
{{  File "pyarrow/error.pxi", line 122, in}}
{{pyarrow.lib.pyarrow_internal_check_status}}
{{  File "pyarrow/error.pxi", line 107, in pyarrow.lib.check_status}}
{{pyarrow.lib.ArrowTypeError: fields had matching names but differing types.}}
{{From: utf8: string To: utf8: large_string}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10784) [Python] Loading pyarrow.compute isn't thread safe

2020-12-01 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10784:
---

 Summary: [Python] Loading pyarrow.compute isn't thread safe
 Key: ARROW-10784
 URL: https://issues.apache.org/jira/browse/ARROW-10784
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Affects Versions: 2.0.0
Reporter: Micah Kornfield


When using Arrow in a multithreaded environment it is possible to trigger an 
initialization race on the pyarrow.compute module when calling Array.flatten.

 

Flatten calls _pc() which imports pyarrow compute but if two threads call 
flatten at the same time is possible that the global initialization of 
functions from the registry will be incomplete and therefore cause an 
AttributeError.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10608) [Python] Decimal256 Support finish off full support for conversion to/from decimal types

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10608:
---

 Summary: [Python] Decimal256 Support finish off full support for 
conversion to/from decimal types
 Key: ARROW-10608
 URL: https://issues.apache.org/jira/browse/ARROW-10608
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Python
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10607) [C++][Parquet] Support Reading/Writing Decimal256 type in Parquet

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10607:
---

 Summary: [C++][Parquet] Support Reading/Writing Decimal256 type in 
Parquet
 Key: ARROW-10607
 URL: https://issues.apache.org/jira/browse/ARROW-10607
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10606) [C++][Compute] Support casts to and from Decimal256 type.

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10606:
---

 Summary: [C++][Compute] Support casts to and from Decimal256 type.
 Key: ARROW-10606
 URL: https://issues.apache.org/jira/browse/ARROW-10606
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10605) [C++][Gandiva] Support Decimal256 type in gandiva computation.

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10605:
---

 Summary: [C++][Gandiva] Support Decimal256 type in gandiva 
computation.
 Key: ARROW-10605
 URL: https://issues.apache.org/jira/browse/ARROW-10605
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva
Reporter: Micah Kornfield


There might be a lot of work here, so sub-jiras might be added once scope is 
determined.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10604) [Ruby] Support Decimal256 type

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10604:
---

 Summary: [Ruby] Support Decimal256 type
 Key: ARROW-10604
 URL: https://issues.apache.org/jira/browse/ARROW-10604
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Ruby
Reporter: Micah Kornfield


The C++ implementation now support it.  We need to ensure Ruby/Gobject bindings 
do as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10603) [Javascript] Support Decimal type with 256 bits of precision

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10603:
---

 Summary: [Javascript] Support Decimal type with 256 bits of 
precision
 Key: ARROW-10603
 URL: https://issues.apache.org/jira/browse/ARROW-10603
 Project: Apache Arrow
  Issue Type: Improvement
  Components: JavaScript
Reporter: Micah Kornfield


The specification now supports it and there are basic implementations in 
C++/Java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10602) [Rust] Implement support for Decimal with 256 bits of precision.

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10602:
---

 Summary: [Rust] Implement support for Decimal with 256 bits of 
precision.
 Key: ARROW-10602
 URL: https://issues.apache.org/jira/browse/ARROW-10602
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Rust
Reporter: Micah Kornfield


The specification now supports it and there are basic implementations in 
C++/Java



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10601) [C++] CSV Reader should support Decimal256 type

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10601:
---

 Summary: [C++] CSV Reader should support Decimal256 type
 Key: ARROW-10601
 URL: https://issues.apache.org/jira/browse/ARROW-10601
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10600) [Go] Support Decimal256 type

2020-11-15 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10600:
---

 Summary: [Go] Support Decimal256 type
 Key: ARROW-10600
 URL: https://issues.apache.org/jira/browse/ARROW-10600
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Go
Reporter: Micah Kornfield


Decimal with 256 bit precision is now allowed in the spec with a basic 
implementation in Java and C++.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10447) [C++][Python] Python compute kernel tests assume C++ is built with utf8proc

2020-10-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10447:
---

 Summary: [C++][Python] Python compute kernel tests assume C++ is 
built with utf8proc
 Key: ARROW-10447
 URL: https://issues.apache.org/jira/browse/ARROW-10447
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Micah Kornfield


Not sure if this is something we want to fix, but I discovered this when 
building without utf8proc install.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10446) [C++][Python] Timezone aware pd.Timestamp's are incorrectly converted to Timestamp arrys

2020-10-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10446:
---

 Summary: [C++][Python] Timezone aware pd.Timestamp's are 
incorrectly converted to Timestamp arrys
 Key: ARROW-10446
 URL: https://issues.apache.org/jira/browse/ARROW-10446
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield
Assignee: Micah Kornfield
 Fix For: 2.0.1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10408) [Java] Upgrade Avro dependency to 1.10

2020-10-27 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10408:
---

 Summary: [Java] Upgrade Avro dependency to 1.10
 Key: ARROW-10408
 URL: https://issues.apache.org/jira/browse/ARROW-10408
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Micah Kornfield
Assignee: Fokko Driesprong






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10367) [Archery] unit tests seem to be broken due to "patch" argument not being on constructor

2020-10-21 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10367:
---

 Summary: [Archery] unit tests seem to be broken due to "patch" 
argument not being on constructor
 Key: ARROW-10367
 URL: https://issues.apache.org/jira/browse/ARROW-10367
 Project: Apache Arrow
  Issue Type: Bug
  Components: Archery
Reporter: Micah Kornfield
Assignee: Micah Kornfield


{{Patch is not a keyword on Version.}}

 

{{[https://github.com/apache/arrow/pull/8475/checks?check_run_id=1290624398]}}

{{cls = , version = '0.17.1'}}{{@classmethod}}
{{ def parse(cls, version):}}
{{ """}}
{{ Parse version string to a VersionInfo instance.}}

{{ :param version: version string}}
{{ :return: a :class:`VersionInfo` instance}}
{{ :raises: :class:`ValueError`}}
{{ :rtype: :class:`VersionInfo`}}

{{ .. versionchanged:: 2.11.0}}
{{ Changed method from static to classmethod to}}
{{ allow subclasses.}}

{{ >>> semver.VersionInfo.parse('3.4.5-pre.2+build.4')}}
{{ VersionInfo(major=3, minor=4, patch=5, \}}
{{ prerelease='pre.2', build='build.4')}}
{{ """}}
{{ match = cls._REGEX.match(ensure_str(version))}}
{{ if match is None:}}
{{ raise ValueError("%s is not valid SemVer string" % version)}}

{{ version_parts = match.groupdict()}}

{{ version_parts["major"] = int(version_parts["major"])}}
{{ version_parts["minor"] = int(version_parts["minor"])}}
{{ version_parts["patch"] = int(version_parts["patch"])}}

{{> return cls(**version_parts)}}
{{E TypeError: __init__() got an unexpected keyword argument 'patch'}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10229) [C++][Parquet] Remove left over ARROW_LOG statement.

2020-10-07 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10229:
---

 Summary: [C++][Parquet] Remove left over ARROW_LOG statement.
 Key: ARROW-10229
 URL: https://issues.apache.org/jira/browse/ARROW-10229
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10203) Capture guidance for endianness support in contributors guide.

2020-10-06 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10203:
---

 Summary: Capture guidance for endianness support in contributors 
guide.
 Key: ARROW-10203
 URL: https://issues.apache.org/jira/browse/ARROW-10203
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Micah Kornfield
Assignee: Micah Kornfield


https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3ccak7z5t--hhhr9dy43pyhd6m-xou4qogwqvlwzsg-koxxjpt...@mail.gmail.com%3e



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10127) [Format] Update specification to support 256-bit Decimal types

2020-09-28 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10127:
---

 Summary: [Format] Update specification to support 256-bit Decimal 
types
 Key: ARROW-10127
 URL: https://issues.apache.org/jira/browse/ARROW-10127
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Reporter: Micah Kornfield
Assignee: Micah Kornfield


This will require a vote to approve merging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10077) [C++] Potential overflow in bit_stream_utils.h multiplication.

2020-09-23 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10077:
---

 Summary: [C++] Potential overflow in bit_stream_utils.h 
multiplication.
 Key: ARROW-10077
 URL: https://issues.apache.org/jira/browse/ARROW-10077
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield


We use a literal "8" for BitsPerByte which is interpretted as int32_t which can 
overflow.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10076) [C++] Use TemporaryDir for all tests that don't already use it.

2020-09-23 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10076:
---

 Summary: [C++] Use TemporaryDir for all tests that don't already 
use it.
 Key: ARROW-10076
 URL: https://issues.apache.org/jira/browse/ARROW-10076
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield


This ensures all files are cleaned up and for some build system it avoid the 
issue of requiring the  ability to write to source/build path when running the 
test



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10075) [C++] Don't use nonstd::nullopt this breaks out vendoring abstraction.

2020-09-23 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10075:
---

 Summary: [C++] Don't use nonstd::nullopt this breaks out vendoring 
abstraction.
 Key: ARROW-10075
 URL: https://issues.apache.org/jira/browse/ARROW-10075
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10074) [C++] Don't use string_view.to_string()

2020-09-23 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10074:
---

 Summary: [C++] Don't use string_view.to_string()
 Key: ARROW-10074
 URL: https://issues.apache.org/jira/browse/ARROW-10074
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield


This makes our non standard string_view incompatible with std::string_view when 
we eventually upgrade to C++17 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10009) [C++] LeastSignficantBitMask has typo in name.

2020-09-14 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10009:
---

 Summary: [C++] LeastSignficantBitMask has typo in name.
 Key: ARROW-10009
 URL: https://issues.apache.org/jira/browse/ARROW-10009
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


We should fix the typo.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9985) [C++][Parquet] Add bitmap based validity bitmap and nested reconstruction

2020-09-12 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9985:
--

 Summary: [C++][Parquet] Add bitmap based validity bitmap and 
nested reconstruction
 Key: ARROW-9985
 URL: https://issues.apache.org/jira/browse/ARROW-9985
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


For low levels of this will likely be  more performant then existing rep/level 
based reconstruction.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9810) [C++][Parquet] Generalize existing null bitmap generation

2020-08-20 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9810:
--

 Summary: [C++][Parquet] Generalize existing null bitmap generation 
 Key: ARROW-9810
 URL: https://issues.apache.org/jira/browse/ARROW-9810
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Right now null bitmap generation assumes only list nesting.  Generalize and 
refactor exisitn code without changing existing functionality to accept 
additional parameters to support arrow nested types:

 

1.  Repeated ancestor def level

2.  Null slot usage (for fixed size lists)

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9796) [C++][Parquet] Support reading FixedSizeLists when null values are present.

2020-08-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9796:
--

 Summary: [C++][Parquet] Support reading FixedSizeLists when null 
values are present.
 Key: ARROW-9796
 URL: https://issues.apache.org/jira/browse/ARROW-9796
 Project: Apache Arrow
  Issue Type: New Feature
  Components: C++
Reporter: Micah Kornfield


This won't be handled on the first pass for ARROW-1644 because FixedSizeLists 
are not very common and some more in depth refactoring is need to support it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9794) [C++] Add functionality to cpu_info to discriminate between Intel vs AMD x86

2020-08-18 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9794:
--

 Summary: [C++] Add functionality to cpu_info to discriminate 
between Intel vs AMD x86
 Key: ARROW-9794
 URL: https://issues.apache.org/jira/browse/ARROW-9794
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield


This is needed to do runtime dispatches for places where pext/pdep can be used. 
 These perform poorly on AMD.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9767) [Python][Parquet] Expose EngineVersion in python arrow reader properties

2020-08-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9767:
--

 Summary: [Python][Parquet] Expose EngineVersion in python arrow 
reader properties
 Key: ARROW-9767
 URL: https://issues.apache.org/jira/browse/ARROW-9767
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield


Probably also pays to have the default selectable by environment variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9766) [C++][Parquet] Add EngineVersion to properties to allow for toggling new vs old logic

2020-08-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9766:
--

 Summary: [C++][Parquet] Add EngineVersion to properties to allow 
for toggling new vs old logic
 Key: ARROW-9766
 URL: https://issues.apache.org/jira/browse/ARROW-9766
 Project: Apache Arrow
  Issue Type: Sub-task
Reporter: Micah Kornfield
Assignee: Micah Kornfield


This will provide an escape hatch in case the new logic some how has unuseable 
bugs in it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9765) [C++][CI][Windows] link errors on windows when using testing::HasSubstr match

2020-08-16 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9765:
--

 Summary: [C++][CI][Windows] link errors on windows when using 
testing::HasSubstr match
 Key: ARROW-9765
 URL: https://issues.apache.org/jira/browse/ARROW-9765
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, CI
Reporter: Micah Kornfield


I tried using using testing::HasSubstring in a test in 
cpp/src/parquet/arrow/arrow_schema_test.cc  and it resulted in appveyor CI 
failing to link on windows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9747) [C++][Java][Format] Support Decimal256 Type

2020-08-14 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9747:
--

 Summary: [C++][Java][Format] Support Decimal256 Type
 Key: ARROW-9747
 URL: https://issues.apache.org/jira/browse/ARROW-9747
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Format, Java
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9710) [C++] Generalize Decimal ToString in preparation for Decimal256

2020-08-12 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9710:
--

 Summary: [C++] Generalize Decimal ToString in preparation for 
Decimal256
 Key: ARROW-9710
 URL: https://issues.apache.org/jira/browse/ARROW-9710
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield
Assignee: Mingyu Zhong


Generalize Decimal ToString method in preparation for introducing Decimal256 
bit type (and other bit widths as needed).  

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9671) Arrow BasicDecimal128 constructor interprets uint64_t integers with highest bit set as negative

2020-08-07 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9671:
--

 Summary: Arrow BasicDecimal128 constructor interprets uint64_t 
integers with highest bit set as negative
 Key: ARROW-9671
 URL: https://issues.apache.org/jira/browse/ARROW-9671
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9614) [Java] JDBC to Arrow converter iterator should reuse the same VectorSchemaRoot

2020-07-31 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9614:
--

 Summary: [Java] JDBC to Arrow converter iterator should reuse the 
same VectorSchemaRoot
 Key: ARROW-9614
 URL: https://issues.apache.org/jira/browse/ARROW-9614
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Micah Kornfield


When originally reviewing the code I suggested a new VectorSchemaRoot on each 
call to the iterator.  After further discussions on the mailing list, it seems 
that this is an anit-pattern for working with VectorSchemaRoot, we should 
update the code to update a single VectorSchemaRoot.

 

After this change it should be easier to use JDBC converter with other 
components of the library (i.e. filewriter) which also make use of a single 
VectorSchemaRoot.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9613) [Java] Avro to Arrow converter should reuse the same VectorSchemaRoot

2020-07-31 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9613:
--

 Summary: [Java] Avro to Arrow converter should reuse the same 
VectorSchemaRoot
 Key: ARROW-9613
 URL: https://issues.apache.org/jira/browse/ARROW-9613
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Java
Reporter: Micah Kornfield


When originally reviewing the code I suggested a new VectorSchemaRoot on each 
call to the iterator.  After further discussions on the mailing list, it seems 
that this is an anit-pattern for working with VectorSchemaRoot, we should 
update the code to own a single vectorschemaroot and return it each time with 
new records.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9603) [C++][Parquet] Write Arrow relies on unspecified behavior for nested types

2020-07-30 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9603:
--

 Summary: [C++][Parquet] Write Arrow relies on unspecified behavior 
for nested types
 Key: ARROW-9603
 URL: https://issues.apache.org/jira/browse/ARROW-9603
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Micah Kornfield


parquet/column_writer.cc WriteArrow implementations at certain points checks 
null counts/required data and passes through the null bitmap for encoding.  
This only works for nested data types if the if the null slot on a parent 
implies a null slot on the leaf.  This relationship is not required by the 
specifications.

 

Most paths for creating arrays follow this pattern so it would be esoteric to 
hit this bug, but we should still fix it.

 

All branches that rely on reading nullness should generate a new null bitmap 
based on definition levels if the column is nested, and decisions should be 
based off of that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9598) [C++][Parquet] Spaced definition levels is not assigned correctly.

2020-07-29 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9598:
--

 Summary: [C++][Parquet]  Spaced definition levels is not assigned 
correctly.
 Key: ARROW-9598
 URL: https://issues.apache.org/jira/browse/ARROW-9598
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


The existing code assumes that there is only a single repeated parent.  Code 
needs to backtrack until null or or a repeated parent.  Unfortunately without 
ability to read path that can read mixed struct/repeated values we can't fully 
test the fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9528) [Python] Honor tzinfo information when converting from datetime to pyarrow

2020-07-20 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9528:
--

 Summary: [Python] Honor tzinfo information when converting from 
datetime to pyarrow
 Key: ARROW-9528
 URL: https://issues.apache.org/jira/browse/ARROW-9528
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Micah Kornfield
 Fix For: 1.0.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9310) Use feature enum in java

2020-07-02 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9310:
--

 Summary: Use feature enum in java
 Key: ARROW-9310
 URL: https://issues.apache.org/jira/browse/ARROW-9310
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9311) [Javascript] Use feature enum in javascript

2020-07-02 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9311:
--

 Summary: [Javascript] Use feature enum in javascript
 Key: ARROW-9311
 URL: https://issues.apache.org/jira/browse/ARROW-9311
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Java
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9309) Start writing out feature enums to value (umbrella issue)

2020-07-02 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9309:
--

 Summary: Start writing out feature enums to value (umbrella issue)
 Key: ARROW-9309
 URL: https://issues.apache.org/jira/browse/ARROW-9309
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Micah Kornfield


Proposed logic:

1.  Add flag where appropriate for supports dictionary replacement if there is 
a possibility it can be used.

2.  Only add compressed buffers when requested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9314) [Go] Use Feature enum

2020-07-02 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9314:
--

 Summary: [Go] Use Feature enum
 Key: ARROW-9314
 URL: https://issues.apache.org/jira/browse/ARROW-9314
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Go
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9313) [Rust] Use feature enum

2020-07-02 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9313:
--

 Summary: [Rust] Use feature enum
 Key: ARROW-9313
 URL: https://issues.apache.org/jira/browse/ARROW-9313
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9312) [C++] Use feature enum

2020-07-02 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9312:
--

 Summary: [C++] Use feature enum
 Key: ARROW-9312
 URL: https://issues.apache.org/jira/browse/ARROW-9312
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9308) Add Feature enum to schema.fbs for forward compatibity

2020-07-02 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9308:
--

 Summary: Add Feature enum to schema.fbs for forward compatibity
 Key: ARROW-9308
 URL: https://issues.apache.org/jira/browse/ARROW-9308
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Format
Reporter: Micah Kornfield
Assignee: Micah Kornfield






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9264) [C++] Cleanup Parquet Arrow Schema code

2020-06-28 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9264:
--

 Summary: [C++] Cleanup Parquet Arrow Schema code
 Key: ARROW-9264
 URL: https://issues.apache.org/jira/browse/ARROW-9264
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


We need a function/class that can take the parquet schema and a proposed arrow 
schema (potentially retrieved from parquet metadata) and outputs a data 
structure that contains, all of the information in "SchemaField" and the 
following additional options:

 

1.  Corresponding Definition level for nullability (wouldn't be populated for 
non-null arrays).

2.  Correspond Repetition level for lists (wouldn't be populated for for 
non-lists).

3.  Definition level for "empty lists".  (wouldn't be populated for legacy two 
level encoded lists).

 

One option is to augment and populate these on the SchemaField.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9223) Fix to_pandas() export for timestamps within structs

2020-06-24 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9223:
--

 Summary: Fix to_pandas() export for timestamps within structs
 Key: ARROW-9223
 URL: https://issues.apache.org/jira/browse/ARROW-9223
 Project: Apache Arrow
  Issue Type: Bug
Reporter: Micah Kornfield


Currently timestamps within structs unilaterally have their timezone discarded 
for backwards compatibility reasons.  There is a TODO in the code to come up 
with a better solution.  This Jira tracks the solution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-7955) [Java] Support large buffer for file/stream IPC

2020-06-11 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-7955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-7955.

Resolution: Fixed

> [Java] Support large buffer for file/stream IPC
> ---
>
> Key: ARROW-7955
> URL: https://issues.apache.org/jira/browse/ARROW-7955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> After supporting 64-bit ArrowBuf, we need to make file/stream IPC work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-9049) [C++] Add a Result<> returning method for for constructing a dictionary

2020-06-06 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-9049:
--

 Summary: [C++] Add a Result<> returning method for for 
constructing a dictionary
 Key: ARROW-9049
 URL: https://issues.apache.org/jira/browse/ARROW-9049
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++
Reporter: Micah Kornfield
Assignee: Micah Kornfield


Dictionary types require a signed integer index type.  Today there is a DCHECK 
that this is the case in the constructor.  

When reading data from an unknown source it is possible due to corruption (or 
user error) that the dictionary index type is not signed. We should add a 
method that checks for signedness and use that at all system boundaries to 
validate input data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-9039) py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions

2020-06-04 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126325#comment-17126325
 ] 

Micah Kornfield edited comment on ARROW-9039 at 6/5/20, 2:59 AM:
-

Thank you for the report.  This is intended behavior, the documentation was 
clarified I think as of 0.16 or 0.15 
([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]).
  Serialize/Deserialize do not provide backward compatibility.  You need to you 
use IPC functionality 
([https://arrow.apache.org/docs/python/ipc.html#streaming-serialization-and-ipc])
 for compatibility guarantees (0.11 is quite old but I don't think anything 
should have been broken between versions).


was (Author: emkornfi...@gmail.com):
Thank you for the report.  This is intended behavior, the documentation was 
clarified I think as of 0.16 or 0.15 
([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]).
  Serialize/Deserialize do not provide backward compatibility.  You need to you 
use IPC functionality for compatibility guarantees (0.11 is quite old but I 
don't think anything should have been broken between versions).

> py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent 
> versions
> -
>
> Key: ARROW-9039
> URL: https://issues.apache.org/jira/browse/ARROW-9039
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.15.1
> Environment: python, windows
>Reporter: Yoav Git
>Priority: Minor
>
> I have been saving dataframes into mongodb using:
> {{import pandas as pd; import pyarrow as pa}}
> {{df = pd.DataFrame([[1,2,3],[2,3,4]], columns = ['x','y','z'])}}
> {{byte = pa.serialize(df).to_buffer().to_pybytes()}}
> and then reading back using:
> {{df = pa.deserialize(pa.py_buffer(memoryview(byte)))}}
> However, pyarrow is not back-compatible. i.e. both versions 0.11.1 and 0.15.1 
> can read their own pybytes created by it. Alas, they cannot read each other. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-9039) py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions

2020-06-04 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126325#comment-17126325
 ] 

Micah Kornfield edited comment on ARROW-9039 at 6/5/20, 2:58 AM:
-

Thank you for the report.  This is intended behavior, the documentation was 
clarified I think as of 0.16 or 0.15 
([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]).
  Serialize/Deserialize do not provide backward compatibility.  You need to you 
use IPC functionality for compatibility guarantees (0.11 is quite old but I 
don't think anything should have been broken between versions).


was (Author: emkornfi...@gmail.com):
This is intended behavior, the documentation was clarified I think as of 0.16 
or 0.15 
([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]).
  Serialize/Deserialize do not provide backward compatibility.  You need to you 
use IPC functionality for compatibility guarantees (0.11 is quite old but I 
don't think anything should have been broken between versions).

> py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent 
> versions
> -
>
> Key: ARROW-9039
> URL: https://issues.apache.org/jira/browse/ARROW-9039
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.15.1
> Environment: python, windows
>Reporter: Yoav Git
>Priority: Minor
>
> I have been saving dataframes into mongodb using:
> {{import pandas as pd; import pyarrow as pa}}
> {{df = pd.DataFrame([[1,2,3],[2,3,4]], columns = ['x','y','z'])}}
> {{byte = pa.serialize(df).to_buffer().to_pybytes()}}
> and then reading back using:
> {{df = pa.deserialize(pa.py_buffer(memoryview(byte)))}}
> However, pyarrow is not back-compatible. i.e. both versions 0.11.1 and 0.15.1 
> can read their own pybytes created by it. Alas, they cannot read each other. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-9039) py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent versions

2020-06-04 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-9039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17126325#comment-17126325
 ] 

Micah Kornfield commented on ARROW-9039:


This is intended behavior, the documentation was clarified I think as of 0.16 
or 0.15 
([https://arrow.apache.org/docs/python/generated/pyarrow.serialize.html#pyarrow.serialize]).
  Serialize/Deserialize do not provide backward compatibility.  You need to you 
use IPC functionality for compatibility guarantees (0.11 is quite old but I 
don't think anything should have been broken between versions).

> py_bytes created by pyarrow 0.11.1 cannot be deserialized by more recent 
> versions
> -
>
> Key: ARROW-9039
> URL: https://issues.apache.org/jira/browse/ARROW-9039
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.11.1, 0.15.1
> Environment: python, windows
>Reporter: Yoav Git
>Priority: Minor
>
> I have been saving dataframes into mongodb using:
> {{import pandas as pd; import pyarrow as pa}}
> {{df = pd.DataFrame([[1,2,3],[2,3,4]], columns = ['x','y','z'])}}
> {{byte = pa.serialize(df).to_buffer().to_pybytes()}}
> and then reading back using:
> {{df = pa.deserialize(pa.py_buffer(memoryview(byte)))}}
> However, pyarrow is not back-compatible. i.e. both versions 0.11.1 and 0.15.1 
> can read their own pybytes created by it. Alas, they cannot read each other. 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-4144) [Java] Arrow-to-JDBC

2020-06-02 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123402#comment-17123402
 ] 

Micah Kornfield edited comment on ARROW-4144 at 6/2/20, 6:56 AM:
-

[~uwe] have you come across a use-case for writing to JDBC sources?


was (Author: emkornfi...@gmail.com):
@uwe have you come across a use-case for writing to JDBC sources?

> [Java] Arrow-to-JDBC
> 
>
> Key: ARROW-4144
> URL: https://issues.apache.org/jira/browse/ARROW-4144
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Chen
>Priority: Major
>
> ARROW-1780 reads a query from a JDBC data source and converts the ResultSet 
> to an Arrow VectorSchemaRoot.  However, there is no built-in adapter for 
> writing an Arrow VectorSchemaRoot back to the database.
> ARROW-3966 adds JDBC field metadata:
>  * The Catalog Name
>  * The Table Name
>  * The Field Name
>  * The Field Type
> We can use this information to ask for the field information from the 
> database via the 
> [DatabaseMetaData|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html]
>  object.  We can then create INSERT or UPDATE statements based on the [list 
> of primary 
> keys|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)]
>  in the table:
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> NULL, insert that record into the database.
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> not NULL, update the existing record in the database.
> We can also perform the same data conversion in reverse based on the field 
> types queried from the database.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4144) [Java] Arrow-to-JDBC

2020-06-02 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17123402#comment-17123402
 ] 

Micah Kornfield commented on ARROW-4144:


@uwe have you come across a use-case for writing to JDBC sources?

> [Java] Arrow-to-JDBC
> 
>
> Key: ARROW-4144
> URL: https://issues.apache.org/jira/browse/ARROW-4144
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Java
>Reporter: Michael Pigott
>Assignee: Chen
>Priority: Major
>
> ARROW-1780 reads a query from a JDBC data source and converts the ResultSet 
> to an Arrow VectorSchemaRoot.  However, there is no built-in adapter for 
> writing an Arrow VectorSchemaRoot back to the database.
> ARROW-3966 adds JDBC field metadata:
>  * The Catalog Name
>  * The Table Name
>  * The Field Name
>  * The Field Type
> We can use this information to ask for the field information from the 
> database via the 
> [DatabaseMetaData|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html]
>  object.  We can then create INSERT or UPDATE statements based on the [list 
> of primary 
> keys|https://docs.oracle.com/javase/7/docs/api/java/sql/DatabaseMetaData.html#getPrimaryKeys(java.lang.String,%20java.lang.String,%20java.lang.String)]
>  in the table:
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> NULL, insert that record into the database.
>  * If the value in the VectorSchemaRoot corresponding to the primary key is 
> not NULL, update the existing record in the database.
> We can also perform the same data conversion in reverse based on the field 
> types queried from the database.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-8972) [Java] Support range value comparison for large varchar/varbinary vectors

2020-06-02 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-8972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield resolved ARROW-8972.

Resolution: Fixed

> [Java] Support range value comparison for large varchar/varbinary vectors
> -
>
> Key: ARROW-8972
> URL: https://issues.apache.org/jira/browse/ARROW-8972
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Java
>Reporter: Liya Fan
>Assignee: Liya Fan
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Support comparing a range of values for LargeVarCharVector and 
> LargeVarBinaryVector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-9000) Java build crashes with JDK14

2020-06-02 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-9000:
---
Component/s: Java

> Java build crashes with JDK14
> -
>
> Key: ARROW-9000
> URL: https://issues.apache.org/jira/browse/ARROW-9000
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current master tree does not build with JDK14. The issue seems to be caused 
> by error prone plugin:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.6.2:compile 
> (default-compile) on project arrow-memory: Compilation failure
> [ERROR] 
> /Users/laurent/devel/arrow/java/memory/src/main/java/org/apache/arrow/memory/BufferLedger.java:[545,15]
>  error: An unhandled exception was thrown by the Error Prone static analysis 
> plugin.
> [ERROR]  Please report this at 
> https://github.com/google/error-prone/issues/new and include the following:
> [ERROR]   
> [ERROR]  error-prone version: 2.3.3
> [ERROR]  BugPattern: TypeParameterUnusedInFormals
> [ERROR]  Stack Trace:
> [ERROR]  java.lang.NoSuchFieldError: bound
> [ERROR]   at 
> com.google.errorprone.bugpatterns.TypeParameterUnusedInFormals.matchMethod(TypeParameterUnusedInFormals.java:71)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.processMatchers(ErrorProneScanner.java:433)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:725)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:150)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.tree.JCTree$JCMethodDecl.accept(JCTree.java:916)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:90)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.visitClass(TreeScanner.java:187)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:535)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:150)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.tree.JCTree$JCClassDecl.accept(JCTree.java:823)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.visitCompilationUnit(TreeScanner.java:144)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:546)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:150)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:603)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:56)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:55)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScannerTransformer.apply(ErrorProneScannerTransformer.java:43)
> [ERROR]   at 
> com.google.errorprone.ErrorProneAnalyzer.finished(ErrorProneAnalyzer.java:151)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.api.MultiTaskListener.finished(MultiTaskListener.java:132)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1423)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1370)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:959)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:316)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:176)
> [ERROR]   at 

[jira] [Updated] (ARROW-9000) [Java] build crashes with JDK14

2020-06-02 Thread Micah Kornfield (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-9000:
---
Summary: [Java] build crashes with JDK14  (was: Java build crashes with 
JDK14)

> [Java] build crashes with JDK14
> ---
>
> Key: ARROW-9000
> URL: https://issues.apache.org/jira/browse/ARROW-9000
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Java
>Reporter: Laurent Goujon
>Assignee: Laurent Goujon
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Current master tree does not build with JDK14. The issue seems to be caused 
> by error prone plugin:
> {noformat}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.6.2:compile 
> (default-compile) on project arrow-memory: Compilation failure
> [ERROR] 
> /Users/laurent/devel/arrow/java/memory/src/main/java/org/apache/arrow/memory/BufferLedger.java:[545,15]
>  error: An unhandled exception was thrown by the Error Prone static analysis 
> plugin.
> [ERROR]  Please report this at 
> https://github.com/google/error-prone/issues/new and include the following:
> [ERROR]   
> [ERROR]  error-prone version: 2.3.3
> [ERROR]  BugPattern: TypeParameterUnusedInFormals
> [ERROR]  Stack Trace:
> [ERROR]  java.lang.NoSuchFieldError: bound
> [ERROR]   at 
> com.google.errorprone.bugpatterns.TypeParameterUnusedInFormals.matchMethod(TypeParameterUnusedInFormals.java:71)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.processMatchers(ErrorProneScanner.java:433)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:725)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitMethod(ErrorProneScanner.java:150)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.tree.JCTree$JCMethodDecl.accept(JCTree.java:916)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:90)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.visitClass(TreeScanner.java:187)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:535)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitClass(ErrorProneScanner.java:150)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.tree.JCTree$JCClassDecl.accept(JCTree.java:823)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:82)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:71)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:45)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scan(TreeScanner.java:105)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.scanAndReduce(TreeScanner.java:113)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreeScanner.visitCompilationUnit(TreeScanner.java:144)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:546)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScanner.visitCompilationUnit(ErrorProneScanner.java:150)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.tree.JCTree$JCCompilationUnit.accept(JCTree.java:603)
> [ERROR]   at 
> jdk.compiler/com.sun.source.util.TreePathScanner.scan(TreePathScanner.java:56)
> [ERROR]   at com.google.errorprone.scanner.Scanner.scan(Scanner.java:55)
> [ERROR]   at 
> com.google.errorprone.scanner.ErrorProneScannerTransformer.apply(ErrorProneScannerTransformer.java:43)
> [ERROR]   at 
> com.google.errorprone.ErrorProneAnalyzer.finished(ErrorProneAnalyzer.java:151)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.api.MultiTaskListener.finished(MultiTaskListener.java:132)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1423)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.JavaCompiler.flow(JavaCompiler.java:1370)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.JavaCompiler.compile(JavaCompiler.java:959)
> [ERROR]   at 
> jdk.compiler/com.sun.tools.javac.main.Main.compile(Main.java:316)
> [ERROR]   at 
> 

  1   2   3   4   5   6   7   8   9   10   >