[jira] [Created] (ARROW-10203) Capture guidance for endianness support in contributors guide.

2020-10-06 Thread Micah Kornfield (Jira)
Micah Kornfield created ARROW-10203:
---

 Summary: Capture guidance for endianness support in contributors 
guide.
 Key: ARROW-10203
 URL: https://issues.apache.org/jira/browse/ARROW-10203
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Documentation
Reporter: Micah Kornfield
Assignee: Micah Kornfield


https://mail-archives.apache.org/mod_mbox/arrow-dev/202009.mbox/%3ccak7z5t--hhhr9dy43pyhd6m-xou4qogwqvlwzsg-koxxjpt...@mail.gmail.com%3e



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10202) [CI][Windows] Use sf.net mirror for MSYS2

2020-10-06 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-10202:


 Summary: [CI][Windows] Use sf.net mirror for MSYS2
 Key: ARROW-10202
 URL: https://issues.apache.org/jira/browse/ARROW-10202
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10201) [C++][CI] Disable S3 in arm64 job on Travis CI

2020-10-06 Thread Kouhei Sutou (Jira)
Kouhei Sutou created ARROW-10201:


 Summary: [C++][CI] Disable S3 in arm64 job on Travis CI
 Key: ARROW-10201
 URL: https://issues.apache.org/jira/browse/ARROW-10201
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++, Continuous Integration
Reporter: Kouhei Sutou
Assignee: Kouhei Sutou






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10200) [Java][CI] Fix failure of Java CI on s390x

2020-10-06 Thread Kazuaki Ishizaki (Jira)
Kazuaki Ishizaki created ARROW-10200:


 Summary: [Java][CI] Fix failure of Java CI on s390x
 Key: ARROW-10200
 URL: https://issues.apache.org/jira/browse/ARROW-10200
 Project: Apache Arrow
  Issue Type: Bug
Affects Versions: 2.0.0
Reporter: Kazuaki Ishizaki


ARROW-9701 causes the following failure due to missing a library 
{{libprotobuf.so.18}}.
{code:java}
...
[ERROR] PROTOC FAILED: 
/arrow/java/flight/flight-core/target/protoc-plugins/protoc-3.7.1-linux-s390_64.exe:
 error while loading shared libraries: libprotobuf.so.18: cannot open shared 
object file: No such file or directory
[ERROR] /arrow/java/flight/flight-core/../../../format/Flight.proto [0:0]: 
/arrow/java/flight/flight-core/target/protoc-plugins/protoc-3.7.1-linux-s390_64.exe:
 error while loading shared libraries: libprotobuf.so.18: cannot open shared 
object file: No such file or directory
...
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10199) [Rust][Parquet] Release Parquet at crates.io to remove debug prints

2020-10-06 Thread Jira
Krzysztof Stanisławek created ARROW-10199:
-

 Summary: [Rust][Parquet] Release Parquet at crates.io to remove 
debug prints
 Key: ARROW-10199
 URL: https://issues.apache.org/jira/browse/ARROW-10199
 Project: Apache Arrow
  Issue Type: Wish
  Components: Rust
Affects Versions: 1.0.1
Reporter: Krzysztof Stanisławek


Version of Parquet released to docs.rs & crates.io has debug prints in 
[https://docs.rs/crate/parquet/1.0.1/source/src/column/writer.rs] (line 30). 
They were pretty hard to track down, so I suggest considering logging create in 
the future. When is the new version going to be released? Is there some stable 
schedule I can expect?

Is it recommended to use the current snapshot straight from github instead of 
crates.io?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10198) [Dev] Python merge script doesn't close PRs if not merged on master

2020-10-06 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-10198:
--

 Summary: [Dev] Python merge script doesn't close PRs if not merged 
on master
 Key: ARROW-10198
 URL: https://issues.apache.org/jira/browse/ARROW-10198
 Project: Apache Arrow
  Issue Type: Bug
  Components: Developer Tools
Affects Versions: 1.0.1
Reporter: Neville Dipale


When using the merge script to merge PRs against non-master branches, the PR on 
Github doesn't get closed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10197) [Gandiva][python] Execute expression on filtered data

2020-10-06 Thread Kirill Lykov (Jira)
Kirill Lykov created ARROW-10197:


 Summary: [Gandiva][python] Execute expression on filtered data
 Key: ARROW-10197
 URL: https://issues.apache.org/jira/browse/ARROW-10197
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++ - Gandiva, Python
Reporter: Kirill Lykov


Looks like there is no way to execute an expression on filtered data in python. 
Basically, I cannot pass `SelectionVector` to projector's `evaluate` method

```python
import pyarrow as pa
import pyarrow.gandiva as gandiva

table = pa.Table.from_arrays([pa.array([1., 31., 46., 3., 57., 44., 22.]),
                                  pa.array([5., 45., 36., 73.,
                                            83., 23., 76.])],
                                 ['a', 'b'])

builder = gandiva.TreeExprBuilder()
node_a = builder.make_field(table.schema.field("a"))
node_b = builder.make_field(table.schema.field("b"))
fifty = builder.make_literal(50.0, pa.float64())
eleven = builder.make_literal(11.0, pa.float64())

cond_1 = builder.make_function("less_than", [node_a, fifty], pa.bool_())
cond_2 = builder.make_function("greater_than", [node_a, node_b],
                                   pa.bool_())
cond_3 = builder.make_function("less_than", [node_b, eleven], pa.bool_())
cond = builder.make_or([builder.make_and([cond_1, cond_2]), cond_3])
condition = builder.make_condition(cond)

filter = gandiva.make_filter(table.schema, condition)
# filterResult has type SelectionVector
filterResult = filter.evaluate(table.to_batches()[0], pa.default_memory_pool())
print(result)

sum = builder.make_function("add", [node_a, node_b], pa.float64())
field_result = pa.field("c", pa.float64())
expr = builder.make_expression(sum, field_result)
projector = gandiva.make_projector(
        table.schema, [expr], pa.default_memory_pool())

### Here there is a problem that I don't know how to use filterResult with 
projector
r, = projector.evaluate(table.to_batches()[0], result)
```

In C++, I see that it is possible to pass SelectionVector as second argument to 
projector::Evaluate: 
[https://github.com/apache/arrow/blob/c5fa23ea0e15abe47b35524fa6a79c7b8c160fa0/cpp/src/gandiva/tests/filter_project_test.cc#L270]
 
Meanwhile, it looks like it is impossible in `gandiva.pyx`: 
[https://github.com/apache/arrow/blob/a4eb08d54ee0d4c0d0202fa0a2dfa8af7aad7a05/python/pyarrow/gandiva.pyx#L154]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10196) [C++] Add Future::DeferNotOk()

2020-10-06 Thread Ben Kietzman (Jira)
Ben Kietzman created ARROW-10196:


 Summary: [C++] Add Future::DeferNotOk()
 Key: ARROW-10196
 URL: https://issues.apache.org/jira/browse/ARROW-10196
 Project: Apache Arrow
  Issue Type: Improvement
  Components: C++
Affects Versions: 1.0.1
Reporter: Ben Kietzman
Assignee: Ben Kietzman
 Fix For: 2.0.0


Provide a static method mapping Result> -> Future. If the Result 
is an error, a finished future containing its Status will be constructed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10195) [C++] Add string struct extract kernel using re2

2020-10-06 Thread Maarten Breddels (Jira)
Maarten Breddels created ARROW-10195:


 Summary: [C++] Add string struct extract kernel using re2
 Key: ARROW-10195
 URL: https://issues.apache.org/jira/browse/ARROW-10195
 Project: Apache Arrow
  Issue Type: New Feature
Reporter: Maarten Breddels
Assignee: Maarten Breddels


Similar to Pandas' str.extract a way to convert a string to a struct of strings 
using the re2 regex library (when having named captured groups). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10194) [Python] Array.to_numpy() with type fixed_size_list(int64(), 1) doesn't roundtrip for large integer values

2020-10-06 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-10194:
---

 Summary: [Python] Array.to_numpy() with type 
fixed_size_list(int64(), 1) doesn't roundtrip for large integer values
 Key: ARROW-10194
 URL: https://issues.apache.org/jira/browse/ARROW-10194
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Krisztian Szucs


Reproducer:

{code:python}
data = [None, [9007199254740993]]
arr = pa.array(data, type=pa.list_(pa.uint64(), 1))
ndarray = arr.to_numpy(zero_copy_only=False)
restored = pa.array(ndarray, type=arr.type)
assert restored.equals(arr)
{code}

Error:

{code}
E   assert False
E+  where False = (\n[\n  
null,\n  [\n90071992547409944\n  ]\n])
E+where  = \n[\n  null,\n  [\n90071992547409952\n  ]\n].equals
{code}

The inner numpy array ({{ndarray[1]}}) has float64 dtype where the integer gets 
truncated because of the precision.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10193) [Python] Segfault when converting to fixed size binary array

2020-10-06 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-10193:
---

 Summary: [Python] Segfault when converting to fixed size binary 
array
 Key: ARROW-10193
 URL: https://issues.apache.org/jira/browse/ARROW-10193
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Krisztian Szucs
Assignee: Krisztian Szucs
 Fix For: 2.0.0


Reproducer:
{code:python}
 data = [b'\x19h\r\x9e\x00\x00\x00\x00\x01\x9b\x9fA']
assert len(data[0]) == 12
ty = pa.binary(12)
arr = pa.array(data, type=ty)
{code}

Trace:
{code}
pyarrow/tests/test_convert_builtin.py::test_fixed_size_binary_length_check 
../src/arrow/array/builder_binary.cc:53:  Check failed: (size) == (byte_width_) 
Appending wrong size to FixedSizeBinaryBuilder
0   libarrow.200.0.0.dylib  0x00010e7f9704 
_ZN5arrow4util7CerrLog14PrintBackTraceEv + 52
1   libarrow.200.0.0.dylib  0x00010e7f9622 
_ZN5arrow4util7CerrLogD2Ev + 98
2   libarrow.200.0.0.dylib  0x00010e7f9585 
_ZN5arrow4util7CerrLogD1Ev + 21
3   libarrow.200.0.0.dylib  0x00010e7f95ac 
_ZN5arrow4util7CerrLogD0Ev + 28
4   libarrow.200.0.0.dylib  0x00010e7f9492 
_ZN5arrow4util8ArrowLogD2Ev + 82
5   libarrow.200.0.0.dylib  0x00010e7f94c5 
_ZN5arrow4util8ArrowLogD1Ev + 21
6   libarrow.200.0.0.dylib  0x00010e303ec1 
_ZN5arrow22FixedSizeBinaryBuilder14CheckValueSizeEx + 209
7   libarrow.200.0.0.dylib  0x00010e30c361 
_ZN5arrow22FixedSizeBinaryBuilder12UnsafeAppendEN6nonstd7sv_lite17basic_string_viewIcNSt3__111char_traitsIc
 + 49
8   libarrow_python.200.0.0.dylib   0x00010b4efa7d 
_ZN5arrow2py20PyPrimitiveConverterINS_19FixedSizeBinaryTypeEvE6AppendEP7_object 
+ 813
{code}

The input {{const char*}} value gets implicitly casted to string_view which 
makes the length check fail in debug builds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10192) [C++][Python] Segfault when converting nested struct array with dictionary field to pandas series

2020-10-06 Thread Krisztian Szucs (Jira)
Krisztian Szucs created ARROW-10192:
---

 Summary: [C++][Python] Segfault when converting nested struct 
array with dictionary field to pandas series
 Key: ARROW-10192
 URL: https://issues.apache.org/jira/browse/ARROW-10192
 Project: Apache Arrow
  Issue Type: Bug
  Components: C++, Python
Reporter: Krisztian Szucs


Reproducer:

{code:python}
def test_struct_array_with_dictionary_field_to_pandas():
ty = pa.struct([
pa.field('dict', pa.dictionary(pa.int64(), pa.int32())),
])
data = [
{'dict': -1859762450}
]
arr = pa.array(data, type=ty)
arr.to_pandas()
{code}

Raises SIGSTOP:
{code}
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x7fff6e2b733a libsystem_kernel.dylib`__pthread_kill + 10
frame #1: 0x7fff6e373e60 libsystem_pthread.dylib`pthread_kill + 430
frame #2: 0x7fff6e1ce93e libsystem_c.dylib`raise + 26
frame #3: 0x7fff6e3685fd libsystem_platform.dylib`_sigtramp + 29
frame #4: 0x00011517adfd 
libarrow_python.200.0.0.dylib`arrow::py::ConvertStruct(options=0x7f84fc5a0230,
 data=0x7f84fc59ef18, out_values=0x7f84fc53d140) at 
arrow_to_pandas.cc:685:54
frame #5: 0x00011514c642 
libarrow_python.200.0.0.dylib`arrow::py::ObjectWriterVisitor::Visit(this=0x7ffee06a1a88,
 type=0x7f84fc5a00e8) at arrow_to_pandas.cc:1031:12
frame #6: 0x0001151499c4 libarrow_python.200.0.0.dylib`arrow::Status 
arrow::VisitTypeInline(type=0x7f84fc5a00e8, 
visitor=0x7ffee06a1a88) at visitor_inline.h:88:5
frame #7: 0x000115149305 
libarrow_python.200.0.0.dylib`arrow::py::ObjectWriter::CopyInto(this=0x7f84fc5a0228,
 data=std::__1::shared_ptr::element_type @ 
0x7f84fc59ef18 strong=2 weak=1, rel_placement=0) at arrow_to_pand
as.cc:1055:12
{code}

{code:cpp}
frame #4: 0x00011517adfd 
libarrow_python.200.0.0.dylib`arrow::py::ConvertStruct(options=0x7f84fc5a0230,
 data=0x7f84fc59ef18, out_values=0x7f84fc53d140) at 
arrow_to_pandas.cc:685:54
   682if (!arr->field(static_cast(field_idx))->IsNull(i)) {
   683  // Value exists in child array, obtain it
   684  auto array = 
reinterpret_cast(fields_data[field_idx].obj());
-> 685  auto ptr = reinterpret_cast(PyArray_GETPTR1(array, i));
   686  field_value.reset(PyArray_GETITEM(array, ptr));
   687  RETURN_IF_PYERROR();
   688} else {
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10191) [Rust] [Parquet] Add roundtrip tests for single column batches

2020-10-06 Thread Neville Dipale (Jira)
Neville Dipale created ARROW-10191:
--

 Summary: [Rust] [Parquet] Add roundtrip tests for single column 
batches
 Key: ARROW-10191
 URL: https://issues.apache.org/jira/browse/ARROW-10191
 Project: Apache Arrow
  Issue Type: Sub-task
  Components: Rust
Affects Versions: 1.0.1
Reporter: Neville Dipale


To aid with test coverage and picking up information loss during Parquet and 
Arrow roundtrips, we can add tests that assert that all supported Arrow 
datatypes can be written and read correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-10190) [Website] Add Jorge to list of committers

2020-10-06 Thread Jira
Jorge Leitão created ARROW-10190:


 Summary: [Website] Add Jorge to list of committers
 Key: ARROW-10190
 URL: https://issues.apache.org/jira/browse/ARROW-10190
 Project: Apache Arrow
  Issue Type: Improvement
Reporter: Jorge Leitão
Assignee: Jorge Leitão






--
This message was sent by Atlassian Jira
(v8.3.4#803005)