[jira] [Commented] (ARROW-6566) Implement VarChar in Scala

2019-09-18 Thread Micah Kornfield (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932144#comment-16932144
 ] 

Micah Kornfield commented on ARROW-6566:


It could be useful if you clarify how this is failing.  One thing that springs 
to mind is you potentially want to use 
[setSafe|[https://arrow.apache.org/docs/java/org/apache/arrow/vector/BaseVariableWidthVector.html#setSafe-int-byte:A-]]
 instead of set

> Implement VarChar in Scala
> --
>
> Key: ARROW-6566
> URL: https://issues.apache.org/jira/browse/ARROW-6566
> Project: Apache Arrow
>  Issue Type: Test
>  Components: Java
>Affects Versions: 0.14.1
>Reporter: Boris V.Kuznetsov
>Priority: Major
>
> Hello
> I'm trying to write and read a zio.Chunk of strings, with is essentially an 
> array of strings.
> My implementation fails the test, how should I fix that ?
> [Writer|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/main/scala/zio/serdes/arrow/ArrowUtils.scala#L48]
>  code
> [Reader|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/main/scala/zio/serdes/arrow/ArrowUtils.scala#L108]
>  code
> [Test|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/test/scala/arrow/Base.scala#L115]
>  code
> Any help, links and advice are highly appreciated
> Thank you!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6578) [Python] Casting int64 to string columns

2019-09-18 Thread Igor Yastrebov (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932161#comment-16932161
 ] 

Igor Yastrebov commented on ARROW-6578:
---

[~pitrou] When I benchmarked it on 16 csv files ~200 MB in size, using 
read+cast(safe=False) was >10% faster than read with ConvertOptions.

This doesn't account for string->int64->string conversions since they aren't 
implemented :)

> [Python] Casting int64 to string columns
> 
>
> Key: ARROW-6578
> URL: https://issues.apache.org/jira/browse/ARROW-6578
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Affects Versions: 0.14.1
>Reporter: Igor Yastrebov
>Priority: Major
>
> I wanted to cast a list of a tables to the same schema so I could use 
> concat_tables later. However, I encountered ArrowNotImplementedError:
> {code:java}
> ---
> ArrowNotImplementedError  Traceback (most recent call last)
>  in 
> > 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb]
>  in (.0)
> > 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb]
> ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi
>  in itercolumns()
> ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi
>  in pyarrow.lib.Column.cast()
> ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\error.pxi
>  in pyarrow.lib.check_status()
> ArrowNotImplementedError: No cast implemented from int64 to string
> {code}
> Some context: I want to read and concatenate a bunch of csv files that come 
> from partitioning of the same table. Using cast after reading csv is usually 
> significantly faster than specifying column_types in ConvertOptions. There 
> are string columns that are mostly populated with integer-like values so a 
> particular file can have an integer-only column. This situation is rather 
> common so having an option to cast int64 column to string column would be 
> helpful.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6003) [C++] Better input validation and error messaging in CSV reader

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932188#comment-16932188
 ] 

Antoine Pitrou commented on ARROW-6003:
---

Can you post the first lines of the CSV file?

> [C++] Better input validation and error messaging in CSV reader
> ---
>
> Key: ARROW-6003
> URL: https://issues.apache.org/jira/browse/ARROW-6003
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: csv
>
> Followup to https://issues.apache.org/jira/browse/ARROW-5747. The error 
> message(s) are not great when you give bad input. For example, if I give too 
> many or too few {{column_names}}, the error I get is {{Invalid: Empty CSV 
> file}}. In fact, that's about the only error message I've seen from the CSV 
> reader, no matter what I've thrown at it.
> It would be better if error messages were more specific so that I as a user 
> might know how to fix my bad input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6003) [C++] Better input validation and error messaging in CSV reader

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932190#comment-16932190
 ] 

Antoine Pitrou commented on ARROW-6003:
---

Here is an example in Python:
{code:python}
>>> s = b"""a,b,c\n1,2,3\n"""   
>>> 
>>>  
>>> csv.read_csv(io.BytesIO(s)) 
>>> 
>>>  
pyarrow.Table
a: int64
b: int64
c: int64
>>> options = csv.ReadOptions(column_names=['a', 'b'])  
>>> 
>>>  
>>> csv.read_csv(io.BytesIO(s), read_options=options)   
>>> 
>>>  
Traceback (most recent call last):
  File "", line 1, in 
csv.read_csv(io.BytesIO(s), read_options=options)
  File "pyarrow/_csv.pyx", line 541, in pyarrow._csv.read_csv
check_status(reader.get().Read(&table))
  File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status
raise ArrowInvalid(message)
ArrowInvalid: CSV parse error: Expected 2 columns, got 3
{code}


> [C++] Better input validation and error messaging in CSV reader
> ---
>
> Key: ARROW-6003
> URL: https://issues.apache.org/jira/browse/ARROW-6003
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Neal Richardson
>Assignee: Neal Richardson
>Priority: Major
>  Labels: csv
>
> Followup to https://issues.apache.org/jira/browse/ARROW-5747. The error 
> message(s) are not great when you give bad input. For example, if I give too 
> many or too few {{column_names}}, the error I get is {{Invalid: Empty CSV 
> file}}. In fact, that's about the only error message I've seen from the CSV 
> reader, no matter what I've thrown at it.
> It would be better if error messages were more specific so that I as a user 
> might know how to fix my bad input.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6445) [CI][Crossbow] Nightly Gandiva jar trusty job fails

2019-09-18 Thread Prudhvi Porandla (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932198#comment-16932198
 ] 

Prudhvi Porandla commented on ARROW-6445:
-

Resolved

Build is not failing https://travis-ci.org/ursa-labs/crossbow/builds/585944097

 

> [CI][Crossbow] Nightly Gandiva jar trusty job fails
> ---
>
> Key: ARROW-6445
> URL: https://issues.apache.org/jira/browse/ARROW-6445
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Continuous Integration, Packaging
>Reporter: Neal Richardson
>Assignee: Praveen Kumar Desabandu
>Priority: Blocker
> Fix For: 0.15.0
>
>
> https://travis-ci.org/ursa-labs/crossbow/builds/580192384. Error looks like 
> something to do with doubleconversion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4018) [C++] RLE decoder may not big-endian compatible

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932199#comment-16932199
 ] 

Antoine Pitrou commented on ARROW-4018:
---

> Are we actually clean in terms of endianness in other places?

Presumably no, because we're reinterpreting array bytes as larger types such as 
int64_t etc. And we're also serializing those bytes directly to disk or wire.

>  it sounds strange to be slicing a long like coverity describes have you 
> looked to see if this is intended?

I think it's just a dirty implementation shortcut. Instead of doing:
{code:cpp}
bool result =
bit_reader_.GetAligned(static_cast(BitUtil::CeilDiv(bit_width_, 
8)),
  reinterpret_cast(¤t_value_));
{code}
The code could presumably be written as:
{code:cpp}
T value;
bool result =
bit_reader_.GetAligned(static_cast(BitUtil::CeilDiv(bit_width_, 
8)), &value);
current_value_ = static_cast(value);
{code}



> [C++] RLE decoder may not big-endian compatible
> ---
>
> Key: ARROW-4018
> URL: https://issues.apache.org/jira/browse/ARROW-4018
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.11.1
>Reporter: Antoine Pitrou
>Priority: Major
> Fix For: 1.0.0
>
>
> This issue was found by Coverity. The {{RleDecoder::NextCounts}} method has 
> the following code to fetch the repeated literal in repeated runs:
> {code:c++}
> bool result =
> 
> bit_reader_.GetAligned(static_cast(BitUtil::CeilDiv(bit_width_, 8)),
>   reinterpret_cast(¤t_value_));
> {code}
> Coverity says this:
> bq. Pointer "&this->current_value_" points to an object whose effective type 
> is "unsigned long long" (64 bits, unsigned) but is dereferenced as a narrower 
> "unsigned int" (32 bits, unsigned). This may lead to unexpected results 
> depending on machine endianness.
> bq. 
> In addition, it's not obvious whether {{current_value_}} also needs 
> byte-swapping (presumably, at least in the Parquet file format, it's supposed 
> to be stored in little-endian format in the RLE bitstream).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7

2019-09-18 Thread Antoine Pitrou (Jira)
Antoine Pitrou created ARROW-6597:
-

 Summary: [Python] Segfault in test_pandas with Python 2.7
 Key: ARROW-6597
 URL: https://issues.apache.org/jira/browse/ARROW-6597
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
Reporter: Antoine Pitrou
Assignee: Antoine Pitrou


I get a segfault in test_pandas with Python 2.7.

gdb stack trace (excerpt):
{code}
Thread 27 "python" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffb7fff700 (LWP 17725)]
0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
../src/arrow/python/datetime.cc:229
229   *out = PyDate_FromDate(static_cast(year), 
static_cast(month),
(gdb) bt
#0  0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
../src/arrow/python/datetime.cc:229
#1  0x7fffcabaed34 in arrow::Status 
arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
arrow::ChunkedArray const&, _object**)::{lambda(int, 
_object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, 
value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657
#2  0x7fffcabaeb8c in arrow::Status 
arrow::py::ConvertAsPyObjects(arrow::py::PandasOptions const&, 
arrow::ChunkedArray const&, _object**)::{lambda(int, 
_object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, 
arrow::Status 
arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, 
_object**)::{lambda(int const&, _object**)#1}::operator()(int const, _object**) 
const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, 
out_values=0x55e1b9b0)
at ../src/arrow/python/arrow_to_pandas.cc:417
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6597:
--
Fix Version/s: 0.15.0

> [Python] Segfault in test_pandas with Python 2.7
> 
>
> Key: ARROW-6597
> URL: https://issues.apache.org/jira/browse/ARROW-6597
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
> Fix For: 0.15.0
>
>
> I get a segfault in test_pandas with Python 2.7.
> gdb stack trace (excerpt):
> {code}
> Thread 27 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffb7fff700 (LWP 17725)]
> 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> 229 *out = PyDate_FromDate(static_cast(year), 
> static_cast(month),
> (gdb) bt
> #0  0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> #1  0x7fffcabaed34 in arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, 
> value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657
> #2  0x7fffcabaeb8c in arrow::Status 
> arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, 
> arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, 
> _object**)::{lambda(int const&, _object**)#1}::operator()(int const, 
> _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, 
> out_values=0x55e1b9b0)
> at ../src/arrow/python/arrow_to_pandas.cc:417
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7

2019-09-18 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated ARROW-6597:
--
Labels: pull-request-available  (was: )

> [Python] Segfault in test_pandas with Python 2.7
> 
>
> Key: ARROW-6597
> URL: https://issues.apache.org/jira/browse/ARROW-6597
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>
> I get a segfault in test_pandas with Python 2.7.
> gdb stack trace (excerpt):
> {code}
> Thread 27 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffb7fff700 (LWP 17725)]
> 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> 229 *out = PyDate_FromDate(static_cast(year), 
> static_cast(month),
> (gdb) bt
> #0  0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> #1  0x7fffcabaed34 in arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, 
> value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657
> #2  0x7fffcabaeb8c in arrow::Status 
> arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, 
> arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, 
> _object**)::{lambda(int const&, _object**)#1}::operator()(int const, 
> _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, 
> out_values=0x55e1b9b0)
> at ../src/arrow/python/arrow_to_pandas.cc:417
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7

2019-09-18 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932293#comment-16932293
 ] 

Krisztian Szucs edited comment on ARROW-6597 at 9/18/19 10:47 AM:
--

[~pitrou] why wasn't it catched by the CI?


was (Author: kszucs):
[~pitrou] why wasn't it cached by the CI?

> [Python] Segfault in test_pandas with Python 2.7
> 
>
> Key: ARROW-6597
> URL: https://issues.apache.org/jira/browse/ARROW-6597
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I get a segfault in test_pandas with Python 2.7.
> gdb stack trace (excerpt):
> {code}
> Thread 27 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffb7fff700 (LWP 17725)]
> 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> 229 *out = PyDate_FromDate(static_cast(year), 
> static_cast(month),
> (gdb) bt
> #0  0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> #1  0x7fffcabaed34 in arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, 
> value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657
> #2  0x7fffcabaeb8c in arrow::Status 
> arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, 
> arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, 
> _object**)::{lambda(int const&, _object**)#1}::operator()(int const, 
> _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, 
> out_values=0x55e1b9b0)
> at ../src/arrow/python/arrow_to_pandas.cc:417
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7

2019-09-18 Thread Krisztian Szucs (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932293#comment-16932293
 ] 

Krisztian Szucs commented on ARROW-6597:


[~pitrou] why wasn't it cached by the CI?

> [Python] Segfault in test_pandas with Python 2.7
> 
>
> Key: ARROW-6597
> URL: https://issues.apache.org/jira/browse/ARROW-6597
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I get a segfault in test_pandas with Python 2.7.
> gdb stack trace (excerpt):
> {code}
> Thread 27 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffb7fff700 (LWP 17725)]
> 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> 229 *out = PyDate_FromDate(static_cast(year), 
> static_cast(month),
> (gdb) bt
> #0  0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> #1  0x7fffcabaed34 in arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, 
> value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657
> #2  0x7fffcabaeb8c in arrow::Status 
> arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, 
> arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, 
> _object**)::{lambda(int const&, _object**)#1}::operator()(int const, 
> _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, 
> out_values=0x55e1b9b0)
> at ../src/arrow/python/arrow_to_pandas.cc:417
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932294#comment-16932294
 ] 

Antoine Pitrou commented on ARROW-6597:
---

I have no idea :-/ Perhaps you don't install Pandas on the 2.7 builder?

> [Python] Segfault in test_pandas with Python 2.7
> 
>
> Key: ARROW-6597
> URL: https://issues.apache.org/jira/browse/ARROW-6597
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> I get a segfault in test_pandas with Python 2.7.
> gdb stack trace (excerpt):
> {code}
> Thread 27 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffb7fff700 (LWP 17725)]
> 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> 229 *out = PyDate_FromDate(static_cast(year), 
> static_cast(month),
> (gdb) bt
> #0  0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> #1  0x7fffcabaed34 in arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, 
> value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657
> #2  0x7fffcabaeb8c in arrow::Status 
> arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, 
> arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, 
> _object**)::{lambda(int const&, _object**)#1}::operator()(int const, 
> _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, 
> out_values=0x55e1b9b0)
> at ../src/arrow/python/arrow_to_pandas.cc:417
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1013) [C++] Add asynchronous RecordBatchStreamWriter

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932297#comment-16932297
 ] 

Antoine Pitrou commented on ARROW-1013:
---

Ah, right. I suppose so. This should be designed depending on the intended use 
cases.

> [C++] Add asynchronous RecordBatchStreamWriter
> --
>
> Key: ARROW-1013
> URL: https://issues.apache.org/jira/browse/ARROW-1013
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
>
> We may want to provide an option to limit the queuing depth. The async writer 
> can be initialized from a synchronous writer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1013) [C++] Add asynchronous RecordBatchStreamWriter

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-1013:
--
Fix Version/s: 2.0.0

> [C++] Add asynchronous RecordBatchStreamWriter
> --
>
> Key: ARROW-1013
> URL: https://issues.apache.org/jira/browse/ARROW-1013
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> We may want to provide an option to limit the queuing depth. The async writer 
> can be initialized from a synchronous writer



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-2229) [C++] Write CSV files from RecordBatch, Table

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2229:
--
Fix Version/s: 2.0.0

> [C++] Write CSV files from RecordBatch, Table
> -
>
> Key: ARROW-2229
> URL: https://issues.apache.org/jira/browse/ARROW-2229
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Jun
>Priority: Major
> Fix For: 2.0.0
>
>
> I did a search through JIRA and didn't find this. Is there a support for CSV 
> file reading/writing in arrow available?
> I can go through pandas.read_csv and then convert to arrow table of course, 
> but I would also like to use a native arrow api that's schema-driven CSV 
> reading/writing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5273) [C++] Valgrind failures in JSON tests

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5273.
---
Resolution: Cannot Reproduce

I can't reproduce anymore, closing.

> [C++] Valgrind failures in JSON tests
> -
>
> Key: ARROW-5273
> URL: https://issues.apache.org/jira/browse/ARROW-5273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Antoine Pitrou
>Priority: Major
>
> I get the following failures with Valgrind:
> {code}
> ==12630== Memcheck, a memory error detector
> ==12630== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
> ==12630== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
> ==12630== Command: 
> /home/antoine/arrow/dev/cpp/build-test/debug//arrow-json-chunker-test
> ==12630== 
> Running main() from 
> /home/conda/feedstock_root/build_artifacts/gtest_1551008230529/work/googletest/src/gtest_main.cc
> [==] Running 12 tests from 3 test cases.
> [--] Global test environment set-up.
> [--] 4 tests from ChunkerTest
> [ RUN  ] ChunkerTest.PrettyPrinted
> ==12630== Conditional jump or move depends on uninitialised value(s)
> ==12630==at 0x15757F: 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::CrtAllocator>::ScanCopyUnescapedString(arrow::rapidjson::GenericStringStream
>  >&, arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::CrtAllocator>::StackStream&) (reader.h:942)
> ==12630==by 0x155FAA: void 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::CrtAllocator>::ParseStringToStream<0u, 
> arrow::rapidjson::UTF8, arrow::rapidjson::UTF8, 
> arrow::rapidjson::GenericStringStream >, 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::CrtAllocator>::StackStream 
> >(arrow::rapidjson::GenericStringStream >&, 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::CrtAllocator>::StackStream&) (reader.h:856)
> ==12630==by 0x1537E0: void 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::CrtAllocator>::ParseString<0u, 
> arrow::rapidjson::GenericStringStream >, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator> 
> >(arrow::rapidjson::GenericStringStream >&, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator>&, bool) (reader.h:827)
> ==12630==by 0x152141: void 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, arrow::rapidjson::CrtAllocator>::ParseValue<0u, 
> arrow::rapidjson::GenericStringStream >, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator> 
> >(arrow::rapidjson::GenericStringStream >&, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator>&) (reader.h:1397)
> ==12630==by 0x153CB4: void 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::CrtAllocator>::ParseObject<0u, 
> arrow::rapidjson::GenericStringStream >, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator> 
> >(arrow::rapidjson::GenericStringStream >&, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator>&) (reader.h:621)
> ==12630==by 0x15215A: void 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, arrow::rapidjson::CrtAllocator>::ParseValue<0u, 
> arrow::rapidjson::GenericStringStream >, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator> 
> >(arrow::rapidjson::GenericStringStream >&, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator>&) (reader.h:1398)
> ==12630==by 0x1503CC: arrow::rapidjson::ParseResult 
> arrow::rapidjson::GenericReader, 
> arrow::rapidjson::UTF8, arrow::rapidjson::CrtAllocator>::Parse<0u, 
> arrow::rapidjson::GenericStringStream >, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator> 
> >(arrow::rapidjson::GenericStringStream >&, 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator>&) (reader.h:501)
> ==12630==by 0x14E385: 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator>& 
> arrow::rapidjson::GenericDocument, 
> arrow::rapidjson::MemoryPoolAllocator, 
> arrow::rapidjson::CrtAllocator>::ParseStream<0u, 
> arrow::rapidjson::UTF8, 
> arrow::rapidjson::GenericStringStre

[jira] [Updated] (ARROW-981) [C++] Write comparable columnar serialization benchmarks versus Protocol Buffers / gRPC

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-981:
-
Fix Version/s: 2.0.0

> [C++] Write comparable columnar serialization benchmarks versus Protocol 
> Buffers / gRPC
> ---
>
> Key: ARROW-981
> URL: https://issues.apache.org/jira/browse/ARROW-981
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> This will help with demonstrating quantifiable gains in data serialization 
> beyond the benefits of columnar layout (which can also be implemented in 
> traditional serialization tools like Protobuf, Thrift, etc.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5121) [C++] arrow::internal::make_unique conflicts std::make_unique on MSVC

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5121:
--
Fix Version/s: 1.0.0

> [C++] arrow::internal::make_unique conflicts std::make_unique on MSVC
> -
>
> Key: ARROW-5121
> URL: https://issues.apache.org/jira/browse/ARROW-5121
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
> Fix For: 1.0.0
>
>
> MSVC appears to implement c++20 ADL, which includes function templates with 
> explicit template arguments (previously these were not looked up through ADL):
> https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/23604480/job/psvu16jasktacvy2#L2097



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5327) [C++] allow construction of ArrayBuilders from existing arrays

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5327:
--
Fix Version/s: 2.0.0

> [C++] allow construction of ArrayBuilders from existing arrays
> --
>
> Key: ARROW-5327
> URL: https://issues.apache.org/jira/browse/ARROW-5327
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
> Fix For: 2.0.0
>
>
> After calling Finish it may become necessary to append further elements to an 
> array, which we don't currently support. One way to support this would be 
> consuming the array to produce a builder with the array's elements 
> pre-inserted.
> {code}
> std::shared_ptr array = get_array();
> std::unique_ptr builder;
> RETURN_NOT_OK(MakeBuilder(std::move(*array), &builder));
> {code}
> This will be efficient if we cannibalize the array's buffers and child data 
> when constructing the builder, which will require that the consumed array is 
> uniquely owned.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5618) [C++] [Parquet] Using deprecated Int96 storage for timestamps triggers integer overflow in some cases

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5618.
---
Resolution: Duplicate

> [C++] [Parquet] Using deprecated Int96 storage for timestamps triggers 
> integer overflow in some cases
> -
>
> Key: ARROW-5618
> URL: https://issues.apache.org/jira/browse/ARROW-5618
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: TP Boudreau
>Assignee: TP Boudreau
>Priority: Minor
>  Labels: parquet, pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> When storing Arrow timestamps in Parquet files using the Int96 storage 
> format, certain combinations of array lengths and validity bitmasks cause an 
> integer overflow error on read.  It's not immediately clear whether the 
> Arrow/Parquet writer is storing zeroes when it should be storing positive 
> values or the reader is attempting to calculate a nanoseconds value 
> inappropriately from zeroed inputs (perhaps missing the null bit flag).  Also 
> not immediately clear why only certain length columns seem to be affected.
> Probably the quickest way to reproduce this undefined behavior is to alter 
> the existing unit test UseDeprecatedInt96 (in file 
> .../arrow/cpp/src/parquet/arrow/arrow-reader-writer-test.cc) by quadrupling 
> its column lengths (repeating the same values), followed by 'make unittest' 
> using clang-7 with sanitizers enabled.  (Here's a patch applicable to current 
> master that changes the test as described: [1]; I used the following cmake 
> command to build my environment: [2].)  You should get a log something like 
> [3].  If requested, I'll see if I can put together a stand-alone minimal test 
> case that induces the behavior.
> The quick-hack at [4] will prevent integer overflows, but this is only 
> included to confirm the proximate cause of the bug: the Julian days field of 
> the Int96 appears to be zero, when a strictly positive number is expected.
> I've assigned the issue to myself and I'll start looking into the root cause 
> of this.
> [1] https://gist.github.com/tpboudreau/b6610c13cbfede4d6b171da681d1f94e
> [2] https://gist.github.com/tpboudreau/59178ca8cb50a935aab7477805aa32b9
> [3] https://gist.github.com/tpboudreau/0c2d0a18960c1aa04c838fa5c2ac7d2d
> [4] https://gist.github.com/tpboudreau/0993beb5c8c1488028e76fb2ca179b7f



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5915) [C++] [Python] Set up testing for backwards compatibility of the parquet reader

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5915:
--
Fix Version/s: 1.0.0

> [C++] [Python] Set up testing for backwards compatibility of the parquet 
> reader
> ---
>
> Key: ARROW-5915
> URL: https://issues.apache.org/jira/browse/ARROW-5915
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++, Python
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: parquet
> Fix For: 1.0.0
>
>
> Given the recent parquet compat problems, we should have better testing for 
> this.
> For easy testing of backwards compatibility, we could add some files (with 
> different types) written with older versions, and ensure they are read 
> correctly with the current version.
> Similarly as what Kartothek is doing: 
> https://github.com/JDASoftwareGroup/kartothek/tree/master/reference-data/arrow-compat
> An easy way would be to do that in pyarrow and add them to 
> /pyarrow/tests/data/parquet (we already have some files from 0.7 there). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-5354) [C++] allow Array to have null buffers when all elements are null

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-5354.
---
Resolution: Won't Fix

As discussed in the linked PR, this isn't currently desirable.

> [C++] allow Array to have null buffers when all elements are null
> -
>
> Key: ARROW-5354
> URL: https://issues.apache.org/jira/browse/ARROW-5354
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> In the case of all elements of an array being null, no buffers whatsoever 
> *need* to be allocated (similar to NullArray). This is a more extreme case of 
> the optimization which allows the null bitmap buffer to be null if all 
> elements are valid. Currently {{arrow::Array}} requires at least a null 
> bitmap buffer to be allocated (and all bits set to 0).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6398) [C++] consolidate ScanOptions and ScanContext

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6398:
--
Fix Version/s: 1.0.0

> [C++] consolidate ScanOptions and ScanContext
> -
>
> Key: ARROW-6398
> URL: https://issues.apache.org/jira/browse/ARROW-6398
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
>  Labels: dataset, pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently ScanOptions has two distinct responsibilities: it contains the data 
> selector (and eventually projection schema) for the current scan and it 
> serves as the base class for format specific scan options.
> In addition, we have ScanContext which holds the memory pool for the current 
> scan.
> I think these classes should be rearranged as follows: ScanOptions will be 
> removed and FileScanOptions will be the abstract base class for format 
> specific scan options. ScanContext will be a concrete struct and contain the 
> data selector, projection schema, a vector of FileScanOptions, and any other 
> shared scan state.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5575) [C++] arrowConfig.cmake includes uninstalled targets

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932310#comment-16932310
 ] 

Antoine Pitrou commented on ARROW-5575:
---

[~kou]

 

> [C++] arrowConfig.cmake includes uninstalled targets
> 
>
> Key: ARROW-5575
> URL: https://issues.apache.org/jira/browse/ARROW-5575
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0, 0.14.0, 0.14.1
>Reporter: Matthijs Brobbel
>Priority: Minor
>
> I'm building a CMake project against arrow and I'm using:
> {code:java}
> find_package(arrow 0.13 CONFIG REQUIRED)
> {code}
> to get the arrow_shared target in scope. This works for me on macOS. I 
> installed apache-arrow with:
> {code:java}
> brew install apache-arrow{code}
> However, when I attempt to build the project in a ubuntu xenial container, I 
> get the following CMake error:
> {code:java}
> CMake Error at /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake:151 
> (message):
> The imported target "arrow_cuda_shared" references the file
> "/usr/lib/x86_64-linux-gnu/libarrow_cuda.so.13.0.0"
> but this file does not exist. Possible reasons include:
> * The file was deleted, renamed, or moved to another location.
> * An install or uninstall procedure did not complete successfully.
> * The installation package was faulty and contained
> "/usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake"
> but not all the files it references.
> Call Stack (most recent call first):
> /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowConfig.cmake:61 (include)
> CMakeLists.txt:15 (find_package)
> {code}
> I installed arrow with:
> {code:java}
> curl -sSL "https://dist.apache.org/repos/dist/dev/arrow/KEYS"; | apt-key add -
> echo "deb [arch=amd64] https://dl.bintray.com/apache/arrow/ubuntu/ xenial 
> main" | tee -a /etc/apt/sources.list
> apt-get update
> apt-get install -y libarrow-dev=0.13.0-1
> {code}
> I can also install libarrow-cuda-dev, but I don't want to because I don't 
> need it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5575) [C++] arrowConfig.cmake includes uninstalled targets

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5575:
--
Fix Version/s: 1.0.0

> [C++] arrowConfig.cmake includes uninstalled targets
> 
>
> Key: ARROW-5575
> URL: https://issues.apache.org/jira/browse/ARROW-5575
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0, 0.14.0, 0.14.1
>Reporter: Matthijs Brobbel
>Priority: Minor
> Fix For: 1.0.0
>
>
> I'm building a CMake project against arrow and I'm using:
> {code:java}
> find_package(arrow 0.13 CONFIG REQUIRED)
> {code}
> to get the arrow_shared target in scope. This works for me on macOS. I 
> installed apache-arrow with:
> {code:java}
> brew install apache-arrow{code}
> However, when I attempt to build the project in a ubuntu xenial container, I 
> get the following CMake error:
> {code:java}
> CMake Error at /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake:151 
> (message):
> The imported target "arrow_cuda_shared" references the file
> "/usr/lib/x86_64-linux-gnu/libarrow_cuda.so.13.0.0"
> but this file does not exist. Possible reasons include:
> * The file was deleted, renamed, or moved to another location.
> * An install or uninstall procedure did not complete successfully.
> * The installation package was faulty and contained
> "/usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake"
> but not all the files it references.
> Call Stack (most recent call first):
> /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowConfig.cmake:61 (include)
> CMakeLists.txt:15 (find_package)
> {code}
> I installed arrow with:
> {code:java}
> curl -sSL "https://dist.apache.org/repos/dist/dev/arrow/KEYS"; | apt-key add -
> echo "deb [arch=amd64] https://dl.bintray.com/apache/arrow/ubuntu/ xenial 
> main" | tee -a /etc/apt/sources.list
> apt-get update
> apt-get install -y libarrow-dev=0.13.0-1
> {code}
> I can also install libarrow-cuda-dev, but I don't want to because I don't 
> need it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6187) [C++] fallback to storage type when writing ExtensionType to Parquet

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6187:
--
Fix Version/s: 1.0.0

> [C++] fallback to storage type when writing ExtensionType to Parquet
> 
>
> Key: ARROW-6187
> URL: https://issues.apache.org/jira/browse/ARROW-6187
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Joris Van den Bossche
>Priority: Major
>  Labels: parquet
> Fix For: 1.0.0
>
>
> Writing a table that contains an ExtensionType array to a parquet file is not 
> yet implemented. It currently raises "ArrowNotImplementedError: Unhandled 
> type for Arrow to Parquet schema conversion: 
> extension" (for a PyExtensionType in this case).
> I think minimal support can consist of writing the storage type / array. 
> We also might want to save the extension name and metadata in the parquet 
> FileMetadata. 
> Later on, this could be potentially be used to restore the extension type 
> when reading. This is related to other issues that need to save the arrow 
> schema (categorical: ARROW-5480, time zones: ARROW-5888). Only in this case, 
> we probably want to store the serialised type in addition to the schema 
> (which only has the extension type's name). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6339) [Python][C++] Rowgroup statistics for pd.NaT array ill defined

2019-09-18 Thread Krisztian Szucs (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Krisztian Szucs resolved ARROW-6339.

Resolution: Fixed

Issue resolved by pull request 5403
[https://github.com/apache/arrow/pull/5403]

> [Python][C++] Rowgroup statistics for pd.NaT array ill defined
> --
>
> Key: ARROW-6339
> URL: https://issues.apache.org/jira/browse/ARROW-6339
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.14.1
>Reporter: Florian Jetter
>Assignee: Uwe L. Korn
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When initialising an array with NaT only values the row group statistic is 
> corrupt returning either random values or raises integer out of bound 
> exceptions.
> {code:python}
> import io
> import pandas as pd
> import pyarrow as pa
> import pyarrow.parquet as pq
> df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")})
> buf = pa.BufferOutputStream()
> pq.write_table(pa.Table.from_pandas(df), buf, version="2.0")
> buf = io.BytesIO(buf.getvalue().to_pybytes())
> parquet_file = pq.ParquetFile(buf)
> # Asserting behaviour is difficult since it is random and the state is ill 
> defined. 
> # After a few iterations an exception is raised.
> while True:
> parquet_file.metadata.row_group(0).column(0).statistics.max
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1318) [C++] hdfs access with auth

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-1318:
--
Fix Version/s: 2.0.0

> [C++] hdfs access with auth
> ---
>
> Key: ARROW-1318
> URL: https://issues.apache.org/jira/browse/ARROW-1318
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Martin Durant
>Priority: Major
> Fix For: 2.0.0
>
>
> A wide variety of authentication schemes are available in hadoop.
> This issue is to track whether libhdfs can successfully operate with them. 
> The list includes:
> - user/password
> - basic kerberos (via kinit and via keytabs)
> - kerberos with active directory and single-sign-on
> - "privacy" and "integrity" modes
> - access with hdfs delegation token
> - probably others...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4864) [C++] gandiva-micro_benchmarks is broken in MSVC build

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4864:
--
Fix Version/s: 1.0.0

> [C++] gandiva-micro_benchmarks is broken in MSVC build
> --
>
> Key: ARROW-4864
> URL: https://issues.apache.org/jira/browse/ARROW-4864
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Assignee: Pindikura Ravindra
>Priority: Major
> Fix For: 1.0.0
>
>
> Not a blocking issue for 0.13. I encountered this when debugging the CMake 
> refactor branch with Visual Studio 2015



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5570) [C++] Update Avro C++ code to conform to Arrow style guide and get it compiling.

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5570:
--
Fix Version/s: 2.0.0

> [C++] Update Avro C++ code to conform to Arrow style guide and get it 
> compiling.
> 
>
> Key: ARROW-5570
> URL: https://issues.apache.org/jira/browse/ARROW-5570
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
> Fix For: 2.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-3201) [C++] Utilize zero-copy protobuf parsing from upstream whenever it becomes available

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3201:
--
Fix Version/s: 2.0.0

> [C++] Utilize zero-copy protobuf parsing from upstream whenever it becomes 
> available
> 
>
> Key: ARROW-3201
> URL: https://issues.apache.org/jira/browse/ARROW-3201
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, FlightRPC
>Reporter: Wes McKinney
>Priority: Major
>  Labels: flight
> Fix For: 2.0.0
>
>
> This has been discussed for a couple of years now; perhaps with Abseil this 
> could happen at some point:
> https://github.com/protocolbuffers/protobuf/issues/1896
> Using zero-copy proto parsing (which is standard practice inside Google, but 
> not available in open source protocol buffers) would obviate the need for the 
> zero-copy workaround that I'm going to implement for C++ Flight RPCs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5569) [C++] import avro C++ code to code base.

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5569:
--
Fix Version/s: 2.0.0

> [C++] import avro C++ code to code base.
> 
>
> Key: ARROW-5569
> URL: https://issues.apache.org/jira/browse/ARROW-5569
> Project: Apache Arrow
>  Issue Type: Sub-task
>  Components: C++
>Reporter: Micah Kornfield
>Assignee: Micah Kornfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> The goal here is to take code as is without compiling it, but flattening it 
> to conform with Arrow's code base standards.  This will give a basis for 
> future PR.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5933) [C++] [Documentation] add discussion of Union.typeIds to Layout.rst

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5933:
--
Fix Version/s: 1.0.0

> [C++] [Documentation] add discussion of Union.typeIds to Layout.rst 
> 
>
> Key: ARROW-5933
> URL: https://issues.apache.org/jira/browse/ARROW-5933
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Documentation
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
> Fix For: 1.0.0
>
>
> Union.typeIds is poorly documented and the corresponding property in 
> UnionType is confusingly named type_codes. In particular, Layout.rst doesn't 
> include an explanation of Union.typeIds and implies that an element of a 
> union array's type_ids buffer is always the index of a child array.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4966) [C++] orc::TimezoneError Can't open /usr/share/zoneinfo/GMT-00:00

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4966:
--
Fix Version/s: 2.0.0

> [C++] orc::TimezoneError Can't open /usr/share/zoneinfo/GMT-00:00
> -
>
> Key: ARROW-4966
> URL: https://issues.apache.org/jira/browse/ARROW-4966
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.12.0
>Reporter: Peter Wicks
>Priority: Major
> Fix For: 2.0.0
>
>
> When reading some ORC files, pyarrow orc throws the following error on 
> `read()`: 
> {code:java}
> o = pf.read(){code}
> {{terminate called after throwing an instance of 'orc::TimezoneError'}}
>  {{what(): Can't open /usr/share/zoneinfo/GMT-00:00}}
> While it's true this folder does not exist, I don't think it normally does. 
> Our server has folders for `GMT`, `GMT0`, `GMT-0`, and `GMT+0`.
> ORC file was created using HIVE, compressed with Snappy. Other files from the 
> same table/partition do not throw this error. Files can be read with Hive.
> We created a soft link from the existing `GMT` timezone to this one, and it 
> fixed the issue. Then shortly I got the same error, but for `GMT+00:00`... :D 
> Soft link fixed this one also.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6479) [C++] inline errors from external projects' build logs

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6479:
--
Fix Version/s: 1.0.0

> [C++] inline errors from external projects' build logs
> --
>
> Key: ARROW-6479
> URL: https://issues.apache.org/jira/browse/ARROW-6479
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Minor
> Fix For: 1.0.0
>
>
> Currently when an external project build fails, we get a very uninformative 
> message:
> {code}
> [88/543] Performing build step for 'flatbuffers_ep'
> FAILED: flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/bin/flatc 
> flatbuffers_ep-prefix/src/flatbuffers_ep-install/lib/libflatbuffers.a 
> cd /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-build && 
> /usr/bin/cmake -P 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake
>  && /usr/bin/cmake -E touch 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build
> CMake Error at 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake:16
>  (message):
>   Command failed: 1
>'/usr/bin/cmake' '--build' '.'
>   See also
> 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log
> {code}
> It would be far more useful if the error were caught and relevant section (or 
> even the entirity) of {{ 
> /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log}}
>  were output instead. This is doubly the case on CI where accessing those 
> logs is non trivial



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932314#comment-16932314
 ] 

Antoine Pitrou commented on ARROW-4917:
---

Do we actually care about this?

> [C++] orc_ep fails in cpp-alpine docker
> ---
>
> Key: ARROW-4917
> URL: https://issues.apache.org/jira/browse/ARROW-4917
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
>
> Failure:
> {code:java}
> FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o
> /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include 
> -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem 
> /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem 
> c++/libs/thirdparty/zlib_ep-install/include -isystem 
> c++/libs/thirdparty/lz4_ep-install/include -isystem 
> /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always 
> -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror 
> -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function 
> 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, 
> uint64_t, uint64_t, uint64_t)':
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' 
> was not declared in this scope
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: 
> suggested alternative: 'rint'
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> rint
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: 
> 'nameStart' was not declared in this scope
> if (nameStart >= nameCount) {
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: 
> suggested alternative: 'nameCount'
> if (nameStart >= nameCount) {
> ^
> nameCount
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: 
> 'nameStart' was not declared in this scope
> + nameOffset + nameStart);
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: 
> suggested alternative: 'nameCount'
> + nameOffset + nameStart);
> ^
> nameCount{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5423) [C++] implement partial schema class to extend JSON conversion

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5423:
--
Fix Version/s: 2.0.0

> [C++] implement partial schema class to extend JSON conversion
> --
>
> Key: ARROW-5423
> URL: https://issues.apache.org/jira/browse/ARROW-5423
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
> Fix For: 2.0.0
>
>
> Currently the JSON parser supports only basic conversion rules such as 
> parsing a number to {{int64}}. In general users will want more capable 
> conversions like parsing a base64 string into binary or parsing a column of 
> objects to {{map}} instead of {{struct}}. This 
> will require extension of {{arrow::json::ParseOptions::explicit_schema}} to 
> something analagous to a schema but which supports mapping to more than a 
> simple output type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5745) [C++] properties of Map(Array|Type) are confusingly named

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5745:
--
Fix Version/s: 1.0.0

> [C++] properties of Map(Array|Type) are confusingly named
> -
>
> Key: ARROW-5745
> URL: https://issues.apache.org/jira/browse/ARROW-5745
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
> Fix For: 1.0.0
>
>
> In the context of ListArrays, "values" indicates the elements in a slot of 
> the ListArray. Since MapArray isa ListArray, "values" indicates the same 
> thing and the elements are key-item pairs. This naming scheme is not 
> idiomatic; these *should* be called key-value pairs but that would require 
> propagating the renaming down to ListArray.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6226) [C++] refactor Diff and PrettyPrint to share code

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6226:
--
Fix Version/s: 1.0.0

> [C++] refactor Diff and PrettyPrint to share code
> -
>
> Key: ARROW-6226
> URL: https://issues.apache.org/jira/browse/ARROW-6226
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Minor
> Fix For: 1.0.0
>
>
> Diff reimplements a lot of PrettyPrint which didn't quite fit the former's 
> required use case and slightly changes the output format. Extract the shared 
> code to a header {{pretty_print_internal.h}} which can be used by both and 
> update the pretty print tests to reflect the new format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4706) [C++] shared conversion framework for JSON/CSV parsers

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4706:
--
Fix Version/s: 2.0.0

> [C++] shared conversion framework for JSON/CSV parsers
> --
>
> Key: ARROW-4706
> URL: https://issues.apache.org/jira/browse/ARROW-4706
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
> Fix For: 2.0.0
>
>
> CSV and JSON both convert strings to values in a Array but there is little 
> code sharing beyond {{arrow::util::StringConverter}}.
> It would be advantageous if a single interface could be shared between CSV 
> and JSON to do the heavy lifting of conversion consistently. This would 
> simplify addition of new parsers as well as allowing all parsers to 
> immediately take advantage of a new conversion strategy.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4548) [C++] run-cmake-format.py is not supported on Windows

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4548:
--
Priority: Minor  (was: Major)

> [C++] run-cmake-format.py is not supported on Windows
> -
>
> Key: ARROW-4548
> URL: https://issues.apache.org/jira/browse/ARROW-4548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Minor
>
> I tried to fix it but no matter what option I pass for {{--line-ending}} to 
> {{cmake-format}} it converts LF line endings to CRLF. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4548) [C++] run-cmake-format.py is not supported on Windows

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4548:
--
Fix Version/s: 1.0.0

> [C++] run-cmake-format.py is not supported on Windows
> -
>
> Key: ARROW-4548
> URL: https://issues.apache.org/jira/browse/ARROW-4548
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Continuous Integration
>Reporter: Wes McKinney
>Priority: Minor
> Fix For: 1.0.0
>
>
> I tried to fix it but no matter what option I pass for {{--line-ending}} to 
> {{cmake-format}} it converts LF line endings to CRLF. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6213) [C++] tests fail for AVX512

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6213:
--
Fix Version/s: 2.0.0

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker

2019-09-18 Thread Uwe L. Korn (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932315#comment-16932315
 ] 

Uwe L. Korn commented on ARROW-4917:


[~mdeepak] [~owen.omalley] might care. I guess ORC is not testing on Alpine.

> [C++] orc_ep fails in cpp-alpine docker
> ---
>
> Key: ARROW-4917
> URL: https://issues.apache.org/jira/browse/ARROW-4917
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Major
>
> Failure:
> {code:java}
> FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o
> /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include 
> -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem 
> /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem 
> c++/libs/thirdparty/zlib_ep-install/include -isystem 
> c++/libs/thirdparty/lz4_ep-install/include -isystem 
> /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always 
> -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror 
> -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function 
> 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, 
> uint64_t, uint64_t, uint64_t)':
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' 
> was not declared in this scope
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: 
> suggested alternative: 'rint'
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> rint
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: 
> 'nameStart' was not declared in this scope
> if (nameStart >= nameCount) {
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: 
> suggested alternative: 'nameCount'
> if (nameStart >= nameCount) {
> ^
> nameCount
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: 
> 'nameStart' was not declared in this scope
> + nameOffset + nameStart);
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: 
> suggested alternative: 'nameCount'
> + nameOffset + nameStart);
> ^
> nameCount{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6213) [C++] tests fail for AVX512

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932316#comment-16932316
 ] 

Antoine Pitrou commented on ARROW-6213:
---

[~coulombec] would you be willing to investigate this? I don't think any of us 
has access to a AVX512 machine.

[~wesmckinn]

> [C++] tests fail for AVX512
> ---
>
> Key: ARROW-6213
> URL: https://issues.apache.org/jira/browse/ARROW-6213
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.1
> Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) 
> avx512
>Reporter: Charles Coulombe
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, 
> arrow-0.14.1-c++-failed-tests.txt
>
>
> When building libraries for avx512 with GCC 7.3.0, two C++ tests fails.
> {noformat}
> The following tests FAILED: 
>   28 - arrow-compute-compare-test (Failed) 
>   30 - arrow-compute-filter-test (Failed) 
> Errors while running CTest{noformat}
> while for avx2 they passes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4917:
--
Fix Version/s: 1.0.0

> [C++] orc_ep fails in cpp-alpine docker
> ---
>
> Key: ARROW-4917
> URL: https://issues.apache.org/jira/browse/ARROW-4917
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Minor
> Fix For: 1.0.0
>
>
> Failure:
> {code:java}
> FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o
> /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include 
> -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem 
> /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem 
> c++/libs/thirdparty/zlib_ep-install/include -isystem 
> c++/libs/thirdparty/lz4_ep-install/include -isystem 
> /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always 
> -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror 
> -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function 
> 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, 
> uint64_t, uint64_t, uint64_t)':
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' 
> was not declared in this scope
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: 
> suggested alternative: 'rint'
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> rint
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: 
> 'nameStart' was not declared in this scope
> if (nameStart >= nameCount) {
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: 
> suggested alternative: 'nameCount'
> if (nameStart >= nameCount) {
> ^
> nameCount
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: 
> 'nameStart' was not declared in this scope
> + nameOffset + nameStart);
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: 
> suggested alternative: 'nameCount'
> + nameOffset + nameStart);
> ^
> nameCount{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4917:
--
Priority: Minor  (was: Major)

> [C++] orc_ep fails in cpp-alpine docker
> ---
>
> Key: ARROW-4917
> URL: https://issues.apache.org/jira/browse/ARROW-4917
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Minor
>
> Failure:
> {code:java}
> FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o
> /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include 
> -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem 
> /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem 
> c++/libs/thirdparty/zlib_ep-install/include -isystem 
> c++/libs/thirdparty/lz4_ep-install/include -isystem 
> /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always 
> -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror 
> -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function 
> 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, 
> uint64_t, uint64_t, uint64_t)':
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' 
> was not declared in this scope
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: 
> suggested alternative: 'rint'
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> rint
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: 
> 'nameStart' was not declared in this scope
> if (nameStart >= nameCount) {
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: 
> suggested alternative: 'nameCount'
> if (nameStart >= nameCount) {
> ^
> nameCount
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: 
> 'nameStart' was not declared in this scope
> + nameOffset + nameStart);
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: 
> suggested alternative: 'nameCount'
> + nameOffset + nameStart);
> ^
> nameCount{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4917:
--
Fix Version/s: (was: 1.0.0)
   2.0.0

> [C++] orc_ep fails in cpp-alpine docker
> ---
>
> Key: ARROW-4917
> URL: https://issues.apache.org/jira/browse/ARROW-4917
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Uwe L. Korn
>Priority: Minor
> Fix For: 2.0.0
>
>
> Failure:
> {code:java}
> FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o
> /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include 
> -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem 
> /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem 
> c++/libs/thirdparty/zlib_ep-install/include -isystem 
> c++/libs/thirdparty/lz4_ep-install/include -isystem 
> /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always 
> -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror 
> -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o 
> c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c 
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function 
> 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, 
> uint64_t, uint64_t, uint64_t)':
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' 
> was not declared in this scope
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: 
> suggested alternative: 'rint'
> uint nameStart = ptr[variantOffset + 6 * variant + 5];
> ^~~~
> rint
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: 
> 'nameStart' was not declared in this scope
> if (nameStart >= nameCount) {
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: 
> suggested alternative: 'nameCount'
> if (nameStart >= nameCount) {
> ^
> nameCount
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: 
> 'nameStart' was not declared in this scope
> + nameOffset + nameStart);
> ^
> /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: 
> suggested alternative: 'nameCount'
> + nameOffset + nameStart);
> ^
> nameCount{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5932) [C++] undefined reference to `__cxa_init_primary_exception@CXXABI_1.3.11'

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5932:
--
Priority: Major  (was: Critical)

> [C++] undefined reference to `__cxa_init_primary_exception@CXXABI_1.3.11'
> -
>
> Key: ARROW-5932
> URL: https://issues.apache.org/jira/browse/ARROW-5932
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.14.0
> Environment: Linux Mint 19.1 Tessa
> g++-6
>Reporter: Cong Ding
>Priority: Major
>
> I was installing Apache Arrow in my Linux Mint 19.1 Tessa server. I followed 
> the instructions on the official arrow website (using the ubuntu 18.04 
> method). However, when I was trying to compile the examples, the g++ compiler 
> threw out some errors.
> I have updated my g++ to g++-6, update my libstdc++ library, and using flag 
> -lstdc++, but it still didn't work.
>  
> {code:java}
> //代码占位符
> g++-6 -std=c++11 -larrow -lparquet main.cpp -lstdc++ 
> {code}
> The error message:
> /usr/lib/x86_64-linux-gnu/libarrow.so: undefined reference to 
> `__cxa_init_primary_exception@CXXABI_1.3.11'
> /usr/lib/x86_64-linux-gnu/libarrow.so: undefined reference to 
> `std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11'
> collect2: error: ld returned 1 exit status.
>  
> I do not know what to do this moment. Can anyone help me?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6436) [C++] vendor a half precision floating point library

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932318#comment-16932318
 ] 

Antoine Pitrou commented on ARROW-6436:
---

See also ARROW-3802. Numpy has dedicated float16 routines that we could reuse. 
Given that it's Numpy it's probably well-maintained.

There may be other projects around.

> [C++] vendor a half precision floating point library
> 
>
> Key: ARROW-6436
> URL: https://issues.apache.org/jira/browse/ARROW-6436
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Major
>
> Clang and GCC provide _Float16 and there are numerous polyfills which can 
> emulate a 16 bit float for other platforms. This would fill a hole in the 
> kernels and other code which don't currently support HALF_FLOAT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6436) [C++] vendor a half precision floating point library

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6436:
--
Fix Version/s: 1.0.0

> [C++] vendor a half precision floating point library
> 
>
> Key: ARROW-6436
> URL: https://issues.apache.org/jira/browse/ARROW-6436
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Major
> Fix For: 1.0.0
>
>
> Clang and GCC provide _Float16 and there are numerous polyfills which can 
> emulate a 16 bit float for other platforms. This would fill a hole in the 
> kernels and other code which don't currently support HALF_FLOAT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6436) [C++] vendor a half precision floating point library

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932321#comment-16932321
 ] 

Antoine Pitrou commented on ARROW-6436:
---

As for built-in float16 types it looks a bit more complicated, e.g. clang:

{quote}Clang supports two half-precision (16-bit) floating point types: 
{{__fp16}} and {{_Float16}}. These types are supported in all language modes.

{{__fp16}} is supported on every target, as it is purely a storage format; see 
below. {{_Float16}} is currently only supported on the following targets, with 
further targets pending ABI standardization:
 * 32-bit ARM
 * 64-bit ARM (AArch64)
 * SPIR
{quote}

> [C++] vendor a half precision floating point library
> 
>
> Key: ARROW-6436
> URL: https://issues.apache.org/jira/browse/ARROW-6436
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Major
> Fix For: 1.0.0
>
>
> Clang and GCC provide _Float16 and there are numerous polyfills which can 
> emulate a 16 bit float for other platforms. This would fill a hole in the 
> kernels and other code which don't currently support HALF_FLOAT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6436) [C++] vendor a half precision floating point library

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932329#comment-16932329
 ] 

Antoine Pitrou commented on ARROW-6436:
---

Also there are conversion intrinsic on recent x86 CPUs:

[https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=half-precision]

[https://en.wikipedia.org/wiki/F16C]

 

> [C++] vendor a half precision floating point library
> 
>
> Key: ARROW-6436
> URL: https://issues.apache.org/jira/browse/ARROW-6436
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Benjamin Kietzman
>Priority: Major
> Fix For: 1.0.0
>
>
> Clang and GCC provide _Float16 and there are numerous polyfills which can 
> emulate a 16 bit float for other platforms. This would fill a hole in the 
> kernels and other code which don't currently support HALF_FLOAT.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-6597.
---
Resolution: Fixed

Issue resolved by pull request 5416
[https://github.com/apache/arrow/pull/5416]

> [Python] Segfault in test_pandas with Python 2.7
> 
>
> Key: ARROW-6597
> URL: https://issues.apache.org/jira/browse/ARROW-6597
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Antoine Pitrou
>Assignee: Antoine Pitrou
>Priority: Major
>  Labels: pull-request-available
> Fix For: 0.15.0
>
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> I get a segfault in test_pandas with Python 2.7.
> gdb stack trace (excerpt):
> {code}
> Thread 27 "python" received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 0x7fffb7fff700 (LWP 17725)]
> 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> 229 *out = PyDate_FromDate(static_cast(year), 
> static_cast(month),
> (gdb) bt
> #0  0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, 
> unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at 
> ../src/arrow/python/datetime.cc:229
> #1  0x7fffcabaed34 in arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, 
> value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657
> #2  0x7fffcabaeb8c in arrow::Status 
> arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, 
> _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, 
> arrow::Status 
> arrow::py::ConvertDates(arrow::py::PandasOptions const&, 
> arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, 
> _object**)::{lambda(int const&, _object**)#1}::operator()(int const, 
> _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, 
> out_values=0x55e1b9b0)
> at ../src/arrow/python/arrow_to_pandas.cc:417
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6393) [C++]Add EqualOptions support in SparseTensor::Equals

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6393:
--
Fix Version/s: 1.0.0

> [C++]Add EqualOptions support in SparseTensor::Equals
> -
>
> Key: ARROW-6393
> URL: https://issues.apache.org/jira/browse/ARROW-6393
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++
>Reporter: Kenta Murata
>Assignee: Kenta Murata
>Priority: Major
> Fix For: 1.0.0
>
>
> SparseTensor::Equals should take EqualOptions argument as Tensor::Equals does.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4784) [C++][CI] Re-enable flaky mingw tests.

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4784:
--
Fix Version/s: 2.0.0

> [C++][CI] Re-enable flaky mingw tests.
> --
>
> Key: ARROW-4784
> URL: https://issues.apache.org/jira/browse/ARROW-4784
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration
>Reporter: Micah Kornfield
>Priority: Major
>  Labels: ci-failure
> Fix For: 2.0.0
>
>
> There is no {{--exclude-regex}} option for {{ctest}} in 
> {{ci/appveyor-cpp-build-mingw.bat}} when we resolve this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5578) [C++][Flight] Flight does not build out of the box on Alpine Linux

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5578:
--
Fix Version/s: 2.0.0

> [C++][Flight] Flight does not build out of the box on Alpine Linux
> --
>
> Key: ARROW-5578
> URL: https://issues.apache.org/jira/browse/ARROW-5578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
> Fix For: 2.0.0
>
>
> Fails with SSL linking errors.
> I am disabling in the Dockerfile for now in ARROW-5577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5593) [C++][Fuzzing] Test fuzzers against arrow-testing corpus

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5593:
--
Fix Version/s: 1.0.0

> [C++][Fuzzing] Test fuzzers against arrow-testing corpus
> 
>
> Key: ARROW-5593
> URL: https://issues.apache.org/jira/browse/ARROW-5593
> Project: Apache Arrow
>  Issue Type: Test
>  Components: C++
>Reporter: Marco Neumann
>Priority: Major
>  Labels: fuzzer
> Fix For: 1.0.0
>
>
> All fuzzers should be run against the corpus in 
> [arrow-testing|https://github.com/apache/arrow-testing] to prevent 
> regressions. The arrow CI should download the current corpus and run the 
> fuzzers exactly once against each corpus applicable corpus file. The fuzzers 
> must be build with address sanitizer enabled.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6273) [C++][Fuzzing] Add fuzzer for parquet->arrow read path

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6273:
--
Fix Version/s: 1.0.0

> [C++][Fuzzing] Add fuzzer for parquet->arrow read path
> --
>
> Key: ARROW-6273
> URL: https://issues.apache.org/jira/browse/ARROW-6273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Marco Neumann
>Assignee: Marco Neumann
>Priority: Major
>  Labels: fuzzer
> Fix For: 1.0.0
>
>
> The parquet to arrow read path is likely the most commonly used one (esp. by 
> pyarrow) and is a closed step that should allow us to fuzz the reading of 
> untrusted parquet files into memory. This complements the existing arrow ipc 
> fuzzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5578) [C++][Flight] Flight does not build out of the box on Alpine Linux

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5578:
--
Priority: Minor  (was: Major)

> [C++][Flight] Flight does not build out of the box on Alpine Linux
> --
>
> Key: ARROW-5578
> URL: https://issues.apache.org/jira/browse/ARROW-5578
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Wes McKinney
>Priority: Minor
>
> Fails with SSL linking errors.
> I am disabling in the Dockerfile for now in ARROW-5577



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6273) [C++][Fuzzing] Add fuzzer for parquet->arrow read path

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6273:
--
Component/s: Developer Tools
 Continuous Integration

> [C++][Fuzzing] Add fuzzer for parquet->arrow read path
> --
>
> Key: ARROW-6273
> URL: https://issues.apache.org/jira/browse/ARROW-6273
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Continuous Integration, Developer Tools
>Reporter: Marco Neumann
>Assignee: Marco Neumann
>Priority: Major
>  Labels: fuzzer
> Fix For: 1.0.0
>
>
> The parquet to arrow read path is likely the most commonly used one (esp. by 
> pyarrow) and is a closed step that should allow us to fuzz the reading of 
> untrusted parquet files into memory. This complements the existing arrow ipc 
> fuzzer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6386) [C++][Documentation] Explicit documentation of null slot interpretation

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6386:
--
Fix Version/s: 1.0.0

> [C++][Documentation] Explicit documentation of null slot interpretation
> ---
>
> Key: ARROW-6386
> URL: https://issues.apache.org/jira/browse/ARROW-6386
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Benjamin Kietzman
>Assignee: Benjamin Kietzman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> To my knowledge, there isn't explicit documentation on how null slots in an 
> array should be interpreted. SQL uses Kleene logic, wherein a null is 
> explicitly an unknown rather than a special value. This yields for example 
> `(null AND false) -> false`, since `(x AND false) -> false` for all possible 
> values of x. This is also the behavior of Gandiva's boolean expressions.
> By contrast the boolean kernels implement something closer to the behavior of 
> NaN: `(null AND false) -> null`. I think this is simply an error in the 
> boolean kernels but in any case I think explicit documentation should be 
> added to prevent future confusion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6148) [C++][Packaging] Improve aarch64 support

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6148:
--
Fix Version/s: 2.0.0

> [C++][Packaging]  Improve aarch64 support
> -
>
> Key: ARROW-6148
> URL: https://issues.apache.org/jira/browse/ARROW-6148
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Packaging
>Reporter: Francois Saint-Jacques
>Assignee: Marcin Juszkiewicz
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 8h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4248) [C++][Plasma] Build on Windows / Visual Studio

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4248:
--
Fix Version/s: 2.0.0

> [C++][Plasma] Build on Windows / Visual Studio
> --
>
> Key: ARROW-4248
> URL: https://issues.apache.org/jira/browse/ARROW-4248
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> See https://github.com/apache/arrow/issues/3391



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-3226) [C++][Plasma] Plasma Store will crash using small memory

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3226:
--
Fix Version/s: 2.0.0

> [C++][Plasma] Plasma Store will crash using small memory
> 
>
> Key: ARROW-3226
> URL: https://issues.apache.org/jira/browse/ARROW-3226
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Reporter: Yuhong Guo
>Assignee: Yuhong Guo
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Plasma Store will do the eviction when the memory allocation is not enough. 
> When specified a smaller store limit, Plasma Store will crash when limit 
> memory reached. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-2045) [C++][Plasma] More primitive operations on plasma store

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2045:
--
Fix Version/s: 2.0.0

> [C++][Plasma] More primitive operations on plasma store
> ---
>
> Key: ARROW-2045
> URL: https://issues.apache.org/jira/browse/ARROW-2045
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++ - Plasma
>Reporter: Yuxin Wu
>Priority: Minor
> Fix For: 2.0.0
>
>
> Hi Developers,
> I found plasma store very useful – it's fast and simple to use. However, I 
> think there are more operations that can make it a more general IPC/messaging 
> tool and potentially helpful in more scenarios.
> Conceptually, an object store can support the following "put" methods:
>  # Evict when full
>  # Wait for space when full, perhaps with a timeout (i.e. blocking)
>  # Return failure when full (i.e. non-blocking)
> And the following "get" methods:
>  # Wait for the object to appear (i.e. blocking)
>  # Return failure when object doesn't exist (i.e. non-blocking)
>  # Remove the object after get
> Some of the above features can be implemented with others. But some of them 
> are primitives (e.g. return failure when full) that needs to be supported.
>  
> My use case: I wanted to use plasma to send/recv large buffers between 
> processes, i.e. build a message passing interface on top of shared memory. 
> Plasma has made it quite easy (only have to send/recv the id) and efficient 
> (faster than unix pipe). But "evict when full" is now the only available 
> "put" method, so that could create many trouble if I want to ensure message 
> delivery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4829) [C++][Plasma] plasma-serialization_tests fails in release builds

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4829:
--
Fix Version/s: 2.0.0

> [C++][Plasma] plasma-serialization_tests fails in release builds
> 
>
> Key: ARROW-4829
> URL: https://issues.apache.org/jira/browse/ARROW-4829
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++ - Plasma
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 2.0.0
>
>
> On Ubuntu 18.10 with conda-forge toolchain (gcc 7.3.x)
> {code}
> $ ./release/plasma-serialization_tests 
> Running main() from gmock_main.cc
> [==] Running 14 tests from 1 test case.
> [--] Global test environment set-up.
> [--] 14 tests from PlasmaSerialization
> [ RUN  ] PlasmaSerialization.CreateRequest
> [   OK ] PlasmaSerialization.CreateRequest (0 ms)
> [ RUN  ] PlasmaSerialization.CreateReply
> [   OK ] PlasmaSerialization.CreateReply (0 ms)
> [ RUN  ] PlasmaSerialization.SealRequest
> [   OK ] PlasmaSerialization.SealRequest (1 ms)
> [ RUN  ] PlasmaSerialization.SealReply
> [   OK ] PlasmaSerialization.SealReply (0 ms)
> [ RUN  ] PlasmaSerialization.GetRequest
> [   OK ] PlasmaSerialization.GetRequest (0 ms)
> [ RUN  ] PlasmaSerialization.GetReply
> ../src/plasma/test/serialization_tests.cc:191: Failure
> Expected equality of these values:
>   memcmp(&plasma_objects[object_ids[0]], &plasma_objects_return[0], 
> sizeof(PlasmaObject))
> Which is: 127
>   0
> [  FAILED  ] PlasmaSerialization.GetReply (0 ms)
> [ RUN  ] PlasmaSerialization.ReleaseRequest
> [   OK ] PlasmaSerialization.ReleaseRequest (0 ms)
> [ RUN  ] PlasmaSerialization.ReleaseReply
> [   OK ] PlasmaSerialization.ReleaseReply (0 ms)
> [ RUN  ] PlasmaSerialization.DeleteRequest
> [   OK ] PlasmaSerialization.DeleteRequest (0 ms)
> [ RUN  ] PlasmaSerialization.DeleteReply
> [   OK ] PlasmaSerialization.DeleteReply (0 ms)
> [ RUN  ] PlasmaSerialization.EvictRequest
> [   OK ] PlasmaSerialization.EvictRequest (0 ms)
> [ RUN  ] PlasmaSerialization.EvictReply
> [   OK ] PlasmaSerialization.EvictReply (0 ms)
> [ RUN  ] PlasmaSerialization.DataRequest
> [   OK ] PlasmaSerialization.DataRequest (0 ms)
> [ RUN  ] PlasmaSerialization.DataReply
> [   OK ] PlasmaSerialization.DataReply (0 ms)
> [--] 14 tests from PlasmaSerialization (1 ms total)
> [--] Global test environment tear-down
> [==] 14 tests from 1 test case ran. (2 ms total)
> [  PASSED  ] 13 tests.
> [  FAILED  ] 1 test, listed below:
> [  FAILED  ] PlasmaSerialization.GetReply
>  1 FAILED TEST
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5607) [C++][Fuzzing] arrow-ipc-fuzzing-test crash 607e9caa76863a97f2694a769a1ae2fb83c55e02

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5607:
--
Fix Version/s: 2.0.0

> [C++][Fuzzing] arrow-ipc-fuzzing-test crash 
> 607e9caa76863a97f2694a769a1ae2fb83c55e02
> 
>
> Key: ARROW-5607
> URL: https://issues.apache.org/jira/browse/ARROW-5607
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Marco Neumann
>Priority: Major
>  Labels: fuzzer
> Fix For: 2.0.0
>
> Attachments: crash-607e9caa76863a97f2694a769a1ae2fb83c55e02
>
>
> {{arrow-ipc-fuzzing-test}} found the attached attached crash. Reproduce with
> {code}
> arrow-ipc-fuzzing-test crash-607e9caa76863a97f2694a769a1ae2fb83c55e02
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-2260) [C++][Plasma] plasma_store should show usage

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2260:
--
Fix Version/s: 2.0.0

> [C++][Plasma] plasma_store should show usage
> 
>
> Key: ARROW-2260
> URL: https://issues.apache.org/jira/browse/ARROW-2260
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++ - Plasma
>Affects Versions: 0.8.0
>Reporter: Antoine Pitrou
>Priority: Minor
> Fix For: 2.0.0
>
>
> Currently the options exposed by the {{plasma_store}} executable aren't very 
> discoverable:
> {code:bash}
> $ plasma_store -h
> please specify socket for incoming connections with -s switch
> Abandon
> (pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting 
> *)$ plasma_store 
> please specify socket for incoming connections with -s switch
> Abandon
> (pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting 
> *)$ plasma_store --help
> plasma_store: invalid option -- '-'
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5607) [C++][Fuzzing] arrow-ipc-fuzzing-test crash 607e9caa76863a97f2694a769a1ae2fb83c55e02

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5607:
--
Fix Version/s: (was: 2.0.0)

> [C++][Fuzzing] arrow-ipc-fuzzing-test crash 
> 607e9caa76863a97f2694a769a1ae2fb83c55e02
> 
>
> Key: ARROW-5607
> URL: https://issues.apache.org/jira/browse/ARROW-5607
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Marco Neumann
>Priority: Major
>  Labels: fuzzer
> Attachments: crash-607e9caa76863a97f2694a769a1ae2fb83c55e02
>
>
> {{arrow-ipc-fuzzing-test}} found the attached attached crash. Reproduce with
> {code}
> arrow-ipc-fuzzing-test crash-607e9caa76863a97f2694a769a1ae2fb83c55e02
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-2358) [C++][Python] API for Writing to Multiple Feather Files

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2358:
--
Fix Version/s: 2.0.0

> [C++][Python] API for Writing to Multiple Feather Files
> ---
>
> Key: ARROW-2358
> URL: https://issues.apache.org/jira/browse/ARROW-2358
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C, C++, Python
>Affects Versions: 0.9.0
>Reporter: Dhruv Madeka
>Priority: Minor
> Fix For: 2.0.0
>
>
> It would be really great to have an API which can write a Table to a 
> `FeatherDataset`. Essentially, taking a name for a file - it would split the 
> table into N-equal parts (which could be determined by the user or the code) 
> and then write the data to N files with a suffix (which is `_part` by default 
> but could be user specificed).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6463) [C++][Python] Rename arrow::fs::Selector to FileSelector

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6463:
--
Fix Version/s: 1.0.0

> [C++][Python] Rename arrow::fs::Selector to FileSelector
> 
>
> Key: ARROW-6463
> URL: https://issues.apache.org/jira/browse/ARROW-6463
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: filesystem
> Fix For: 1.0.0
>
>
> In both the C++ implementation and the python binding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-2882:
--
Fix Version/s: 2.0.0

> [C++][Python] Support AWS Firehose partition_scheme implementation for 
> Parquet datasets
> ---
>
> Key: ARROW-2882
> URL: https://issues.apache.org/jira/browse/ARROW-2882
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++, Python
>Reporter: Pablo Javier Takara
>Priority: Major
>  Labels: dataset, parquet
> Fix For: 2.0.0
>
>
> I'd like to be able to read a ParquetDataset generated by AWS Firehose.
> The only implementation at the time of writting was the partition scheme 
> created by hive (year=2018/month=01/day=11).
> AWS Firehose partition scheme is a little bit different (2018/01/11).
>  
> Thanks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-1975) [C++] Add abi-compliance-checker to build process

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-1975:
--
Fix Version/s: 2.0.0

> [C++] Add abi-compliance-checker to build process
> -
>
> Key: ARROW-1975
> URL: https://issues.apache.org/jira/browse/ARROW-1975
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>Priority: Major
>  Labels: pull-request-available
> Fix For: 2.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> I would like to check our baseline modules with 
> https://lvc.github.io/abi-compliance-checker/ to ensure that version upgrades 
> are much smoother and that we don‘t break the ABI in patch releases. 
> As we‘re pre-1.0 yet, I accept that there will be breakage but I would like 
> to keep them to a minimum. Currently the biggest pain with Arrow is you need 
> to pin it in Python always with {{==0.x.y}}, otherwise segfaults are 
> inevitable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4761) [C++] Support zstandard<1

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932339#comment-16932339
 ] 

Antoine Pitrou commented on ARROW-4761:
---

Due to our maintenance workload, I'm not sure we still care about this.

> [C++] Support zstandard<1
> -
>
> Key: ARROW-4761
> URL: https://issues.apache.org/jira/browse/ARROW-4761
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Priority: Major
>
> To support building with as many system packages as possible on Ubuntu, we 
> should support building with zstandard 0.5.1 which is the one available on 
> Ubuntu Xenial. Given the size of our current code for Zstandard, this seems 
> feasible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4761) [C++] Support zstandard<1

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4761:
--
Priority: Minor  (was: Major)

> [C++] Support zstandard<1
> -
>
> Key: ARROW-4761
> URL: https://issues.apache.org/jira/browse/ARROW-4761
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Packaging
>Reporter: Uwe L. Korn
>Priority: Minor
>
> To support building with as many system packages as possible on Ubuntu, we 
> should support building with zstandard 0.5.1 which is the one available on 
> Ubuntu Xenial. Given the size of our current code for Zstandard, this seems 
> feasible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-3737) [CI/Docker/Python] Support running integration tests on multiple python versions

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-3737.
---
Resolution: Won't Fix

Indeed, this doesn't seem useful.

> [CI/Docker/Python] Support running integration tests on multiple python 
> versions
> 
>
> Key: ARROW-3737
> URL: https://issues.apache.org/jira/browse/ARROW-3737
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: docker
>
> Currently python-3.6 image is pinned in integration/hdfs/Dockerfile and 
> integration/pandas-master/Dockerfile. It's possible to pass build time 
> argument similarly like the arrow:python-${PYTHON_VERSION} image works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-5607) [C++][Fuzzing] arrow-ipc-fuzzing-test crash 607e9caa76863a97f2694a769a1ae2fb83c55e02

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932340#comment-16932340
 ] 

Antoine Pitrou commented on ARROW-5607:
---

It passes here. Can you reproduce?

> [C++][Fuzzing] arrow-ipc-fuzzing-test crash 
> 607e9caa76863a97f2694a769a1ae2fb83c55e02
> 
>
> Key: ARROW-5607
> URL: https://issues.apache.org/jira/browse/ARROW-5607
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.13.0
>Reporter: Marco Neumann
>Priority: Major
>  Labels: fuzzer
> Attachments: crash-607e9caa76863a97f2694a769a1ae2fb83c55e02
>
>
> {{arrow-ipc-fuzzing-test}} found the attached attached crash. Reproduce with
> {code}
> arrow-ipc-fuzzing-test crash-607e9caa76863a97f2694a769a1ae2fb83c55e02
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-3710) [CI/Python] Run nightly tests against pandas master

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-3710:
--
Fix Version/s: 1.0.0

> [CI/Python] Run nightly tests against pandas master
> ---
>
> Key: ARROW-3710
> URL: https://issues.apache.org/jira/browse/ARROW-3710
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration, Python
>Reporter: Krisztian Szucs
>Assignee: Krisztian Szucs
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.0.0
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Follow-up of [https://github.com/apache/arrow/pull/2758] and 
> https://github.com/apache/arrow/pull/2755



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5216) [CI] Add Appveyor badge to README

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5216:
--
Fix Version/s: 1.0.0

> [CI] Add Appveyor badge to README
> -
>
> Key: ARROW-5216
> URL: https://issues.apache.org/jira/browse/ARROW-5216
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Neal Richardson
>Priority: Trivial
> Fix For: 1.0.0
>
>
> I was trying to see what was running in appveyor and couldn't find it. 
> Krisztián helped me to find 
> [https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow], but it 
> would be nice to add the badge to the README next to the Travis-CI one for a 
> quick link to it (as well as showing off build status).
> I was just going to add it myself, but unlike Travis, you can't guess the 
> Appveyor badge URL from the project name because they have a hash in them; 
> only someone with sufficient privileges on the project in Appveyor can get to 
> the settings panel to find the URL.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6448) [CI] Add crossbow notifications

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932343#comment-16932343
 ] 

Antoine Pitrou commented on ARROW-6448:
---

Does something remain to do here?

> [CI] Add crossbow notifications
> ---
>
> Key: ARROW-6448
> URL: https://issues.apache.org/jira/browse/ARROW-6448
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Continuous Integration
>Reporter: Francois Saint-Jacques
>Assignee: Francois Saint-Jacques
>Priority: Critical
>  Labels: pull-request-available
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-4785) [CI] Make Travis CI resilient against hash sum mismatch errors

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-4785.
---
Resolution: Not A Problem

> [CI] Make Travis CI resilient against hash sum mismatch errors
> --
>
> Key: ARROW-4785
> URL: https://issues.apache.org/jira/browse/ARROW-4785
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Hatem Helal
>Priority: Minor
>  Labels: ci-failure
>
> Travis Jobs sometime fail with a GPG error:
> {code:java}
> W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: 
> Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest 
> algorithm (SHA1)
> W: An error occurred during the signature verification. The repository is not 
> updated and the previous index files will be used. GPG error: 
> https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease: The following 
> signatures couldn't be verified because the public key is not available: 
> NO_PUBKEY 6B05F25D762E3157
> W: Failed to fetch 
> https://packagecloud.io/github/git-lfs/ubuntu/dists/trusty/InRelease  The 
> following signatures couldn't be verified because the public key is not 
> available: NO_PUBKEY 6B05F25D762E3157
> E: Failed to fetch 
> http://security.ubuntu.com/ubuntu/dists/trusty-security/main/binary-i386/Packages.gz
>   Hash Sum mismatch
> W: Some index files failed to download. They have been ignored, or old ones 
> used instead.
> The command "if [ $TRAVIS_OS_NAME == "linux" ]; then
> sudo bash -c "echo -e 'Acquire::Retries 10; Acquire::http::Timeout 
> \"20\";' > /etc/apt/apt.conf.d/99-travis-retry"
> sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
> sudo apt-get update -qq
>   fi
>   " failed and exited with 100 during .
> Your build has been stopped.{code}
> It would be nice if the number of retries, timeout, or both could be 
> increased to make the travis jobs more resilient to this seemingly sporadic 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4949) [CI] Add C# docker image to the docker-compose setup

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4949:
--
Component/s: C#

> [CI] Add C# docker image to the docker-compose setup
> 
>
> Key: ARROW-4949
> URL: https://issues.apache.org/jira/browse/ARROW-4949
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C#, Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 2.0.0
>
>
> https://github.com/apache/arrow/blob/master/csharp/build/docker/Dockerfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4949) [CI] Add C# docker image to the docker-compose setup

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4949:
--
Fix Version/s: 2.0.0

> [CI] Add C# docker image to the docker-compose setup
> 
>
> Key: ARROW-4949
> URL: https://issues.apache.org/jira/browse/ARROW-4949
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Continuous Integration
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 2.0.0
>
>
> https://github.com/apache/arrow/blob/master/csharp/build/docker/Dockerfile



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-4720) [CI] Mark MinGW build failures allowed

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-4720.
---
Resolution: Won't Fix

minGW builds seem to have been stable for a while, closing.

> [CI] Mark MinGW build failures allowed
> --
>
> Key: ARROW-4720
> URL: https://issues.apache.org/jira/browse/ARROW-4720
> Project: Apache Arrow
>  Issue Type: Wish
>  Components: C++, Continuous Integration
>Reporter: Antoine Pitrou
>Priority: Major
>  Labels: ci-failure
>
> Currently we are requiring MinGW tests to pass on AppVeyor. Almost nobody 
> will use MinGW builds if regular MSVC builds work fine. So it should be on 
> the onus of the few people caring about MinGW to ensure that the build chain 
> works on those platforms.
> Example here, apparently the uriparser library doesn't build on MinGW:
> https://ci.appveyor.com/project/pitrou/arrow/build/job/t64xwyj2axhl1jgr
> There is a tendency to inflate the number of different configurations in our 
> CI matrices. Not only it makes builds longer and adds delays (see how long 
> you have to wait before you get a CI result on AppVeyor), but it's of dubious 
> utility. I'm not sure it serves the project's general interest.
> Rant off ;)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6513) [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt extension

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6513:
--
Priority: Trivial  (was: Minor)

> [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt 
> extension
> 
>
> Key: ARROW-6513
> URL: https://issues.apache.org/jira/browse/ARROW-6513
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: CI
>Reporter: Krisztian Szucs
>Priority: Trivial
>
> The files `arrow/ci/conda_env_*.yml` files are not yaml files, we should 
> rename them to use txt extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-6513) [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt extension

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-6513:
--
Fix Version/s: 2.0.0

> [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt 
> extension
> 
>
> Key: ARROW-6513
> URL: https://issues.apache.org/jira/browse/ARROW-6513
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: CI
>Reporter: Krisztian Szucs
>Priority: Trivial
> Fix For: 2.0.0
>
>
> The files `arrow/ci/conda_env_*.yml` files are not yaml files, we should 
> rename them to use txt extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-1851) [C] Minimalist ANSI C / C99 implementation of Arrow data structures and IPC

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932349#comment-16932349
 ] 

Antoine Pitrou commented on ARROW-1851:
---

I don't think this is likely to happen any soon. Should we close as Won't Fix?

> [C] Minimalist ANSI C / C99 implementation of Arrow data structures and IPC
> ---
>
> Key: ARROW-1851
> URL: https://issues.apache.org/jira/browse/ARROW-1851
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C
>Reporter: Wes McKinney
>Priority: Major
> Attachments: text.html
>
>
> This is an umbrella tracking JIRA for creating a small self-contained C 
> implementation of Arrow. This purpose of this library would be compactness 
> and portability, for embedded settings or for FFI in languages that have a 
> harder time binding to C++. The C library could also grow wrapper support for 
> the C++ library to expose more complicated functionality where we don't 
> necessarily want multiple implementations



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-6513) [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt extension

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932348#comment-16932348
 ] 

Antoine Pitrou commented on ARROW-6513:
---

[~kszucs] do you want to tackle this?

> [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt 
> extension
> 
>
> Key: ARROW-6513
> URL: https://issues.apache.org/jira/browse/ARROW-6513
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: CI
>Reporter: Krisztian Szucs
>Priority: Minor
>
> The files `arrow/ci/conda_env_*.yml` files are not yaml files, we should 
> rename them to use txt extension.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-3266) [C] Minimalistic C library implementing Arrow data structures, IPC read/write

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-3266.
---
Resolution: Duplicate

> [C] Minimalistic C library implementing Arrow data structures, IPC read/write
> -
>
> Key: ARROW-3266
> URL: https://issues.apache.org/jira/browse/ARROW-3266
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: C
>Reporter: Wes McKinney
>Priority: Major
>
> I am interested in a small C89/C99 library for interacting with Arrow data 
> structures in standard C with minimal dependencies. Using 
> https://github.com/dvidelabs/flatcc it should be possible to deal with IPC 
> metadata in C as well without involving a C++ compiler



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4412) [DOCUMENTATION] Add explicit version numbers to the arrow specification documents.

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4412:
--
Fix Version/s: 1.0.0

> [DOCUMENTATION] Add explicit version numbers to the arrow specification 
> documents.
> --
>
> Key: ARROW-4412
> URL: https://issues.apache.org/jira/browse/ARROW-4412
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Micah Kornfield
>Priority: Minor
> Fix For: 1.0.0
>
>
> Based on conversation on the mailing list it might pay to include 
> version/revision numbers on the specification document.  One way is to 
> include the "release" version, another might be to only update versioning on 
> changes to the document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4412) [DOCUMENTATION] Add explicit version numbers to the arrow specification documents.

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932352#comment-16932352
 ] 

Antoine Pitrou commented on ARROW-4412:
---

cc [~npr]

 

> [DOCUMENTATION] Add explicit version numbers to the arrow specification 
> documents.
> --
>
> Key: ARROW-4412
> URL: https://issues.apache.org/jira/browse/ARROW-4412
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Micah Kornfield
>Priority: Minor
> Fix For: 1.0.0
>
>
> Based on conversation on the mailing list it might pay to include 
> version/revision numbers on the specification document.  One way is to 
> include the "release" version, another might be to only update versioning on 
> changes to the document.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5673) [Crossbow] Support GitLab runners

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5673:
--
Fix Version/s: 2.0.0

> [Crossbow] Support GitLab runners
> -
>
> Key: ARROW-5673
> URL: https://issues.apache.org/jira/browse/ARROW-5673
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Packaging
>Reporter: Krisztian Szucs
>Priority: Major
> Fix For: 2.0.0
>
>
> Description is by [~kou]:
> I want to use GitLab Runner instead of CircleCI.
> Because we can add custom GitLab Runners for us. For example, we can add GPU 
> enabled GitLab Runner to test CUDA enabled Apache Arrow build. We can also 
> increase timeout more than 5h for our GitLab Runners.
> We can use https://gitlab.com/ to run GitLab Runners: 
> https://about.gitlab.com/solutions/github/
> This feature isn't included in the Free tier on GitLab.com (it's available 
> with the Free tier for campaing for now (*1)) but GitLab.com provides Gold 
> tier features to open source projects (*2). So we can use this feature by 
> choosing "CI/CD for external repo" in "New project page" 
> https://gitlab.com/projects/new .
> (*1)
> So, for the next year we are making the GitLab CI/CD for GitHub feature a 
> part of our GitLab.com Free tier.
> (*2)
> As part of our commitment to open source, we offer all public projects 
> our highest tier features (Gold) for free.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5858) [Doc] Better document the Tensor classes in the prose documentation

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5858:
--
Fix Version/s: 1.0.0

> [Doc] Better document the Tensor classes in the prose documentation
> ---
>
> Key: ARROW-5858
> URL: https://issues.apache.org/jira/browse/ARROW-5858
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation, Python
>Reporter: Joris Van den Bossche
>Priority: Major
> Fix For: 1.0.0
>
>
> From a comment from [~wesmckinn] in ARROW-2714:
> {quote}The Tensor classes are independent from the columnar data structures, 
> though they reuse pieces of metadata, metadata serialization, memory 
> management, and IPC.
> The purpose of adding these to the library was to have in-memory data 
> structures for handling Tensor/ndarray data and metadata that "plug in" to 
> the rest of the Arrow C++ system (Plasma store, IO subsystem, memory pools, 
> buffers, etc.).
> Theoretically you could return a Tensor when creating a non-contiguous slice 
> of an Array; in light of the above, I don't think that would be intuitive.
> When we started the project, our focus was creating an open standard for 
> in-memory columnar data, a hitherto unsolved problem. The project's scope has 
> expanded into peripheral problems in the same domain in the meantime (with 
> the mantra of creating interoperable components, a use-what-you-need 
> development platform for system developers). I think this aspect of the 
> project could be better documented / advertised, since the project's initial 
> focus on the columnar standard has given some the mistaken impression that we 
> are not interested in any work outside of that.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-4999) [Doc][C++] Add examples on how to construct with ArrayData::Make instead of builder classes

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-4999:
--
Fix Version/s: 1.0.0

> [Doc][C++] Add examples on how to construct with ArrayData::Make instead of 
> builder classes
> ---
>
> Key: ARROW-4999
> URL: https://issues.apache.org/jira/browse/ARROW-4999
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: C++, Documentation
>Reporter: Francois Saint-Jacques
>Priority: Minor
> Fix For: 1.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (ARROW-4405) [Docs] Docker documentation builds fail since the source directory is mounted as readonly

2019-09-18 Thread Antoine Pitrou (Jira)


[ 
https://issues.apache.org/jira/browse/ARROW-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932354#comment-16932354
 ] 

Antoine Pitrou commented on ARROW-4405:
---

Is this still an issue?

> [Docs] Docker documentation builds fail since the source directory is mounted 
> as readonly
> -
>
> Key: ARROW-4405
> URL: https://issues.apache.org/jira/browse/ARROW-4405
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Krisztian Szucs
>Priority: Major
>  Labels: docker
>
> {code:java}
> writing list of installed files to '../../build/python/record.txt'
> /
> + pushd /arrow/cpp/apidoc
> /arrow/cpp/apidoc /
> + doxygen
> error: Failed to open temporary file /arrow/cpp/apidoc/doxygen_objdb_4898.tmp
> The command "docker-compose run docs" exited with 1.{code}
> https://travis-ci.org/kszucs/crossbow/builds/485348071



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (ARROW-2793) [Documentation] Add contributor guide to Sphinx documentation

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou resolved ARROW-2793.
---
Resolution: Not A Problem

Agree with Neal, the desired info already seems there. Please reopen if we 
misunderstood :)

> [Documentation] Add contributor guide to Sphinx documentation
> -
>
> Key: ARROW-2793
> URL: https://issues.apache.org/jira/browse/ARROW-2793
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Wiki
>Reporter: Wes McKinney
>Priority: Major
>
> We should document the desired contributor workflow (e.g. git branches, etc.) 
> someplace. We should put this in the Sphinx project



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5405) [Documentation] Move integration testing documentation to Sphinx docs, add instructions for JavaScript

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5405:
--
Fix Version/s: 1.0.0

> [Documentation] Move integration testing documentation to Sphinx docs, add 
> instructions for JavaScript
> --
>
> Key: ARROW-5405
> URL: https://issues.apache.org/jira/browse/ARROW-5405
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> I noticed that JavaScript information is not in integration/README.md. It 
> would be a good opportunity to migrate over this to the 
> docs/source/developers directory



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5543) [Documentation] Migrate FAQ page to Sphinx / rst around release time

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5543:
--
Fix Version/s: 1.0.0

> [Documentation] Migrate FAQ page to Sphinx / rst around release time
> 
>
> Key: ARROW-5543
> URL: https://issues.apache.org/jira/browse/ARROW-5543
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
>Priority: Major
> Fix For: 1.0.0
>
>
> In ARROW-973, a Markdown page with the FAQ was added. When we are close to 
> publishing a new version of the Sphinx site, it would make sense to move the 
> FAQ to the main docs project and link from the project from page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (ARROW-5543) [Documentation] Migrate FAQ page to Sphinx / rst around release time

2019-09-18 Thread Antoine Pitrou (Jira)


 [ 
https://issues.apache.org/jira/browse/ARROW-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Antoine Pitrou updated ARROW-5543:
--
Component/s: Documentation

> [Documentation] Migrate FAQ page to Sphinx / rst around release time
> 
>
> Key: ARROW-5543
> URL: https://issues.apache.org/jira/browse/ARROW-5543
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
>Priority: Major
>
> In ARROW-973, a Markdown page with the FAQ was added. When we are close to 
> publishing a new version of the Sphinx site, it would make sense to move the 
> FAQ to the main docs project and link from the project from page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   >