[jira] [Commented] (ARROW-6566) Implement VarChar in Scala
[ https://issues.apache.org/jira/browse/ARROW-6566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932144#comment-16932144 ] Micah Kornfield commented on ARROW-6566: It could be useful if you clarify how this is failing. One thing that springs to mind is you potentially want to use [setSafe|[https://arrow.apache.org/docs/java/org/apache/arrow/vector/BaseVariableWidthVector.html#setSafe-int-byte:A-]] instead of set > Implement VarChar in Scala > -- > > Key: ARROW-6566 > URL: https://issues.apache.org/jira/browse/ARROW-6566 > Project: Apache Arrow > Issue Type: Test > Components: Java >Affects Versions: 0.14.1 >Reporter: Boris V.Kuznetsov >Priority: Major > > Hello > I'm trying to write and read a zio.Chunk of strings, with is essentially an > array of strings. > My implementation fails the test, how should I fix that ? > [Writer|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/main/scala/zio/serdes/arrow/ArrowUtils.scala#L48] > code > [Reader|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/main/scala/zio/serdes/arrow/ArrowUtils.scala#L108] > code > [Test|https://github.com/Neurodyne/zio-serdes/blob/9e2128ff64ffa0e7c64167a5ee46584c3fcab9e4/src/test/scala/arrow/Base.scala#L115] > code > Any help, links and advice are highly appreciated > Thank you! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6578) [Python] Casting int64 to string columns
[ https://issues.apache.org/jira/browse/ARROW-6578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932161#comment-16932161 ] Igor Yastrebov commented on ARROW-6578: --- [~pitrou] When I benchmarked it on 16 csv files ~200 MB in size, using read+cast(safe=False) was >10% faster than read with ConvertOptions. This doesn't account for string->int64->string conversions since they aren't implemented :) > [Python] Casting int64 to string columns > > > Key: ARROW-6578 > URL: https://issues.apache.org/jira/browse/ARROW-6578 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Affects Versions: 0.14.1 >Reporter: Igor Yastrebov >Priority: Major > > I wanted to cast a list of a tables to the same schema so I could use > concat_tables later. However, I encountered ArrowNotImplementedError: > {code:java} > --- > ArrowNotImplementedError Traceback (most recent call last) > in > > 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb] > in (.0) > > 1 list_tb = [i.cast(mts_schema, safe = True) for i in list_tb] > ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi > in itercolumns() > ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\table.pxi > in pyarrow.lib.Column.cast() > ~\AppData\Local\Continuum\miniconda3\envs\cyclone\lib\site-packages\pyarrow\error.pxi > in pyarrow.lib.check_status() > ArrowNotImplementedError: No cast implemented from int64 to string > {code} > Some context: I want to read and concatenate a bunch of csv files that come > from partitioning of the same table. Using cast after reading csv is usually > significantly faster than specifying column_types in ConvertOptions. There > are string columns that are mostly populated with integer-like values so a > particular file can have an integer-only column. This situation is rather > common so having an option to cast int64 column to string column would be > helpful. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6003) [C++] Better input validation and error messaging in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932188#comment-16932188 ] Antoine Pitrou commented on ARROW-6003: --- Can you post the first lines of the CSV file? > [C++] Better input validation and error messaging in CSV reader > --- > > Key: ARROW-6003 > URL: https://issues.apache.org/jira/browse/ARROW-6003 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: csv > > Followup to https://issues.apache.org/jira/browse/ARROW-5747. The error > message(s) are not great when you give bad input. For example, if I give too > many or too few {{column_names}}, the error I get is {{Invalid: Empty CSV > file}}. In fact, that's about the only error message I've seen from the CSV > reader, no matter what I've thrown at it. > It would be better if error messages were more specific so that I as a user > might know how to fix my bad input. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6003) [C++] Better input validation and error messaging in CSV reader
[ https://issues.apache.org/jira/browse/ARROW-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932190#comment-16932190 ] Antoine Pitrou commented on ARROW-6003: --- Here is an example in Python: {code:python} >>> s = b"""a,b,c\n1,2,3\n""" >>> >>> >>> csv.read_csv(io.BytesIO(s)) >>> >>> pyarrow.Table a: int64 b: int64 c: int64 >>> options = csv.ReadOptions(column_names=['a', 'b']) >>> >>> >>> csv.read_csv(io.BytesIO(s), read_options=options) >>> >>> Traceback (most recent call last): File "", line 1, in csv.read_csv(io.BytesIO(s), read_options=options) File "pyarrow/_csv.pyx", line 541, in pyarrow._csv.read_csv check_status(reader.get().Read(&table)) File "pyarrow/error.pxi", line 78, in pyarrow.lib.check_status raise ArrowInvalid(message) ArrowInvalid: CSV parse error: Expected 2 columns, got 3 {code} > [C++] Better input validation and error messaging in CSV reader > --- > > Key: ARROW-6003 > URL: https://issues.apache.org/jira/browse/ARROW-6003 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Neal Richardson >Assignee: Neal Richardson >Priority: Major > Labels: csv > > Followup to https://issues.apache.org/jira/browse/ARROW-5747. The error > message(s) are not great when you give bad input. For example, if I give too > many or too few {{column_names}}, the error I get is {{Invalid: Empty CSV > file}}. In fact, that's about the only error message I've seen from the CSV > reader, no matter what I've thrown at it. > It would be better if error messages were more specific so that I as a user > might know how to fix my bad input. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6445) [CI][Crossbow] Nightly Gandiva jar trusty job fails
[ https://issues.apache.org/jira/browse/ARROW-6445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932198#comment-16932198 ] Prudhvi Porandla commented on ARROW-6445: - Resolved Build is not failing https://travis-ci.org/ursa-labs/crossbow/builds/585944097 > [CI][Crossbow] Nightly Gandiva jar trusty job fails > --- > > Key: ARROW-6445 > URL: https://issues.apache.org/jira/browse/ARROW-6445 > Project: Apache Arrow > Issue Type: Bug > Components: Continuous Integration, Packaging >Reporter: Neal Richardson >Assignee: Praveen Kumar Desabandu >Priority: Blocker > Fix For: 0.15.0 > > > https://travis-ci.org/ursa-labs/crossbow/builds/580192384. Error looks like > something to do with doubleconversion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4018) [C++] RLE decoder may not big-endian compatible
[ https://issues.apache.org/jira/browse/ARROW-4018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932199#comment-16932199 ] Antoine Pitrou commented on ARROW-4018: --- > Are we actually clean in terms of endianness in other places? Presumably no, because we're reinterpreting array bytes as larger types such as int64_t etc. And we're also serializing those bytes directly to disk or wire. > it sounds strange to be slicing a long like coverity describes have you > looked to see if this is intended? I think it's just a dirty implementation shortcut. Instead of doing: {code:cpp} bool result = bit_reader_.GetAligned(static_cast(BitUtil::CeilDiv(bit_width_, 8)), reinterpret_cast(¤t_value_)); {code} The code could presumably be written as: {code:cpp} T value; bool result = bit_reader_.GetAligned(static_cast(BitUtil::CeilDiv(bit_width_, 8)), &value); current_value_ = static_cast(value); {code} > [C++] RLE decoder may not big-endian compatible > --- > > Key: ARROW-4018 > URL: https://issues.apache.org/jira/browse/ARROW-4018 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.11.1 >Reporter: Antoine Pitrou >Priority: Major > Fix For: 1.0.0 > > > This issue was found by Coverity. The {{RleDecoder::NextCounts}} method has > the following code to fetch the repeated literal in repeated runs: > {code:c++} > bool result = > > bit_reader_.GetAligned(static_cast(BitUtil::CeilDiv(bit_width_, 8)), > reinterpret_cast(¤t_value_)); > {code} > Coverity says this: > bq. Pointer "&this->current_value_" points to an object whose effective type > is "unsigned long long" (64 bits, unsigned) but is dereferenced as a narrower > "unsigned int" (32 bits, unsigned). This may lead to unexpected results > depending on machine endianness. > bq. > In addition, it's not obvious whether {{current_value_}} also needs > byte-swapping (presumably, at least in the Parquet file format, it's supposed > to be stored in little-endian format in the RLE bitstream). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
Antoine Pitrou created ARROW-6597: - Summary: [Python] Segfault in test_pandas with Python 2.7 Key: ARROW-6597 URL: https://issues.apache.org/jira/browse/ARROW-6597 Project: Apache Arrow Issue Type: Bug Components: Python Reporter: Antoine Pitrou Assignee: Antoine Pitrou I get a segfault in test_pandas with Python 2.7. gdb stack trace (excerpt): {code} Thread 27 "python" received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fffb7fff700 (LWP 17725)] 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at ../src/arrow/python/datetime.cc:229 229 *out = PyDate_FromDate(static_cast(year), static_cast(month), (gdb) bt #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at ../src/arrow/python/datetime.cc:229 #1 0x7fffcabaed34 in arrow::Status arrow::py::ConvertDates(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 #2 0x7fffcabaeb8c in arrow::Status arrow::py::ConvertAsPyObjects(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, arrow::Status arrow::py::ConvertDates(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, _object**)::{lambda(int const&, _object**)#1}::operator()(int const, _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, out_values=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:417 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
[ https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6597: -- Fix Version/s: 0.15.0 > [Python] Segfault in test_pandas with Python 2.7 > > > Key: ARROW-6597 > URL: https://issues.apache.org/jira/browse/ARROW-6597 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Fix For: 0.15.0 > > > I get a segfault in test_pandas with Python 2.7. > gdb stack trace (excerpt): > {code} > Thread 27 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffb7fff700 (LWP 17725)] > 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > 229 *out = PyDate_FromDate(static_cast(year), > static_cast(month), > (gdb) bt > #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > #1 0x7fffcabaed34 in arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, > value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 > #2 0x7fffcabaeb8c in arrow::Status > arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, > arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, > _object**)::{lambda(int const&, _object**)#1}::operator()(int const, > _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, > out_values=0x55e1b9b0) > at ../src/arrow/python/arrow_to_pandas.cc:417 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
[ https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated ARROW-6597: -- Labels: pull-request-available (was: ) > [Python] Segfault in test_pandas with Python 2.7 > > > Key: ARROW-6597 > URL: https://issues.apache.org/jira/browse/ARROW-6597 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > > I get a segfault in test_pandas with Python 2.7. > gdb stack trace (excerpt): > {code} > Thread 27 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffb7fff700 (LWP 17725)] > 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > 229 *out = PyDate_FromDate(static_cast(year), > static_cast(month), > (gdb) bt > #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > #1 0x7fffcabaed34 in arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, > value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 > #2 0x7fffcabaeb8c in arrow::Status > arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, > arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, > _object**)::{lambda(int const&, _object**)#1}::operator()(int const, > _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, > out_values=0x55e1b9b0) > at ../src/arrow/python/arrow_to_pandas.cc:417 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
[ https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932293#comment-16932293 ] Krisztian Szucs edited comment on ARROW-6597 at 9/18/19 10:47 AM: -- [~pitrou] why wasn't it catched by the CI? was (Author: kszucs): [~pitrou] why wasn't it cached by the CI? > [Python] Segfault in test_pandas with Python 2.7 > > > Key: ARROW-6597 > URL: https://issues.apache.org/jira/browse/ARROW-6597 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I get a segfault in test_pandas with Python 2.7. > gdb stack trace (excerpt): > {code} > Thread 27 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffb7fff700 (LWP 17725)] > 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > 229 *out = PyDate_FromDate(static_cast(year), > static_cast(month), > (gdb) bt > #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > #1 0x7fffcabaed34 in arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, > value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 > #2 0x7fffcabaeb8c in arrow::Status > arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, > arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, > _object**)::{lambda(int const&, _object**)#1}::operator()(int const, > _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, > out_values=0x55e1b9b0) > at ../src/arrow/python/arrow_to_pandas.cc:417 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
[ https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932293#comment-16932293 ] Krisztian Szucs commented on ARROW-6597: [~pitrou] why wasn't it cached by the CI? > [Python] Segfault in test_pandas with Python 2.7 > > > Key: ARROW-6597 > URL: https://issues.apache.org/jira/browse/ARROW-6597 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I get a segfault in test_pandas with Python 2.7. > gdb stack trace (excerpt): > {code} > Thread 27 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffb7fff700 (LWP 17725)] > 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > 229 *out = PyDate_FromDate(static_cast(year), > static_cast(month), > (gdb) bt > #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > #1 0x7fffcabaed34 in arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, > value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 > #2 0x7fffcabaeb8c in arrow::Status > arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, > arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, > _object**)::{lambda(int const&, _object**)#1}::operator()(int const, > _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, > out_values=0x55e1b9b0) > at ../src/arrow/python/arrow_to_pandas.cc:417 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
[ https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932294#comment-16932294 ] Antoine Pitrou commented on ARROW-6597: --- I have no idea :-/ Perhaps you don't install Pandas on the 2.7 builder? > [Python] Segfault in test_pandas with Python 2.7 > > > Key: ARROW-6597 > URL: https://issues.apache.org/jira/browse/ARROW-6597 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 20m > Remaining Estimate: 0h > > I get a segfault in test_pandas with Python 2.7. > gdb stack trace (excerpt): > {code} > Thread 27 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffb7fff700 (LWP 17725)] > 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > 229 *out = PyDate_FromDate(static_cast(year), > static_cast(month), > (gdb) bt > #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > #1 0x7fffcabaed34 in arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, > value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 > #2 0x7fffcabaeb8c in arrow::Status > arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, > arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, > _object**)::{lambda(int const&, _object**)#1}::operator()(int const, > _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, > out_values=0x55e1b9b0) > at ../src/arrow/python/arrow_to_pandas.cc:417 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1013) [C++] Add asynchronous RecordBatchStreamWriter
[ https://issues.apache.org/jira/browse/ARROW-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932297#comment-16932297 ] Antoine Pitrou commented on ARROW-1013: --- Ah, right. I suppose so. This should be designed depending on the intended use cases. > [C++] Add asynchronous RecordBatchStreamWriter > -- > > Key: ARROW-1013 > URL: https://issues.apache.org/jira/browse/ARROW-1013 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > > We may want to provide an option to limit the queuing depth. The async writer > can be initialized from a synchronous writer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-1013) [C++] Add asynchronous RecordBatchStreamWriter
[ https://issues.apache.org/jira/browse/ARROW-1013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-1013: -- Fix Version/s: 2.0.0 > [C++] Add asynchronous RecordBatchStreamWriter > -- > > Key: ARROW-1013 > URL: https://issues.apache.org/jira/browse/ARROW-1013 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 2.0.0 > > > We may want to provide an option to limit the queuing depth. The async writer > can be initialized from a synchronous writer -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-2229) [C++] Write CSV files from RecordBatch, Table
[ https://issues.apache.org/jira/browse/ARROW-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2229: -- Fix Version/s: 2.0.0 > [C++] Write CSV files from RecordBatch, Table > - > > Key: ARROW-2229 > URL: https://issues.apache.org/jira/browse/ARROW-2229 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Jun >Priority: Major > Fix For: 2.0.0 > > > I did a search through JIRA and didn't find this. Is there a support for CSV > file reading/writing in arrow available? > I can go through pandas.read_csv and then convert to arrow table of course, > but I would also like to use a native arrow api that's schema-driven CSV > reading/writing. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-5273) [C++] Valgrind failures in JSON tests
[ https://issues.apache.org/jira/browse/ARROW-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5273. --- Resolution: Cannot Reproduce I can't reproduce anymore, closing. > [C++] Valgrind failures in JSON tests > - > > Key: ARROW-5273 > URL: https://issues.apache.org/jira/browse/ARROW-5273 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Antoine Pitrou >Priority: Major > > I get the following failures with Valgrind: > {code} > ==12630== Memcheck, a memory error detector > ==12630== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al. > ==12630== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info > ==12630== Command: > /home/antoine/arrow/dev/cpp/build-test/debug//arrow-json-chunker-test > ==12630== > Running main() from > /home/conda/feedstock_root/build_artifacts/gtest_1551008230529/work/googletest/src/gtest_main.cc > [==] Running 12 tests from 3 test cases. > [--] Global test environment set-up. > [--] 4 tests from ChunkerTest > [ RUN ] ChunkerTest.PrettyPrinted > ==12630== Conditional jump or move depends on uninitialised value(s) > ==12630==at 0x15757F: > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, > arrow::rapidjson::CrtAllocator>::ScanCopyUnescapedString(arrow::rapidjson::GenericStringStream > >&, arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, > arrow::rapidjson::CrtAllocator>::StackStream&) (reader.h:942) > ==12630==by 0x155FAA: void > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, > arrow::rapidjson::CrtAllocator>::ParseStringToStream<0u, > arrow::rapidjson::UTF8, arrow::rapidjson::UTF8, > arrow::rapidjson::GenericStringStream >, > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, > arrow::rapidjson::CrtAllocator>::StackStream > >(arrow::rapidjson::GenericStringStream >&, > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, > arrow::rapidjson::CrtAllocator>::StackStream&) (reader.h:856) > ==12630==by 0x1537E0: void > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, > arrow::rapidjson::CrtAllocator>::ParseString<0u, > arrow::rapidjson::GenericStringStream >, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator> > >(arrow::rapidjson::GenericStringStream >&, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator>&, bool) (reader.h:827) > ==12630==by 0x152141: void > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, arrow::rapidjson::CrtAllocator>::ParseValue<0u, > arrow::rapidjson::GenericStringStream >, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator> > >(arrow::rapidjson::GenericStringStream >&, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator>&) (reader.h:1397) > ==12630==by 0x153CB4: void > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, > arrow::rapidjson::CrtAllocator>::ParseObject<0u, > arrow::rapidjson::GenericStringStream >, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator> > >(arrow::rapidjson::GenericStringStream >&, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator>&) (reader.h:621) > ==12630==by 0x15215A: void > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, arrow::rapidjson::CrtAllocator>::ParseValue<0u, > arrow::rapidjson::GenericStringStream >, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator> > >(arrow::rapidjson::GenericStringStream >&, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator>&) (reader.h:1398) > ==12630==by 0x1503CC: arrow::rapidjson::ParseResult > arrow::rapidjson::GenericReader, > arrow::rapidjson::UTF8, arrow::rapidjson::CrtAllocator>::Parse<0u, > arrow::rapidjson::GenericStringStream >, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator> > >(arrow::rapidjson::GenericStringStream >&, > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator>&) (reader.h:501) > ==12630==by 0x14E385: > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator>& > arrow::rapidjson::GenericDocument, > arrow::rapidjson::MemoryPoolAllocator, > arrow::rapidjson::CrtAllocator>::ParseStream<0u, > arrow::rapidjson::UTF8, > arrow::rapidjson::GenericStringStre
[jira] [Updated] (ARROW-981) [C++] Write comparable columnar serialization benchmarks versus Protocol Buffers / gRPC
[ https://issues.apache.org/jira/browse/ARROW-981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-981: - Fix Version/s: 2.0.0 > [C++] Write comparable columnar serialization benchmarks versus Protocol > Buffers / gRPC > --- > > Key: ARROW-981 > URL: https://issues.apache.org/jira/browse/ARROW-981 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Wes McKinney >Priority: Major > Fix For: 2.0.0 > > > This will help with demonstrating quantifiable gains in data serialization > beyond the benefits of columnar layout (which can also be implemented in > traditional serialization tools like Protobuf, Thrift, etc.) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5121) [C++] arrow::internal::make_unique conflicts std::make_unique on MSVC
[ https://issues.apache.org/jira/browse/ARROW-5121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5121: -- Fix Version/s: 1.0.0 > [C++] arrow::internal::make_unique conflicts std::make_unique on MSVC > - > > Key: ARROW-5121 > URL: https://issues.apache.org/jira/browse/ARROW-5121 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > Fix For: 1.0.0 > > > MSVC appears to implement c++20 ADL, which includes function templates with > explicit template arguments (previously these were not looked up through ADL): > https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow/builds/23604480/job/psvu16jasktacvy2#L2097 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5327) [C++] allow construction of ArrayBuilders from existing arrays
[ https://issues.apache.org/jira/browse/ARROW-5327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5327: -- Fix Version/s: 2.0.0 > [C++] allow construction of ArrayBuilders from existing arrays > -- > > Key: ARROW-5327 > URL: https://issues.apache.org/jira/browse/ARROW-5327 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Major > Fix For: 2.0.0 > > > After calling Finish it may become necessary to append further elements to an > array, which we don't currently support. One way to support this would be > consuming the array to produce a builder with the array's elements > pre-inserted. > {code} > std::shared_ptr array = get_array(); > std::unique_ptr builder; > RETURN_NOT_OK(MakeBuilder(std::move(*array), &builder)); > {code} > This will be efficient if we cannibalize the array's buffers and child data > when constructing the builder, which will require that the consumed array is > uniquely owned. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-5618) [C++] [Parquet] Using deprecated Int96 storage for timestamps triggers integer overflow in some cases
[ https://issues.apache.org/jira/browse/ARROW-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5618. --- Resolution: Duplicate > [C++] [Parquet] Using deprecated Int96 storage for timestamps triggers > integer overflow in some cases > - > > Key: ARROW-5618 > URL: https://issues.apache.org/jira/browse/ARROW-5618 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: TP Boudreau >Assignee: TP Boudreau >Priority: Minor > Labels: parquet, pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > When storing Arrow timestamps in Parquet files using the Int96 storage > format, certain combinations of array lengths and validity bitmasks cause an > integer overflow error on read. It's not immediately clear whether the > Arrow/Parquet writer is storing zeroes when it should be storing positive > values or the reader is attempting to calculate a nanoseconds value > inappropriately from zeroed inputs (perhaps missing the null bit flag). Also > not immediately clear why only certain length columns seem to be affected. > Probably the quickest way to reproduce this undefined behavior is to alter > the existing unit test UseDeprecatedInt96 (in file > .../arrow/cpp/src/parquet/arrow/arrow-reader-writer-test.cc) by quadrupling > its column lengths (repeating the same values), followed by 'make unittest' > using clang-7 with sanitizers enabled. (Here's a patch applicable to current > master that changes the test as described: [1]; I used the following cmake > command to build my environment: [2].) You should get a log something like > [3]. If requested, I'll see if I can put together a stand-alone minimal test > case that induces the behavior. > The quick-hack at [4] will prevent integer overflows, but this is only > included to confirm the proximate cause of the bug: the Julian days field of > the Int96 appears to be zero, when a strictly positive number is expected. > I've assigned the issue to myself and I'll start looking into the root cause > of this. > [1] https://gist.github.com/tpboudreau/b6610c13cbfede4d6b171da681d1f94e > [2] https://gist.github.com/tpboudreau/59178ca8cb50a935aab7477805aa32b9 > [3] https://gist.github.com/tpboudreau/0c2d0a18960c1aa04c838fa5c2ac7d2d > [4] https://gist.github.com/tpboudreau/0993beb5c8c1488028e76fb2ca179b7f -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5915) [C++] [Python] Set up testing for backwards compatibility of the parquet reader
[ https://issues.apache.org/jira/browse/ARROW-5915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5915: -- Fix Version/s: 1.0.0 > [C++] [Python] Set up testing for backwards compatibility of the parquet > reader > --- > > Key: ARROW-5915 > URL: https://issues.apache.org/jira/browse/ARROW-5915 > Project: Apache Arrow > Issue Type: Test > Components: C++, Python >Reporter: Joris Van den Bossche >Priority: Major > Labels: parquet > Fix For: 1.0.0 > > > Given the recent parquet compat problems, we should have better testing for > this. > For easy testing of backwards compatibility, we could add some files (with > different types) written with older versions, and ensure they are read > correctly with the current version. > Similarly as what Kartothek is doing: > https://github.com/JDASoftwareGroup/kartothek/tree/master/reference-data/arrow-compat > An easy way would be to do that in pyarrow and add them to > /pyarrow/tests/data/parquet (we already have some files from 0.7 there). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-5354) [C++] allow Array to have null buffers when all elements are null
[ https://issues.apache.org/jira/browse/ARROW-5354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-5354. --- Resolution: Won't Fix As discussed in the linked PR, this isn't currently desirable. > [C++] allow Array to have null buffers when all elements are null > - > > Key: ARROW-5354 > URL: https://issues.apache.org/jira/browse/ARROW-5354 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > > In the case of all elements of an array being null, no buffers whatsoever > *need* to be allocated (similar to NullArray). This is a more extreme case of > the optimization which allows the null bitmap buffer to be null if all > elements are valid. Currently {{arrow::Array}} requires at least a null > bitmap buffer to be allocated (and all bits set to 0). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6398) [C++] consolidate ScanOptions and ScanContext
[ https://issues.apache.org/jira/browse/ARROW-6398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6398: -- Fix Version/s: 1.0.0 > [C++] consolidate ScanOptions and ScanContext > - > > Key: ARROW-6398 > URL: https://issues.apache.org/jira/browse/ARROW-6398 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > Labels: dataset, pull-request-available > Fix For: 1.0.0 > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Currently ScanOptions has two distinct responsibilities: it contains the data > selector (and eventually projection schema) for the current scan and it > serves as the base class for format specific scan options. > In addition, we have ScanContext which holds the memory pool for the current > scan. > I think these classes should be rearranged as follows: ScanOptions will be > removed and FileScanOptions will be the abstract base class for format > specific scan options. ScanContext will be a concrete struct and contain the > data selector, projection schema, a vector of FileScanOptions, and any other > shared scan state. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5575) [C++] arrowConfig.cmake includes uninstalled targets
[ https://issues.apache.org/jira/browse/ARROW-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932310#comment-16932310 ] Antoine Pitrou commented on ARROW-5575: --- [~kou] > [C++] arrowConfig.cmake includes uninstalled targets > > > Key: ARROW-5575 > URL: https://issues.apache.org/jira/browse/ARROW-5575 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0, 0.14.0, 0.14.1 >Reporter: Matthijs Brobbel >Priority: Minor > > I'm building a CMake project against arrow and I'm using: > {code:java} > find_package(arrow 0.13 CONFIG REQUIRED) > {code} > to get the arrow_shared target in scope. This works for me on macOS. I > installed apache-arrow with: > {code:java} > brew install apache-arrow{code} > However, when I attempt to build the project in a ubuntu xenial container, I > get the following CMake error: > {code:java} > CMake Error at /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake:151 > (message): > The imported target "arrow_cuda_shared" references the file > "/usr/lib/x86_64-linux-gnu/libarrow_cuda.so.13.0.0" > but this file does not exist. Possible reasons include: > * The file was deleted, renamed, or moved to another location. > * An install or uninstall procedure did not complete successfully. > * The installation package was faulty and contained > "/usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake" > but not all the files it references. > Call Stack (most recent call first): > /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowConfig.cmake:61 (include) > CMakeLists.txt:15 (find_package) > {code} > I installed arrow with: > {code:java} > curl -sSL "https://dist.apache.org/repos/dist/dev/arrow/KEYS"; | apt-key add - > echo "deb [arch=amd64] https://dl.bintray.com/apache/arrow/ubuntu/ xenial > main" | tee -a /etc/apt/sources.list > apt-get update > apt-get install -y libarrow-dev=0.13.0-1 > {code} > I can also install libarrow-cuda-dev, but I don't want to because I don't > need it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5575) [C++] arrowConfig.cmake includes uninstalled targets
[ https://issues.apache.org/jira/browse/ARROW-5575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5575: -- Fix Version/s: 1.0.0 > [C++] arrowConfig.cmake includes uninstalled targets > > > Key: ARROW-5575 > URL: https://issues.apache.org/jira/browse/ARROW-5575 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0, 0.14.0, 0.14.1 >Reporter: Matthijs Brobbel >Priority: Minor > Fix For: 1.0.0 > > > I'm building a CMake project against arrow and I'm using: > {code:java} > find_package(arrow 0.13 CONFIG REQUIRED) > {code} > to get the arrow_shared target in scope. This works for me on macOS. I > installed apache-arrow with: > {code:java} > brew install apache-arrow{code} > However, when I attempt to build the project in a ubuntu xenial container, I > get the following CMake error: > {code:java} > CMake Error at /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake:151 > (message): > The imported target "arrow_cuda_shared" references the file > "/usr/lib/x86_64-linux-gnu/libarrow_cuda.so.13.0.0" > but this file does not exist. Possible reasons include: > * The file was deleted, renamed, or moved to another location. > * An install or uninstall procedure did not complete successfully. > * The installation package was faulty and contained > "/usr/lib/x86_64-linux-gnu/cmake/arrow/arrowTargets.cmake" > but not all the files it references. > Call Stack (most recent call first): > /usr/lib/x86_64-linux-gnu/cmake/arrow/arrowConfig.cmake:61 (include) > CMakeLists.txt:15 (find_package) > {code} > I installed arrow with: > {code:java} > curl -sSL "https://dist.apache.org/repos/dist/dev/arrow/KEYS"; | apt-key add - > echo "deb [arch=amd64] https://dl.bintray.com/apache/arrow/ubuntu/ xenial > main" | tee -a /etc/apt/sources.list > apt-get update > apt-get install -y libarrow-dev=0.13.0-1 > {code} > I can also install libarrow-cuda-dev, but I don't want to because I don't > need it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6187) [C++] fallback to storage type when writing ExtensionType to Parquet
[ https://issues.apache.org/jira/browse/ARROW-6187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6187: -- Fix Version/s: 1.0.0 > [C++] fallback to storage type when writing ExtensionType to Parquet > > > Key: ARROW-6187 > URL: https://issues.apache.org/jira/browse/ARROW-6187 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Joris Van den Bossche >Priority: Major > Labels: parquet > Fix For: 1.0.0 > > > Writing a table that contains an ExtensionType array to a parquet file is not > yet implemented. It currently raises "ArrowNotImplementedError: Unhandled > type for Arrow to Parquet schema conversion: > extension" (for a PyExtensionType in this case). > I think minimal support can consist of writing the storage type / array. > We also might want to save the extension name and metadata in the parquet > FileMetadata. > Later on, this could be potentially be used to restore the extension type > when reading. This is related to other issues that need to save the arrow > schema (categorical: ARROW-5480, time zones: ARROW-5888). Only in this case, > we probably want to store the serialised type in addition to the schema > (which only has the extension type's name). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6339) [Python][C++] Rowgroup statistics for pd.NaT array ill defined
[ https://issues.apache.org/jira/browse/ARROW-6339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Krisztian Szucs resolved ARROW-6339. Resolution: Fixed Issue resolved by pull request 5403 [https://github.com/apache/arrow/pull/5403] > [Python][C++] Rowgroup statistics for pd.NaT array ill defined > -- > > Key: ARROW-6339 > URL: https://issues.apache.org/jira/browse/ARROW-6339 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Affects Versions: 0.14.1 >Reporter: Florian Jetter >Assignee: Uwe L. Korn >Priority: Minor > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When initialising an array with NaT only values the row group statistic is > corrupt returning either random values or raises integer out of bound > exceptions. > {code:python} > import io > import pandas as pd > import pyarrow as pa > import pyarrow.parquet as pq > df = pd.DataFrame({"t": pd.Series([pd.NaT], dtype="datetime64[ns]")}) > buf = pa.BufferOutputStream() > pq.write_table(pa.Table.from_pandas(df), buf, version="2.0") > buf = io.BytesIO(buf.getvalue().to_pybytes()) > parquet_file = pq.ParquetFile(buf) > # Asserting behaviour is difficult since it is random and the state is ill > defined. > # After a few iterations an exception is raised. > while True: > parquet_file.metadata.row_group(0).column(0).statistics.max > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-1318) [C++] hdfs access with auth
[ https://issues.apache.org/jira/browse/ARROW-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-1318: -- Fix Version/s: 2.0.0 > [C++] hdfs access with auth > --- > > Key: ARROW-1318 > URL: https://issues.apache.org/jira/browse/ARROW-1318 > Project: Apache Arrow > Issue Type: Test > Components: C++ >Reporter: Martin Durant >Priority: Major > Fix For: 2.0.0 > > > A wide variety of authentication schemes are available in hadoop. > This issue is to track whether libhdfs can successfully operate with them. > The list includes: > - user/password > - basic kerberos (via kinit and via keytabs) > - kerberos with active directory and single-sign-on > - "privacy" and "integrity" modes > - access with hdfs delegation token > - probably others... -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4864) [C++] gandiva-micro_benchmarks is broken in MSVC build
[ https://issues.apache.org/jira/browse/ARROW-4864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4864: -- Fix Version/s: 1.0.0 > [C++] gandiva-micro_benchmarks is broken in MSVC build > -- > > Key: ARROW-4864 > URL: https://issues.apache.org/jira/browse/ARROW-4864 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Assignee: Pindikura Ravindra >Priority: Major > Fix For: 1.0.0 > > > Not a blocking issue for 0.13. I encountered this when debugging the CMake > refactor branch with Visual Studio 2015 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5570) [C++] Update Avro C++ code to conform to Arrow style guide and get it compiling.
[ https://issues.apache.org/jira/browse/ARROW-5570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5570: -- Fix Version/s: 2.0.0 > [C++] Update Avro C++ code to conform to Arrow style guide and get it > compiling. > > > Key: ARROW-5570 > URL: https://issues.apache.org/jira/browse/ARROW-5570 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-3201) [C++] Utilize zero-copy protobuf parsing from upstream whenever it becomes available
[ https://issues.apache.org/jira/browse/ARROW-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3201: -- Fix Version/s: 2.0.0 > [C++] Utilize zero-copy protobuf parsing from upstream whenever it becomes > available > > > Key: ARROW-3201 > URL: https://issues.apache.org/jira/browse/ARROW-3201 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, FlightRPC >Reporter: Wes McKinney >Priority: Major > Labels: flight > Fix For: 2.0.0 > > > This has been discussed for a couple of years now; perhaps with Abseil this > could happen at some point: > https://github.com/protocolbuffers/protobuf/issues/1896 > Using zero-copy proto parsing (which is standard practice inside Google, but > not available in open source protocol buffers) would obviate the need for the > zero-copy workaround that I'm going to implement for C++ Flight RPCs -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5569) [C++] import avro C++ code to code base.
[ https://issues.apache.org/jira/browse/ARROW-5569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5569: -- Fix Version/s: 2.0.0 > [C++] import avro C++ code to code base. > > > Key: ARROW-5569 > URL: https://issues.apache.org/jira/browse/ARROW-5569 > Project: Apache Arrow > Issue Type: Sub-task > Components: C++ >Reporter: Micah Kornfield >Assignee: Micah Kornfield >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 5h 50m > Remaining Estimate: 0h > > The goal here is to take code as is without compiling it, but flattening it > to conform with Arrow's code base standards. This will give a basis for > future PR. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5933) [C++] [Documentation] add discussion of Union.typeIds to Layout.rst
[ https://issues.apache.org/jira/browse/ARROW-5933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5933: -- Fix Version/s: 1.0.0 > [C++] [Documentation] add discussion of Union.typeIds to Layout.rst > > > Key: ARROW-5933 > URL: https://issues.apache.org/jira/browse/ARROW-5933 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Documentation >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Major > Fix For: 1.0.0 > > > Union.typeIds is poorly documented and the corresponding property in > UnionType is confusingly named type_codes. In particular, Layout.rst doesn't > include an explanation of Union.typeIds and implies that an element of a > union array's type_ids buffer is always the index of a child array. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4966) [C++] orc::TimezoneError Can't open /usr/share/zoneinfo/GMT-00:00
[ https://issues.apache.org/jira/browse/ARROW-4966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4966: -- Fix Version/s: 2.0.0 > [C++] orc::TimezoneError Can't open /usr/share/zoneinfo/GMT-00:00 > - > > Key: ARROW-4966 > URL: https://issues.apache.org/jira/browse/ARROW-4966 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.12.0 >Reporter: Peter Wicks >Priority: Major > Fix For: 2.0.0 > > > When reading some ORC files, pyarrow orc throws the following error on > `read()`: > {code:java} > o = pf.read(){code} > {{terminate called after throwing an instance of 'orc::TimezoneError'}} > {{what(): Can't open /usr/share/zoneinfo/GMT-00:00}} > While it's true this folder does not exist, I don't think it normally does. > Our server has folders for `GMT`, `GMT0`, `GMT-0`, and `GMT+0`. > ORC file was created using HIVE, compressed with Snappy. Other files from the > same table/partition do not throw this error. Files can be read with Hive. > We created a soft link from the existing `GMT` timezone to this one, and it > fixed the issue. Then shortly I got the same error, but for `GMT+00:00`... :D > Soft link fixed this one also. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6479) [C++] inline errors from external projects' build logs
[ https://issues.apache.org/jira/browse/ARROW-6479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6479: -- Fix Version/s: 1.0.0 > [C++] inline errors from external projects' build logs > -- > > Key: ARROW-6479 > URL: https://issues.apache.org/jira/browse/ARROW-6479 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Priority: Minor > Fix For: 1.0.0 > > > Currently when an external project build fails, we get a very uninformative > message: > {code} > [88/543] Performing build step for 'flatbuffers_ep' > FAILED: flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build > flatbuffers_ep-prefix/src/flatbuffers_ep-install/bin/flatc > flatbuffers_ep-prefix/src/flatbuffers_ep-install/lib/libflatbuffers.a > cd /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-build && > /usr/bin/cmake -P > /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake > && /usr/bin/cmake -E touch > /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build > CMake Error at > /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-DEBUG.cmake:16 > (message): > Command failed: 1 >'/usr/bin/cmake' '--build' '.' > See also > > /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log > {code} > It would be far more useful if the error were caught and relevant section (or > even the entirity) of {{ > /build/cpp/flatbuffers_ep-prefix/src/flatbuffers_ep-stamp/flatbuffers_ep-build-*.log}} > were output instead. This is doubly the case on CI where accessing those > logs is non trivial -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker
[ https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932314#comment-16932314 ] Antoine Pitrou commented on ARROW-4917: --- Do we actually care about this? > [C++] orc_ep fails in cpp-alpine docker > --- > > Key: ARROW-4917 > URL: https://issues.apache.org/jira/browse/ARROW-4917 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe L. Korn >Priority: Major > > Failure: > {code:java} > FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o > /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include > -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem > /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem > c++/libs/thirdparty/zlib_ep-install/include -isystem > c++/libs/thirdparty/lz4_ep-install/include -isystem > /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always > -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror > -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF > c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function > 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, > uint64_t, uint64_t, uint64_t)': > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' > was not declared in this scope > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: > suggested alternative: 'rint' > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > rint > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: > 'nameStart' was not declared in this scope > if (nameStart >= nameCount) { > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: > suggested alternative: 'nameCount' > if (nameStart >= nameCount) { > ^ > nameCount > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: > 'nameStart' was not declared in this scope > + nameOffset + nameStart); > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: > suggested alternative: 'nameCount' > + nameOffset + nameStart); > ^ > nameCount{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5423) [C++] implement partial schema class to extend JSON conversion
[ https://issues.apache.org/jira/browse/ARROW-5423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5423: -- Fix Version/s: 2.0.0 > [C++] implement partial schema class to extend JSON conversion > -- > > Key: ARROW-5423 > URL: https://issues.apache.org/jira/browse/ARROW-5423 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Python >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Major > Fix For: 2.0.0 > > > Currently the JSON parser supports only basic conversion rules such as > parsing a number to {{int64}}. In general users will want more capable > conversions like parsing a base64 string into binary or parsing a column of > objects to {{map}} instead of {{struct}}. This > will require extension of {{arrow::json::ParseOptions::explicit_schema}} to > something analagous to a schema but which supports mapping to more than a > simple output type. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5745) [C++] properties of Map(Array|Type) are confusingly named
[ https://issues.apache.org/jira/browse/ARROW-5745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5745: -- Fix Version/s: 1.0.0 > [C++] properties of Map(Array|Type) are confusingly named > - > > Key: ARROW-5745 > URL: https://issues.apache.org/jira/browse/ARROW-5745 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Major > Fix For: 1.0.0 > > > In the context of ListArrays, "values" indicates the elements in a slot of > the ListArray. Since MapArray isa ListArray, "values" indicates the same > thing and the elements are key-item pairs. This naming scheme is not > idiomatic; these *should* be called key-value pairs but that would require > propagating the renaming down to ListArray. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6226) [C++] refactor Diff and PrettyPrint to share code
[ https://issues.apache.org/jira/browse/ARROW-6226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6226: -- Fix Version/s: 1.0.0 > [C++] refactor Diff and PrettyPrint to share code > - > > Key: ARROW-6226 > URL: https://issues.apache.org/jira/browse/ARROW-6226 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Minor > Fix For: 1.0.0 > > > Diff reimplements a lot of PrettyPrint which didn't quite fit the former's > required use case and slightly changes the output format. Extract the shared > code to a header {{pretty_print_internal.h}} which can be used by both and > update the pretty print tests to reflect the new format. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4706) [C++] shared conversion framework for JSON/CSV parsers
[ https://issues.apache.org/jira/browse/ARROW-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4706: -- Fix Version/s: 2.0.0 > [C++] shared conversion framework for JSON/CSV parsers > -- > > Key: ARROW-4706 > URL: https://issues.apache.org/jira/browse/ARROW-4706 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Major > Fix For: 2.0.0 > > > CSV and JSON both convert strings to values in a Array but there is little > code sharing beyond {{arrow::util::StringConverter}}. > It would be advantageous if a single interface could be shared between CSV > and JSON to do the heavy lifting of conversion consistently. This would > simplify addition of new parsers as well as allowing all parsers to > immediately take advantage of a new conversion strategy. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4548) [C++] run-cmake-format.py is not supported on Windows
[ https://issues.apache.org/jira/browse/ARROW-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4548: -- Priority: Minor (was: Major) > [C++] run-cmake-format.py is not supported on Windows > - > > Key: ARROW-4548 > URL: https://issues.apache.org/jira/browse/ARROW-4548 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Wes McKinney >Priority: Minor > > I tried to fix it but no matter what option I pass for {{--line-ending}} to > {{cmake-format}} it converts LF line endings to CRLF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4548) [C++] run-cmake-format.py is not supported on Windows
[ https://issues.apache.org/jira/browse/ARROW-4548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4548: -- Fix Version/s: 1.0.0 > [C++] run-cmake-format.py is not supported on Windows > - > > Key: ARROW-4548 > URL: https://issues.apache.org/jira/browse/ARROW-4548 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Continuous Integration >Reporter: Wes McKinney >Priority: Minor > Fix For: 1.0.0 > > > I tried to fix it but no matter what option I pass for {{--line-ending}} to > {{cmake-format}} it converts LF line endings to CRLF. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6213) [C++] tests fail for AVX512
[ https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6213: -- Fix Version/s: 2.0.0 > [C++] tests fail for AVX512 > --- > > Key: ARROW-6213 > URL: https://issues.apache.org/jira/browse/ARROW-6213 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.14.1 > Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) > avx512 >Reporter: Charles Coulombe >Priority: Minor > Fix For: 2.0.0 > > Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, > arrow-0.14.1-c++-failed-tests.txt > > > When building libraries for avx512 with GCC 7.3.0, two C++ tests fails. > {noformat} > The following tests FAILED: > 28 - arrow-compute-compare-test (Failed) > 30 - arrow-compute-filter-test (Failed) > Errors while running CTest{noformat} > while for avx2 they passes. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker
[ https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932315#comment-16932315 ] Uwe L. Korn commented on ARROW-4917: [~mdeepak] [~owen.omalley] might care. I guess ORC is not testing on Alpine. > [C++] orc_ep fails in cpp-alpine docker > --- > > Key: ARROW-4917 > URL: https://issues.apache.org/jira/browse/ARROW-4917 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe L. Korn >Priority: Major > > Failure: > {code:java} > FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o > /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include > -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem > /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem > c++/libs/thirdparty/zlib_ep-install/include -isystem > c++/libs/thirdparty/lz4_ep-install/include -isystem > /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always > -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror > -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF > c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function > 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, > uint64_t, uint64_t, uint64_t)': > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' > was not declared in this scope > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: > suggested alternative: 'rint' > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > rint > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: > 'nameStart' was not declared in this scope > if (nameStart >= nameCount) { > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: > suggested alternative: 'nameCount' > if (nameStart >= nameCount) { > ^ > nameCount > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: > 'nameStart' was not declared in this scope > + nameOffset + nameStart); > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: > suggested alternative: 'nameCount' > + nameOffset + nameStart); > ^ > nameCount{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6213) [C++] tests fail for AVX512
[ https://issues.apache.org/jira/browse/ARROW-6213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932316#comment-16932316 ] Antoine Pitrou commented on ARROW-6213: --- [~coulombec] would you be willing to investigate this? I don't think any of us has access to a AVX512 machine. [~wesmckinn] > [C++] tests fail for AVX512 > --- > > Key: ARROW-6213 > URL: https://issues.apache.org/jira/browse/ARROW-6213 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.14.1 > Environment: CentOS 7.6.1810, Intel Xeon Processor (Skylake, IBRS) > avx512 >Reporter: Charles Coulombe >Priority: Minor > Fix For: 2.0.0 > > Attachments: arrow-0.14.1-c++-failed-tests-cmake-conf.txt, > arrow-0.14.1-c++-failed-tests.txt > > > When building libraries for avx512 with GCC 7.3.0, two C++ tests fails. > {noformat} > The following tests FAILED: > 28 - arrow-compute-compare-test (Failed) > 30 - arrow-compute-filter-test (Failed) > Errors while running CTest{noformat} > while for avx2 they passes. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker
[ https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4917: -- Fix Version/s: 1.0.0 > [C++] orc_ep fails in cpp-alpine docker > --- > > Key: ARROW-4917 > URL: https://issues.apache.org/jira/browse/ARROW-4917 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe L. Korn >Priority: Minor > Fix For: 1.0.0 > > > Failure: > {code:java} > FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o > /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include > -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem > /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem > c++/libs/thirdparty/zlib_ep-install/include -isystem > c++/libs/thirdparty/lz4_ep-install/include -isystem > /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always > -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror > -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF > c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function > 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, > uint64_t, uint64_t, uint64_t)': > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' > was not declared in this scope > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: > suggested alternative: 'rint' > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > rint > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: > 'nameStart' was not declared in this scope > if (nameStart >= nameCount) { > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: > suggested alternative: 'nameCount' > if (nameStart >= nameCount) { > ^ > nameCount > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: > 'nameStart' was not declared in this scope > + nameOffset + nameStart); > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: > suggested alternative: 'nameCount' > + nameOffset + nameStart); > ^ > nameCount{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker
[ https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4917: -- Priority: Minor (was: Major) > [C++] orc_ep fails in cpp-alpine docker > --- > > Key: ARROW-4917 > URL: https://issues.apache.org/jira/browse/ARROW-4917 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe L. Korn >Priority: Minor > > Failure: > {code:java} > FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o > /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include > -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem > /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem > c++/libs/thirdparty/zlib_ep-install/include -isystem > c++/libs/thirdparty/lz4_ep-install/include -isystem > /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always > -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror > -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF > c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function > 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, > uint64_t, uint64_t, uint64_t)': > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' > was not declared in this scope > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: > suggested alternative: 'rint' > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > rint > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: > 'nameStart' was not declared in this scope > if (nameStart >= nameCount) { > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: > suggested alternative: 'nameCount' > if (nameStart >= nameCount) { > ^ > nameCount > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: > 'nameStart' was not declared in this scope > + nameOffset + nameStart); > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: > suggested alternative: 'nameCount' > + nameOffset + nameStart); > ^ > nameCount{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4917) [C++] orc_ep fails in cpp-alpine docker
[ https://issues.apache.org/jira/browse/ARROW-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4917: -- Fix Version/s: (was: 1.0.0) 2.0.0 > [C++] orc_ep fails in cpp-alpine docker > --- > > Key: ARROW-4917 > URL: https://issues.apache.org/jira/browse/ARROW-4917 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Uwe L. Korn >Priority: Minor > Fix For: 2.0.0 > > > Failure: > {code:java} > FAILED: c++/src/CMakeFiles/orc.dir/Timezone.cc.o > /usr/bin/g++ -Ic++/include -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/include > -I/build/cpp/orc_ep-prefix/src/orc_ep/c++/src -Ic++/src -isystem > /build/cpp/snappy_ep/src/snappy_ep-install/include -isystem > c++/libs/thirdparty/zlib_ep-install/include -isystem > c++/libs/thirdparty/lz4_ep-install/include -isystem > /arrow/cpp/thirdparty/protobuf_ep-install/include -fdiagnostics-color=always > -ggdb -O0 -g -fPIC -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror > -std=c++11 -Wall -Wno-unknown-pragmas -Wconversion -Werror -O0 -g -MD -MT > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -MF > c++/src/CMakeFiles/orc.dir/Timezone.cc.o.d -o > c++/src/CMakeFiles/orc.dir/Timezone.cc.o -c > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc: In member function > 'void orc::TimezoneImpl::parseTimeVariants(const unsigned char*, uint64_t, > uint64_t, uint64_t, uint64_t)': > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: error: 'uint' > was not declared in this scope > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:748:7: note: > suggested alternative: 'rint' > uint nameStart = ptr[variantOffset + 6 * variant + 5]; > ^~~~ > rint > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: error: > 'nameStart' was not declared in this scope > if (nameStart >= nameCount) { > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:749:11: note: > suggested alternative: 'nameCount' > if (nameStart >= nameCount) { > ^ > nameCount > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: error: > 'nameStart' was not declared in this scope > + nameOffset + nameStart); > ^ > /build/cpp/orc_ep-prefix/src/orc_ep/c++/src/Timezone.cc:756:59: note: > suggested alternative: 'nameCount' > + nameOffset + nameStart); > ^ > nameCount{code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5932) [C++] undefined reference to `__cxa_init_primary_exception@CXXABI_1.3.11'
[ https://issues.apache.org/jira/browse/ARROW-5932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5932: -- Priority: Major (was: Critical) > [C++] undefined reference to `__cxa_init_primary_exception@CXXABI_1.3.11' > - > > Key: ARROW-5932 > URL: https://issues.apache.org/jira/browse/ARROW-5932 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.14.0 > Environment: Linux Mint 19.1 Tessa > g++-6 >Reporter: Cong Ding >Priority: Major > > I was installing Apache Arrow in my Linux Mint 19.1 Tessa server. I followed > the instructions on the official arrow website (using the ubuntu 18.04 > method). However, when I was trying to compile the examples, the g++ compiler > threw out some errors. > I have updated my g++ to g++-6, update my libstdc++ library, and using flag > -lstdc++, but it still didn't work. > > {code:java} > //代码占位符 > g++-6 -std=c++11 -larrow -lparquet main.cpp -lstdc++ > {code} > The error message: > /usr/lib/x86_64-linux-gnu/libarrow.so: undefined reference to > `__cxa_init_primary_exception@CXXABI_1.3.11' > /usr/lib/x86_64-linux-gnu/libarrow.so: undefined reference to > `std::__exception_ptr::exception_ptr::exception_ptr(void*)@CXXABI_1.3.11' > collect2: error: ld returned 1 exit status. > > I do not know what to do this moment. Can anyone help me? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6436) [C++] vendor a half precision floating point library
[ https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932318#comment-16932318 ] Antoine Pitrou commented on ARROW-6436: --- See also ARROW-3802. Numpy has dedicated float16 routines that we could reuse. Given that it's Numpy it's probably well-maintained. There may be other projects around. > [C++] vendor a half precision floating point library > > > Key: ARROW-6436 > URL: https://issues.apache.org/jira/browse/ARROW-6436 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Priority: Major > > Clang and GCC provide _Float16 and there are numerous polyfills which can > emulate a 16 bit float for other platforms. This would fill a hole in the > kernels and other code which don't currently support HALF_FLOAT. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6436) [C++] vendor a half precision floating point library
[ https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6436: -- Fix Version/s: 1.0.0 > [C++] vendor a half precision floating point library > > > Key: ARROW-6436 > URL: https://issues.apache.org/jira/browse/ARROW-6436 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Priority: Major > Fix For: 1.0.0 > > > Clang and GCC provide _Float16 and there are numerous polyfills which can > emulate a 16 bit float for other platforms. This would fill a hole in the > kernels and other code which don't currently support HALF_FLOAT. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6436) [C++] vendor a half precision floating point library
[ https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932321#comment-16932321 ] Antoine Pitrou commented on ARROW-6436: --- As for built-in float16 types it looks a bit more complicated, e.g. clang: {quote}Clang supports two half-precision (16-bit) floating point types: {{__fp16}} and {{_Float16}}. These types are supported in all language modes. {{__fp16}} is supported on every target, as it is purely a storage format; see below. {{_Float16}} is currently only supported on the following targets, with further targets pending ABI standardization: * 32-bit ARM * 64-bit ARM (AArch64) * SPIR {quote} > [C++] vendor a half precision floating point library > > > Key: ARROW-6436 > URL: https://issues.apache.org/jira/browse/ARROW-6436 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Priority: Major > Fix For: 1.0.0 > > > Clang and GCC provide _Float16 and there are numerous polyfills which can > emulate a 16 bit float for other platforms. This would fill a hole in the > kernels and other code which don't currently support HALF_FLOAT. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6436) [C++] vendor a half precision floating point library
[ https://issues.apache.org/jira/browse/ARROW-6436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932329#comment-16932329 ] Antoine Pitrou commented on ARROW-6436: --- Also there are conversion intrinsic on recent x86 CPUs: [https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=half-precision] [https://en.wikipedia.org/wiki/F16C] > [C++] vendor a half precision floating point library > > > Key: ARROW-6436 > URL: https://issues.apache.org/jira/browse/ARROW-6436 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Benjamin Kietzman >Priority: Major > Fix For: 1.0.0 > > > Clang and GCC provide _Float16 and there are numerous polyfills which can > emulate a 16 bit float for other platforms. This would fill a hole in the > kernels and other code which don't currently support HALF_FLOAT. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-6597) [Python] Segfault in test_pandas with Python 2.7
[ https://issues.apache.org/jira/browse/ARROW-6597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-6597. --- Resolution: Fixed Issue resolved by pull request 5416 [https://github.com/apache/arrow/pull/5416] > [Python] Segfault in test_pandas with Python 2.7 > > > Key: ARROW-6597 > URL: https://issues.apache.org/jira/browse/ARROW-6597 > Project: Apache Arrow > Issue Type: Bug > Components: Python >Reporter: Antoine Pitrou >Assignee: Antoine Pitrou >Priority: Major > Labels: pull-request-available > Fix For: 0.15.0 > > Time Spent: 40m > Remaining Estimate: 0h > > I get a segfault in test_pandas with Python 2.7. > gdb stack trace (excerpt): > {code} > Thread 27 "python" received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7fffb7fff700 (LWP 17725)] > 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > 229 *out = PyDate_FromDate(static_cast(year), > static_cast(month), > (gdb) bt > #0 0x7fffcac1a9f9 in arrow::py::internal::PyDate_from_int (val=10957, > unit=arrow::DateUnit::DAY, out=0x55e1b9b0) at > ../src/arrow/python/datetime.cc:229 > #1 0x7fffcabaed34 in arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}::operator()(int, _object**) const (this=0x7fffb7ffde90, > value=10957, out=0x55e1b9b0) at ../src/arrow/python/arrow_to_pandas.cc:657 > #2 0x7fffcabaeb8c in arrow::Status > arrow::py::ConvertAsPyObjects arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, > _object**)#1}&>(arrow::py::PandasOptions const&, arrow::ChunkedArray const&, > arrow::Status > arrow::py::ConvertDates(arrow::py::PandasOptions const&, > arrow::ChunkedArray const&, _object**)::{lambda(int, _object**)#1}&, > _object**)::{lambda(int const&, _object**)#1}::operator()(int const, > _object**) const (this=0x7fffb7ffdd88, value=@0x7fffb7ffdcbc: 10957, > out_values=0x55e1b9b0) > at ../src/arrow/python/arrow_to_pandas.cc:417 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6393) [C++]Add EqualOptions support in SparseTensor::Equals
[ https://issues.apache.org/jira/browse/ARROW-6393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6393: -- Fix Version/s: 1.0.0 > [C++]Add EqualOptions support in SparseTensor::Equals > - > > Key: ARROW-6393 > URL: https://issues.apache.org/jira/browse/ARROW-6393 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ >Reporter: Kenta Murata >Assignee: Kenta Murata >Priority: Major > Fix For: 1.0.0 > > > SparseTensor::Equals should take EqualOptions argument as Tensor::Equals does. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4784) [C++][CI] Re-enable flaky mingw tests.
[ https://issues.apache.org/jira/browse/ARROW-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4784: -- Fix Version/s: 2.0.0 > [C++][CI] Re-enable flaky mingw tests. > -- > > Key: ARROW-4784 > URL: https://issues.apache.org/jira/browse/ARROW-4784 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration >Reporter: Micah Kornfield >Priority: Major > Labels: ci-failure > Fix For: 2.0.0 > > > There is no {{--exclude-regex}} option for {{ctest}} in > {{ci/appveyor-cpp-build-mingw.bat}} when we resolve this. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5578) [C++][Flight] Flight does not build out of the box on Alpine Linux
[ https://issues.apache.org/jira/browse/ARROW-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5578: -- Fix Version/s: 2.0.0 > [C++][Flight] Flight does not build out of the box on Alpine Linux > -- > > Key: ARROW-5578 > URL: https://issues.apache.org/jira/browse/ARROW-5578 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Minor > Fix For: 2.0.0 > > > Fails with SSL linking errors. > I am disabling in the Dockerfile for now in ARROW-5577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5593) [C++][Fuzzing] Test fuzzers against arrow-testing corpus
[ https://issues.apache.org/jira/browse/ARROW-5593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5593: -- Fix Version/s: 1.0.0 > [C++][Fuzzing] Test fuzzers against arrow-testing corpus > > > Key: ARROW-5593 > URL: https://issues.apache.org/jira/browse/ARROW-5593 > Project: Apache Arrow > Issue Type: Test > Components: C++ >Reporter: Marco Neumann >Priority: Major > Labels: fuzzer > Fix For: 1.0.0 > > > All fuzzers should be run against the corpus in > [arrow-testing|https://github.com/apache/arrow-testing] to prevent > regressions. The arrow CI should download the current corpus and run the > fuzzers exactly once against each corpus applicable corpus file. The fuzzers > must be build with address sanitizer enabled. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6273) [C++][Fuzzing] Add fuzzer for parquet->arrow read path
[ https://issues.apache.org/jira/browse/ARROW-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6273: -- Fix Version/s: 1.0.0 > [C++][Fuzzing] Add fuzzer for parquet->arrow read path > -- > > Key: ARROW-6273 > URL: https://issues.apache.org/jira/browse/ARROW-6273 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Marco Neumann >Assignee: Marco Neumann >Priority: Major > Labels: fuzzer > Fix For: 1.0.0 > > > The parquet to arrow read path is likely the most commonly used one (esp. by > pyarrow) and is a closed step that should allow us to fuzz the reading of > untrusted parquet files into memory. This complements the existing arrow ipc > fuzzer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5578) [C++][Flight] Flight does not build out of the box on Alpine Linux
[ https://issues.apache.org/jira/browse/ARROW-5578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5578: -- Priority: Minor (was: Major) > [C++][Flight] Flight does not build out of the box on Alpine Linux > -- > > Key: ARROW-5578 > URL: https://issues.apache.org/jira/browse/ARROW-5578 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Wes McKinney >Priority: Minor > > Fails with SSL linking errors. > I am disabling in the Dockerfile for now in ARROW-5577 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6273) [C++][Fuzzing] Add fuzzer for parquet->arrow read path
[ https://issues.apache.org/jira/browse/ARROW-6273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6273: -- Component/s: Developer Tools Continuous Integration > [C++][Fuzzing] Add fuzzer for parquet->arrow read path > -- > > Key: ARROW-6273 > URL: https://issues.apache.org/jira/browse/ARROW-6273 > Project: Apache Arrow > Issue Type: Bug > Components: C++, Continuous Integration, Developer Tools >Reporter: Marco Neumann >Assignee: Marco Neumann >Priority: Major > Labels: fuzzer > Fix For: 1.0.0 > > > The parquet to arrow read path is likely the most commonly used one (esp. by > pyarrow) and is a closed step that should allow us to fuzz the reading of > untrusted parquet files into memory. This complements the existing arrow ipc > fuzzer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6386) [C++][Documentation] Explicit documentation of null slot interpretation
[ https://issues.apache.org/jira/browse/ARROW-6386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6386: -- Fix Version/s: 1.0.0 > [C++][Documentation] Explicit documentation of null slot interpretation > --- > > Key: ARROW-6386 > URL: https://issues.apache.org/jira/browse/ARROW-6386 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: Benjamin Kietzman >Assignee: Benjamin Kietzman >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > To my knowledge, there isn't explicit documentation on how null slots in an > array should be interpreted. SQL uses Kleene logic, wherein a null is > explicitly an unknown rather than a special value. This yields for example > `(null AND false) -> false`, since `(x AND false) -> false` for all possible > values of x. This is also the behavior of Gandiva's boolean expressions. > By contrast the boolean kernels implement something closer to the behavior of > NaN: `(null AND false) -> null`. I think this is simply an error in the > boolean kernels but in any case I think explicit documentation should be > added to prevent future confusion. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6148) [C++][Packaging] Improve aarch64 support
[ https://issues.apache.org/jira/browse/ARROW-6148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6148: -- Fix Version/s: 2.0.0 > [C++][Packaging] Improve aarch64 support > - > > Key: ARROW-6148 > URL: https://issues.apache.org/jira/browse/ARROW-6148 > Project: Apache Arrow > Issue Type: Bug > Components: Packaging >Reporter: Francois Saint-Jacques >Assignee: Marcin Juszkiewicz >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 8h > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4248) [C++][Plasma] Build on Windows / Visual Studio
[ https://issues.apache.org/jira/browse/ARROW-4248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4248: -- Fix Version/s: 2.0.0 > [C++][Plasma] Build on Windows / Visual Studio > -- > > Key: ARROW-4248 > URL: https://issues.apache.org/jira/browse/ARROW-4248 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Reporter: Wes McKinney >Priority: Major > Fix For: 2.0.0 > > > See https://github.com/apache/arrow/issues/3391 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-3226) [C++][Plasma] Plasma Store will crash using small memory
[ https://issues.apache.org/jira/browse/ARROW-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3226: -- Fix Version/s: 2.0.0 > [C++][Plasma] Plasma Store will crash using small memory > > > Key: ARROW-3226 > URL: https://issues.apache.org/jira/browse/ARROW-3226 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Reporter: Yuhong Guo >Assignee: Yuhong Guo >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 2h 10m > Remaining Estimate: 0h > > Plasma Store will do the eviction when the memory allocation is not enough. > When specified a smaller store limit, Plasma Store will crash when limit > memory reached. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-2045) [C++][Plasma] More primitive operations on plasma store
[ https://issues.apache.org/jira/browse/ARROW-2045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2045: -- Fix Version/s: 2.0.0 > [C++][Plasma] More primitive operations on plasma store > --- > > Key: ARROW-2045 > URL: https://issues.apache.org/jira/browse/ARROW-2045 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ - Plasma >Reporter: Yuxin Wu >Priority: Minor > Fix For: 2.0.0 > > > Hi Developers, > I found plasma store very useful – it's fast and simple to use. However, I > think there are more operations that can make it a more general IPC/messaging > tool and potentially helpful in more scenarios. > Conceptually, an object store can support the following "put" methods: > # Evict when full > # Wait for space when full, perhaps with a timeout (i.e. blocking) > # Return failure when full (i.e. non-blocking) > And the following "get" methods: > # Wait for the object to appear (i.e. blocking) > # Return failure when object doesn't exist (i.e. non-blocking) > # Remove the object after get > Some of the above features can be implemented with others. But some of them > are primitives (e.g. return failure when full) that needs to be supported. > > My use case: I wanted to use plasma to send/recv large buffers between > processes, i.e. build a message passing interface on top of shared memory. > Plasma has made it quite easy (only have to send/recv the id) and efficient > (faster than unix pipe). But "evict when full" is now the only available > "put" method, so that could create many trouble if I want to ensure message > delivery. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4829) [C++][Plasma] plasma-serialization_tests fails in release builds
[ https://issues.apache.org/jira/browse/ARROW-4829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4829: -- Fix Version/s: 2.0.0 > [C++][Plasma] plasma-serialization_tests fails in release builds > > > Key: ARROW-4829 > URL: https://issues.apache.org/jira/browse/ARROW-4829 > Project: Apache Arrow > Issue Type: Bug > Components: C++ - Plasma >Reporter: Wes McKinney >Priority: Major > Fix For: 2.0.0 > > > On Ubuntu 18.10 with conda-forge toolchain (gcc 7.3.x) > {code} > $ ./release/plasma-serialization_tests > Running main() from gmock_main.cc > [==] Running 14 tests from 1 test case. > [--] Global test environment set-up. > [--] 14 tests from PlasmaSerialization > [ RUN ] PlasmaSerialization.CreateRequest > [ OK ] PlasmaSerialization.CreateRequest (0 ms) > [ RUN ] PlasmaSerialization.CreateReply > [ OK ] PlasmaSerialization.CreateReply (0 ms) > [ RUN ] PlasmaSerialization.SealRequest > [ OK ] PlasmaSerialization.SealRequest (1 ms) > [ RUN ] PlasmaSerialization.SealReply > [ OK ] PlasmaSerialization.SealReply (0 ms) > [ RUN ] PlasmaSerialization.GetRequest > [ OK ] PlasmaSerialization.GetRequest (0 ms) > [ RUN ] PlasmaSerialization.GetReply > ../src/plasma/test/serialization_tests.cc:191: Failure > Expected equality of these values: > memcmp(&plasma_objects[object_ids[0]], &plasma_objects_return[0], > sizeof(PlasmaObject)) > Which is: 127 > 0 > [ FAILED ] PlasmaSerialization.GetReply (0 ms) > [ RUN ] PlasmaSerialization.ReleaseRequest > [ OK ] PlasmaSerialization.ReleaseRequest (0 ms) > [ RUN ] PlasmaSerialization.ReleaseReply > [ OK ] PlasmaSerialization.ReleaseReply (0 ms) > [ RUN ] PlasmaSerialization.DeleteRequest > [ OK ] PlasmaSerialization.DeleteRequest (0 ms) > [ RUN ] PlasmaSerialization.DeleteReply > [ OK ] PlasmaSerialization.DeleteReply (0 ms) > [ RUN ] PlasmaSerialization.EvictRequest > [ OK ] PlasmaSerialization.EvictRequest (0 ms) > [ RUN ] PlasmaSerialization.EvictReply > [ OK ] PlasmaSerialization.EvictReply (0 ms) > [ RUN ] PlasmaSerialization.DataRequest > [ OK ] PlasmaSerialization.DataRequest (0 ms) > [ RUN ] PlasmaSerialization.DataReply > [ OK ] PlasmaSerialization.DataReply (0 ms) > [--] 14 tests from PlasmaSerialization (1 ms total) > [--] Global test environment tear-down > [==] 14 tests from 1 test case ran. (2 ms total) > [ PASSED ] 13 tests. > [ FAILED ] 1 test, listed below: > [ FAILED ] PlasmaSerialization.GetReply > 1 FAILED TEST > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5607) [C++][Fuzzing] arrow-ipc-fuzzing-test crash 607e9caa76863a97f2694a769a1ae2fb83c55e02
[ https://issues.apache.org/jira/browse/ARROW-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5607: -- Fix Version/s: 2.0.0 > [C++][Fuzzing] arrow-ipc-fuzzing-test crash > 607e9caa76863a97f2694a769a1ae2fb83c55e02 > > > Key: ARROW-5607 > URL: https://issues.apache.org/jira/browse/ARROW-5607 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: Marco Neumann >Priority: Major > Labels: fuzzer > Fix For: 2.0.0 > > Attachments: crash-607e9caa76863a97f2694a769a1ae2fb83c55e02 > > > {{arrow-ipc-fuzzing-test}} found the attached attached crash. Reproduce with > {code} > arrow-ipc-fuzzing-test crash-607e9caa76863a97f2694a769a1ae2fb83c55e02 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-2260) [C++][Plasma] plasma_store should show usage
[ https://issues.apache.org/jira/browse/ARROW-2260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2260: -- Fix Version/s: 2.0.0 > [C++][Plasma] plasma_store should show usage > > > Key: ARROW-2260 > URL: https://issues.apache.org/jira/browse/ARROW-2260 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ - Plasma >Affects Versions: 0.8.0 >Reporter: Antoine Pitrou >Priority: Minor > Fix For: 2.0.0 > > > Currently the options exposed by the {{plasma_store}} executable aren't very > discoverable: > {code:bash} > $ plasma_store -h > please specify socket for incoming connections with -s switch > Abandon > (pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting > *)$ plasma_store > please specify socket for incoming connections with -s switch > Abandon > (pyarrow) antoine@fsol:~/arrow/cpp (ARROW-2135-nan-conversion-when-casting > *)$ plasma_store --help > plasma_store: invalid option -- '-' > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5607) [C++][Fuzzing] arrow-ipc-fuzzing-test crash 607e9caa76863a97f2694a769a1ae2fb83c55e02
[ https://issues.apache.org/jira/browse/ARROW-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5607: -- Fix Version/s: (was: 2.0.0) > [C++][Fuzzing] arrow-ipc-fuzzing-test crash > 607e9caa76863a97f2694a769a1ae2fb83c55e02 > > > Key: ARROW-5607 > URL: https://issues.apache.org/jira/browse/ARROW-5607 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: Marco Neumann >Priority: Major > Labels: fuzzer > Attachments: crash-607e9caa76863a97f2694a769a1ae2fb83c55e02 > > > {{arrow-ipc-fuzzing-test}} found the attached attached crash. Reproduce with > {code} > arrow-ipc-fuzzing-test crash-607e9caa76863a97f2694a769a1ae2fb83c55e02 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-2358) [C++][Python] API for Writing to Multiple Feather Files
[ https://issues.apache.org/jira/browse/ARROW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2358: -- Fix Version/s: 2.0.0 > [C++][Python] API for Writing to Multiple Feather Files > --- > > Key: ARROW-2358 > URL: https://issues.apache.org/jira/browse/ARROW-2358 > Project: Apache Arrow > Issue Type: New Feature > Components: C, C++, Python >Affects Versions: 0.9.0 >Reporter: Dhruv Madeka >Priority: Minor > Fix For: 2.0.0 > > > It would be really great to have an API which can write a Table to a > `FeatherDataset`. Essentially, taking a name for a file - it would split the > table into N-equal parts (which could be determined by the user or the code) > and then write the data to N files with a suffix (which is `_part` by default > but could be user specificed). -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6463) [C++][Python] Rename arrow::fs::Selector to FileSelector
[ https://issues.apache.org/jira/browse/ARROW-6463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6463: -- Fix Version/s: 1.0.0 > [C++][Python] Rename arrow::fs::Selector to FileSelector > > > Key: ARROW-6463 > URL: https://issues.apache.org/jira/browse/ARROW-6463 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: filesystem > Fix For: 1.0.0 > > > In both the C++ implementation and the python binding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-2882) [C++][Python] Support AWS Firehose partition_scheme implementation for Parquet datasets
[ https://issues.apache.org/jira/browse/ARROW-2882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-2882: -- Fix Version/s: 2.0.0 > [C++][Python] Support AWS Firehose partition_scheme implementation for > Parquet datasets > --- > > Key: ARROW-2882 > URL: https://issues.apache.org/jira/browse/ARROW-2882 > Project: Apache Arrow > Issue Type: New Feature > Components: C++, Python >Reporter: Pablo Javier Takara >Priority: Major > Labels: dataset, parquet > Fix For: 2.0.0 > > > I'd like to be able to read a ParquetDataset generated by AWS Firehose. > The only implementation at the time of writting was the partition scheme > created by hive (year=2018/month=01/day=11). > AWS Firehose partition scheme is a little bit different (2018/01/11). > > Thanks -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-1975) [C++] Add abi-compliance-checker to build process
[ https://issues.apache.org/jira/browse/ARROW-1975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-1975: -- Fix Version/s: 2.0.0 > [C++] Add abi-compliance-checker to build process > - > > Key: ARROW-1975 > URL: https://issues.apache.org/jira/browse/ARROW-1975 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ >Reporter: Uwe L. Korn >Assignee: Uwe L. Korn >Priority: Major > Labels: pull-request-available > Fix For: 2.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > I would like to check our baseline modules with > https://lvc.github.io/abi-compliance-checker/ to ensure that version upgrades > are much smoother and that we don‘t break the ABI in patch releases. > As we‘re pre-1.0 yet, I accept that there will be breakage but I would like > to keep them to a minimum. Currently the biggest pain with Arrow is you need > to pin it in Python always with {{==0.x.y}}, otherwise segfaults are > inevitable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4761) [C++] Support zstandard<1
[ https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932339#comment-16932339 ] Antoine Pitrou commented on ARROW-4761: --- Due to our maintenance workload, I'm not sure we still care about this. > [C++] Support zstandard<1 > - > > Key: ARROW-4761 > URL: https://issues.apache.org/jira/browse/ARROW-4761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Reporter: Uwe L. Korn >Priority: Major > > To support building with as many system packages as possible on Ubuntu, we > should support building with zstandard 0.5.1 which is the one available on > Ubuntu Xenial. Given the size of our current code for Zstandard, this seems > feasible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4761) [C++] Support zstandard<1
[ https://issues.apache.org/jira/browse/ARROW-4761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4761: -- Priority: Minor (was: Major) > [C++] Support zstandard<1 > - > > Key: ARROW-4761 > URL: https://issues.apache.org/jira/browse/ARROW-4761 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Packaging >Reporter: Uwe L. Korn >Priority: Minor > > To support building with as many system packages as possible on Ubuntu, we > should support building with zstandard 0.5.1 which is the one available on > Ubuntu Xenial. Given the size of our current code for Zstandard, this seems > feasible. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-3737) [CI/Docker/Python] Support running integration tests on multiple python versions
[ https://issues.apache.org/jira/browse/ARROW-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-3737. --- Resolution: Won't Fix Indeed, this doesn't seem useful. > [CI/Docker/Python] Support running integration tests on multiple python > versions > > > Key: ARROW-3737 > URL: https://issues.apache.org/jira/browse/ARROW-3737 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: docker > > Currently python-3.6 image is pinned in integration/hdfs/Dockerfile and > integration/pandas-master/Dockerfile. It's possible to pass build time > argument similarly like the arrow:python-${PYTHON_VERSION} image works. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-5607) [C++][Fuzzing] arrow-ipc-fuzzing-test crash 607e9caa76863a97f2694a769a1ae2fb83c55e02
[ https://issues.apache.org/jira/browse/ARROW-5607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932340#comment-16932340 ] Antoine Pitrou commented on ARROW-5607: --- It passes here. Can you reproduce? > [C++][Fuzzing] arrow-ipc-fuzzing-test crash > 607e9caa76863a97f2694a769a1ae2fb83c55e02 > > > Key: ARROW-5607 > URL: https://issues.apache.org/jira/browse/ARROW-5607 > Project: Apache Arrow > Issue Type: Bug > Components: C++ >Affects Versions: 0.13.0 >Reporter: Marco Neumann >Priority: Major > Labels: fuzzer > Attachments: crash-607e9caa76863a97f2694a769a1ae2fb83c55e02 > > > {{arrow-ipc-fuzzing-test}} found the attached attached crash. Reproduce with > {code} > arrow-ipc-fuzzing-test crash-607e9caa76863a97f2694a769a1ae2fb83c55e02 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-3710) [CI/Python] Run nightly tests against pandas master
[ https://issues.apache.org/jira/browse/ARROW-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-3710: -- Fix Version/s: 1.0.0 > [CI/Python] Run nightly tests against pandas master > --- > > Key: ARROW-3710 > URL: https://issues.apache.org/jira/browse/ARROW-3710 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration, Python >Reporter: Krisztian Szucs >Assignee: Krisztian Szucs >Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Follow-up of [https://github.com/apache/arrow/pull/2758] and > https://github.com/apache/arrow/pull/2755 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5216) [CI] Add Appveyor badge to README
[ https://issues.apache.org/jira/browse/ARROW-5216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5216: -- Fix Version/s: 1.0.0 > [CI] Add Appveyor badge to README > - > > Key: ARROW-5216 > URL: https://issues.apache.org/jira/browse/ARROW-5216 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Neal Richardson >Priority: Trivial > Fix For: 1.0.0 > > > I was trying to see what was running in appveyor and couldn't find it. > Krisztián helped me to find > [https://ci.appveyor.com/project/ApacheSoftwareFoundation/arrow], but it > would be nice to add the badge to the README next to the Travis-CI one for a > quick link to it (as well as showing off build status). > I was just going to add it myself, but unlike Travis, you can't guess the > Appveyor badge URL from the project name because they have a hash in them; > only someone with sufficient privileges on the project in Appveyor can get to > the settings panel to find the URL. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6448) [CI] Add crossbow notifications
[ https://issues.apache.org/jira/browse/ARROW-6448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932343#comment-16932343 ] Antoine Pitrou commented on ARROW-6448: --- Does something remain to do here? > [CI] Add crossbow notifications > --- > > Key: ARROW-6448 > URL: https://issues.apache.org/jira/browse/ARROW-6448 > Project: Apache Arrow > Issue Type: New Feature > Components: Continuous Integration >Reporter: Francois Saint-Jacques >Assignee: Francois Saint-Jacques >Priority: Critical > Labels: pull-request-available > Time Spent: 1h 40m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-4785) [CI] Make Travis CI resilient against hash sum mismatch errors
[ https://issues.apache.org/jira/browse/ARROW-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-4785. --- Resolution: Not A Problem > [CI] Make Travis CI resilient against hash sum mismatch errors > -- > > Key: ARROW-4785 > URL: https://issues.apache.org/jira/browse/ARROW-4785 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Hatem Helal >Priority: Minor > Labels: ci-failure > > Travis Jobs sometime fail with a GPG error: > {code:java} > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > W: An error occurred during the signature verification. The repository is not > updated and the previous index files will be used. GPG error: > https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease: The following > signatures couldn't be verified because the public key is not available: > NO_PUBKEY 6B05F25D762E3157 > W: Failed to fetch > https://packagecloud.io/github/git-lfs/ubuntu/dists/trusty/InRelease The > following signatures couldn't be verified because the public key is not > available: NO_PUBKEY 6B05F25D762E3157 > E: Failed to fetch > http://security.ubuntu.com/ubuntu/dists/trusty-security/main/binary-i386/Packages.gz > Hash Sum mismatch > W: Some index files failed to download. They have been ignored, or old ones > used instead. > The command "if [ $TRAVIS_OS_NAME == "linux" ]; then > sudo bash -c "echo -e 'Acquire::Retries 10; Acquire::http::Timeout > \"20\";' > /etc/apt/apt.conf.d/99-travis-retry" > sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test > sudo apt-get update -qq > fi > " failed and exited with 100 during . > Your build has been stopped.{code} > It would be nice if the number of retries, timeout, or both could be > increased to make the travis jobs more resilient to this seemingly sporadic > issue. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4949) [CI] Add C# docker image to the docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4949: -- Component/s: C# > [CI] Add C# docker image to the docker-compose setup > > > Key: ARROW-4949 > URL: https://issues.apache.org/jira/browse/ARROW-4949 > Project: Apache Arrow > Issue Type: Improvement > Components: C#, Continuous Integration >Reporter: Krisztian Szucs >Priority: Major > Fix For: 2.0.0 > > > https://github.com/apache/arrow/blob/master/csharp/build/docker/Dockerfile -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4949) [CI] Add C# docker image to the docker-compose setup
[ https://issues.apache.org/jira/browse/ARROW-4949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4949: -- Fix Version/s: 2.0.0 > [CI] Add C# docker image to the docker-compose setup > > > Key: ARROW-4949 > URL: https://issues.apache.org/jira/browse/ARROW-4949 > Project: Apache Arrow > Issue Type: Improvement > Components: Continuous Integration >Reporter: Krisztian Szucs >Priority: Major > Fix For: 2.0.0 > > > https://github.com/apache/arrow/blob/master/csharp/build/docker/Dockerfile -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-4720) [CI] Mark MinGW build failures allowed
[ https://issues.apache.org/jira/browse/ARROW-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-4720. --- Resolution: Won't Fix minGW builds seem to have been stable for a while, closing. > [CI] Mark MinGW build failures allowed > -- > > Key: ARROW-4720 > URL: https://issues.apache.org/jira/browse/ARROW-4720 > Project: Apache Arrow > Issue Type: Wish > Components: C++, Continuous Integration >Reporter: Antoine Pitrou >Priority: Major > Labels: ci-failure > > Currently we are requiring MinGW tests to pass on AppVeyor. Almost nobody > will use MinGW builds if regular MSVC builds work fine. So it should be on > the onus of the few people caring about MinGW to ensure that the build chain > works on those platforms. > Example here, apparently the uriparser library doesn't build on MinGW: > https://ci.appveyor.com/project/pitrou/arrow/build/job/t64xwyj2axhl1jgr > There is a tendency to inflate the number of different configurations in our > CI matrices. Not only it makes builds longer and adds delays (see how long > you have to wait before you get a CI result on AppVeyor), but it's of dubious > utility. I'm not sure it serves the project's general interest. > Rant off ;) -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6513) [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt extension
[ https://issues.apache.org/jira/browse/ARROW-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6513: -- Priority: Trivial (was: Minor) > [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt > extension > > > Key: ARROW-6513 > URL: https://issues.apache.org/jira/browse/ARROW-6513 > Project: Apache Arrow > Issue Type: Improvement > Components: CI >Reporter: Krisztian Szucs >Priority: Trivial > > The files `arrow/ci/conda_env_*.yml` files are not yaml files, we should > rename them to use txt extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-6513) [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt extension
[ https://issues.apache.org/jira/browse/ARROW-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-6513: -- Fix Version/s: 2.0.0 > [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt > extension > > > Key: ARROW-6513 > URL: https://issues.apache.org/jira/browse/ARROW-6513 > Project: Apache Arrow > Issue Type: Improvement > Components: CI >Reporter: Krisztian Szucs >Priority: Trivial > Fix For: 2.0.0 > > > The files `arrow/ci/conda_env_*.yml` files are not yaml files, we should > rename them to use txt extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-1851) [C] Minimalist ANSI C / C99 implementation of Arrow data structures and IPC
[ https://issues.apache.org/jira/browse/ARROW-1851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932349#comment-16932349 ] Antoine Pitrou commented on ARROW-1851: --- I don't think this is likely to happen any soon. Should we close as Won't Fix? > [C] Minimalist ANSI C / C99 implementation of Arrow data structures and IPC > --- > > Key: ARROW-1851 > URL: https://issues.apache.org/jira/browse/ARROW-1851 > Project: Apache Arrow > Issue Type: New Feature > Components: C >Reporter: Wes McKinney >Priority: Major > Attachments: text.html > > > This is an umbrella tracking JIRA for creating a small self-contained C > implementation of Arrow. This purpose of this library would be compactness > and portability, for embedded settings or for FFI in languages that have a > harder time binding to C++. The C library could also grow wrapper support for > the C++ library to expose more complicated functionality where we don't > necessarily want multiple implementations -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-6513) [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt extension
[ https://issues.apache.org/jira/browse/ARROW-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932348#comment-16932348 ] Antoine Pitrou commented on ARROW-6513: --- [~kszucs] do you want to tackle this? > [CI] The conda environment files arrow/ci/conda_env_*.yml should have .txt > extension > > > Key: ARROW-6513 > URL: https://issues.apache.org/jira/browse/ARROW-6513 > Project: Apache Arrow > Issue Type: Improvement > Components: CI >Reporter: Krisztian Szucs >Priority: Minor > > The files `arrow/ci/conda_env_*.yml` files are not yaml files, we should > rename them to use txt extension. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-3266) [C] Minimalistic C library implementing Arrow data structures, IPC read/write
[ https://issues.apache.org/jira/browse/ARROW-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-3266. --- Resolution: Duplicate > [C] Minimalistic C library implementing Arrow data structures, IPC read/write > - > > Key: ARROW-3266 > URL: https://issues.apache.org/jira/browse/ARROW-3266 > Project: Apache Arrow > Issue Type: New Feature > Components: C >Reporter: Wes McKinney >Priority: Major > > I am interested in a small C89/C99 library for interacting with Arrow data > structures in standard C with minimal dependencies. Using > https://github.com/dvidelabs/flatcc it should be possible to deal with IPC > metadata in C as well without involving a C++ compiler -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4412) [DOCUMENTATION] Add explicit version numbers to the arrow specification documents.
[ https://issues.apache.org/jira/browse/ARROW-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4412: -- Fix Version/s: 1.0.0 > [DOCUMENTATION] Add explicit version numbers to the arrow specification > documents. > -- > > Key: ARROW-4412 > URL: https://issues.apache.org/jira/browse/ARROW-4412 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Micah Kornfield >Priority: Minor > Fix For: 1.0.0 > > > Based on conversation on the mailing list it might pay to include > version/revision numbers on the specification document. One way is to > include the "release" version, another might be to only update versioning on > changes to the document. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4412) [DOCUMENTATION] Add explicit version numbers to the arrow specification documents.
[ https://issues.apache.org/jira/browse/ARROW-4412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932352#comment-16932352 ] Antoine Pitrou commented on ARROW-4412: --- cc [~npr] > [DOCUMENTATION] Add explicit version numbers to the arrow specification > documents. > -- > > Key: ARROW-4412 > URL: https://issues.apache.org/jira/browse/ARROW-4412 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Micah Kornfield >Priority: Minor > Fix For: 1.0.0 > > > Based on conversation on the mailing list it might pay to include > version/revision numbers on the specification document. One way is to > include the "release" version, another might be to only update versioning on > changes to the document. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5673) [Crossbow] Support GitLab runners
[ https://issues.apache.org/jira/browse/ARROW-5673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5673: -- Fix Version/s: 2.0.0 > [Crossbow] Support GitLab runners > - > > Key: ARROW-5673 > URL: https://issues.apache.org/jira/browse/ARROW-5673 > Project: Apache Arrow > Issue Type: Improvement > Components: Packaging >Reporter: Krisztian Szucs >Priority: Major > Fix For: 2.0.0 > > > Description is by [~kou]: > I want to use GitLab Runner instead of CircleCI. > Because we can add custom GitLab Runners for us. For example, we can add GPU > enabled GitLab Runner to test CUDA enabled Apache Arrow build. We can also > increase timeout more than 5h for our GitLab Runners. > We can use https://gitlab.com/ to run GitLab Runners: > https://about.gitlab.com/solutions/github/ > This feature isn't included in the Free tier on GitLab.com (it's available > with the Free tier for campaing for now (*1)) but GitLab.com provides Gold > tier features to open source projects (*2). So we can use this feature by > choosing "CI/CD for external repo" in "New project page" > https://gitlab.com/projects/new . > (*1) > So, for the next year we are making the GitLab CI/CD for GitHub feature a > part of our GitLab.com Free tier. > (*2) > As part of our commitment to open source, we offer all public projects > our highest tier features (Gold) for free. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5858) [Doc] Better document the Tensor classes in the prose documentation
[ https://issues.apache.org/jira/browse/ARROW-5858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5858: -- Fix Version/s: 1.0.0 > [Doc] Better document the Tensor classes in the prose documentation > --- > > Key: ARROW-5858 > URL: https://issues.apache.org/jira/browse/ARROW-5858 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation, Python >Reporter: Joris Van den Bossche >Priority: Major > Fix For: 1.0.0 > > > From a comment from [~wesmckinn] in ARROW-2714: > {quote}The Tensor classes are independent from the columnar data structures, > though they reuse pieces of metadata, metadata serialization, memory > management, and IPC. > The purpose of adding these to the library was to have in-memory data > structures for handling Tensor/ndarray data and metadata that "plug in" to > the rest of the Arrow C++ system (Plasma store, IO subsystem, memory pools, > buffers, etc.). > Theoretically you could return a Tensor when creating a non-contiguous slice > of an Array; in light of the above, I don't think that would be intuitive. > When we started the project, our focus was creating an open standard for > in-memory columnar data, a hitherto unsolved problem. The project's scope has > expanded into peripheral problems in the same domain in the meantime (with > the mantra of creating interoperable components, a use-what-you-need > development platform for system developers). I think this aspect of the > project could be better documented / advertised, since the project's initial > focus on the columnar standard has given some the mistaken impression that we > are not interested in any work outside of that. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-4999) [Doc][C++] Add examples on how to construct with ArrayData::Make instead of builder classes
[ https://issues.apache.org/jira/browse/ARROW-4999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-4999: -- Fix Version/s: 1.0.0 > [Doc][C++] Add examples on how to construct with ArrayData::Make instead of > builder classes > --- > > Key: ARROW-4999 > URL: https://issues.apache.org/jira/browse/ARROW-4999 > Project: Apache Arrow > Issue Type: Improvement > Components: C++, Documentation >Reporter: Francois Saint-Jacques >Priority: Minor > Fix For: 1.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (ARROW-4405) [Docs] Docker documentation builds fail since the source directory is mounted as readonly
[ https://issues.apache.org/jira/browse/ARROW-4405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16932354#comment-16932354 ] Antoine Pitrou commented on ARROW-4405: --- Is this still an issue? > [Docs] Docker documentation builds fail since the source directory is mounted > as readonly > - > > Key: ARROW-4405 > URL: https://issues.apache.org/jira/browse/ARROW-4405 > Project: Apache Arrow > Issue Type: Bug > Components: Documentation >Reporter: Krisztian Szucs >Priority: Major > Labels: docker > > {code:java} > writing list of installed files to '../../build/python/record.txt' > / > + pushd /arrow/cpp/apidoc > /arrow/cpp/apidoc / > + doxygen > error: Failed to open temporary file /arrow/cpp/apidoc/doxygen_objdb_4898.tmp > The command "docker-compose run docs" exited with 1.{code} > https://travis-ci.org/kszucs/crossbow/builds/485348071 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (ARROW-2793) [Documentation] Add contributor guide to Sphinx documentation
[ https://issues.apache.org/jira/browse/ARROW-2793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou resolved ARROW-2793. --- Resolution: Not A Problem Agree with Neal, the desired info already seems there. Please reopen if we misunderstood :) > [Documentation] Add contributor guide to Sphinx documentation > - > > Key: ARROW-2793 > URL: https://issues.apache.org/jira/browse/ARROW-2793 > Project: Apache Arrow > Issue Type: New Feature > Components: Wiki >Reporter: Wes McKinney >Priority: Major > > We should document the desired contributor workflow (e.g. git branches, etc.) > someplace. We should put this in the Sphinx project -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5405) [Documentation] Move integration testing documentation to Sphinx docs, add instructions for JavaScript
[ https://issues.apache.org/jira/browse/ARROW-5405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5405: -- Fix Version/s: 1.0.0 > [Documentation] Move integration testing documentation to Sphinx docs, add > instructions for JavaScript > -- > > Key: ARROW-5405 > URL: https://issues.apache.org/jira/browse/ARROW-5405 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > I noticed that JavaScript information is not in integration/README.md. It > would be a good opportunity to migrate over this to the > docs/source/developers directory -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5543) [Documentation] Migrate FAQ page to Sphinx / rst around release time
[ https://issues.apache.org/jira/browse/ARROW-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5543: -- Fix Version/s: 1.0.0 > [Documentation] Migrate FAQ page to Sphinx / rst around release time > > > Key: ARROW-5543 > URL: https://issues.apache.org/jira/browse/ARROW-5543 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney >Priority: Major > Fix For: 1.0.0 > > > In ARROW-973, a Markdown page with the FAQ was added. When we are close to > publishing a new version of the Sphinx site, it would make sense to move the > FAQ to the main docs project and link from the project from page -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (ARROW-5543) [Documentation] Migrate FAQ page to Sphinx / rst around release time
[ https://issues.apache.org/jira/browse/ARROW-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Antoine Pitrou updated ARROW-5543: -- Component/s: Documentation > [Documentation] Migrate FAQ page to Sphinx / rst around release time > > > Key: ARROW-5543 > URL: https://issues.apache.org/jira/browse/ARROW-5543 > Project: Apache Arrow > Issue Type: Improvement > Components: Documentation >Reporter: Wes McKinney >Priority: Major > > In ARROW-973, a Markdown page with the FAQ was added. When we are close to > publishing a new version of the Sphinx site, it would make sense to move the > FAQ to the main docs project and link from the project from page -- This message was sent by Atlassian Jira (v8.3.4#803005)