[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer
tianchen92 commented on pull request #7231: URL: https://github.com/apache/arrow/pull/7231#issuecomment-643892709 Thanks @rymurr for the review. @emkornfield Do you have other comments? otherwise I'll merge this in several days later. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] liyafan82 commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package
liyafan82 commented on pull request #6729: URL: https://github.com/apache/arrow/pull/6729#issuecomment-643872298 > This breaks Spark: https://github.com/ursa-labs/crossbow/runs/769424833#step:6:13025 > > ``` > [ERROR] [Error] /spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:20: cannot find symbol > symbol: class ArrowBuf > location: package io.netty.buffer > [ERROR] [Error] /spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:461: cannot find symbol > symbol: class ArrowBuf > location: class org.apache.spark.sql.vectorized.ArrowColumnVector.ArrayAccessor > ``` > > Because Spark uses `io.netty.buffer.ArrowBuf`: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L20 > > Should we restore the `io.netty.buffer.ArrowBuf` name or update Spark? Hi @kou, thanks a lot for reporting the problem. I'd prefer updating Spark, as this PR represents one of the steps towards moving netty related code into a separate module. We have retried to keep two implementations of ArrowBuf, and make one as deprecated. However, that would cause some other problems, so we chose to directly move ArrowBuf to another package. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm closed pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels
wesm closed pull request #7417: URL: https://github.com/apache/arrow/pull/7417 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] wesm commented on pull request #7421: ARROW-9030: [Python] Remove pyarrow/compat.py, move some oft-used utility functions to pyarrow.lib
wesm commented on pull request #7421: URL: https://github.com/apache/arrow/pull/7421#issuecomment-643865571 That's unfortunate. We can restore pyarrow.compat with wrappers for the functions that have deprecation warnings. I opened https://issues.apache.org/jira/browse/ARROW-9130 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou closed pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check
kou closed pull request #7433: URL: https://github.com/apache/arrow/pull/7433 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check
kou commented on pull request #7433: URL: https://github.com/apache/arrow/pull/7433#issuecomment-643824491 +1 The error has gone but new errors are appeared: https://github.com/ursa-labs/crossbow/runs/770616977?check_suite_focus=true#step:6:10042 ```text > return om.readValue(jvm_spec, pojo_Field) E pyarrow.tests.test_jvm.com.fasterxml.jackson.databind.exc.MismatchedInputException: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.util.ArrayList` out of START_OBJECT token Eat [Source: (String)"{"name": "field_name", "nullable": true, "type": {"name": "timestamp", "unit": "NANOSECOND", "timezone": "Europe/Paris"}, "children": [], "metadata": {"field meta": "field data"}}"; line: 1, column: 151] (through reference chain: org.apache.arrow.vector.types.pojo.Field["metadata"]) ``` The new errors are out of scope of this pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package
kou commented on pull request #6729: URL: https://github.com/apache/arrow/pull/6729#issuecomment-643823860 This breaks Spark: https://github.com/ursa-labs/crossbow/runs/769424833#step:6:13025 ```text [ERROR] [Error] /spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:20: cannot find symbol symbol: class ArrowBuf location: package io.netty.buffer [ERROR] [Error] /spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:461: cannot find symbol symbol: class ArrowBuf location: class org.apache.spark.sql.vectorized.ArrowColumnVector.ArrayAccessor ``` Because Spark uses `io.netty.buffer.ArrowBuf`: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L20 Should we restore the `io.netty.buffer.ArrowBuf` name or update Spark? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7335: ARROW-9018: [C++] Remove APIs that were marked as deprecated in 0.17.0 and prior
kou commented on pull request #7335: URL: https://github.com/apache/arrow/pull/7335#issuecomment-643822639 This breaks Turbodbc: https://github.com/dask/dask/blob/master/dask/dataframe/io/parquet/arrow.py#L9 Turbodbc uses `Status AllocateResizableBuffer(MemoryPool* pool, ...)`: https://github.com/blue-yonder/turbodbc/blob/master/cpp/turbodbc_arrow/Test/tests/arrow_result_set_test.cpp#L113 @xhochy Could you change Turbodbc to use `Result> AllocateResizableBuffer(const int64_t size, MemoryPool* pool = NULLPTR)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou commented on pull request #7421: ARROW-9030: [Python] Remove pyarrow/compat.py, move some oft-used utility functions to pyarrow.lib
kou commented on pull request #7421: URL: https://github.com/apache/arrow/pull/7421#issuecomment-643821762 This breaks Dask: https://github.com/ursa-labs/crossbow/runs/769427250#step:6:12381 Because Dask depends on `pyarrow.compat`: https://github.com/dask/dask/blob/master/dask/dataframe/io/parquet/arrow.py#L9 Should we restore `pyarrow.compat` or change Dask? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check
github-actions[bot] commented on pull request #7433: URL: https://github.com/apache/arrow/pull/7433#issuecomment-643821823 https://issues.apache.org/jira/browse/ARROW-9129 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check
github-actions[bot] commented on pull request #7433: URL: https://github.com/apache/arrow/pull/7433#issuecomment-643821184 Revision: 4fd9f1696970c48eb9ceeca3fef975fcd9905be9 Submitted crossbow builds: [ursa-labs/crossbow @ actions-319](https://github.com/ursa-labs/crossbow/branches/all?query=actions-319) |Task|Status| ||--| |test-conda-python-3.8-jpype|[![Github Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-319-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-319-github-test-conda-python-3.8-jpype)| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check
kou opened a new pull request #7433: URL: https://github.com/apache/arrow/pull/7433 Because we only run the test with the latest JPype. Error details: https://github.com/ursa-labs/crossbow/runs/769433714#step:6:7995 ```text > if jpype.__version_info__ >= (0, 7): E TypeError: '>=' not supported between instances of 'list' and 'tuple' ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kiszk commented on a change in pull request #7402: ARROW-9099: [C++][Gandiva] Implement trim function for string
kiszk commented on a change in pull request #7402: URL: https://github.com/apache/arrow/pull/7402#discussion_r439854412 ## File path: cpp/src/gandiva/precompiled/string_ops_test.cc ## @@ -426,6 +426,33 @@ TEST(TestStringOps, TestReverse) { ctx.Reset(); } +TEST(TestStringOps, TestTrim) { + gandiva::ExecutionContext ctx; + uint64_t ctx_ptr = reinterpret_cast(); + gdv_int32 out_len = 0; + const char* out_str; + + out_str = trim_utf8(ctx_ptr, "TestString", 10, _len); + EXPECT_EQ(std::string(out_str, out_len), "TestString"); + EXPECT_FALSE(ctx.has_error()); + + out_str = trim_utf8(ctx_ptr, " TestString ", 18, _len); + EXPECT_EQ(std::string(out_str, out_len), "TestString"); + EXPECT_FALSE(ctx.has_error()); + + out_str = trim_utf8(ctx_ptr, " Test çåå†bD ", 21, _len); + EXPECT_EQ(std::string(out_str, out_len), "Test çåå†bD"); + EXPECT_FALSE(ctx.has_error()); + + out_str = trim_utf8(ctx_ptr, "", 0, _len); + EXPECT_EQ(std::string(out_str, out_len), ""); + EXPECT_FALSE(ctx.has_error()); + + out_str = trim_utf8(ctx_ptr, " ", 6, _len); + EXPECT_EQ(std::string(out_str, out_len), "sadfsdgfh"); Review comment: Is this result correct? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] github-actions[bot] commented on pull request #7432: ARROW-9127: [Rust] Update thrift dependency to 0.13 (latest)
github-actions[bot] commented on pull request #7432: URL: https://github.com/apache/arrow/pull/7432#issuecomment-643753938 https://issues.apache.org/jira/browse/ARROW-9127 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] alamb opened a new pull request #7432: ARROW-8124: [Rust] Update thrift dependency to 0.13 (latest)
alamb opened a new pull request #7432: URL: https://github.com/apache/arrow/pull/7432 Update to latest version of apache thrift (1.3) Rationale: We were trying to update the version of `byteorder` that an internal project used, but arrow/parquet -> depends on parquet-format-rs -> depends on thrift. @sunchao recently updated the thrift-pin in parquet-format in https://github.com/apache/arrow/pull/6626 (thank you!), so now it is possible to update the thrift version here as well It seems like the thrift dependency was postponed when the dependencies were last updated (https://github.com/apache/arrow/pull/6626 / https://issues.apache.org/jira/browse/ARROW-8124 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org