[GitHub] [arrow] tianchen92 commented on pull request #7231: ARROW-6839: [Java] Add APIs to read and write "custom_metadata" field of IPC file footer

2020-06-14 Thread GitBox


tianchen92 commented on pull request #7231:
URL: https://github.com/apache/arrow/pull/7231#issuecomment-643892709


   Thanks @rymurr for the review.
   @emkornfield Do you have other comments? otherwise I'll merge this in 
several days later. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] liyafan82 commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package

2020-06-14 Thread GitBox


liyafan82 commented on pull request #6729:
URL: https://github.com/apache/arrow/pull/6729#issuecomment-643872298


   > This breaks Spark: 
https://github.com/ursa-labs/crossbow/runs/769424833#step:6:13025
   > 
   > ```
   >  [ERROR] [Error] 
/spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:20:
 cannot find symbol
   >   symbol:   class ArrowBuf
   >   location: package io.netty.buffer
   > [ERROR] [Error] 
/spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:461:
 cannot find symbol
   >   symbol:   class ArrowBuf
   >   location: class 
org.apache.spark.sql.vectorized.ArrowColumnVector.ArrayAccessor
   > ```
   > 
   > Because Spark uses `io.netty.buffer.ArrowBuf`: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L20
   > 
   > Should we restore the `io.netty.buffer.ArrowBuf` name or update Spark?
   
   Hi @kou, thanks a lot for reporting the problem.
   I'd prefer updating Spark, as this PR represents one of the steps towards 
moving netty related code into a separate module. 
   We have retried to keep two implementations of ArrowBuf, and make one as 
deprecated. However, that would cause some other problems, so we chose to 
directly move ArrowBuf to another package. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm closed pull request #7417: ARROW-9079: [C++] Write benchmark for arithmetic kernels

2020-06-14 Thread GitBox


wesm closed pull request #7417:
URL: https://github.com/apache/arrow/pull/7417


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] wesm commented on pull request #7421: ARROW-9030: [Python] Remove pyarrow/compat.py, move some oft-used utility functions to pyarrow.lib

2020-06-14 Thread GitBox


wesm commented on pull request #7421:
URL: https://github.com/apache/arrow/pull/7421#issuecomment-643865571


   That's unfortunate. We can restore pyarrow.compat with wrappers for the 
functions that have deprecation warnings. I opened 
https://issues.apache.org/jira/browse/ARROW-9130



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou closed pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check

2020-06-14 Thread GitBox


kou closed pull request #7433:
URL: https://github.com/apache/arrow/pull/7433


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check

2020-06-14 Thread GitBox


kou commented on pull request #7433:
URL: https://github.com/apache/arrow/pull/7433#issuecomment-643824491


   +1
   
   The error has gone but new errors are appeared:
   
   
https://github.com/ursa-labs/crossbow/runs/770616977?check_suite_focus=true#step:6:10042
   
   ```text
   >   return om.readValue(jvm_spec, pojo_Field)
   E   
pyarrow.tests.test_jvm.com.fasterxml.jackson.databind.exc.MismatchedInputException:
 com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot 
deserialize instance of `java.util.ArrayList` out of START_OBJECT token
   Eat [Source: (String)"{"name": "field_name", "nullable": true, 
"type": {"name": "timestamp", "unit": "NANOSECOND", "timezone": 
"Europe/Paris"}, "children": [], "metadata": {"field meta": "field data"}}"; 
line: 1, column: 151] (through reference chain: 
org.apache.arrow.vector.types.pojo.Field["metadata"])
   ```
   
   The new errors are out of scope of this pull request.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #6729: ARROW-8229: [Java] Move ArrowBuf into the Arrow package

2020-06-14 Thread GitBox


kou commented on pull request #6729:
URL: https://github.com/apache/arrow/pull/6729#issuecomment-643823860


   This breaks Spark: 
https://github.com/ursa-labs/crossbow/runs/769424833#step:6:13025
   
   ```text
[ERROR] [Error] 
/spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:20:
 cannot find symbol
 symbol:   class ArrowBuf
 location: package io.netty.buffer
   [ERROR] [Error] 
/spark/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java:461:
 cannot find symbol
 symbol:   class ArrowBuf
 location: class 
org.apache.spark.sql.vectorized.ArrowColumnVector.ArrayAccessor
   ```
   
   Because Spark uses `io.netty.buffer.ArrowBuf`: 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java#L20
   
   Should we restore the `io.netty.buffer.ArrowBuf` name or update Spark?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7335: ARROW-9018: [C++] Remove APIs that were marked as deprecated in 0.17.0 and prior

2020-06-14 Thread GitBox


kou commented on pull request #7335:
URL: https://github.com/apache/arrow/pull/7335#issuecomment-643822639


   This breaks Turbodbc: 
https://github.com/dask/dask/blob/master/dask/dataframe/io/parquet/arrow.py#L9
   
   Turbodbc uses `Status AllocateResizableBuffer(MemoryPool* pool, ...)`: 
https://github.com/blue-yonder/turbodbc/blob/master/cpp/turbodbc_arrow/Test/tests/arrow_result_set_test.cpp#L113
   
   @xhochy Could you change Turbodbc to use 
`Result> AllocateResizableBuffer(const int64_t 
size, MemoryPool* pool = NULLPTR)`?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou commented on pull request #7421: ARROW-9030: [Python] Remove pyarrow/compat.py, move some oft-used utility functions to pyarrow.lib

2020-06-14 Thread GitBox


kou commented on pull request #7421:
URL: https://github.com/apache/arrow/pull/7421#issuecomment-643821762


   This breaks Dask: 
https://github.com/ursa-labs/crossbow/runs/769427250#step:6:12381
   
   Because Dask depends on `pyarrow.compat`: 
https://github.com/dask/dask/blob/master/dask/dataframe/io/parquet/arrow.py#L9
   
   Should we restore `pyarrow.compat` or change Dask?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check

2020-06-14 Thread GitBox


github-actions[bot] commented on pull request #7433:
URL: https://github.com/apache/arrow/pull/7433#issuecomment-643821823


   https://issues.apache.org/jira/browse/ARROW-9129



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check

2020-06-14 Thread GitBox


github-actions[bot] commented on pull request #7433:
URL: https://github.com/apache/arrow/pull/7433#issuecomment-643821184


   Revision: 4fd9f1696970c48eb9ceeca3fef975fcd9905be9
   
   Submitted crossbow builds: [ursa-labs/crossbow @ 
actions-319](https://github.com/ursa-labs/crossbow/branches/all?query=actions-319)
   
   |Task|Status|
   ||--|
   |test-conda-python-3.8-jpype|[![Github 
Actions](https://github.com/ursa-labs/crossbow/workflows/Crossbow/badge.svg?branch=actions-319-github-test-conda-python-3.8-jpype)](https://github.com/ursa-labs/crossbow/actions?query=branch:actions-319-github-test-conda-python-3.8-jpype)|



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kou opened a new pull request #7433: ARROW-9129: [Python][JPype] Remove JPype version check

2020-06-14 Thread GitBox


kou opened a new pull request #7433:
URL: https://github.com/apache/arrow/pull/7433


   Because we only run the test with the latest JPype.
   
   Error details:
   
   https://github.com/ursa-labs/crossbow/runs/769433714#step:6:7995
   
   ```text
   >   if jpype.__version_info__ >= (0, 7):
   E   TypeError: '>=' not supported between instances of 'list' and 'tuple'
   ```
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] kiszk commented on a change in pull request #7402: ARROW-9099: [C++][Gandiva] Implement trim function for string

2020-06-14 Thread GitBox


kiszk commented on a change in pull request #7402:
URL: https://github.com/apache/arrow/pull/7402#discussion_r439854412



##
File path: cpp/src/gandiva/precompiled/string_ops_test.cc
##
@@ -426,6 +426,33 @@ TEST(TestStringOps, TestReverse) {
   ctx.Reset();
 }
 
+TEST(TestStringOps, TestTrim) {
+  gandiva::ExecutionContext ctx;
+  uint64_t ctx_ptr = reinterpret_cast();
+  gdv_int32 out_len = 0;
+  const char* out_str;
+
+  out_str = trim_utf8(ctx_ptr, "TestString", 10, _len);
+  EXPECT_EQ(std::string(out_str, out_len), "TestString");
+  EXPECT_FALSE(ctx.has_error());
+
+  out_str = trim_utf8(ctx_ptr, "  TestString  ", 18, _len);
+  EXPECT_EQ(std::string(out_str, out_len), "TestString");
+  EXPECT_FALSE(ctx.has_error());
+
+  out_str = trim_utf8(ctx_ptr, " Test  çåå†bD   ", 21, _len);
+  EXPECT_EQ(std::string(out_str, out_len), "Test  çåå†bD");
+  EXPECT_FALSE(ctx.has_error());
+
+  out_str = trim_utf8(ctx_ptr, "", 0, _len);
+  EXPECT_EQ(std::string(out_str, out_len), "");
+  EXPECT_FALSE(ctx.has_error());
+
+  out_str = trim_utf8(ctx_ptr, "  ", 6, _len);
+  EXPECT_EQ(std::string(out_str, out_len), "sadfsdgfh");

Review comment:
   Is this result correct?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] github-actions[bot] commented on pull request #7432: ARROW-9127: [Rust] Update thrift dependency to 0.13 (latest)

2020-06-14 Thread GitBox


github-actions[bot] commented on pull request #7432:
URL: https://github.com/apache/arrow/pull/7432#issuecomment-643753938


   https://issues.apache.org/jira/browse/ARROW-9127



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [arrow] alamb opened a new pull request #7432: ARROW-8124: [Rust] Update thrift dependency to 0.13 (latest)

2020-06-14 Thread GitBox


alamb opened a new pull request #7432:
URL: https://github.com/apache/arrow/pull/7432


   Update to latest version of apache thrift (1.3)
   
   Rationale: We were trying to update the version of `byteorder` that an 
internal project used, but arrow/parquet -> depends on parquet-format-rs -> 
depends on thrift.
   
   @sunchao  recently updated the thrift-pin in parquet-format in 
https://github.com/apache/arrow/pull/6626 (thank you!),  so now it is possible 
to update the thrift version here as well
   
   It seems like the thrift dependency was postponed when the dependencies were 
last updated (https://github.com/apache/arrow/pull/6626 / 
https://issues.apache.org/jira/browse/ARROW-8124



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org