[GitHub] [arrow] pitrou commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
pitrou commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652316126 Ok, then I was mistaken. Sorry :-) This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-652319908 > Thanks a lot @liyafan82 I have addressed your suggestions and rebased @rymurr Thanks for your work. Will merge when it turns green.

[GitHub] [arrow] rymurr commented on pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on pull request #7275: URL: https://github.com/apache/arrow/pull/7275#issuecomment-652323175 > Thanks for working on this @rymurr ! Apologies for taking so long to review.. It looks pretty good, but I saw what looked like inconsistencies in the `LargeListVector` APIs

[GitHub] [arrow] pitrou commented on a change in pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
pitrou commented on a change in pull request #7593: URL: https://github.com/apache/arrow/pull/7593#discussion_r448326108 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -297,6 +297,116 @@ void AddAsciiLength(FunctionRegistry* registry) {

[GitHub] [arrow] pitrou commented on pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
pitrou commented on pull request #7544: URL: https://github.com/apache/arrow/pull/7544#issuecomment-652383485 Thanks for the update. I will merge this PR once CI is green. This is an automated message from the Apache Git

[GitHub] [arrow] liyafan82 commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7543: URL: https://github.com/apache/arrow/pull/7543#issuecomment-652395634 > @liyafan82 The problem actually isn't with big-endian platforms! It's because Java's ByteBuffer [defaults to

[GitHub] [arrow] github-actions[bot] commented on pull request #7605: ARROW-9283: [Python] Expose build info

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7605: URL: https://github.com/apache/arrow/pull/7605#issuecomment-652332966 https://issues.apache.org/jira/browse/ARROW-9283 This is an automated message from the Apache Git

[GitHub] [arrow] lidavidm commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
lidavidm commented on pull request #7543: URL: https://github.com/apache/arrow/pull/7543#issuecomment-652379935 @liyafan82 The problem actually isn't with big-endian platforms! It's because Java's ByteBuffer [defaults to

[GitHub] [arrow] liyafan82 commented on pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7544: URL: https://github.com/apache/arrow/pull/7544#issuecomment-652395253 @pitrou Thanks a lot for your effort. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-01 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r448255547 ## File path: java/vector/src/main/codegen/templates/UnionLargeListWriter.java ## @@ -0,0 +1,232 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] [arrow] pitrou opened a new pull request #7605: ARROW-9283: [Python] Expose build info

2020-07-01 Thread GitBox
pitrou opened a new pull request #7605: URL: https://github.com/apache/arrow/pull/7605 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] lidavidm commented on a change in pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
lidavidm commented on a change in pull request #7543: URL: https://github.com/apache/arrow/pull/7543#discussion_r448345297 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -872,6 +874,7 @@ public void setBytes(long index, ByteBuffer src) {

[GitHub] [arrow] pitrou commented on pull request #7605: ARROW-9283: [Python] Expose build info

2020-07-01 Thread GitBox
pitrou commented on pull request #7605: URL: https://github.com/apache/arrow/pull/7605#issuecomment-652330730 There's a problem where we already generate `__version__` and it ends up different, for example: ```python >>> import pyarrow as pa

[GitHub] [arrow] liyafan82 commented on a change in pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7543: URL: https://github.com/apache/arrow/pull/7543#discussion_r448338460 ## File path: java/memory/src/main/java/org/apache/arrow/memory/ArrowBuf.java ## @@ -872,6 +874,7 @@ public void setBytes(long index, ByteBuffer src) {

[GitHub] [arrow] jorisvandenbossche commented on pull request #7546: ARROW-8733: [C++][Dataset][Python] Expose RowGroupInfo statistics values

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7546: URL: https://github.com/apache/arrow/pull/7546#issuecomment-652497706 @rjzamora `num_rows` is already available on the RowGroupInfo object

[GitHub] [arrow] kiszk commented on pull request #7507: ARROW-8797: [C++] Create test to receive RecordBatch for different endian

2020-07-01 Thread GitBox
kiszk commented on pull request #7507: URL: https://github.com/apache/arrow/pull/7507#issuecomment-652482871 @wesm Thank you for your suggestion. I will pursue the approach that you suggested. I will check the integration test command line tool and the integration test with the

[GitHub] [arrow] github-actions[bot] commented on pull request #7609: ARROW-9289: [R] Remove deprecated functions

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7609: URL: https://github.com/apache/arrow/pull/7609#issuecomment-652507951 https://issues.apache.org/jira/browse/ARROW-9289 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson closed pull request #7602: ARROW-9083: [R] collect int64, uint32, uint64 as R integer type if not out of bounds

2020-07-01 Thread GitBox
nealrichardson closed pull request #7602: URL: https://github.com/apache/arrow/pull/7602 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] jorisvandenbossche commented on pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7478: URL: https://github.com/apache/arrow/pull/7478#issuecomment-652504934 > We will have to resolve the sum([]) -> null/0 by introducing a "minimum valid values" option. Do we already have a JIRA to track this?

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #7478: ARROW-9055: [C++] Add sum/mean/minmax kernels for Boolean type

2020-07-01 Thread GitBox
jorisvandenbossche edited a comment on pull request #7478: URL: https://github.com/apache/arrow/pull/7478#issuecomment-652504934 > We will have to resolve the sum([]) -> null/0 by introducing a "minimum valid values" option. Do we already have a JIRA to track this? EDIT -> it

[GitHub] [arrow] kiszk commented on pull request #7596: ARROW-9163: [C++] Validate UTF8 contents of a StringArray

2020-07-01 Thread GitBox
kiszk commented on pull request #7596: URL: https://github.com/apache/arrow/pull/7596#issuecomment-652505301 Looks good This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] emkornfield commented on a change in pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield commented on a change in pull request #7604: URL: https://github.com/apache/arrow/pull/7604#discussion_r448413419 ## File path: cpp/src/arrow/python/datetime.cc ## @@ -262,6 +265,42 @@ int64_t PyDate_to_days(PyDateTime_Date* pydate) {

[GitHub] [arrow] jorisvandenbossche commented on pull request #7536: ARROW-8647: [C++][Python][Dataset] Allow partitioning fields to be inferred with dictionary type

2020-07-01 Thread GitBox
jorisvandenbossche commented on pull request #7536: URL: https://github.com/apache/arrow/pull/7536#issuecomment-652493993 @bkietz thanks for the update ensuring all uniques as dictionary values! Testing this out, I ran into an issue with HivePartitioning -> ARROW-9288 / #7608

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r448483516 ## File path: r/src/array_from_vector.cpp ## @@ -918,6 +923,97 @@ class Time64Converter : public TimeConverter { } }; +template +class

[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652431931 @emkornfield This is the new version for sum aggregate without intrinsic, could you help to review? The dense part nearly get the same scores with intrinsic for AVX2 on

[GitHub] [arrow] emkornfield commented on a change in pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield commented on a change in pull request #7604: URL: https://github.com/apache/arrow/pull/7604#discussion_r448445229 ## File path: cpp/src/arrow/python/arrow_to_pandas.cc ## @@ -951,8 +951,21 @@ struct ObjectWriterVisitor { template enable_if_timestamp

[GitHub] [arrow] nealrichardson opened a new pull request #7609: ARROW-9289: [R] Remove deprecated functions

2020-07-01 Thread GitBox
nealrichardson opened a new pull request #7609: URL: https://github.com/apache/arrow/pull/7609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r448490350 ## File path: r/src/array_to_vector.cpp ## @@ -693,6 +741,9 @@ std::shared_ptr Converter::Make(const std::shared_ptr& type case Type::BOOL:

[GitHub] [arrow] pitrou opened a new pull request #7606: ARROW-8434: [C++] Avoid multiple schema deserializations in RecordBatchFileReader

2020-07-01 Thread GitBox
pitrou opened a new pull request #7606: URL: https://github.com/apache/arrow/pull/7606 This doesn't seem to make a difference in the included benchmark. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] jianxind commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
jianxind commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652425725 @ursabot benchmark --suite-filter=arrow-compute-aggregate-benchmark This is an automated message from the Apache

[GitHub] [arrow] wesm commented on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
wesm commented on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-652450464 > Could you elaborate? Why is this not a problem with the lower/upper kernels? The data preallocation is only for fixed size outputs (eg boolean, integers, floating point,

[GitHub] [arrow] jianxind opened a new pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
jianxind opened a new pull request #7607: URL: https://github.com/apache/arrow/pull/7607 1. Add AVX2/AVX512 build version of aggregate sum/mean function. Use set_source_files_properties to append the SIMD build option. Register the SIMD path at runtime by CPU feature.

[GitHub] [arrow] github-actions[bot] commented on pull request #7607: ARROW-8996: [C++] Add AVX version for aggregate sum/mean

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7607: URL: https://github.com/apache/arrow/pull/7607#issuecomment-652433109 https://issues.apache.org/jira/browse/ARROW-8996 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7606: ARROW-8434: [C++] Avoid multiple schema deserializations in RecordBatchFileReader

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7606: URL: https://github.com/apache/arrow/pull/7606#issuecomment-652433108 https://issues.apache.org/jira/browse/ARROW-8434 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7608: URL: https://github.com/apache/arrow/pull/7608#issuecomment-652480774 https://issues.apache.org/jira/browse/ARROW-9288 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-01 Thread GitBox
jorisvandenbossche commented on a change in pull request #7608: URL: https://github.com/apache/arrow/pull/7608#discussion_r448438477 ## File path: cpp/src/arrow/dataset/partition.cc ## @@ -646,15 +657,26 @@ class HivePartitioningFactory : public PartitioningFactory { }

[GitHub] [arrow] rymurr commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-07-01 Thread GitBox
rymurr commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-652516529 This has been modified to incorporate the changes to Unions as proposed on the mailing list This is an automated

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7608: ARROW-9288: [C++][Dataset] Fix PartitioningFactory with dictionary encoding for HivePartioning

2020-07-01 Thread GitBox
jorisvandenbossche opened a new pull request #7608: URL: https://github.com/apache/arrow/pull/7608 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] nealrichardson commented on a change in pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r448491760 ## File path: r/src/array_from_vector.cpp ## @@ -918,6 +923,97 @@ class Time64Converter : public TimeConverter { } }; +template +class

[GitHub] [arrow] nealrichardson closed pull request #7514: ARROW-6235: [R] Implement conversion from arrow::BinaryArray to R character vector

2020-07-01 Thread GitBox
nealrichardson closed pull request #7514: URL: https://github.com/apache/arrow/pull/7514 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] nealrichardson closed pull request #7609: ARROW-9289: [R] Remove deprecated functions

2020-07-01 Thread GitBox
nealrichardson closed pull request #7609: URL: https://github.com/apache/arrow/pull/7609 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] github-actions[bot] commented on pull request #7612: ARROW-7011: [C++] Implement casts from float/double to decimal

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7612: URL: https://github.com/apache/arrow/pull/7612#issuecomment-652589053 https://issues.apache.org/jira/browse/ARROW-7011 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448560164 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,748 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] nealrichardson edited a comment on pull request #7613: ARROW-8881: [Rust] Add large binary, string and list support

2020-07-01 Thread GitBox
nealrichardson edited a comment on pull request #7613: URL: https://github.com/apache/arrow/pull/7613#issuecomment-652613744 > I'll look at the relevant integration tests separately. Separately as in another commit to this branch, or is there a JIRA already for enabling these

[GitHub] [arrow] kou commented on pull request #7605: ARROW-9283: [Python] Expose build info

2020-07-01 Thread GitBox
kou commented on pull request #7605: URL: https://github.com/apache/arrow/pull/7605#issuecomment-652646475 How about using our SNAPSHOT version as the next version of pyarrow? ```diff diff --git a/python/setup.py b/python/setup.py index 4c264a2d7..bc3efee77 100755 ---

[GitHub] [arrow] kiszk commented on a change in pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-01 Thread GitBox
kiszk commented on a change in pull request #7593: URL: https://github.com/apache/arrow/pull/7593#discussion_r448506425 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -297,6 +297,116 @@ void AddAsciiLength(FunctionRegistry* registry) {

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448556316 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,745 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] github-actions[bot] commented on pull request #7610: ARROW-9290: [Rust] [Parquet] Add features to allow opting out of dependencies

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7610: URL: https://github.com/apache/arrow/pull/7610#issuecomment-652553010 https://issues.apache.org/jira/browse/ARROW-9290 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson opened a new pull request #7611: ARROW-3308: [R] Convert R character vector with data exceeding 2GB to Large type

2020-07-01 Thread GitBox
nealrichardson opened a new pull request #7611: URL: https://github.com/apache/arrow/pull/7611 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448559330 ## File path: python/pyarrow/tests/test_convert_builtin.py ## @@ -968,25 +968,31 @@ def test_sequence_timestamp_from_int_with_unit(): arr_s =

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448589857 ## File path: python/pyarrow/scalar.pxi ## @@ -1217,21 +764,50 @@ cdef dict _scalar_classes = { _Type_INT16: Int16Scalar, _Type_INT32:

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448589973 ## File path: python/pyarrow/scalar.pxi ## @@ -1217,21 +764,50 @@ cdef dict _scalar_classes = { _Type_INT16: Int16Scalar, _Type_INT32:

[GitHub] [arrow] github-actions[bot] commented on pull request #7611: ARROW-3308: [R] Convert R character vector with data exceeding 2GB to Large type

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7611: URL: https://github.com/apache/arrow/pull/7611#issuecomment-652580102 https://issues.apache.org/jira/browse/ARROW-3308 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on pull request #7613: ARROW-8881: [Rust] Add large binary, string and list support

2020-07-01 Thread GitBox
nealrichardson commented on pull request #7613: URL: https://github.com/apache/arrow/pull/7613#issuecomment-652613744 > I'll look at the relevant integration tests separately. Separately as in another commit to this branch, or is there a JIRA already for enabling these integration

[GitHub] [arrow] github-actions[bot] commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652544533 Revision: a914eea4f3ab16e359adee2f37a4fb30a1eba86c Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] pitrou closed pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
pitrou closed pull request #7544: URL: https://github.com/apache/arrow/pull/7544 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448560952 ## File path: python/pyarrow/scalar.pxi ## @@ -1217,21 +767,95 @@ cdef dict _scalar_classes = { _Type_INT16: Int16Scalar, _Type_INT32:

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
jorisvandenbossche commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448619631 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,745 @@ # under the License. -_NULL = NA = None +import collections cdef

[GitHub] [arrow] BryanCutler commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
BryanCutler commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652541712 @kszucs the nightly against Spark master have been passing. Do you think you could update this to just add the test against branch-3.0 and remove branch-2.4 for now? I'm not

[GitHub] [arrow] saethlin opened a new pull request #7610: ARROW-9290: [Rust] [Parquet] Add features to allow opting out of dependencies

2020-07-01 Thread GitBox
saethlin opened a new pull request #7610: URL: https://github.com/apache/arrow/pull/7610 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to

[GitHub] [arrow] pitrou opened a new pull request #7612: ARROW-7011: [C++] Implement casts from float/double to decimal

2020-07-01 Thread GitBox
pitrou opened a new pull request #7612: URL: https://github.com/apache/arrow/pull/7612 Also naturally available in Python using the Array.cast() method. This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] nealrichardson commented on pull request #7611: ARROW-3308: [R] Convert R character vector with data exceeding 2GB to Large type

2020-07-01 Thread GitBox
nealrichardson commented on pull request #7611: URL: https://github.com/apache/arrow/pull/7611#issuecomment-652596005 The failed build is an OOM. Any recommendations for testing this? I could disable the test on CI and maybe that's fine since this code shouldn't be changing much, but

[GitHub] [arrow] nevi-me opened a new pull request #7613: ARROW-8881: [Rust] Add large binary, string and list support

2020-07-01 Thread GitBox
nevi-me opened a new pull request #7613: URL: https://github.com/apache/arrow/pull/7613 Similar to other implementations, this creates binary, string and list arrays with `i64` offsets instead of `i32`. Behaviourally, everything's the same as the `i32` counterparts, except for the larger

[GitHub] [arrow] kszucs commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
kszucs commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652649719 @github-actions crossbow submit test-*spark* This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448620980 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,745 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
jorisvandenbossche commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448619823 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,748 @@ # under the License. -_NULL = NA = None +import collections cdef

[GitHub] [arrow] github-actions[bot] commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652650521 Revision: 55d941160d6cee2da24f951cee31928beed7c76d Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] nealrichardson opened a new pull request #7614: ARROW-8977: [R] Table$create with schema crashes with some dictionary index types

2020-07-01 Thread GitBox
nealrichardson opened a new pull request #7614: URL: https://github.com/apache/arrow/pull/7614 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] emkornfield commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-07-01 Thread GitBox
emkornfield commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-652738045 @liyafan82 if you aren't already please make sure you use the merge script under dev to merge PRs This is an

[GitHub] [arrow] liyafan82 commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7543: URL: https://github.com/apache/arrow/pull/7543#issuecomment-652737865 Seems a rebase is required. This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #7614: ARROW-8977: [R] Table$create with schema crashes with some dictionary index types

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7614: URL: https://github.com/apache/arrow/pull/7614#issuecomment-652699046 https://issues.apache.org/jira/browse/ARROW-8977 This is an automated message from the Apache Git

[GitHub] [arrow] liyafan82 commented on pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7347: URL: https://github.com/apache/arrow/pull/7347#issuecomment-652739238 > @liyafan82 if you aren't already please make sure you use the merge script under dev to merge PRs @emkornfield Thanks a lot for your kind reminder. I will use the

[GitHub] [arrow] kou opened a new pull request #7615: ARROW-9294: [GLib] Add GArrowFunction and related objects

2020-07-01 Thread GitBox
kou opened a new pull request #7615: URL: https://github.com/apache/arrow/pull/7615 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] liyafan82 merged pull request #7347: ARROW-8230: [Java] Remove netty dependency from arrow-memory

2020-07-01 Thread GitBox
liyafan82 merged pull request #7347: URL: https://github.com/apache/arrow/pull/7347 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] cyb70289 commented on pull request #7603: ARROW-9206: [C++][Flight] Add latency benchmark

2020-07-01 Thread GitBox
cyb70289 commented on pull request #7603: URL: https://github.com/apache/arrow/pull/7603#issuecomment-652237692 CI failure reproduces [[Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build](https://issues.apache.org/jira/browse/ARROW-8999)

[GitHub] [arrow] liyafan82 commented on pull request #7543: ARROW-9221: [Java] account for big-endian buffers in ArrowBuf.setBytes

2020-07-01 Thread GitBox
liyafan82 commented on pull request #7543: URL: https://github.com/apache/arrow/pull/7543#issuecomment-652211147 @lidavidm Thanks for reporting the problem. I am curious how this problem has ever arised. The default byte order is platform dependent. For a big endian machine, the program

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448166294 ## File path: cpp/src/arrow/ipc/reader.cc ## @@ -684,7 +685,19 @@ Status ReadDictionary(const Buffer& metadata, DictionaryMemo* dictionary_memo,

[GitHub] [arrow] liyafan82 commented on a change in pull request #7544: ARROW-7285: [C++] ensure C++ implementation meets clarified dictionary spec

2020-07-01 Thread GitBox
liyafan82 commented on a change in pull request #7544: URL: https://github.com/apache/arrow/pull/7544#discussion_r448166650 ## File path: cpp/src/arrow/ipc/reader.cc ## @@ -684,7 +685,19 @@ Status ReadDictionary(const Buffer& metadata, DictionaryMemo* dictionary_memo,

[GitHub] [arrow] emkornfield commented on pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-652765985 One other concern. For timezone naive Timestamps, I'm not sure if we should be adjusting the datetime to reflect UTC instead of system time zone. thoughts?

[GitHub] [arrow] zeevm commented on a change in pull request #7586: ARROW-9280: [Rust] [Parquet] Calculate page and column statistics

2020-07-01 Thread GitBox
zeevm commented on a change in pull request #7586: URL: https://github.com/apache/arrow/pull/7586#discussion_r448745612 ## File path: rust/parquet/src/column/writer.rs ## @@ -216,26 +278,26 @@ impl ColumnWriterImpl { def_levels_sink: vec![],

[GitHub] [arrow] zeevm commented on a change in pull request #7586: ARROW-9280: [Rust] [Parquet] Calculate page and column statistics

2020-07-01 Thread GitBox
zeevm commented on a change in pull request #7586: URL: https://github.com/apache/arrow/pull/7586#discussion_r448757699 ## File path: rust/parquet/src/column/writer.rs ## @@ -216,26 +278,26 @@ impl ColumnWriterImpl { def_levels_sink: vec![],

[GitHub] [arrow] github-actions[bot] commented on pull request #7615: ARROW-9294: [GLib] Add GArrowFunction and related objects

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7615: URL: https://github.com/apache/arrow/pull/7615#issuecomment-652728574 https://issues.apache.org/jira/browse/ARROW-9294 This is an automated message from the Apache Git

[GitHub] [arrow] emkornfield commented on pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield commented on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-652764305 If the approach is agreeable it would be nice to get in before the next release. This potentially breaks List[Timestamp] because that now returns datetimes as well but I

[GitHub] [arrow] vagarwal77 commented on issue #7616: Critical - PyArrow incompatibility with apache2/Django

2020-07-01 Thread GitBox
vagarwal77 commented on issue #7616: URL: https://github.com/apache/arrow/issues/7616#issuecomment-652768166 root@c3-dev-pe-dev1-b587b5574-pmnhb:/django/phai_web# /usr/sbin/apache2 -v Server version: Apache/2.4.38 (Debian) Server built: 2019-10-15T19:53:42

[GitHub] [arrow] houqp commented on pull request #7501: ARROW-9192: [Rust] run clippy to lint arrow crate in CI

2020-07-01 Thread GitBox
houqp commented on pull request #7501: URL: https://github.com/apache/arrow/pull/7501#issuecomment-652770694 @kszucs gentle ping, let me know what's the best way for me to help. This is an automated message from the Apache

[GitHub] [arrow] zeevm commented on a change in pull request #7586: ARROW-9280: [Rust] [Parquet] Calculate page and column statistics

2020-07-01 Thread GitBox
zeevm commented on a change in pull request #7586: URL: https://github.com/apache/arrow/pull/7586#discussion_r448745197 ## File path: rust/parquet/src/column/writer.rs ## @@ -276,12 +372,60 @@ impl ColumnWriterImpl { [values_offset..],

[GitHub] [arrow] sunchao commented on a change in pull request #7586: ARROW-9280: [Rust] [Parquet] Calculate page and column statistics

2020-07-01 Thread GitBox
sunchao commented on a change in pull request #7586: URL: https://github.com/apache/arrow/pull/7586#discussion_r448755889 ## File path: rust/parquet/src/column/writer.rs ## @@ -216,26 +278,26 @@ impl ColumnWriterImpl { def_levels_sink: vec![],

[GitHub] [arrow] vagarwal77 opened a new issue #7616: Critical - PyArrow incompatibility with apache2/Django

2020-07-01 Thread GitBox
vagarwal77 opened a new issue #7616: URL: https://github.com/apache/arrow/issues/7616 I am using Docker based deployment on AWS EKS clusters which works fine. The moment, i had added pyarrow==0.17.1 library into the requirement file, my service had stopped responding without showing any

[GitHub] [arrow] emkornfield edited a comment on pull request #7604: ARROW-9223: [Python] Propagate Timzone information in pandas conversion

2020-07-01 Thread GitBox
emkornfield edited a comment on pull request #7604: URL: https://github.com/apache/arrow/pull/7604#issuecomment-652765985 One other concern. For timezone naive Timestamps, I'm not sure if we should be adjusting the datetime to reflect UTC converted to system time zone. thoughts?

[GitHub] [arrow] sunchao commented on a change in pull request #7610: ARROW-9290: [Rust] [Parquet] Add features to allow opting out of dependencies

2020-07-01 Thread GitBox
sunchao commented on a change in pull request #7610: URL: https://github.com/apache/arrow/pull/7610#discussion_r448553159 ## File path: rust/parquet/Cargo.toml ## @@ -29,20 +29,29 @@ build = "build.rs" edition = "2018" [dependencies] -parquet-format = "~2.6"

[GitHub] [arrow] zeevm commented on a change in pull request #7586: ARROW-9280: [Rust] [Parquet] Calculate page and column statistics

2020-07-01 Thread GitBox
zeevm commented on a change in pull request #7586: URL: https://github.com/apache/arrow/pull/7586#discussion_r448764844 ## File path: rust/parquet/src/column/writer.rs ## @@ -216,26 +278,26 @@ impl ColumnWriterImpl { def_levels_sink: vec![],

[GitHub] [arrow] BryanCutler commented on pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
BryanCutler commented on pull request #6316: URL: https://github.com/apache/arrow/pull/6316#issuecomment-652542479 @github-actions crossbow submit test-conda-python-3.7-spark-branch-3.0 This is an automated message from the

[GitHub] [arrow] BryanCutler commented on a change in pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
BryanCutler commented on a change in pull request #6316: URL: https://github.com/apache/arrow/pull/6316#discussion_r448503228 ## File path: dev/tasks/tasks.yml ## @@ -1833,12 +1833,32 @@ tasks: HDFS: 2.9.2 run: conda-python-hdfs -

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448550619 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,748 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448550822 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,748 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448551033 ## File path: python/pyarrow/tests/test_scalars.py ## @@ -16,427 +16,443 @@ # under the License. import datetime +import decimal import pytest -import

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448551482 ## File path: python/pyarrow/scalar.pxi ## @@ -16,1198 +16,745 @@ # under the License. -_NULL = NA = None +import collections cdef class Scalar:

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r448551033 ## File path: python/pyarrow/tests/test_scalars.py ## @@ -16,427 +16,443 @@ # under the License. import datetime +import decimal import pytest -import

[GitHub] [arrow] github-actions[bot] commented on pull request #7613: ARROW-8881: [Rust] Add large binary, string and list support

2020-07-01 Thread GitBox
github-actions[bot] commented on pull request #7613: URL: https://github.com/apache/arrow/pull/7613#issuecomment-652606493 https://issues.apache.org/jira/browse/ARROW-8881 This is an automated message from the Apache Git

[GitHub] [arrow] kszucs commented on a change in pull request #6316: ARROW-7717: [CI] Have nightly integration test for Spark's latest release

2020-07-01 Thread GitBox
kszucs commented on a change in pull request #6316: URL: https://github.com/apache/arrow/pull/6316#discussion_r448618127 ## File path: dev/tasks/tasks.yml ## @@ -1833,12 +1833,32 @@ tasks: HDFS: 2.9.2 run: conda-python-hdfs -

  1   2   >