[GitHub] [arrow] github-actions[bot] commented on pull request #7633: ARROW-9322: [R] Dataset documentation polishing

2020-07-03 Thread GitBox
github-actions[bot] commented on pull request #7633: URL: https://github.com/apache/arrow/pull/7633#issuecomment-653683641 https://issues.apache.org/jira/browse/ARROW-9322 This is an automated message from the Apache Git

[GitHub] [arrow] kou commented on pull request #7633: ARROW-9322: [R] Dataset documentation polishing

2020-07-03 Thread GitBox
kou commented on pull request #7633: URL: https://github.com/apache/arrow/pull/7633#issuecomment-653683024 Oh, sorry... This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-03 Thread GitBox
fsaintjacques commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r449647801 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -760,6 +760,98 @@ def test_fragments_parquet_row_groups(tempdir): assert len(result) ==

[GitHub] [arrow] sunchao commented on pull request #7613: ARROW-8881: [Rust] Add large binary, string and list support

2020-07-03 Thread GitBox
sunchao commented on pull request #7613: URL: https://github.com/apache/arrow/pull/7613#issuecomment-653628617 @nevi-me Are we trying to get this in before the 1.0.0 release? if you are in a hurry we can quickly review this and merge. However IMO the code duplication can largely be

[GitHub] [arrow] kou closed pull request #7627: ARROW-9307: [Ruby] Add Arrow::RecordBatchIterator#to_a

2020-07-03 Thread GitBox
kou closed pull request #7627: URL: https://github.com/apache/arrow/pull/7627 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-03 Thread GitBox
fsaintjacques commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r449647463 ## File path: python/pyarrow/_dataset.pyx ## @@ -861,10 +861,15 @@ cdef class RowGroupInfo: name =

[GitHub] [arrow] kou commented on pull request #7627: ARROW-9307: [Ruby] Add Arrow::RecordBatchIterator#to_a

2020-07-03 Thread GitBox
kou commented on pull request #7627: URL: https://github.com/apache/arrow/pull/7627#issuecomment-653679207 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] kou closed pull request #7626: ARROW-9306: [Ruby] Add support for Arrow::RecordBatch.new(raw_table)

2020-07-03 Thread GitBox
kou closed pull request #7626: URL: https://github.com/apache/arrow/pull/7626 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] kou commented on pull request #7626: ARROW-9306: [Ruby] Add support for Arrow::RecordBatch.new(raw_table)

2020-07-03 Thread GitBox
kou commented on pull request #7626: URL: https://github.com/apache/arrow/pull/7626#issuecomment-653679038 +1 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] github-actions[bot] commented on pull request #7634: ARROW-9323: [Ruby] Add Red Arrow Dataset

2020-07-03 Thread GitBox
github-actions[bot] commented on pull request #7634: URL: https://github.com/apache/arrow/pull/7634#issuecomment-653687487 https://issues.apache.org/jira/browse/ARROW-9323 This is an automated message from the Apache Git

[GitHub] [arrow] nealrichardson commented on pull request #7633: ARROW-9322: [R] Dataset documentation polishing

2020-07-03 Thread GitBox
nealrichardson commented on pull request #7633: URL: https://github.com/apache/arrow/pull/7633#issuecomment-653693747 No worries! This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] wjones1 commented on pull request #6979: ARROW-7800 [Python] implement iter_batches() method for ParquetFile and ParquetReader

2020-07-03 Thread GitBox
wjones1 commented on pull request #6979: URL: https://github.com/apache/arrow/pull/6979#issuecomment-653687403 Looking at the code, no longer think this `batch_size` parameter actually would affect those other read methods. There are a few different "batch_size" parameters floating

[GitHub] [arrow] emkornfield commented on a change in pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-03 Thread GitBox
emkornfield commented on a change in pull request #7604: URL: https://github.com/apache/arrow/pull/7604#discussion_r449738860 ## File path: cpp/src/arrow/python/datetime.cc ## @@ -262,6 +302,42 @@ int64_t PyDate_to_days(PyDateTime_Date* pydate) {

[GitHub] [arrow] emkornfield commented on pull request #6156: ARROW-7539: [Java] FieldVector getFieldBuffers API should not set reader/writer indices

2020-07-03 Thread GitBox
emkornfield commented on pull request #6156: URL: https://github.com/apache/arrow/pull/6156#issuecomment-653722056 @tianchen92 does @jacques-n proposal make sense? This is an automated message from the Apache Git Service.

[GitHub] [arrow] kou opened a new pull request #7634: ARROW-9323: [Ruby] Add Red Arrow Dataset

2020-07-03 Thread GitBox
kou opened a new pull request #7634: URL: https://github.com/apache/arrow/pull/7634 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] emkornfield commented on pull request #7630: ARROW-9317: [Java] add few testcases for arrow-memory

2020-07-03 Thread GitBox
emkornfield commented on pull request #7630: URL: https://github.com/apache/arrow/pull/7630#issuecomment-653721399 @dota17 thank you for the PR. It looks like this is out of data after some refactoring. This is an

[GitHub] [arrow] xhochy commented on pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-03 Thread GitBox
xhochy commented on pull request #7593: URL: https://github.com/apache/arrow/pull/7593#issuecomment-653379639 Adressed all comments and CI is happy, too. This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] kou opened a new pull request #7629: ARROW-9316: [C++] Use "Dataset" instead of "Datasets"

2020-07-03 Thread GitBox
kou opened a new pull request #7629: URL: https://github.com/apache/arrow/pull/7629 Because we use "dataset" for ID of this module such as libarrow_dataset.so and arrow/dataset/api.h. This is an automated message from the

[GitHub] [arrow] github-actions[bot] commented on pull request #7629: ARROW-9316: [C++] Use "Dataset" instead of "Datasets"

2020-07-03 Thread GitBox
github-actions[bot] commented on pull request #7629: URL: https://github.com/apache/arrow/pull/7629#issuecomment-653382631 https://issues.apache.org/jira/browse/ARROW-9316 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #7630: ARROW-9317: [Java] add few testcases for arrow-memory

2020-07-03 Thread GitBox
github-actions[bot] commented on pull request #7630: URL: https://github.com/apache/arrow/pull/7630#issuecomment-653431067 https://issues.apache.org/jira/browse/ARROW-9317 This is an automated message from the Apache Git

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-03 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r449471803 ## File path: java/vector/src/main/java/org/apache/arrow/vector/complex/impl/UnionLargeListReader.java ## @@ -0,0 +1,118 @@ +/* + * Licensed to the Apache

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-03 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r449474540 ## File path: java/vector/src/main/java/org/apache/arrow/vector/validate/ValidateVectorVisitor.java ## @@ -135,6 +136,45 @@ public Void visit(ListVector

[GitHub] [arrow] liyafan82 commented on pull request #7628: ARROW-9315: [Java] Fix the failure of testAllocationManagerType

2020-07-03 Thread GitBox
liyafan82 commented on pull request #7628: URL: https://github.com/apache/arrow/pull/7628#issuecomment-653446059 > Hey @liyafan82 I found the same last night and put something similar into #7619 > > I noticed this was failing a lot on Java 11 for me. I see. Thanks a lot for

[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r449492262 ## File path: java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to

[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r449494433 ## File path: java/vector/pom.xml ## @@ -50,6 +50,12 @@ commons-codec 1.10 + + org.apache.arrow + arrow-memory-netty +

[GitHub] [arrow] rymurr commented on a change in pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-03 Thread GitBox
rymurr commented on a change in pull request #7275: URL: https://github.com/apache/arrow/pull/7275#discussion_r449468817 ## File path: java/vector/src/main/java/org/apache/arrow/vector/ipc/JsonFileReader.java ## @@ -710,7 +712,10 @@ private void readFromJsonIntoVector(Field

[GitHub] [arrow] dota17 opened a new pull request #7630: ARROW-9317: [Java] add few testcases for arrow-memory

2020-07-03 Thread GitBox
dota17 opened a new pull request #7630: URL: https://github.com/apache/arrow/pull/7630 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-03 Thread GitBox
jorisvandenbossche commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r449498839 ## File path: python/pyarrow/tests/test_array.py ## @@ -583,12 +584,14 @@ def test_dictionary_from_numpy(): assert

[GitHub] [arrow] rymurr commented on pull request #7275: ARROW-6110: [Java][Integration] Support LargeList Type and add integration test with C++

2020-07-03 Thread GitBox
rymurr commented on pull request #7275: URL: https://github.com/apache/arrow/pull/7275#issuecomment-653444220 > Thanks for the quick update @rymurr , it looks pretty good! Only a couple minor things. I see quite a few instances of `offsetBuffer.getLong/setLong(i * OFFSET_WIDTH)` that I

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7604: ARROW-9223: [Python] Propagate timezone information in pandas conversion

2020-07-03 Thread GitBox
jorisvandenbossche commented on a change in pull request #7604: URL: https://github.com/apache/arrow/pull/7604#discussion_r449479330 ## File path: cpp/src/arrow/python/datetime.cc ## @@ -262,6 +302,42 @@ int64_t PyDate_to_days(PyDateTime_Date* pydate) {

[GitHub] [arrow] kszucs commented on a change in pull request #7519: ARROW-9017: [C++][Python] Refactor scalar bindings

2020-07-03 Thread GitBox
kszucs commented on a change in pull request #7519: URL: https://github.com/apache/arrow/pull/7519#discussion_r449512190 ## File path: python/pyarrow/tests/test_array.py ## @@ -583,12 +584,14 @@ def test_dictionary_from_numpy(): assert d2.dictionary.to_pylist() ==

[GitHub] [arrow] wesm commented on pull request #7632: ARROW-6775: [C++][Python] Implement list_value_lengths and list_parent_indices functions

2020-07-03 Thread GitBox
wesm commented on pull request #7632: URL: https://github.com/apache/arrow/pull/7632#issuecomment-653586307 cc @brills This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] wesm opened a new pull request #7632: ARROW-6775: [C++][Python] Implement list_value_lengths and list_parent_indices functions

2020-07-03 Thread GitBox
wesm opened a new pull request #7632: URL: https://github.com/apache/arrow/pull/7632 This adds two functions that operate on list types: * list_value_lengths: returns an int32 (for List) or int64 (for LargeList) array with the number of elements in each list value slot *

[GitHub] [arrow] github-actions[bot] commented on pull request #7632: ARROW-6775: [C++][Python] Implement list_value_lengths and list_parent_indices functions

2020-07-03 Thread GitBox
github-actions[bot] commented on pull request #7632: URL: https://github.com/apache/arrow/pull/7632#issuecomment-653587663 https://issues.apache.org/jira/browse/ARROW-6775 This is an automated message from the Apache Git

[GitHub] [arrow] liyafan82 closed pull request #7628: ARROW-9315: [Java] Fix the failure of testAllocationManagerType

2020-07-03 Thread GitBox
liyafan82 closed pull request #7628: URL: https://github.com/apache/arrow/pull/7628 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r449540606 ## File path: java/vector/pom.xml ## @@ -50,6 +50,12 @@ commons-codec 1.10 + + org.apache.arrow + arrow-memory-netty +

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r449546201 ## File path: java/vector/pom.xml ## @@ -50,6 +50,12 @@ commons-codec 1.10 + + org.apache.arrow + arrow-memory-netty +

[GitHub] [arrow] liyafan82 commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
liyafan82 commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-653520801 > Thanks for the comments @liyafan82 . I have updated and rebased to pull in your fix from #7628 Sorry for the trouble, and thank you for the effort.

[GitHub] [arrow] liyafan82 commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
liyafan82 commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r449553368 ## File path: java/vector/pom.xml ## @@ -50,6 +50,12 @@ commons-codec 1.10 + + org.apache.arrow + arrow-memory-netty +

[GitHub] [arrow] wesm commented on a change in pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-03 Thread GitBox
wesm commented on a change in pull request #7593: URL: https://github.com/apache/arrow/pull/7593#discussion_r449553297 ## File path: cpp/src/arrow/compute/kernels/scalar_string_benchmark.cc ## @@ -56,6 +57,11 @@ static void AsciiUpper(benchmark::State& state) {

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r449559785 ## File path: java/vector/pom.xml ## @@ -50,6 +50,12 @@ commons-codec 1.10 + + org.apache.arrow + arrow-memory-netty +

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-03 Thread GitBox
jorisvandenbossche commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r449567025 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -635,6 +635,37 @@ def test_make_fragment_from_buffer(): assert

[GitHub] [arrow] rymurr commented on a change in pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
rymurr commented on a change in pull request #7619: URL: https://github.com/apache/arrow/pull/7619#discussion_r449540176 ## File path: java/memory/memory-core/src/test/java/org/apache/arrow/memory/DefaultAllocationManagerFactory.java ## @@ -0,0 +1,60 @@ +/* + * Licensed to

[GitHub] [arrow] kszucs opened a new pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-03 Thread GitBox
kszucs opened a new pull request #7631: URL: https://github.com/apache/arrow/pull/7631 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] github-actions[bot] commented on pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-03 Thread GitBox
github-actions[bot] commented on pull request #7631: URL: https://github.com/apache/arrow/pull/7631#issuecomment-653516244 https://issues.apache.org/jira/browse/ARROW-8651 This is an automated message from the Apache Git

[GitHub] [arrow] wesm closed pull request #7593: ARROW-9160: [C++] Implement contains for exact matches

2020-07-03 Thread GitBox
wesm closed pull request #7593: URL: https://github.com/apache/arrow/pull/7593 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-03 Thread GitBox
jorisvandenbossche commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r449556844 ## File path: python/pyarrow/_dataset.pyx ## @@ -773,6 +789,14 @@ cdef class FileFragment(Fragment): Fragment.init(self, sp)

[GitHub] [arrow] kszucs commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-03 Thread GitBox
kszucs commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r449562966 ## File path: python/pyarrow/_dataset.pyx ## @@ -773,6 +789,14 @@ cdef class FileFragment(Fragment): Fragment.init(self, sp)

[GitHub] [arrow] kszucs commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-03 Thread GitBox
kszucs commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r449562766 ## File path: python/pyarrow/_dataset.pyx ## @@ -887,6 +911,14 @@ cdef class ParquetFileFragment(FileFragment): FileFragment.init(self, sp)

[GitHub] [arrow] kszucs commented on a change in pull request #7631: ARROW-8651: [Python][Dataset] Support pickling of Dataset objects

2020-07-03 Thread GitBox
kszucs commented on a change in pull request #7631: URL: https://github.com/apache/arrow/pull/7631#discussion_r449564759 ## File path: python/pyarrow/_dataset.pyx ## @@ -773,6 +789,14 @@ cdef class FileFragment(Fragment): Fragment.init(self, sp)

[GitHub] [arrow] rymurr commented on pull request #7619: ARROW-9300: [Java] Separate Netty Memory to its own module

2020-07-03 Thread GitBox
rymurr commented on pull request #7619: URL: https://github.com/apache/arrow/pull/7619#issuecomment-653514629 Thanks for the comments @liyafan82 . I have updated and rebased to pull in your fix from #7628 This is an

[GitHub] [arrow] wesm closed pull request #7629: ARROW-9316: [C++] Use "Dataset" instead of "Datasets"

2020-07-03 Thread GitBox
wesm closed pull request #7629: URL: https://github.com/apache/arrow/pull/7629 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] wesm closed pull request #7618: ARROW-7010: [C++] Implement decimal-to-float casts

2020-07-03 Thread GitBox
wesm closed pull request #7618: URL: https://github.com/apache/arrow/pull/7618 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7623: ARROW-9108: [C++][Dataset] Add supports for missing type in Statistics to Scalar conversion

2020-07-03 Thread GitBox
jorisvandenbossche commented on a change in pull request #7623: URL: https://github.com/apache/arrow/pull/7623#discussion_r449610259 ## File path: python/pyarrow/_dataset.pyx ## @@ -861,10 +861,15 @@ cdef class RowGroupInfo: name =

[GitHub] [arrow] mrkn commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-07-03 Thread GitBox
mrkn commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-653559925 @wesm I decided to separate a pull-request for performance optimization because I may need some days to get this work done. I'll make a new ticket for optimization, and clean up

[GitHub] [arrow] wesm commented on pull request #7539: ARROW-9156: [C++] Reducing the code size of the tensor module

2020-07-03 Thread GitBox
wesm commented on pull request #7539: URL: https://github.com/apache/arrow/pull/7539#issuecomment-653560364 OK, sounds good, let me know when this is ready to be merged This is an automated message from the Apache Git