[GitHub] [arrow] praveenbingo commented on a change in pull request #7495: ARROW-9185: [Java][Gandiva] Make llvm build optimisation configurable from java

2020-06-22 Thread GitBox
praveenbingo commented on a change in pull request #7495: URL: https://github.com/apache/arrow/pull/7495#discussion_r443353079 ## File path: cpp/src/gandiva/configuration.h ## @@ -53,7 +55,12 @@ class GANDIVA_EXPORT Configuration { class GANDIVA_EXPORT ConfigurationBuilder {

[GitHub] [arrow] xhochy commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647331754 @github-actions crossbow submit conda-osx-clang-py36 conda-win-vs2015-py36 This is an automated message from the Ap

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647332510 Revision: 80cc7570cefe64a0cb20cd530da6241a59e4052a Submitted crossbow builds: [ursa-labs/crossbow @ actions-351](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] jianxind commented on pull request #7314: ARROW-8996: [C++] AVX2/AVX512 runtime support for aggregate sum kernel

2020-06-22 Thread GitBox
jianxind commented on pull request #7314: URL: https://github.com/apache/arrow/pull/7314#issuecomment-647334741 > I think we also need a way of setting max runtime instruction set for runtime dispatch (apologies if there is one and I missed it) Thanks, currently no. But it can easily

[GitHub] [arrow] romainfrancois commented on pull request #7499: ARROW-9179: [R] Replace usage of iris dataset in tests

2020-06-22 Thread GitBox
romainfrancois commented on pull request #7499: URL: https://github.com/apache/arrow/pull/7499#issuecomment-647357048 Thanks :-) This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] romainfrancois commented on a change in pull request #7435: ARROW-8779: [R] Implement conversion to List

2020-06-22 Thread GitBox
romainfrancois commented on a change in pull request #7435: URL: https://github.com/apache/arrow/pull/7435#discussion_r443386121 ## File path: r/src/array_from_vector.cpp ## @@ -201,6 +202,67 @@ struct VectorToArrayConverter { return Status::OK(); } + template + ar

[GitHub] [arrow] xhochy commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647388094 @github-actions crossbow submit conda-osx-clang-py36 This is an automated message from the Apache Git Service. To r

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647391220 Revision: 7e7b9f9d497d6256aaad68436a7d72bed4842c34 Submitted crossbow builds: [ursa-labs/crossbow @ actions-352](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] pitrou commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443428904 ## File path: cpp/src/arrow/util/value_parsing.h ## @@ -565,6 +565,39 @@ static inline bool ParseTimestampStrptime(const char* buf, size_t length, return

[GitHub] [arrow] pitrou commented on pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647397849 @projjal Where do the null-terminated strings come from? Doesn't Gandiva operate on Arrow data? This is an automat

[GitHub] [arrow] projjal commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
projjal commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443432148 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d //

[GitHub] [arrow] pitrou commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443432735 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d //

[GitHub] [arrow] tianchen92 commented on a change in pull request #7496: ARROW-7084: [C++] ArrayRangeEquals should check for full type equality?

2020-06-22 Thread GitBox
tianchen92 commented on a change in pull request #7496: URL: https://github.com/apache/arrow/pull/7496#discussion_r443433850 ## File path: cpp/src/arrow/compare.cc ## @@ -984,7 +984,8 @@ bool ArrayRangeEquals(const Array& left, const Array& right, int64_t left_start_ bool a

[GitHub] [arrow] tianchen92 commented on pull request #7496: ARROW-7084: [C++] ArrayRangeEquals should check for full type equality?

2020-06-22 Thread GitBox
tianchen92 commented on pull request #7496: URL: https://github.com/apache/arrow/pull/7496#issuecomment-647402771 With this change, generate_non_canonical_map_case would fail, i think it is because the map field names. What is the right fix here? @pitrou -

[GitHub] [arrow] projjal commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
projjal commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443434522 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d //

[GitHub] [arrow] pitrou closed pull request #7512: ARROW-9204: [C++][Flight] Change records_per_stream to int64

2020-06-22 Thread GitBox
pitrou closed pull request #7512: URL: https://github.com/apache/arrow/pull/7512 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou closed pull request #7511: ARROW-9205: [Documentation] Fix typos in Columnar.rst

2020-06-22 Thread GitBox
pitrou closed pull request #7511: URL: https://github.com/apache/arrow/pull/7511 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] xhochy commented on pull request #7497: WIP: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647407639 @github-actions crossbow submit conda-* This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647408989 https://issues.apache.org/jira/browse/ARROW-8149 This is an automated message from the Apache Git Serv

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647408675 Revision: 7e7b9f9d497d6256aaad68436a7d72bed4842c34 Submitted crossbow builds: [ursa-labs/crossbow @ actions-353](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] pitrou commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443441540 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d //

[GitHub] [arrow] projjal commented on a change in pull request #7504: ARROW-9193: [C++] Add method to parse date from null-terminated string

2020-06-22 Thread GitBox
projjal commented on a change in pull request #7504: URL: https://github.com/apache/arrow/pull/7504#discussion_r443442994 ## File path: cpp/src/gandiva/to_date_holder.cc ## @@ -83,7 +83,7 @@ int64_t ToDateHolder::operator()(ExecutionContext* context, const std::string& d //

[GitHub] [arrow] kszucs commented on a change in pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
kszucs commented on a change in pull request #7497: URL: https://github.com/apache/arrow/pull/7497#discussion_r443444034 ## File path: dev/tasks/conda-recipes/arrow-cpp/build-pyarrow.sh ## @@ -16,10 +16,26 @@ export PYARROW_WITH_ORC=1 export PYARROW_WITH_PARQUET=1 export PYAR

[GitHub] [arrow] pitrou commented on a change in pull request #7503: ARROW-4429: [Doc] Add Git conventions to contributing guidelines

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7503: URL: https://github.com/apache/arrow/pull/7503#discussion_r443443849 ## File path: docs/source/developers/contributing.rst ## @@ -127,3 +127,52 @@ To contribute a patch: * Add new unit tests for your code. Thank you in adv

[GitHub] [arrow] kszucs commented on a change in pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
kszucs commented on a change in pull request #7497: URL: https://github.com/apache/arrow/pull/7497#discussion_r443447582 ## File path: dev/tasks/conda-recipes/arrow-cpp/meta.yaml ## @@ -1,124 +1,237 @@ +{% set version = ARROW_VERSION %} +{% set number = "6" %} +{% set cuda_enab

[GitHub] [arrow] kszucs commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
kszucs commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647415488 @xhochy have you manually updated the variant files or backported/copied from the upstream repo? This is an automa

[GitHub] [arrow] suvayu commented on a change in pull request #7503: ARROW-4429: [Doc] Add Git conventions to contributing guidelines

2020-06-22 Thread GitBox
suvayu commented on a change in pull request #7503: URL: https://github.com/apache/arrow/pull/7503#discussion_r443448786 ## File path: docs/source/developers/contributing.rst ## @@ -127,3 +127,52 @@ To contribute a patch: * Add new unit tests for your code. Thank you in adv

[GitHub] [arrow] xhochy commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647415713 > @xhochy have you manually updated the variant files or backported/copied from the upstream repo? Copied.

[GitHub] [arrow] xhochy commented on a change in pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on a change in pull request #7497: URL: https://github.com/apache/arrow/pull/7497#discussion_r443449477 ## File path: dev/tasks/conda-recipes/arrow-cpp/meta.yaml ## @@ -1,124 +1,237 @@ +{% set version = ARROW_VERSION %} +{% set number = "6" %} +{% set cuda_enab

[GitHub] [arrow] xhochy commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647416565 > > @xhochy have you manually updated the variant files or backported/copied from the upstream repo? > > Copied. I would also --

[GitHub] [arrow] xhochy closed pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy closed pull request #7497: URL: https://github.com/apache/arrow/pull/7497 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] xhochy removed a comment on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy removed a comment on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647416565 > > @xhochy have you manually updated the variant files or backported/copied from the upstream repo? > > Copied. I would also --

[GitHub] [arrow] xhochy commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647418158 > @xhochy have you manually updated the variant files or backported/copied from the upstream repo? I would like to auto-generate them soo but therefore I would need to auto

[GitHub] [arrow] pitrou commented on pull request #7496: ARROW-7084: [C++] ArrayRangeEquals should check for full type equality?

2020-06-22 Thread GitBox
pitrou commented on pull request #7496: URL: https://github.com/apache/arrow/pull/7496#issuecomment-647422120 @tianchen92 I'll take a look. This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] suvayu commented on a change in pull request #7503: ARROW-4429: [Doc] Add Git conventions to contributing guidelines

2020-06-22 Thread GitBox
suvayu commented on a change in pull request #7503: URL: https://github.com/apache/arrow/pull/7503#discussion_r443457092 ## File path: docs/source/developers/contributing.rst ## @@ -127,3 +127,52 @@ To contribute a patch: * Add new unit tests for your code. Thank you in adv

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647429536 > I'd like to treat in-kernel type promotions as an anti-pattern in general. There are upsides and downsides to it. The downside is that users of the Arrow library a

[GitHub] [arrow] xhochy commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647466585 @github-actions crossbow submit conda-linux-gcc-py36-cuda This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647467405 Revision: a71843c26aea5a31a4502e7ad9cd2a28f0be380d Submitted crossbow builds: [ursa-labs/crossbow @ actions-354](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7513: ARROW-9207: [Python] Clean-up internal FileSource class

2020-06-22 Thread GitBox
jorisvandenbossche opened a new pull request #7513: URL: https://github.com/apache/arrow/pull/7513 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7513: ARROW-9207: [Python] Clean-up internal FileSource class

2020-06-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #7513: URL: https://github.com/apache/arrow/pull/7513#discussion_r443506768 ## File path: python/pyarrow/tests/test_parquet.py ## @@ -579,6 +579,22 @@ def test_pandas_parquet_native_file_roundtrip(tempdir, use_legacy_dat

[GitHub] [arrow] github-actions[bot] commented on pull request #7513: ARROW-9207: [Python] Clean-up internal FileSource class

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7513: URL: https://github.com/apache/arrow/pull/7513#issuecomment-647474678 https://issues.apache.org/jira/browse/ARROW-9207 This is an automated message from the Apache Git Serv

[GitHub] [arrow] xhochy commented on a change in pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on a change in pull request #7497: URL: https://github.com/apache/arrow/pull/7497#discussion_r443513409 ## File path: dev/tasks/conda-recipes/arrow-cpp/build-pyarrow.sh ## @@ -16,10 +16,26 @@ export PYARROW_WITH_ORC=1 export PYARROW_WITH_PARQUET=1 export PYAR

[GitHub] [arrow] xhochy commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647486024 @github-actions crossbow submit conda-linux-gcc-py36-cuda This is an automated message from the Apache Git Service.

[GitHub] [arrow] github-actions[bot] commented on pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7497: URL: https://github.com/apache/arrow/pull/7497#issuecomment-647487528 Revision: 932047d63f45ff65db933151561160462664c71c Submitted crossbow builds: [ursa-labs/crossbow @ actions-355](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] romainfrancois opened a new pull request #7514: ARROW-6235 : [R] Conversion from arrow::BinaryArray to R character vector not implemented

2020-06-22 Thread GitBox
romainfrancois opened a new pull request #7514: URL: https://github.com/apache/arrow/pull/7514 Going with list of raw vectors because strings can't hold nulls: ``` r library(arrow, warn.conflicts = FALSE) raws <- vctrs::list_of(as.raw(0:2), as.raw(0:255), .ptype = raw())

[GitHub] [arrow] github-actions[bot] commented on pull request #7514: ARROW-6235 : [R] Conversion from arrow::BinaryArray to R character vector not implemented

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7514: URL: https://github.com/apache/arrow/pull/7514#issuecomment-647496418 https://issues.apache.org/jira/browse/ARROW-6235 This is an automated message from the Apache Git Serv

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #7395: URL: https://github.com/apache/arrow/pull/7395#discussion_r443535516 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -1117,6 +1115,15 @@ def test_open_dataset_from_uri_s3(s3_connection, s3_server): w

[GitHub] [arrow] wesm commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
wesm commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647519089 > The downside is that users of the Arrow library are exposed to the implementation details of how each kernel can grow the resulting array. I'm not saying that. I'm proposing

[GitHub] [arrow] pitrou commented on a change in pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7395: URL: https://github.com/apache/arrow/pull/7395#discussion_r443562617 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -1117,6 +1115,15 @@ def test_open_dataset_from_uri_s3(s3_connection, s3_server): with fs.open_

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647533705 @emkornfield I meant that we should treat the results of "ursabot benchmark" as informational only and certainly not authoritative --

[GitHub] [arrow] jorisvandenbossche commented on pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-06-22 Thread GitBox
jorisvandenbossche commented on pull request #7272: URL: https://github.com/apache/arrow/pull/7272#issuecomment-647540218 I added a C++ version (didn't yet update R to use it) This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-06-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #7272: URL: https://github.com/apache/arrow/pull/7272#discussion_r443582903 ## File path: cpp/src/arrow/table.cc ## @@ -362,6 +362,23 @@ Result> Table::RenameColumns( return Table::Make(::arrow::schema(std::move(fields

[GitHub] [arrow] xhochy closed pull request #7497: ARROW-8149: [C++/Python] Enable CUDA Support in conda recipes

2020-06-22 Thread GitBox
xhochy closed pull request #7497: URL: https://github.com/apache/arrow/pull/7497 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] pitrou closed pull request #7510: ARROW-7012: [C++] Add comments explaining high level detail about ChunkedArray class and questions about chunk sizes

2020-06-22 Thread GitBox
pitrou closed pull request #7510: URL: https://github.com/apache/arrow/pull/7510 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #7395: URL: https://github.com/apache/arrow/pull/7395#discussion_r443585915 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -1117,6 +1115,15 @@ def test_open_dataset_from_uri_s3(s3_connection, s3_server): w

[GitHub] [arrow] pitrou commented on pull request #7504: ARROW-9193: [C++] Avoid spurious intermediate string copy in ToDateHolder

2020-06-22 Thread GitBox
pitrou commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647544705 @projjal Can you fix the Windows build failures (see AppVeyor build): ``` C:/projects/arrow/cpp/src/gandiva/to_date_holder_test.cc(55): error C2220: warning treated as error -

[GitHub] [arrow] pitrou commented on a change in pull request #7395: ARROW-9089: [Python] A PyFileSystem handler for fsspec-based filesystems

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7395: URL: https://github.com/apache/arrow/pull/7395#discussion_r443590078 ## File path: python/pyarrow/tests/test_dataset.py ## @@ -1117,6 +1115,15 @@ def test_open_dataset_from_uri_s3(s3_connection, s3_server): with fs.open_

[GitHub] [arrow] pitrou commented on pull request #7503: ARROW-4429: [Doc] Add Git conventions to contributing guidelines

2020-06-22 Thread GitBox
pitrou commented on pull request #7503: URL: https://github.com/apache/arrow/pull/7503#issuecomment-647550891 @nealrichardson Do you want to review this? This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] pitrou commented on a change in pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7272: URL: https://github.com/apache/arrow/pull/7272#discussion_r443596563 ## File path: cpp/src/arrow/table.cc ## @@ -362,6 +362,23 @@ Result> Table::RenameColumns( return Table::Make(::arrow::schema(std::move(fields)), std::mov

[GitHub] [arrow] pitrou commented on a change in pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7272: URL: https://github.com/apache/arrow/pull/7272#discussion_r443596910 ## File path: cpp/src/arrow/table.cc ## @@ -362,6 +362,23 @@ Result> Table::RenameColumns( return Table::Make(::arrow::schema(std::move(fields)), std::mov

[GitHub] [arrow] pitrou commented on pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-06-22 Thread GitBox
pitrou commented on pull request #7272: URL: https://github.com/apache/arrow/pull/7272#issuecomment-647555153 It would also be nice to add a test on the C++ side, if that's not too time-consuming. This is an automated messag

[GitHub] [arrow] xhochy commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
xhochy commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647555249 @github-actions crossbow submit -g linux conda This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] github-actions[bot] commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647560430 Revision: 5ab75c48ef009fcee7ef602d39c2327d629d080a Submitted crossbow builds: [ursa-labs/crossbow @ actions-356](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] xhochy commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
xhochy commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647566604 @github-actions crossbow submit -g conda-* This is an automated message from the Apache Git Service. To respond to

[GitHub] [arrow] xhochy commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
xhochy commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647569437 @github-actions crossbow submit conda-* This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #7452: ARROW-8961: [C++] Add utf8proc library to toolchain

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7452: URL: https://github.com/apache/arrow/pull/7452#issuecomment-647571012 Revision: 5ab75c48ef009fcee7ef602d39c2327d629d080a Submitted crossbow builds: [ursa-labs/crossbow @ actions-357](https://github.com/ursa-labs/crossbow/branches/a

[GitHub] [arrow] maartenbreddels commented on pull request #7449: ARROW-9133: [C++] Add utf8_upper and utf8_lower

2020-06-22 Thread GitBox
maartenbreddels commented on pull request #7449: URL: https://github.com/apache/arrow/pull/7449#issuecomment-647572424 Validating the utf8 string made the results slightly slower, but still much better then the initial results. Invalid utf8 characters are now replaced by a '?', as co

[GitHub] [arrow] andygrove commented on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-22 Thread GitBox
andygrove commented on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-647573600 @nevi-me @nealrichardson I have some time available this week. How can I help with this? This is an automated m

[GitHub] [arrow] nevi-me edited a comment on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-22 Thread GitBox
nevi-me edited a comment on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-647578097 @andygrove thanks. I pushed some of what I worked on over the weekend. The main problem seems to be that we don't read all record batches from the Arrow files. As a result

[GitHub] [arrow] nevi-me commented on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-22 Thread GitBox
nevi-me commented on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-647578097 @andygrove thanks. I pushed done if what I worked on over the weekend. The main problem seems to be that we don't read all record batches from the Arrow files. As a result we end

[GitHub] [arrow] nevi-me commented on pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-22 Thread GitBox
nevi-me commented on pull request #7297: URL: https://github.com/apache/arrow/pull/7297#issuecomment-647579286 Oh, there's also a TODO on array data comparisons. I think we only compare the array lengths and types for now; but not their data. --

[GitHub] [arrow] nealrichardson commented on pull request #7272: ARROW-8314: [Python] Add a Table.select method to select a subset of columns

2020-06-22 Thread GitBox
nealrichardson commented on pull request #7272: URL: https://github.com/apache/arrow/pull/7272#issuecomment-647587606 > I added a C++ version (didn't yet update R to use it) Are you intending to make that change in this PR too? ---

[GitHub] [arrow] nealrichardson commented on a change in pull request #7297: ARROW-6945: [Rust][Integration] Run rust integration tests

2020-06-22 Thread GitBox
nealrichardson commented on a change in pull request #7297: URL: https://github.com/apache/arrow/pull/7297#discussion_r443641664 ## File path: dev/archery/archery/integration/datagen.py ## @@ -1492,21 +1492,25 @@ def _temp_path(): generate_primitive_large_offsets_cas

[GitHub] [arrow] fsaintjacques commented on a change in pull request #7514: ARROW-6235 : [R] Implement conversion from arrow::BinaryArray to R character vector

2020-06-22 Thread GitBox
fsaintjacques commented on a change in pull request #7514: URL: https://github.com/apache/arrow/pull/7514#discussion_r443639526 ## File path: r/src/array_to_vector.cpp ## @@ -693,6 +741,9 @@ std::shared_ptr Converter::Make(const std::shared_ptr& type case Type::BOOL:

[GitHub] [arrow] nealrichardson commented on pull request #7290: ARROW-1692: [Java] UnionArray round trip not working

2020-06-22 Thread GitBox
nealrichardson commented on pull request #7290: URL: https://github.com/apache/arrow/pull/7290#issuecomment-647589883 Assuming CI is still passing, is this good to merge? What is left to do, or who needs to approve? This is

[GitHub] [arrow] fsaintjacques commented on pull request #7504: ARROW-9193: [C++] Avoid spurious intermediate string copy in ToDateHolder

2020-06-22 Thread GitBox
fsaintjacques commented on pull request #7504: URL: https://github.com/apache/arrow/pull/7504#issuecomment-647596137 Any reason not to use a `util::string_view`? This is an automated message from the Apache Git Service. To re

[GitHub] [arrow] pitrou closed pull request #7496: ARROW-7084: [C++] Check for full type equality in ArrayRangeEquals

2020-06-22 Thread GitBox
pitrou closed pull request #7496: URL: https://github.com/apache/arrow/pull/7496 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] mrkn commented on a change in pull request #7477: ARROW-4221: Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
mrkn commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443655002 ## File path: python/pyarrow/tensor.pxi ## @@ -199,7 +202,13 @@ shape: {0.shape}""".format(self) for x in dim_names: c_dim_names.

[GitHub] [arrow] mrkn commented on pull request #7477: ARROW-4221: Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
mrkn commented on pull request #7477: URL: https://github.com/apache/arrow/pull/7477#issuecomment-647607307 @wesm @pitrou Could you please review it? This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] mrkn commented on pull request #7477: ARROW-4221: Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
mrkn commented on pull request #7477: URL: https://github.com/apache/arrow/pull/7477#issuecomment-647607661 @rok Could you please review the pyarrow part? This is an automated message from the Apache Git Service. To respond t

[GitHub] [arrow] jorisvandenbossche opened a new pull request #7515: ARROW-2801: [Python] Add split_row_group keyword to ParquetDataset / document split_by_row_group

2020-06-22 Thread GitBox
jorisvandenbossche opened a new pull request #7515: URL: https://github.com/apache/arrow/pull/7515 ARROW-2801 Still WIP, didn't yet add tests This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #7515: ARROW-2801: [Python] Add split_row_group keyword to ParquetDataset / document split_by_row_group

2020-06-22 Thread GitBox
jorisvandenbossche commented on a change in pull request #7515: URL: https://github.com/apache/arrow/pull/7515#discussion_r443663674 ## File path: python/pyarrow/parquet.py ## @@ -1404,27 +1403,36 @@ def __init__(self, path_or_paths, filesystem=None, filters=None, sel

[GitHub] [arrow] github-actions[bot] commented on pull request #7515: ARROW-2801: [Python] Add split_row_group keyword to ParquetDataset / document split_by_row_group

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7515: URL: https://github.com/apache/arrow/pull/7515#issuecomment-647616782 https://issues.apache.org/jira/browse/ARROW-2801 This is an automated message from the Apache Git Serv

[GitHub] [arrow] alexbaden commented on pull request #7263: ARROW-8927: [C++] Support dictionary memo in CUDA IPC ReadRecordBatch functions

2020-06-22 Thread GitBox
alexbaden commented on pull request #7263: URL: https://github.com/apache/arrow/pull/7263#issuecomment-647623057 I'd be interested in trying to add it into the record batch stream reader -- the idea being you could send both CPU data and GPU data pointers in a single serialized object with

[GitHub] [arrow] pitrou commented on pull request #7263: ARROW-8927: [C++] Support dictionary memo in CUDA IPC ReadRecordBatch functions

2020-06-22 Thread GitBox
pitrou commented on pull request #7263: URL: https://github.com/apache/arrow/pull/7263#issuecomment-647627751 @alexbaden I'm not sure I understand what you have in mind. Could you elaborate? This is an automated message from

[GitHub] [arrow] alexbaden commented on pull request #7263: ARROW-8927: [C++] Support dictionary memo in CUDA IPC ReadRecordBatch functions

2020-06-22 Thread GitBox
alexbaden commented on pull request #7263: URL: https://github.com/apache/arrow/pull/7263#issuecomment-647628896 I'd like to have an object similar to `RecordBatchStreamReader` that could read the schema, IPC handles, and dictionary memo from a single buffer, and a corresponding writer of

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647633470 Here's the benchmark comparison with clang-8 ``` $ archery benchmark diff --cc=clang-8 --cxx=clang++-8 89844a100 653817301 --benchmark-filter=Cast

[GitHub] [arrow] pitrou commented on pull request #7263: ARROW-8927: [C++] Support dictionary memo in CUDA IPC ReadRecordBatch functions

2020-06-22 Thread GitBox
pitrou commented on pull request #7263: URL: https://github.com/apache/arrow/pull/7263#issuecomment-647633067 Well, can't you just use `BufferReader` and `BufferOutputStream`? Am I missing something? This is an automated mes

[GitHub] [arrow] wesm edited a comment on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm edited a comment on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647633470 Here's the benchmark comparison with clang-8 ``` $ archery benchmark diff --cc=clang-8 --cxx=clang++-8 2db48b4 653817301 --benchmark-filter=Cast

[GitHub] [arrow] pitrou commented on a change in pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7506: URL: https://github.com/apache/arrow/pull/7506#discussion_r443664270 ## File path: cpp/src/arrow/util/int_util.cc ## @@ -461,75 +472,434 @@ Status IndexBoundsCheckImpl(const ArrayData& indices, uint64_t upper_limit) { if (

[GitHub] [arrow] wesm commented on a change in pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm commented on a change in pull request #7506: URL: https://github.com/apache/arrow/pull/7506#discussion_r443689349 ## File path: cpp/src/arrow/util/int_util.cc ## @@ -461,75 +472,434 @@ Status IndexBoundsCheckImpl(const ArrayData& indices, uint64_t upper_limit) { if (in

[GitHub] [arrow] wesm opened a new pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
wesm opened a new pull request #7516: URL: https://github.com/apache/arrow/pull/7516 This uses pandas to generate a sorted text table when using `archery benchmark diff`. Example: https://github.com/apache/arrow/pull/7506#issuecomment-647633470 There's some other incidental ch

[GitHub] [arrow] github-actions[bot] commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
github-actions[bot] commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647641059 https://issues.apache.org/jira/browse/ARROW-9201 This is an automated message from the Apache Git Serv

[GitHub] [arrow] wesm commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
wesm commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647641007 @kszucs can you assist me with adapting ursabot for these changes? I think we can use pandas's `DataFrame.to_html` to create a colorized table for GitHub, too https://pandas.pydata.o

[GitHub] [arrow] wesm commented on a change in pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm commented on a change in pull request #7506: URL: https://github.com/apache/arrow/pull/7506#discussion_r443696884 ## File path: cpp/src/arrow/util/int_util.cc ## @@ -461,75 +472,434 @@ Status IndexBoundsCheckImpl(const ArrayData& indices, uint64_t upper_limit) { if (in

[GitHub] [arrow] wesm commented on pull request #7506: ARROW-9197: [C++] Overhaul integer/floating point casting: vectorize truncation checks, reduce binary size

2020-06-22 Thread GitBox
wesm commented on pull request #7506: URL: https://github.com/apache/arrow/pull/7506#issuecomment-647649670 For curiosity, here are the same benchmarks on my laptop with gcc-8 (using the new output formatting from ARROW-9201) ``` benchmark

[GitHub] [arrow] pitrou commented on a change in pull request #7477: ARROW-4221: Add canonical flag in COO sparse index

2020-06-22 Thread GitBox
pitrou commented on a change in pull request #7477: URL: https://github.com/apache/arrow/pull/7477#discussion_r443707259 ## File path: python/pyarrow/tensor.pxi ## @@ -339,6 +350,15 @@ shape: {0.shape}""".format(self) def non_zero_length(self): return self.stp.non

[GitHub] [arrow] pitrou commented on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
pitrou commented on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647673667 Just a small question: why are `m` and `b` used for millions and billions, respectively? (I would probably expect `M` and `G`)

[GitHub] [arrow] wesm commented on pull request #7143: ARROW-8504: [C++] Add BitRunReader and use it in parquet

2020-06-22 Thread GitBox
wesm commented on pull request #7143: URL: https://github.com/apache/arrow/pull/7143#issuecomment-647674585 @emkornfield I'm sorry that I've been neglecting this PR. I will try to rebase this and investigate the perf questions a little bit -

[GitHub] [arrow] kszucs edited a comment on pull request #7516: ARROW-9201: [Archery] More user-friendly console output for benchmark diffs, add repetitions argument, don't build unit tests

2020-06-22 Thread GitBox
kszucs edited a comment on pull request #7516: URL: https://github.com/apache/arrow/pull/7516#issuecomment-647677894 > @kszucs can you assist me with adapting ursabot for these changes? Sure. > I think we can use pandas's `DataFrame.to_html` to create a colorized table for Git

  1   2   >