[GitHub] [arrow] Dandandan commented on pull request #8714: ARROW-10654: [Rust] Specialize parsing of floats / bools

2020-11-19 Thread GitBox
Dandandan commented on pull request #8714: URL: https://github.com/apache/arrow/pull/8714#issuecomment-730492620 Did some benchmarking on this. Seems like a small win. Master: ``` Running benchmarks with the following options: Opt { debug: false, iterations: 10, concurrency:

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527067914 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527080099 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] alamb commented on pull request #8630: ARROW-10540 [Rust] Improve filtering

2020-11-19 Thread GitBox
alamb commented on pull request #8630: URL: https://github.com/apache/arrow/pull/8630#issuecomment-730598527 FWIW keeping "filter" as a special case that is very fast is not unreasonable -- it is likely to be one of the most performance critical pieces of code in analytics systems, so it

[GitHub] [arrow] eerhardt commented on pull request #8702: ARROW-10634: [C#][CI] Change the build version from 2.2 to 3.1 in CI

2020-11-19 Thread GitBox
eerhardt commented on pull request #8702: URL: https://github.com/apache/arrow/pull/8702#issuecomment-730629122 > should FluentBuilderExample.csproj also be updated to 3.1? No, it doesn't build as part of CI. Let's just leave it for now. Maybe in the future we could add it as a unit

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527020685 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527019693 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] jorgecarleitao commented on pull request #8698: ARROW-10636: [Rust][Parquet] Remove rust specialization

2020-11-19 Thread GitBox
jorgecarleitao commented on pull request #8698: URL: https://github.com/apache/arrow/pull/8698#issuecomment-730570547 I do not have time to review this, but having tried this myself once (and failed miserably), I am just leaving a big thank you note to @GregBowyer for this 

[GitHub] [arrow] Fonsan commented on pull request #8717: ARROW-10659 [Ruby] Refactor Table#initialize

2020-11-19 Thread GitBox
Fonsan commented on pull request #8717: URL: https://github.com/apache/arrow/pull/8717#issuecomment-730593024 Tests are failing, I will publish a revised version once I can run the tests locally This is an automated message

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527061979 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,467 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527061776 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] Dandandan commented on a change in pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

2020-11-19 Thread GitBox
Dandandan commented on a change in pull request #8715: URL: https://github.com/apache/arrow/pull/8715#discussion_r527093860 ## File path: rust/arrow/src/datatypes.rs ## @@ -1142,6 +1142,44 @@ impl DataType { | Float64 ) } + +/// Compares this

[GitHub] [arrow] alamb closed pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
alamb closed pull request #8713: URL: https://github.com/apache/arrow/pull/8713 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527028885 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8672: ARROW-10574: [Python][Parquet] Enhance hive partition filtering.

2020-11-19 Thread GitBox
jorisvandenbossche commented on a change in pull request #8672: URL: https://github.com/apache/arrow/pull/8672#discussion_r527031173 ## File path: python/pyarrow/tests/test_parquet.py ## @@ -2008,6 +2008,19 @@ def test_filters_inclusive_set(tempdir, use_legacy_dataset):

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527031579 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8672: ARROW-10574: [Python][Parquet] Enhance hive partition filtering.

2020-11-19 Thread GitBox
jorisvandenbossche commented on a change in pull request #8672: URL: https://github.com/apache/arrow/pull/8672#discussion_r527019353 ## File path: python/pyarrow/parquet.py ## @@ -114,7 +115,24 @@ def _check_filters(filters, check_null_strings=True): Predicates may also

[GitHub] [arrow] jorgecarleitao edited a comment on pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

2020-11-19 Thread GitBox
jorgecarleitao edited a comment on pull request #8715: URL: https://github.com/apache/arrow/pull/8715#issuecomment-730555090 Hey @ch-sc , thanks for your PR! @nevi-me, could you help here? I am a bit worried about introducing another comparison of datatypes, but I was unable to find

[GitHub] [arrow] alamb commented on pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
alamb commented on pull request #8713: URL: https://github.com/apache/arrow/pull/8713#issuecomment-730564010 FYI @andygrove and @jorgecarleitao This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] alamb commented on pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
alamb commented on pull request #8713: URL: https://github.com/apache/arrow/pull/8713#issuecomment-730563705 For the record: https://github.com/rust-lang/rust/pull/79131 perhaps is perhaps the change to rustlang that @vertexclique refers to

[GitHub] [arrow] jorgecarleitao commented on pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
jorgecarleitao commented on pull request #8713: URL: https://github.com/apache/arrow/pull/8713#issuecomment-730565436 Thanks a lot. LGTM. And as a side note: super cool @vertexclique that you are contributing to rust core! 

[GitHub] [arrow] alamb commented on pull request #8708: ARROW-10647: [Rust] [Parquet] Port benchmarks from from parquet-rs to arrow repo

2020-11-19 Thread GitBox
alamb commented on pull request #8708: URL: https://github.com/apache/arrow/pull/8708#issuecomment-730565402 @wesm suggests that rather than checking in files, we write / use a data generator, which makes sense to me. I'll try and work on such a thing -- though I am not sure when I will

[GitHub] [arrow] Fonsan opened a new pull request #8717: ARROW-10659 [Ruby] Refactor Table#initialize

2020-11-19 Thread GitBox
Fonsan opened a new pull request #8717: URL: https://github.com/apache/arrow/pull/8717 DRYed it up and allowed for ruby to handle argument numbers This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] alamb closed pull request #8701: ARROW-10639: [Rust] Added examples to is_null kernel and simplified signature.

2020-11-19 Thread GitBox
alamb closed pull request #8701: URL: https://github.com/apache/arrow/pull/8701 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] alamb commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
alamb commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527144524 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] eerhardt closed pull request #8702: ARROW-10634: [C#][CI] Change the build version from 2.2 to 3.1 in CI

2020-11-19 Thread GitBox
eerhardt closed pull request #8702: URL: https://github.com/apache/arrow/pull/8702 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou opened a new pull request #8716: ARROW-10655: [C++] Add cache and memoization facility

2020-11-19 Thread GitBox
pitrou opened a new pull request #8716: URL: https://github.com/apache/arrow/pull/8716 Two cache implementations are provided: - a two-level random replacement cache - a LRU cache The LRU cache ends up 4 to 5 times faster than the 2-level RR cache. Benchmarks (arg 1: key

[GitHub] [arrow] jorgecarleitao commented on pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

2020-11-19 Thread GitBox
jorgecarleitao commented on pull request #8715: URL: https://github.com/apache/arrow/pull/8715#issuecomment-730555090 Hey @ch-sc , thanks for your PR! @nevi-me, could you help here? I am a bit worried about introducing another comparison of datatypes, but I was unable to find

[GitHub] [arrow] alamb commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
alamb commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527142608 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,467 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] alamb commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
alamb commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527142874 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,507 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] Ulimo commented on pull request #8702: ARROW-10634: [C#][CI] Change the build version from 2.2 to 3.1 in CI

2020-11-19 Thread GitBox
Ulimo commented on pull request #8702: URL: https://github.com/apache/arrow/pull/8702#issuecomment-730616626 @eerhardt sorry for pinging, notifying atleast that I added this PR, I understand that you are probably super busy, so mostly wondering when/if this can be reviewed? Or is there

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527045467 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,467 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527045467 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,467 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] github-actions[bot] commented on pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8715: URL: https://github.com/apache/arrow/pull/8715#issuecomment-730507645 https://issues.apache.org/jira/browse/ARROW-10656 This is an automated message from the Apache Git

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527045467 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,467 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] ch-sc opened a new pull request #8715: ARROW-10656 Use DataType comparison without values

2020-11-19 Thread GitBox
ch-sc opened a new pull request #8715: URL: https://github.com/apache/arrow/pull/8715 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527045467 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,467 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow] wyzhao commented on a change in pull request #8672: ARROW-10574: [Python][Parquet] Enhance hive partition filtering.

2020-11-19 Thread GitBox
wyzhao commented on a change in pull request #8672: URL: https://github.com/apache/arrow/pull/8672#discussion_r527096473 ## File path: python/pyarrow/parquet.py ## @@ -1237,6 +1264,8 @@ def validate_schemas(self): if self.common_metadata is not None:

[GitHub] [arrow] alamb commented on pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
alamb commented on pull request #8713: URL: https://github.com/apache/arrow/pull/8713#issuecomment-730561357 This is cool -- though note I think there is a broader effort afoot to remove the use of rust nightly in Arrow (e.g. https://github.com/apache/arrow/pull/8698) So I suspect

[GitHub] [arrow] alamb commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
alamb commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527147571 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,507 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527061776 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527067445 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] github-actions[bot] commented on pull request #8717: ARROW-10659 [Ruby] Refactor Table#initialize

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8717: URL: https://github.com/apache/arrow/pull/8717#issuecomment-730572150 https://issues.apache.org/jira/browse/ARROW-10659 This is an automated message from the Apache Git

[GitHub] [arrow] KirillLykov commented on a change in pull request #8461: ARROW-10197: [python][Gandiva] Execute expression on filtered data

2020-11-19 Thread GitBox
KirillLykov commented on a change in pull request #8461: URL: https://github.com/apache/arrow/pull/8461#discussion_r527184657 ## File path: python/pyarrow/includes/libgandiva.pxd ## @@ -58,6 +67,31 @@ cdef extern from "gandiva/selection_vector.h" namespace "gandiva" nogil:

[GitHub] [arrow] nevi-me commented on pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

2020-11-19 Thread GitBox
nevi-me commented on pull request #8715: URL: https://github.com/apache/arrow/pull/8715#issuecomment-730640190 > My feeling is that if we need to introduce a different comparison, this often hints that there is useless information on the `DataType` that we should eliminate. If it is

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527080099 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] Dandandan commented on a change in pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

2020-11-19 Thread GitBox
Dandandan commented on a change in pull request #8715: URL: https://github.com/apache/arrow/pull/8715#discussion_r527092867 ## File path: rust/arrow/src/datatypes.rs ## @@ -1142,6 +1142,44 @@ impl DataType { | Float64 ) } + +/// Compares this

[GitHub] [arrow] github-actions[bot] commented on pull request #8716: ARROW-10655: [C++] Add cache and memoization facility

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8716: URL: https://github.com/apache/arrow/pull/8716#issuecomment-730545225 https://issues.apache.org/jira/browse/ARROW-10655 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche commented on a change in pull request #8672: ARROW-10574: [Python][Parquet] Enhance hive partition filtering.

2020-11-19 Thread GitBox
jorisvandenbossche commented on a change in pull request #8672: URL: https://github.com/apache/arrow/pull/8672#discussion_r527139269 ## File path: python/pyarrow/parquet.py ## @@ -1237,6 +1264,8 @@ def validate_schemas(self): if self.common_metadata is not None:

[GitHub] [arrow] andygrove opened a new pull request #8720: ARROW-10585: [Rust] [DataFusion] Add join support to DataFrame and LogicalPlan

2020-11-19 Thread GitBox
andygrove opened a new pull request #8720: URL: https://github.com/apache/arrow/pull/8720 We need to merge https://github.com/apache/arrow/pull/8709 before this can be completed. This is an automated message from the Apache

[GitHub] [arrow] wesm commented on a change in pull request #8703: ARROW-10143: [C++] Rewrite Array(Range)Equals

2020-11-19 Thread GitBox
wesm commented on a change in pull request #8703: URL: https://github.com/apache/arrow/pull/8703#discussion_r527275663 ## File path: cpp/src/arrow/ipc/feather_test.cc ## @@ -286,10 +286,13 @@ TEST_P(TestFeather, PrimitiveNullRoundTrip) { std::vector> expected_fields;

[GitHub] [arrow] andygrove commented on pull request #8720: ARROW-10585: [Rust] [DataFusion] Add join support to DataFrame and LogicalPlan

2020-11-19 Thread GitBox
andygrove commented on pull request #8720: URL: https://github.com/apache/arrow/pull/8720#issuecomment-730713148 @jorgecarleitao @alamb fyi This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #8721: ARROW-10662: [Java] Avoid integer overflow for Json file reader

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8721: URL: https://github.com/apache/arrow/pull/8721#issuecomment-730804419 https://issues.apache.org/jira/browse/ARROW-10662 This is an automated message from the Apache Git

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r527402867 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one

[GitHub] [arrow] wesm closed pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-19 Thread GitBox
wesm closed pull request #8671: URL: https://github.com/apache/arrow/pull/8671 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] vertexclique commented on pull request #8715: ARROW-10656: [Rust] Use DataType comparison without values

2020-11-19 Thread GitBox
vertexclique commented on pull request #8715: URL: https://github.com/apache/arrow/pull/8715#issuecomment-730700392 > Update the PartialCmp implementation for DataType to ignore Field names for all DataType == DataType comparisons A very good approach to go forward. I like it.

[GitHub] [arrow] github-actions[bot] commented on pull request #8720: ARROW-10585: [Rust] [DataFusion] Add join support to DataFrame and LogicalPlan

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8720: URL: https://github.com/apache/arrow/pull/8720#issuecomment-730716907 https://issues.apache.org/jira/browse/ARROW-10585 This is an automated message from the Apache Git

[GitHub] [arrow] Ulimo opened a new pull request #8719: ARROW-10661: [C#] Fix benchmarking project

2020-11-19 Thread GitBox
Ulimo opened a new pull request #8719: URL: https://github.com/apache/arrow/pull/8719 After the upgrade to 3.1, the benchmarks stopped working since the version of BenchmarkDotNet used 2.1. Also the writer benchmark reaches the 2gb limit of memory stream which causes an exception.

[GitHub] [arrow] github-actions[bot] commented on pull request #8718: ARROW-10660: [Rust] Implement AVX-512 bit or operation

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8718: URL: https://github.com/apache/arrow/pull/8718#issuecomment-730698778 https://issues.apache.org/jira/browse/ARROW-10660 This is an automated message from the Apache Git

[GitHub] [arrow] github-actions[bot] commented on pull request #8719: ARROW-10661: [C#] Fix benchmarking project

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8719: URL: https://github.com/apache/arrow/pull/8719#issuecomment-730698779 https://issues.apache.org/jira/browse/ARROW-10661 This is an automated message from the Apache Git

[GitHub] [arrow] wesm commented on pull request #8708: ARROW-10647: [Rust] [Parquet] Port benchmarks from from parquet-rs to arrow repo

2020-11-19 Thread GitBox
wesm commented on pull request #8708: URL: https://github.com/apache/arrow/pull/8708#issuecomment-730710414 I'm fine with checking in these files (or putting them in an S3 bucket, or anything really), but just don't think that checking in binary files should be the project's benchmarking

[GitHub] [arrow] wesm commented on pull request #8671: ARROW-10598: [C++] Separate out bit-packing in internal::GenerateBitsUnrolled for better performance

2020-11-19 Thread GitBox
wesm commented on pull request #8671: URL: https://github.com/apache/arrow/pull/8671#issuecomment-730686505 +1, thanks all This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] wesm closed issue #8684: Adding nullable columns to RecordBatch

2020-11-19 Thread GitBox
wesm closed issue #8684: URL: https://github.com/apache/arrow/issues/8684 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] liyafan82 opened a new pull request #8721: ARROW-10662: [Java] Avoid integer overflow for Json file reader

2020-11-19 Thread GitBox
liyafan82 opened a new pull request #8721: URL: https://github.com/apache/arrow/pull/8721 For the current implementation, it uses int to represent the buffer size. However, the buffer can be larger than Integer.MAX_VALUE, which will lead to integer overflow and unexpected behaviors.

[GitHub] [arrow] emkornfield commented on a change in pull request #8632: ARROW-10426: [C++] Allow writing large strings to Parquet

2020-11-19 Thread GitBox
emkornfield commented on a change in pull request #8632: URL: https://github.com/apache/arrow/pull/8632#discussion_r527404377 ## File path: cpp/src/parquet/encoding.cc ## @@ -127,6 +130,25 @@ class PlainEncoder : public EncoderImpl, virtual public TypedEncoder { }

[GitHub] [arrow] emkornfield commented on pull request #8632: ARROW-10426: [C++] Allow writing large strings to Parquet

2020-11-19 Thread GitBox
emkornfield commented on pull request #8632: URL: https://github.com/apache/arrow/pull/8632#issuecomment-730858106 One concern with the length check, otherwise LGTM. This is an automated message from the Apache Git Service.

[GitHub] [arrow] vertexclique opened a new pull request #8718: ARROW-10660: [Rust] Implement AVX-512 bit or operation

2020-11-19 Thread GitBox
vertexclique opened a new pull request #8718: URL: https://github.com/apache/arrow/pull/8718 Implements AVX-512 bit or operation Before ``` buffer_bit_ops or time: [681.78 ns 682.42 ns 683.36 ns] Found 14 outliers among 100

[GitHub] [arrow] jorgecarleitao commented on pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on pull request #8709: URL: https://github.com/apache/arrow/pull/8709#issuecomment-730868602 Hi, thanks a lot for the feedback @andygrove and @alamb . I have changed this PR in the following ways: 1. removed all changes wrt to the `filter` and just kept the

[GitHub] [arrow] liyafan82 closed pull request #8605: ARROW-10508 [Java] Allow FixedSizeListVector to have empty children

2020-11-19 Thread GitBox
liyafan82 closed pull request #8605: URL: https://github.com/apache/arrow/pull/8605 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] github-actions[bot] commented on pull request #8712: ARROW-10651: [C++] Fix alloc-dealloc-mismatch in S3-related factory

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8712: URL: https://github.com/apache/arrow/pull/8712#issuecomment-730237733 https://issues.apache.org/jira/browse/ARROW-10651 This is an automated message from the Apache Git

[GitHub] [arrow] jorisvandenbossche closed pull request #8469: ARROW-10122: [Python] Fix to_pandas conversion with subset of columns and MultiIndex

2020-11-19 Thread GitBox
jorisvandenbossche closed pull request #8469: URL: https://github.com/apache/arrow/pull/8469 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] github-actions[bot] commented on pull request #8704: ARROW-10644: [Python] Consolidate path/filesystem handling in pyarrow.dataset and pyarrow.fs

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8704: URL: https://github.com/apache/arrow/pull/8704#issuecomment-730224215 Revision: 5805abb1c1ae4753cae6631928b43758685f628a Submitted crossbow builds: [ursa-labs/crossbow @

[GitHub] [arrow] liyafan82 commented on pull request #8605: ARROW-10508 [Java] Allow FixedSizeListVector to have empty children

2020-11-19 Thread GitBox
liyafan82 commented on pull request #8605: URL: https://github.com/apache/arrow/pull/8605#issuecomment-730210605 Merging. Thanks for your effort. @Kopilov This is an automated message from the Apache Git Service. To respond

[GitHub] [arrow] jorisvandenbossche closed pull request #8661: ARROW-10581: [Doc] IPC dictionary reference to relevant section

2020-11-19 Thread GitBox
jorisvandenbossche closed pull request #8661: URL: https://github.com/apache/arrow/pull/8661 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] jorisvandenbossche commented on pull request #8704: ARROW-10644: [Python] Consolidate path/filesystem handling in pyarrow.dataset and pyarrow.fs

2020-11-19 Thread GitBox
jorisvandenbossche commented on pull request #8704: URL: https://github.com/apache/arrow/pull/8704#issuecomment-730220775 @github-actions crossbow submit -g integration This is an automated message from the Apache Git

[GitHub] [arrow] chiyang10000 opened a new pull request #8712: ARROW-10651: [C++] Fix alloc-dealloc-mismatch in S3-related factory

2020-11-19 Thread GitBox
chiyang1 opened a new pull request #8712: URL: https://github.com/apache/arrow/pull/8712 aws-sdk-cpp requires to use matched Aws::New and Aws::Delete. Since the AwsWriteableStreamFactory provides an Aws::IOStreamFactory object that would be called inside aws-sdk-cpp, it is

[GitHub] [arrow] github-actions[bot] commented on pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8713: URL: https://github.com/apache/arrow/pull/8713#issuecomment-730270987 https://issues.apache.org/jira/browse/ARROW-10653 This is an automated message from the Apache Git

[GitHub] [arrow] vertexclique commented on pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
vertexclique commented on pull request #8713: URL: https://github.com/apache/arrow/pull/8713#issuecomment-730304017 r? @alamb @nevi-me This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] Dandandan commented on pull request #8714: ARROW-10654: [Rust] Specialize parsing of floats / bools

2020-11-19 Thread GitBox
Dandandan commented on pull request #8714: URL: https://github.com/apache/arrow/pull/8714#issuecomment-730307198 Some benchmark/context of string -> f64 is here https://github.com/Alexhuszagh/rust-lexical/ This is an

[GitHub] [arrow] Dandandan commented on pull request #8710: ARROW-10649: [Rust] Parse manually in infer_field_schema, remove lazy static dependency

2020-11-19 Thread GitBox
Dandandan commented on pull request #8710: URL: https://github.com/apache/arrow/pull/8710#issuecomment-730330984 Related:https://github.com/apache/arrow/pull/8714 This is an automated message from the Apache Git Service. To

[GitHub] [arrow] codecov-io commented on pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
codecov-io commented on pull request #8713: URL: https://github.com/apache/arrow/pull/8713#issuecomment-730264767 # [Codecov](https://codecov.io/gh/apache/arrow/pull/8713?src=pr=h1) Report > Merging [#8713](https://codecov.io/gh/apache/arrow/pull/8713?src=pr=desc) (e4c8e71) into

[GitHub] [arrow] github-actions[bot] commented on pull request #8714: ARROW-10654: [Rust] Specialize parsing of floats / bools

2020-11-19 Thread GitBox
github-actions[bot] commented on pull request #8714: URL: https://github.com/apache/arrow/pull/8714#issuecomment-730313603 https://issues.apache.org/jira/browse/ARROW-10654 This is an automated message from the Apache Git

[GitHub] [arrow] Dandandan commented on pull request #8710: ARROW-10649: [Rust] Parse manually in infer_field_schema, remove lazy static dependency

2020-11-19 Thread GitBox
Dandandan commented on pull request #8710: URL: https://github.com/apache/arrow/pull/8710#issuecomment-730331585 > Before we remove `lazy_static`, how would we also remove it in #8611? CC @Jibbow I think it can reuse the structure here and also use the all digit function.

[GitHub] [arrow] vertexclique opened a new pull request #8713: ARROW-10653: [Rust] Update toolchain nightly

2020-11-19 Thread GitBox
vertexclique opened a new pull request #8713: URL: https://github.com/apache/arrow/pull/8713 I have deployed new intrinsics to rust lang core, so I want to bring these in iterations. This is an automated message from the

[GitHub] [arrow] Dandandan edited a comment on pull request #8714: ARROW-10654: [Rust] Specialize parsing of floats / bools

2020-11-19 Thread GitBox
Dandandan edited a comment on pull request #8714: URL: https://github.com/apache/arrow/pull/8714#issuecomment-730307198 Some benchmark/context of string -> f64 is here (note: log scale) https://github.com/Alexhuszagh/rust-lexical/

[GitHub] [arrow] Dandandan opened a new pull request #8714: ARROW-10654: Specialize parsers

2020-11-19 Thread GitBox
Dandandan opened a new pull request #8714: URL: https://github.com/apache/arrow/pull/8714 Internal rust float parser is known to be slow. This change allows to have specialized implementations rather than relying on FromStr::parse. Also avoids calling `to_lowercase` for

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526834131 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,507 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526837738 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,507 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526849347 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,467 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] Jibbow commented on pull request #8710: ARROW-10649: [Rust] Parse manually in infer_field_schema, remove lazy static dependency

2020-11-19 Thread GitBox
Jibbow commented on pull request #8710: URL: https://github.com/apache/arrow/pull/8710#issuecomment-730353755 `regex` + `lazy_static` is somewhat a nice combination, but I agree that we could also recognize dates without those two libraries. But instead of using `all_digit()` and manually

[GitHub] [arrow] pitrou commented on a change in pull request #8703: ARROW-10143: [C++] Rewrite Array(Range)Equals

2020-11-19 Thread GitBox
pitrou commented on a change in pull request #8703: URL: https://github.com/apache/arrow/pull/8703#discussion_r526854313 ## File path: cpp/src/arrow/ipc/feather_test.cc ## @@ -286,10 +286,13 @@ TEST_P(TestFeather, PrimitiveNullRoundTrip) { std::vector> expected_fields;

[GitHub] [arrow] pitrou commented on pull request #8712: ARROW-10651: [C++] Fix alloc-dealloc-mismatch in S3-related factory

2020-11-19 Thread GitBox
pitrou commented on pull request #8712: URL: https://github.com/apache/arrow/pull/8712#issuecomment-730355408 Thank you. Does `Aws::New` use a different allocator? This is an automated message from the Apache Git Service. To

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526856756 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -89,6 +89,8 @@ pub trait ExecutionPlan: Debug + Send + Sync { pub enum Partitioning {

[GitHub] [arrow] pitrou closed pull request #8712: ARROW-10651: [C++] Fix alloc-dealloc-mismatch in S3-related factory

2020-11-19 Thread GitBox
pitrou closed pull request #8712: URL: https://github.com/apache/arrow/pull/8712 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorgecarleitao commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
jorgecarleitao commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526892568 ## File path: rust/datafusion/src/physical_plan/mod.rs ## @@ -89,6 +89,8 @@ pub trait ExecutionPlan: Debug + Send + Sync { pub enum Partitioning {

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526851354 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526851354 ## File path: rust/datafusion/src/physical_plan/hash_utils.rs ## @@ -0,0 +1,144 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] andygrove commented on a change in pull request #8709: ARROW-9555: [Rust] [DataFusion] Implement physical node for inner join

2020-11-19 Thread GitBox
andygrove commented on a change in pull request #8709: URL: https://github.com/apache/arrow/pull/8709#discussion_r526855265 ## File path: rust/datafusion/src/physical_plan/hash_join.rs ## @@ -0,0 +1,507 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or

[GitHub] [arrow] chiyang10000 commented on pull request #8712: ARROW-10651: [C++] Fix alloc-dealloc-mismatch in S3-related factory

2020-11-19 Thread GitBox
chiyang1 commented on pull request #8712: URL: https://github.com/apache/arrow/pull/8712#issuecomment-730362129 > Thank you. Does `Aws::New` use a different allocator? Referring to

[GitHub] [arrow] Dandandan commented on pull request #8710: ARROW-10649: [Rust] Parse manually in infer_field_schema, remove lazy static dependency

2020-11-19 Thread GitBox
Dandandan commented on pull request #8710: URL: https://github.com/apache/arrow/pull/8710#issuecomment-730390464 > `regex` + `lazy_static` is somewhat a nice combination, but I agree that we could also recognize dates without those two libraries. But instead of using `all_digit()` and

  1   2   >