[GitHub] [arrow] jonkeane opened a new pull request #10861: ARROW-13538: [R] [CI] Don't test DuckDB in the minimal build

2021-08-03 Thread GitBox
jonkeane opened a new pull request #10861: URL: https://github.com/apache/arrow/pull/10861 Also request the correct version of duckdb now that it's been released. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[GitHub] [arrow] github-actions[bot] commented on pull request #10861: ARROW-13538: [R] [CI] Don't test DuckDB in the minimal build

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10861: URL: https://github.com/apache/arrow/pull/10861#issuecomment-891999588 https://issues.apache.org/jira/browse/ARROW-13538 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] jonkeane commented on pull request #10861: ARROW-13538: [R] [CI] Don't test DuckDB in the minimal build

2021-08-03 Thread GitBox
jonkeane commented on pull request #10861: URL: https://github.com/apache/arrow/pull/10861#issuecomment-891999525 @github-actions crossbow submit test-r-minimal -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [arrow] kharoc commented on a change in pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

2021-08-03 Thread GitBox
kharoc commented on a change in pull request #10854: URL: https://github.com/apache/arrow/pull/10854#discussion_r681931753 ## File path: python/pyarrow/tests/test_table.py ## @@ -1685,3 +1685,60 @@ def test_table_select(): result = table.select(['f2']) expected = pa.t

[GitHub] [arrow] kharoc commented on a change in pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

2021-08-03 Thread GitBox
kharoc commented on a change in pull request #10854: URL: https://github.com/apache/arrow/pull/10854#discussion_r681932100 ## File path: python/pyarrow/table.pxi ## @@ -616,6 +616,53 @@ cdef class RecordBatch(_PandasConvertible): self.sp_batch = batch self.bat

[GitHub] [arrow] jonkeane commented on pull request #10861: ARROW-13538: [R] [CI] Don't test DuckDB in the minimal build

2021-08-03 Thread GitBox
jonkeane commented on pull request #10861: URL: https://github.com/apache/arrow/pull/10861#issuecomment-892001495 @github-actions crossbow submit test-r-minimal-build -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [arrow] github-actions[bot] commented on pull request #10861: ARROW-13538: [R] [CI] Don't test DuckDB in the minimal build

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10861: URL: https://github.com/apache/arrow/pull/10861#issuecomment-892002107 Revision: 2f9ee26e36ca438ce1bee3204e7f2c85c6804dac Submitted crossbow builds: [ursacomputing/crossbow @ actions-706](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] jorisvandenbossche commented on pull request #10162: ARROW-12506: [Python] Improve modularity of pyarrow codebase: _ipc module

2021-08-03 Thread GitBox
jorisvandenbossche commented on pull request #10162: URL: https://github.com/apache/arrow/pull/10162#issuecomment-892015848 I think it would be better to first discuss more what we think the "public Cython API" story should be. Because there currently are packages that use the cython APIs,

[GitHub] [arrow] pitrou opened a new pull request #10862: ARROW-13132: [C++] Add Scalar validation

2021-08-03 Thread GitBox
pitrou opened a new pull request #10862: URL: https://github.com/apache/arrow/pull/10862 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow] github-actions[bot] commented on pull request #10862: ARROW-13132: [C++] Add Scalar validation

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10862: URL: https://github.com/apache/arrow/pull/10862#issuecomment-892025469 https://issues.apache.org/jira/browse/ARROW-13132 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] jorisvandenbossche commented on pull request #10854: ARROW-13089:[Python]Allow creating RecordBatch from Python dict

2021-08-03 Thread GitBox
jorisvandenbossche commented on pull request #10854: URL: https://github.com/apache/arrow/pull/10854#issuecomment-892031609 @kharoc can you also take a look at my non-inline comment (https://github.com/apache/arrow/pull/10854#pullrequestreview-721106382) about avoiding duplication between

[GitHub] [arrow] westonpace commented on pull request #10729: ARROW-12513: [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls

2021-08-03 Thread GitBox
westonpace commented on pull request #10729: URL: https://github.com/apache/arrow/pull/10729#issuecomment-892031859 Also, it seems we were not writing page statistics at all for data page V2. I added it back in but wasn't sure if that was intentionally disabled for any reason. -- This

[GitHub] [arrow] cpcloud commented on a change in pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
cpcloud commented on a change in pull request #10849: URL: https://github.com/apache/arrow/pull/10849#discussion_r681966121 ## File path: r/inst/build_arrow_static.sh ## @@ -75,6 +79,7 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \ -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON \

[GitHub] [arrow] thisisnic commented on a change in pull request #10765: ARROW-13399: [R] Update dataset.Rmd vignette

2021-08-03 Thread GitBox
thisisnic commented on a change in pull request #10765: URL: https://github.com/apache/arrow/pull/10765#discussion_r681966895 ## File path: r/vignettes/dataset.Rmd ## @@ -77,39 +79,44 @@ feel free to grab only a year or two of data. If you don't have the taxi data downloaded

[GitHub] [arrow] cpcloud commented on a change in pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
cpcloud commented on a change in pull request #10849: URL: https://github.com/apache/arrow/pull/10849#discussion_r681967419 ## File path: r/inst/build_arrow_static.sh ## @@ -45,6 +45,10 @@ else ARROW_DEFAULT_PARAM="OFF" fi +if echo "$ARROW_R_CXXFLAGS" | grep -q "flto"; th

[GitHub] [arrow] jonkeane commented on a change in pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
jonkeane commented on a change in pull request #10849: URL: https://github.com/apache/arrow/pull/10849#discussion_r681984940 ## File path: r/inst/build_arrow_static.sh ## @@ -75,6 +79,7 @@ ${CMAKE} -DARROW_BOOST_USE_SHARED=OFF \ -DCMAKE_EXPORT_NO_PACKAGE_REGISTRY=ON \

[GitHub] [arrow] jonkeane commented on pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
jonkeane commented on pull request #10849: URL: https://github.com/apache/arrow/pull/10849#issuecomment-892050606 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please log

[GitHub] [arrow] github-actions[bot] commented on pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10849: URL: https://github.com/apache/arrow/pull/10849#issuecomment-892051274 Revision: 02bd56a82ff42565d9db8a788452b4d6eb4a1264 Submitted crossbow builds: [ursacomputing/crossbow @ actions-707](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #812: Implement vectorized hashing for DictionaryArray types

2021-08-03 Thread GitBox
Dandandan commented on a change in pull request #812: URL: https://github.com/apache/arrow-datafusion/pull/812#discussion_r681987051 ## File path: datafusion/src/physical_plan/hash_utils.rs ## @@ -245,9 +249,54 @@ macro_rules! hash_array_float { }; } -/// Creates hash v

[GitHub] [arrow] edponce commented on a change in pull request #10855: ARROW-12946: [C++] String swap case kernel

2021-08-03 Thread GitBox
edponce commented on a change in pull request #10855: URL: https://github.com/apache/arrow/pull/10855#discussion_r682002907 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -318,6 +373,26 @@ struct UTF8LowerTransform : public CaseMappingTransform { template

[GitHub] [arrow-cookbook] Nlte opened a new pull request #2: Explicit arr object creation

2021-08-03 Thread GitBox
Nlte opened a new pull request #2: URL: https://github.com/apache/arrow-cookbook/pull/2 Hi, I have started playing with the cookbook examples and there is a little doc section that I did not find clear hence here this PR. ### Issue The **testsetup::** blocs in the rst files are no

[GitHub] [arrow] lidavidm commented on pull request #10855: ARROW-12946: [C++] String swap case kernel

2021-08-03 Thread GitBox
lidavidm commented on pull request #10855: URL: https://github.com/apache/arrow/pull/10855#issuecomment-892094979 For CI, it appears that Ubuntu 20.04 provides utf8proc 2.5, but the isupper/islower functions are not provided until 2.6: https://juliastrings.github.io/utf8proc/releases/

[GitHub] [arrow] lidavidm edited a comment on pull request #10855: ARROW-12946: [C++] String swap case kernel

2021-08-03 Thread GitBox
lidavidm edited a comment on pull request #10855: URL: https://github.com/apache/arrow/pull/10855#issuecomment-892094979 For CI, it appears that Ubuntu 20.04 provides utf8proc 2.5, but the isupper/islower functions are not provided until 2.6: https://juliastrings.github.io/utf8proc/release

[GitHub] [arrow] lidavidm commented on pull request #10842: ARROW-13469: [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h

2021-08-03 Thread GitBox
lidavidm commented on pull request #10842: URL: https://github.com/apache/arrow/pull/10842#issuecomment-892096317 The failure here looks like a flake. Merging. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL a

[GitHub] [arrow] lidavidm closed pull request #10842: ARROW-13469: [C++] Suppress -Wmissing-field-initializers in DayMilliseconds arrow/type.h

2021-08-03 Thread GitBox
lidavidm closed pull request #10842: URL: https://github.com/apache/arrow/pull/10842 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubs

[GitHub] [arrow] nealrichardson commented on a change in pull request #10851: ARROW-13519: [R] Make doc examples less noisy

2021-08-03 Thread GitBox
nealrichardson commented on a change in pull request #10851: URL: https://github.com/apache/arrow/pull/10851#discussion_r682033544 ## File path: r/R/ipc_stream.R ## @@ -68,8 +68,8 @@ write_ipc_stream <- function(x, sink, ...) { #' @return A `raw` vector containing the bytes of

[GitHub] [arrow] nealrichardson commented on a change in pull request #10851: ARROW-13519: [R] Make doc examples less noisy

2021-08-03 Thread GitBox
nealrichardson commented on a change in pull request #10851: URL: https://github.com/apache/arrow/pull/10851#discussion_r682033849 ## File path: r/R/compute.R ## @@ -284,7 +283,7 @@ is_in <- function(x, table, ...) { #' `Int64`. #' @examplesIf arrow_available() #' cyl_vals <

[GitHub] [arrow-datafusion] NGA-TRAN opened a new issue #821: Internal error : Unsupported data type in hasher

2021-08-03 Thread GitBox
NGA-TRAN opened a new issue #821: URL: https://github.com/apache/arrow-datafusion/issues/821 **Describe the bug** while running the below SQL, I this this internal error `Error running remote query: status: Internal, message: "Internal error reading points from database 84491

[GitHub] [arrow] nealrichardson commented on pull request #10710: ARROW-11460: [R] Use system libraries if present on Linux

2021-08-03 Thread GitBox
nealrichardson commented on pull request #10710: URL: https://github.com/apache/arrow/pull/10710#issuecomment-892102594 @github-actions crossbow submit -g r -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abov

[GitHub] [arrow] edponce commented on pull request #10855: ARROW-12946: [C++] String swap case kernel

2021-08-03 Thread GitBox
edponce commented on pull request #10855: URL: https://github.com/apache/arrow/pull/10855#issuecomment-892102908 An alternative solution is to use helper functions already available, refer to https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/kernels/scalar_string.cc#L1391-L

[GitHub] [arrow] jonkeane commented on pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
jonkeane commented on pull request #10849: URL: https://github.com/apache/arrow/pull/10849#issuecomment-892103043 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] github-actions[bot] commented on pull request #10710: ARROW-11460: [R] Use system libraries if present on Linux

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10710: URL: https://github.com/apache/arrow/pull/10710#issuecomment-892103270 Revision: f11123784bde348b69665dfd0c9f7b302a910209 Submitted crossbow builds: [ursacomputing/crossbow @ actions-708](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] github-actions[bot] commented on pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10849: URL: https://github.com/apache/arrow/pull/10849#issuecomment-892103644 Revision: 6f2f0f77d4ce62a966b537dd85d6e32e28d1648f Submitted crossbow builds: [ursacomputing/crossbow @ actions-709](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow-datafusion] Dandandan commented on issue #821: Internal error : Unsupported data type in hasher

2021-08-03 Thread GitBox
Dandandan commented on issue #821: URL: https://github.com/apache/arrow-datafusion/issues/821#issuecomment-892104898 The error message could be improved. I think it's trying to hash the dictionaries `bucket_id` and `partition_id` which misses.an implementation. There is a PR over

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #812: Implement vectorized hashing for DictionaryArray types

2021-08-03 Thread GitBox
Dandandan commented on a change in pull request #812: URL: https://github.com/apache/arrow-datafusion/pull/812#discussion_r682042466 ## File path: datafusion/src/physical_plan/hash_utils.rs ## @@ -245,9 +249,54 @@ macro_rules! hash_array_float { }; } -/// Creates hash v

[GitHub] [arrow] nealrichardson commented on pull request #10855: ARROW-12946: [C++] String swap case kernel

2021-08-03 Thread GitBox
nealrichardson commented on pull request #10855: URL: https://github.com/apache/arrow/pull/10855#issuecomment-892109522 > Meanwhile, RTools35 is using utf8proc 2.4; it builds utf8proc from source due to some other issue, but I believe it's using the system headers anyways looking at the co

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #810: Qualified field resolution too strict

2021-08-03 Thread GitBox
alamb commented on a change in pull request #810: URL: https://github.com/apache/arrow-datafusion/pull/810#discussion_r682046038 ## File path: datafusion/src/sql/planner.rs ## @@ -925,11 +925,40 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> { /// Generate a relational

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #813: Speed up inlist for strings and primitives

2021-08-03 Thread GitBox
alamb commented on a change in pull request #813: URL: https://github.com/apache/arrow-datafusion/pull/813#discussion_r682053016 ## File path: datafusion/src/physical_plan/expressions/in_list.rs ## @@ -99,6 +124,104 @@ macro_rules! make_contains { }}; } +macro_rules! ma

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #813: Speed up inlist for strings and primitives

2021-08-03 Thread GitBox
Dandandan commented on a change in pull request #813: URL: https://github.com/apache/arrow-datafusion/pull/813#discussion_r682060135 ## File path: datafusion/src/physical_plan/expressions/in_list.rs ## @@ -99,6 +124,104 @@ macro_rules! make_contains { }}; } +macro_rules

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #813: Speed up inlist for strings and primitives

2021-08-03 Thread GitBox
Dandandan commented on a change in pull request #813: URL: https://github.com/apache/arrow-datafusion/pull/813#discussion_r682060480 ## File path: datafusion/src/physical_plan/expressions/in_list.rs ## @@ -234,16 +363,40 @@ impl PhysicalExpr for InListExpr { match va

[GitHub] [arrow-datafusion] alamb commented on pull request #806: Add optimizer rule to replace inlist with `or` chain for small expression list

2021-08-03 Thread GitBox
alamb commented on pull request #806: URL: https://github.com/apache/arrow-datafusion/pull/806#issuecomment-892125010 Possibly also related https://github.com/apache/arrow-datafusion/pull/813 as a different performance approach -- This is an automated message from the Apache Git Service.

[GitHub] [arrow-datafusion] Dandandan commented on pull request #806: Add optimizer rule to replace inlist with `or` chain for small expression list

2021-08-03 Thread GitBox
Dandandan commented on pull request #806: URL: https://github.com/apache/arrow-datafusion/pull/806#issuecomment-892128127 I might tune this later at a later moment to be for empty/single items instead for which it really *should* be an improvement, and do some more profiling. It could a

[GitHub] [arrow-datafusion] jgoday edited a comment on pull request #819: Draft: python bindings for window functions

2021-08-03 Thread GitBox
jgoday edited a comment on pull request #819: URL: https://github.com/apache/arrow-datafusion/pull/819#issuecomment-891341609 There is some pending issues: 1. Currently, only a generic way of calling window functions by name is exported. Should we export all window functions individu

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #811: Source ext for remote files read

2021-08-03 Thread GitBox
alamb commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682087771 ## File path: datafusion/src/datasource/protocol_registry.rs ## @@ -0,0 +1,83 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// o

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #797: Better join order resolution logic

2021-08-03 Thread GitBox
alamb commented on a change in pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797#discussion_r682092227 ## File path: datafusion/src/logical_plan/builder.rs ## @@ -287,16 +287,125 @@ impl LogicalPlanBuilder { .into_iter()

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #812: Implement vectorized hashing for DictionaryArray types

2021-08-03 Thread GitBox
alamb commented on a change in pull request #812: URL: https://github.com/apache/arrow-datafusion/pull/812#discussion_r682098857 ## File path: datafusion/src/physical_plan/hash_utils.rs ## @@ -245,9 +249,54 @@ macro_rules! hash_array_float { }; } -/// Creates hash value

[GitHub] [arrow-datafusion] seddonm1 commented on pull request #810: Qualified field resolution too strict

2021-08-03 Thread GitBox
seddonm1 commented on pull request #810: URL: https://github.com/apache/arrow-datafusion/pull/810#issuecomment-892165605 @alamb I have taken your feedback and managed to remove the dfschema.rs changes entirely. I think this is ready now. -- This is an automated message from the Apache Gi

[GitHub] [arrow-datafusion] seddonm1 commented on a change in pull request #797: Better join order resolution logic

2021-08-03 Thread GitBox
seddonm1 commented on a change in pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797#discussion_r682102395 ## File path: datafusion/src/logical_plan/builder.rs ## @@ -287,16 +287,125 @@ impl LogicalPlanBuilder { .into_iter()

[GitHub] [arrow-datafusion] seddonm1 commented on a change in pull request #797: Better join order resolution logic

2021-08-03 Thread GitBox
seddonm1 commented on a change in pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797#discussion_r682102739 ## File path: datafusion/tests/sql.rs ## @@ -1730,6 +1730,17 @@ async fn equijoin() -> Result<()> { let actual = execute(&mut ctx, sql).a

[GitHub] [arrow-datafusion] seddonm1 commented on a change in pull request #797: Better join order resolution logic

2021-08-03 Thread GitBox
seddonm1 commented on a change in pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797#discussion_r682103000 ## File path: datafusion/src/logical_plan/builder.rs ## @@ -287,16 +287,125 @@ impl LogicalPlanBuilder { .into_iter()

[GitHub] [arrow-datafusion] seddonm1 commented on a change in pull request #797: Better join order resolution logic

2021-08-03 Thread GitBox
seddonm1 commented on a change in pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797#discussion_r682104499 ## File path: datafusion/src/logical_plan/builder.rs ## @@ -287,16 +287,125 @@ impl LogicalPlanBuilder { .into_iter()

[GitHub] [arrow] westonpace commented on pull request #10729: ARROW-12513: [C++][Parquet] Parquet Writer always puts null_count=0 in Parquet statistics for dictionary-encoded array with nulls

2021-08-03 Thread GitBox
westonpace commented on pull request #10729: URL: https://github.com/apache/arrow/pull/10729#issuecomment-892173390 CI failures appear unrelated. I'll merge this tomorrow assuming no concerns about https://github.com/apache/arrow/pull/10729#discussion_r681470482 -- This is an automated

[GitHub] [arrow] jonkeane commented on pull request #10861: ARROW-13538: [R] [CI] Don't test DuckDB in the minimal build

2021-08-03 Thread GitBox
jonkeane commented on pull request #10861: URL: https://github.com/apache/arrow/pull/10861#issuecomment-892174722 @github-actions crossbow submit -g r -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

[GitHub] [arrow] github-actions[bot] commented on pull request #10861: ARROW-13538: [R] [CI] Don't test DuckDB in the minimal build

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10861: URL: https://github.com/apache/arrow/pull/10861#issuecomment-892175280 Revision: 93ca61045bca627a90acf9e70d203128d9209f50 Submitted crossbow builds: [ursacomputing/crossbow @ actions-710](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow] github-actions[bot] commented on pull request #10863: ARROW-13540: [C++] Add order by sink node

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10863: URL: https://github.com/apache/arrow/pull/10863#issuecomment-892184818 https://issues.apache.org/jira/browse/ARROW-13540 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] lidavidm opened a new pull request #10863: ARROW-13540: [C++] Add order by sink node

2021-08-03 Thread GitBox
lidavidm opened a new pull request #10863: URL: https://github.com/apache/arrow/pull/10863 Adds a subclass of sink node that accumulates data, then sorts it and pushes it. I chose to make it a sink node as otherwise there's no ordering of batches between nodes, so we would have to c

[GitHub] [arrow] lidavidm commented on pull request #10863: ARROW-13540: [C++] Add order by sink node

2021-08-03 Thread GitBox
lidavidm commented on pull request #10863: URL: https://github.com/apache/arrow/pull/10863#issuecomment-892184859 Draft for now since it'll require rebasing on top of #10793/ARROW-13482 -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] jonkeane commented on pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
jonkeane commented on pull request #10849: URL: https://github.com/apache/arrow/pull/10849#issuecomment-892193399 @github-actions crossbow submit test-r-rhub-debian-gcc-devel-lto-latest -- This is an automated message from the Apache Git Service. To respond to the message, please log on t

[GitHub] [arrow] github-actions[bot] commented on pull request #10849: ARROW-13507: [R] LTO job on CRAN fails

2021-08-03 Thread GitBox
github-actions[bot] commented on pull request #10849: URL: https://github.com/apache/arrow/pull/10849#issuecomment-892193858 Revision: 6f2f0f77d4ce62a966b537dd85d6e32e28d1648f Submitted crossbow builds: [ursacomputing/crossbow @ actions-711](https://github.com/ursacomputing/crossbow/

[GitHub] [arrow-datafusion] seddonm1 commented on pull request #797: Better join order resolution logic

2021-08-03 Thread GitBox
seddonm1 commented on pull request #797: URL: https://github.com/apache/arrow-datafusion/pull/797#issuecomment-892218323 Thank you for the review @alamb. I have refactored to reduce repetition (by reusing some existing methods) but still have the ambiguous column problem outstanding.

[GitHub] [arrow] augustoasilva commented on a change in pull request #10718: ARROW-13331: [C++][Gandiva] Add format_number hive function to gandiva

2021-08-03 Thread GitBox
augustoasilva commented on a change in pull request #10718: URL: https://github.com/apache/arrow/pull/10718#discussion_r682165568 ## File path: cpp/src/gandiva/precompiled/string_ops_test.cc ## @@ -1683,4 +1683,27 @@ TEST(TestStringOps, TestConvertToBigEndian) { } #endif }

[GitHub] [arrow] westonpace commented on pull request #10431: ARROW-12921: [C++][Dataset] Add RadosParquetFileFormat to Dataset API

2021-08-03 Thread GitBox
westonpace commented on pull request #10431: URL: https://github.com/apache/arrow/pull/10431#issuecomment-892275122 > Could you please explain this part a little more? Sure, let me prioritize and group things a bit too. # Move flatbuffers out of `format` The file `ScanRe

[GitHub] [arrow] Christian8491 commented on pull request #10855: ARROW-12946: [C++] String swap case kernel

2021-08-03 Thread GitBox
Christian8491 commented on pull request #10855: URL: https://github.com/apache/arrow/pull/10855#issuecomment-892298100 As @edponce suggested, I replaced some `utf8proc` functions with helper functions. After all the CI passes. This PR is ready for review. -- This is an automated message

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682237507 ## File path: r/R/dplyr-functions.R ## @@ -777,3 +777,34 @@ nse_funcs$case_when <- function(...) { ) ) } + +# Aggregation functions +# These all

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682241123 ## File path: r/R/dplyr-functions.R ## @@ -777,3 +777,34 @@ nse_funcs$case_when <- function(...) { ) ) } + +# Aggregation functions +# These all

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #811: Source ext for remote files read

2021-08-03 Thread GitBox
yjshen commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682247394 ## File path: datafusion/src/datasource/datasource2.rs ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #811: Source ext for remote files read

2021-08-03 Thread GitBox
yjshen commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682247695 ## File path: datafusion/src/datasource/datasource2.rs ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mo

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682248376 ## File path: r/R/dplyr-summarize.R ## @@ -28,14 +28,107 @@ summarise.arrow_dplyr_query <- function(.data, ..., .engine = c("arrow", "duckdb dplyr:

[GitHub] [arrow] kou commented on pull request #10828: ARROW-13485: [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh

2021-08-03 Thread GitBox
kou commented on pull request #10828: URL: https://github.com/apache/arrow/pull/10828#issuecomment-892320790 I'll merge this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [arrow] kou closed pull request #10828: ARROW-13485: [Release] Replace ${PREVIOUS_RELEASE}.9000 in r/NEWS.md by post-12-bump-versions.sh

2021-08-03 Thread GitBox
kou closed pull request #10828: URL: https://github.com/apache/arrow/pull/10828 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #811: Source ext for remote files read

2021-08-03 Thread GitBox
yjshen commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682249207 ## File path: datafusion/src/execution/context.rs ## @@ -840,6 +859,8 @@ pub struct ExecutionContextState { pub config: ExecutionConfig, /

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682249980 ## File path: r/R/dplyr-summarize.R ## @@ -28,14 +28,107 @@ summarise.arrow_dplyr_query <- function(.data, ..., .engine = c("arrow", "duckdb dplyr:

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682251881 ## File path: r/R/query-engine.R ## @@ -0,0 +1,75 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreeme

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682255126 ## File path: r/R/dplyr-summarize.R ## @@ -28,14 +28,107 @@ summarise.arrow_dplyr_query <- function(.data, ..., .engine = c("arrow", "duckdb dplyr:

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #811: Source ext for remote files read

2021-08-03 Thread GitBox
yjshen commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682256093 ## File path: datafusion/src/execution/context.rs ## @@ -125,12 +127,26 @@ pub struct ExecutionContext { pub state: Arc>, } +lazy_static! {

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682256920 ## File path: r/R/dplyr-summarize.R ## @@ -28,14 +28,107 @@ summarise.arrow_dplyr_query <- function(.data, ..., .engine = c("arrow", "duckdb dplyr:

[GitHub] [arrow] ianmcook commented on a change in pull request #10722: ARROW-13344: [R] Initial bindings for ExecPlan/ExecNode

2021-08-03 Thread GitBox
ianmcook commented on a change in pull request #10722: URL: https://github.com/apache/arrow/pull/10722#discussion_r682270571 ## File path: r/tests/testthat/test-dplyr-aggregate.R ## @@ -0,0 +1,165 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contr

[GitHub] [arrow] cyb70289 commented on a change in pull request #10860: ARROW-13520: [C++] Implement hash_aggregate tdigest kernel

2021-08-03 Thread GitBox
cyb70289 commented on a change in pull request #10860: URL: https://github.com/apache/arrow/pull/10860#discussion_r682274703 ## File path: cpp/src/arrow/compute/kernels/hash_aggregate.cc ## @@ -1311,6 +1312,126 @@ struct GroupedVarStdFactory { InputType argument_type; };

[GitHub] [arrow-datafusion] jgoday edited a comment on pull request #819: Draft: python bindings for window functions

2021-08-03 Thread GitBox
jgoday edited a comment on pull request #819: URL: https://github.com/apache/arrow-datafusion/pull/819#issuecomment-891341609 There is some pending issues: 1. Currently, only a generic way of calling window functions by name is exported. Should we export all window functions individu

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #811: Source ext for remote files read

2021-08-03 Thread GitBox
yjshen commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682305233 ## File path: datafusion/src/datasource/protocol_registry.rs ## @@ -0,0 +1,83 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #811: Source ext for remote files read

2021-08-03 Thread GitBox
yjshen commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682306021 ## File path: datafusion/src/datasource/protocol_registry.rs ## @@ -0,0 +1,83 @@ +// Licensed to the Apache Software Foundation (ASF) under one +//

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #812: Implement vectorized hashing for DictionaryArray types

2021-08-03 Thread GitBox
Dandandan commented on a change in pull request #812: URL: https://github.com/apache/arrow-datafusion/pull/812#discussion_r682319823 ## File path: datafusion/src/physical_plan/hash_utils.rs ## @@ -245,9 +249,60 @@ macro_rules! hash_array_float { }; } -/// Creates hash v

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #812: Implement vectorized hashing for DictionaryArray types

2021-08-03 Thread GitBox
Dandandan commented on a change in pull request #812: URL: https://github.com/apache/arrow-datafusion/pull/812#discussion_r682319958 ## File path: datafusion/src/physical_plan/hash_utils.rs ## @@ -438,11 +493,84 @@ pub fn create_hashes<'a>( multi_col

[GitHub] [arrow-datafusion] Dandandan commented on a change in pull request #812: Implement vectorized hashing for DictionaryArray types

2021-08-03 Thread GitBox
Dandandan commented on a change in pull request #812: URL: https://github.com/apache/arrow-datafusion/pull/812#discussion_r682320952 ## File path: datafusion/src/physical_plan/hash_utils.rs ## @@ -245,9 +249,54 @@ macro_rules! hash_array_float { }; } -/// Creates hash v

[GitHub] [arrow] liyafan82 opened a new pull request #10864: ARROW-13544 [Java]: Remove APIs that have been deprecated for long

2021-08-04 Thread GitBox
liyafan82 opened a new pull request #10864: URL: https://github.com/apache/arrow/pull/10864 For some APIs, it has been a long time since they were annotated deprecated. During this time, a number of releases have been published. So it is time to get rid of them. Please also note tha

[GitHub] [arrow] github-actions[bot] commented on pull request #10864: ARROW-13544 [Java]: Remove APIs that have been deprecated for long

2021-08-04 Thread GitBox
github-actions[bot] commented on pull request #10864: URL: https://github.com/apache/arrow/pull/10864#issuecomment-892424106 https://issues.apache.org/jira/browse/ARROW-13544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub an

[GitHub] [arrow] praveenbingo closed pull request #10775: ARROW-13429: [C++][Gandiva] Fix Gandiva codegen for if-else expression with binary type

2021-08-04 Thread GitBox
praveenbingo closed pull request #10775: URL: https://github.com/apache/arrow/pull/10775 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow] praveenbingo closed pull request #10033: ARROW-12388: [C++][Gandiva] Implement cast numbers from varbinary functions in gandiva

2021-08-04 Thread GitBox
praveenbingo closed pull request #10033: URL: https://github.com/apache/arrow/pull/10033 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow] praveenbingo closed pull request #10112: ARROW-12479: [C++][Gandiva] Implement castBigInt, castInt, castIntervalDay and castIntervalYear extra functions

2021-08-04 Thread GitBox
praveenbingo closed pull request #10112: URL: https://github.com/apache/arrow/pull/10112 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow] praveenbingo closed pull request #10059: ARROW-12410: [C++][Gandiva] Implement regexp_replace function on Gandiva

2021-08-04 Thread GitBox
praveenbingo closed pull request #10059: URL: https://github.com/apache/arrow/pull/10059 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow] praveenbingo commented on pull request #10068: ARROW-12422: [C++][Gandiva] Add castVARCHAR from date millis function

2021-08-04 Thread GitBox
praveenbingo commented on pull request #10068: URL: https://github.com/apache/arrow/pull/10068#issuecomment-892549798 @projjal Needs a rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] praveenbingo commented on pull request #10385: ARROW-12858: [C++][Gandiva] Add isNull, isTrue, isFalse, isNotTrue, IsNotFalse and NVL functions on Gandiva

2021-08-04 Thread GitBox
praveenbingo commented on pull request #10385: URL: https://github.com/apache/arrow/pull/10385#issuecomment-892550094 @projjal Needs a rebase. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] praveenbingo closed pull request #10396: ARROW-12866: [C++][Gandiva] Implement STRPOS function on Gandiva

2021-08-04 Thread GitBox
praveenbingo closed pull request #10396: URL: https://github.com/apache/arrow/pull/10396 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow] praveenbingo commented on pull request #10464: ARROW-12943: [Gandiva][C++]Implement MD5 Hive function

2021-08-04 Thread GitBox
praveenbingo commented on pull request #10464: URL: https://github.com/apache/arrow/pull/10464#issuecomment-892552298 @projjal Needs a rebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] praveenbingo closed pull request #10595: ARROW-13163: [C++][Gandiva] Implement REPEAT function on Gandiva

2021-08-04 Thread GitBox
praveenbingo closed pull request #10595: URL: https://github.com/apache/arrow/pull/10595 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-un

[GitHub] [arrow] kszucs opened a new pull request #10865: ARROW-3699: [C++] Dockerfile for testing 32-bit C++ build

2021-08-04 Thread GitBox
kszucs opened a new pull request #10865: URL: https://github.com/apache/arrow/pull/10865 I'm not sure whether this is going to work on a amd64 linux host, but it actually runs on docker for mac. @pitrou could you please try `ARCH=i386 archery docker run debian-cpp` locally? -- T

[GitHub] [arrow] github-actions[bot] commented on pull request #10865: ARROW-3699: [C++] Dockerfile for testing 32-bit C++ build

2021-08-04 Thread GitBox
github-actions[bot] commented on pull request #10865: URL: https://github.com/apache/arrow/pull/10865#issuecomment-892562096 https://issues.apache.org/jira/browse/ARROW-3699 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow] kszucs commented on a change in pull request #10865: ARROW-3699: [C++] Dockerfile for testing 32-bit C++ build

2021-08-04 Thread GitBox
kszucs commented on a change in pull request #10865: URL: https://github.com/apache/arrow/pull/10865#discussion_r682509076 ## File path: ci/docker/debian-10-cpp.dockerfile ## @@ -73,7 +73,9 @@ RUN apt-get update -y -q && \ COPY ci/scripts/install_minio.sh \ /arrow/ci/s

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #811: Source ext for remote files read

2021-08-04 Thread GitBox
alamb commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682513345 ## File path: datafusion/src/datasource/datasource2.rs ## @@ -0,0 +1,163 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or mor

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #811: Source ext for remote files read

2021-08-04 Thread GitBox
alamb commented on a change in pull request #811: URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r682514446 ## File path: datafusion/src/execution/context.rs ## @@ -125,12 +127,26 @@ pub struct ExecutionContext { pub state: Arc>, } +lazy_static! {

<    1   2   3   4   5   6   7   8   9   10   >