[GitHub] [arrow] pitrou closed pull request #10490: ARROW-13018: [C++][Docs] Use consistent terminology for nulls (min_count) in scalar aggregate kernels

2021-06-10 Thread GitBox
pitrou closed pull request #10490: URL: https://github.com/apache/arrow/pull/10490 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] github-actions[bot] commented on pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-10 Thread GitBox
github-actions[bot] commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858386755 Revision: c306cd70ea240d7d90b86c1688308dcc22301b2a Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow-datafusion] edrevo opened a new issue #534: Make BallisitaContext::collect streaming

2021-06-10 Thread GitBox
edrevo opened a new issue #534: URL: https://github.com/apache/arrow-datafusion/issues/534 https://github.com/apache/arrow-datafusion/blob/bdae93b9365ef5892e686915250d42e927d00620/ballista/rust/client/src/context.rs#L225 -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow] pitrou commented on a change in pull request #10445: ARROW-9140: [R] Zero-copy Arrow to R where possible

2021-06-10 Thread GitBox
pitrou commented on a change in pull request #10445: URL: https://github.com/apache/arrow/pull/10445#discussion_r648947602 ## File path: r/src/altrep.cpp ## @@ -0,0 +1,180 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #535: Make BallistaContext::collect streaming

2021-06-10 Thread GitBox
edrevo commented on a change in pull request #535: URL: https://github.com/apache/arrow-datafusion/pull/535#discussion_r648949167 ## File path: ballista/rust/client/src/context.rs ## @@ -68,6 +74,32 @@ impl BallistaContextState { } } +struct WrappedStream { +

[GitHub] [arrow-datafusion] edrevo commented on a change in pull request #535: Make BallistaContext::collect streaming

2021-06-10 Thread GitBox
edrevo commented on a change in pull request #535: URL: https://github.com/apache/arrow-datafusion/pull/535#discussion_r648949167 ## File path: ballista/rust/client/src/context.rs ## @@ -68,6 +74,32 @@ impl BallistaContextState { } } +struct WrappedStream { +

[GitHub] [arrow] pitrou commented on a change in pull request #10482: ARROW-12597: [C++] Enable per-row-group parallelism in async Parquet reader

2021-06-10 Thread GitBox
pitrou commented on a change in pull request #10482: URL: https://github.com/apache/arrow/pull/10482#discussion_r648968980 ## File path: cpp/src/arrow/util/parallel.h ## @@ -44,6 +45,25 @@ Status ParallelFor(int num_tasks, FUNCTION&& func, return st; } +template

[GitHub] [arrow] xhochy commented on pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-10 Thread GitBox
xhochy commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858439690 @github-actions crossbow submit conda-osx-clang-py36-r40 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-10 Thread GitBox
jorisvandenbossche commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-858395074 I think it would be good to write some tests in python as well, as currently the C++ tests are very hard to verify since we don't yet have the ability to parse

[GitHub] [arrow-datafusion] edrevo opened a new pull request #535: Make BallistaContext::collect streaming

2021-06-10 Thread GitBox
edrevo opened a new pull request #535: URL: https://github.com/apache/arrow-datafusion/pull/535 # Which issue does this PR close? Closes #534. # Rationale for this change The collect implementation in BallistaContext is bringing all of the contents in memory even though

[GitHub] [arrow] pitrou commented on a change in pull request #10494: ARROW-12948: [C++][Python] Add slice_replace kernel

2021-06-10 Thread GitBox
pitrou commented on a change in pull request #10494: URL: https://github.com/apache/arrow/pull/10494#discussion_r648955618 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -2288,6 +2288,164 @@ const FunctionDoc replace_substring_regex_doc( {"strings"},

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #535: Make BallistaContext::collect streaming

2021-06-10 Thread GitBox
codecov-commenter commented on pull request #535: URL: https://github.com/apache/arrow-datafusion/pull/535#issuecomment-858448053 #

[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-10 Thread GitBox
jorisvandenbossche commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-858486248 > Localize kernel would be great. I suppose we'd need a scalar and a vector one depending if timezone is shared between rows or not? I think a scalar kernel is

[GitHub] [arrow] pitrou commented on pull request #10471: ARROW-12952: [C++] Add count_substring_regex

2021-06-10 Thread GitBox
pitrou commented on pull request #10471: URL: https://github.com/apache/arrow/pull/10471#issuecomment-858388150 Can you rebase this @lidavidm ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] jorisvandenbossche commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-10 Thread GitBox
jorisvandenbossche commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-858491335 Another question: is the `locate_zone` configurable in some way to give some hints where to find the tz database? The database is not always available on the

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #536: use nightly nightly-2021-05-10

2021-06-10 Thread GitBox
Jimexist opened a new pull request #536: URL: https://github.com/apache/arrow-datafusion/pull/536 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes?

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #532: reuse datafusion physical planner in ballista building from protobuf

2021-06-10 Thread GitBox
codecov-commenter commented on pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#issuecomment-858336030 #

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #532: reuse datafusion physical planner in ballista building from protobuf

2021-06-10 Thread GitBox
codecov-commenter edited a comment on pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#issuecomment-858336030 #

[GitHub] [arrow-datafusion] jorgecarleitao opened a new issue #533: Add extension plugin to parse SQL into logical plan

2021-06-10 Thread GitBox
jorgecarleitao opened a new issue #533: URL: https://github.com/apache/arrow-datafusion/issues/533 As a user of DataFusion, I would like to be able to install custom parsing rules of SQL to DataFusion, so that I can plan custom nodes from SQL. This would allow me to extend

[GitHub] [arrow] pitrou commented on pull request #10486: ARROW-13016: [C++][Compute] Support Null type in Sum/Mean/MinMax aggregation

2021-06-10 Thread GitBox
pitrou commented on pull request #10486: URL: https://github.com/apache/arrow/pull/10486#issuecomment-858387939 @jorisvandenbossche has a good point. The only reason to define these kernels is for consistency. But always returning null doesn't seem consistent with current behaviour of the

[GitHub] [arrow] jorisvandenbossche commented on pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-10 Thread GitBox
jorisvandenbossche commented on pull request #10176: URL: https://github.com/apache/arrow/pull/10176#issuecomment-858392898 Cool, this is a nice start for datetime kernels! @thisisnic opened a JIRA for expsosing those in R as well (https://issues.apache.org/jira/browse/ARROW-13022)

[GitHub] [arrow] rok commented on pull request #10176: ARROW-11759: [C++] Kernel to extract datetime components (year, month, day, etc) from timestamp type

2021-06-10 Thread GitBox
rok commented on pull request #10176: URL: https://github.com/apache/arrow/pull/10176#issuecomment-858457355 Thanks @jorisvandenbossche @pitrou for the help and feedback! I'm very glad this is merged! :) -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-10 Thread GitBox
rok commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-858470970 > I think it would be good to write some tests in python as well, as currently the C++ tests are very hard to verify since we don't yet have the ability to parse strings localized

[GitHub] [arrow] jorisvandenbossche commented on pull request #9948: ARROW-12150: [Python] Correctly infer type of mixed-precision Decimals

2021-06-10 Thread GitBox
jorisvandenbossche commented on pull request #9948: URL: https://github.com/apache/arrow/pull/9948#issuecomment-858472915 And thanks for the simplification! That's better ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow] xhochy commented on pull request #10499: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-10 Thread GitBox
xhochy commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858496056 @github-actions crossbow submit -g conda -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow] xhochy closed pull request #10499: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-10 Thread GitBox
xhochy closed pull request #10499: URL: https://github.com/apache/arrow/pull/10499 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-10 Thread GitBox
rok commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-858500316 > I opened https://issues.apache.org/jira/browse/ARROW-13033 for this. Nice, I was also writing it just now :) The strptime timzone ignoring issue is

[GitHub] [arrow] xhochy commented on pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-10 Thread GitBox
xhochy commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858378151 @github-actions crossbow submit conda-osx-clang-py36-r40 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

[GitHub] [arrow-datafusion] edrevo commented on pull request #535: Make BallistaContext::collect streaming

2021-06-10 Thread GitBox
edrevo commented on pull request #535: URL: https://github.com/apache/arrow-datafusion/pull/535#issuecomment-858407024 cc @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow] Crystrix commented on pull request #10486: ARROW-13016: [C++][Compute] Support Null type in Sum/Mean/MinMax aggregation

2021-06-10 Thread GitBox
Crystrix commented on pull request #10486: URL: https://github.com/apache/arrow/pull/10486#issuecomment-858454595 Maybe we can keep a consistent behavior with numeric types. For example, - Count: keep the current implementation (return 0 for `skip_nulls` option, otherwise return the

[GitHub] [arrow] pitrou commented on issue #10488: Passing back and forth from Python and C++ with Pyarrow C++ extension and pybind11.

2021-06-10 Thread GitBox
pitrou commented on issue #10488: URL: https://github.com/apache/arrow/issues/10488#issuecomment-858370128 Perhaps @maartenbreddels can help here as he wrote the original example. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-10 Thread GitBox
jorisvandenbossche edited a comment on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-858395074 I think it would be good to write some tests in python as well, as currently the C++ tests are very hard to verify since we don't yet have the ability to

[GitHub] [arrow] bhargav-inthezone opened a new issue #10502: AttributeError: module 'pyarrow.lib' has no attribute '_Weakrefable'

2021-06-10 Thread GitBox
bhargav-inthezone opened a new issue #10502: URL: https://github.com/apache/arrow/issues/10502 hey !, I'm working kaggle notebooks for a competition !pip install vaex ## sucessfully installed import vaex print(vaex.__version__)

[GitHub] [arrow-rs] garyanaplan commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-10 Thread GitBox
garyanaplan commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-858440550 Yep. If I update my test to remove BOOLEAN from the schema, the problem goes away. I've done some digging around today and noticed that it looks like the problem might lie in the

[GitHub] [arrow] projjal opened a new pull request #10501: ARROW-13032: Update guava version

2021-06-10 Thread GitBox
projjal opened a new pull request #10501: URL: https://github.com/apache/arrow/pull/10501 Vulnerabilities in current version: [CVE-2018-10237](https://github.com/advisories/GHSA-mvr2-9pj6-7w5j) [CVE-2020-8908](https://github.com/advisories/GHSA-5mg8-w23w-74h3) -- This is an

[GitHub] [arrow] github-actions[bot] commented on pull request #10499: WIP: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-10 Thread GitBox
github-actions[bot] commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858459815 Revision: b44f49618af8512142ce2678a62a8608379c47bd Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] n3world opened a new pull request #10505: ARROW-12995: [C++] Add validation to CSV options

2021-06-10 Thread GitBox
n3world opened a new pull request #10505: URL: https://github.com/apache/arrow/pull/10505 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] alamb closed issue #529: With window frame present and frame type = RANGE, order by must be present with 1 column

2021-06-10 Thread GitBox
alamb closed issue #529: URL: https://github.com/apache/arrow-datafusion/issues/529 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] alamb merged pull request #530: add error handling and boundary checking for window frames

2021-06-10 Thread GitBox
alamb merged pull request #530: URL: https://github.com/apache/arrow-datafusion/pull/530 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow] rok commented on pull request #10457: ARROW-12980: [C++] Kernels to extract datetime components should be timezone aware

2021-06-10 Thread GitBox
rok commented on pull request #10457: URL: https://github.com/apache/arrow/pull/10457#issuecomment-858694911 @jorisvandenbossche added the [python tests](https://github.com/apache/arrow/pull/10457/commits/83b1e21a187c88d13e2b7a7589eac80f0a77a5b0). Will add some more for timezone naive and

[GitHub] [arrow-datafusion] andygrove commented on pull request #532: reuse datafusion physical planner in ballista building from protobuf

2021-06-10 Thread GitBox
andygrove commented on pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#issuecomment-858751549 I will have time tomorrow to review this and other pending ballista PRs On Thu, Jun 10, 2021, 9:47 AM QP Hou ***@***.***> wrote: > ***@***. approved

[GitHub] [arrow-datafusion] jgoday commented on issue #420: Support for `!=` predicate in pruning predicates

2021-06-10 Thread GitBox
jgoday commented on issue #420: URL: https://github.com/apache/arrow-datafusion/issues/420#issuecomment-858776910 Can I try to solve this issue ? If I understand it correctly, for the non equal predicate the expression should be pruned if the literal value does not fall between the

[GitHub] [arrow] projjal commented on pull request #10411: ARROW-12801: [CI][Packaging][Java] Include all modules in script that generate Arrow jars

2021-06-10 Thread GitBox
projjal commented on pull request #10411: URL: https://github.com/apache/arrow/pull/10411#issuecomment-858352950 @kszucs Thanks for fixing this. Is there anything left to be done in this patch? -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] maartenbreddels commented on issue #10488: Passing back and forth from Python and C++ with Pyarrow C++ extension and pybind11.

2021-06-10 Thread GitBox
maartenbreddels commented on issue #10488: URL: https://github.com/apache/arrow/issues/10488#issuecomment-858385035 It's been a while since I wrote the code in that repo, but it seems I only added Double support:

[GitHub] [arrow] pitrou closed pull request #9948: ARROW-12150: [Python] Correctly infer type of mixed-precision Decimals

2021-06-10 Thread GitBox
pitrou closed pull request #9948: URL: https://github.com/apache/arrow/pull/9948 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] github-actions[bot] commented on pull request #10501: ARROW-13032: [Java] Update guava version

2021-06-10 Thread GitBox
github-actions[bot] commented on pull request #10501: URL: https://github.com/apache/arrow/pull/10501#issuecomment-858459973 https://issues.apache.org/jira/browse/ARROW-13032 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-datafusion] Jimexist opened a new pull request #539: refactor hash aggregates

2021-06-10 Thread GitBox
Jimexist opened a new pull request #539: URL: https://github.com/apache/arrow-datafusion/pull/539 # Which issue does this PR close? Closes #. # Rationale for this change a small refactor to hash aggregate planning phase # What changes are included in this PR?

[GitHub] [arrow] lidavidm commented on a change in pull request #10482: ARROW-12597: [C++] Enable per-row-group parallelism in async Parquet reader

2021-06-10 Thread GitBox
lidavidm commented on a change in pull request #10482: URL: https://github.com/apache/arrow/pull/10482#discussion_r649318943 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -1134,6 +1138,42 @@ Status FileReaderImpl::DecodeRowGroups(const std::vector& row_groups, return

[GitHub] [arrow] pitrou closed pull request #10504: ARROW-12937: [C++][Python] Allow setting default metadata for new S3 files

2021-06-10 Thread GitBox
pitrou closed pull request #10504: URL: https://github.com/apache/arrow/pull/10504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] codecov-commenter edited a comment on pull request #532: reuse datafusion physical planner in ballista building from protobuf

2021-06-10 Thread GitBox
codecov-commenter edited a comment on pull request #532: URL: https://github.com/apache/arrow-datafusion/pull/532#issuecomment-858336030 #

[GitHub] [arrow] lidavidm commented on a change in pull request #10482: ARROW-12597: [C++] Enable per-row-group parallelism in async Parquet reader

2021-06-10 Thread GitBox
lidavidm commented on a change in pull request #10482: URL: https://github.com/apache/arrow/pull/10482#discussion_r649348454 ## File path: cpp/src/arrow/util/parallel.h ## @@ -44,6 +45,25 @@ Status ParallelFor(int num_tasks, FUNCTION&& func, return st; } +template

[GitHub] [arrow] pitrou commented on pull request #10471: ARROW-12952: [C++] Add count_substring_regex

2021-06-10 Thread GitBox
pitrou commented on pull request #10471: URL: https://github.com/apache/arrow/pull/10471#issuecomment-858780343 Hmm, there are conflicts now... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] lidavidm commented on pull request #10471: ARROW-12952: [C++] Add count_substring_regex

2021-06-10 Thread GitBox
lidavidm commented on pull request #10471: URL: https://github.com/apache/arrow/pull/10471#issuecomment-858784119 It's because the tests overlap/I forgot to rename & move the test case for ascii_replace_slice (now done). -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow-datafusion] alamb opened a new pull request #538: Cleanup Repartition Exec code

2021-06-10 Thread GitBox
alamb opened a new pull request #538: URL: https://github.com/apache/arrow-datafusion/pull/538 # Rationale for this change The body of RepartitionExec::execute is long and highly indented, and has a bunch of metrics related code that obscures how it works, in my opinion. # What

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #521: Return errors properly from RepartitionExec

2021-06-10 Thread GitBox
alamb commented on a change in pull request #521: URL: https://github.com/apache/arrow-datafusion/pull/521#discussion_r649260106 ## File path: datafusion/src/physical_plan/repartition.rs ## @@ -308,6 +310,45 @@ impl RepartitionExec { send_time_nanos:

[GitHub] [arrow-datafusion] alamb commented on a change in pull request #538: Cleanup Repartition Exec code

2021-06-10 Thread GitBox
alamb commented on a change in pull request #538: URL: https://github.com/apache/arrow-datafusion/pull/538#discussion_r649259579 ## File path: datafusion/src/physical_plan/repartition.rs ## @@ -132,132 +160,33 @@ impl ExecutionPlan for RepartitionExec { //

[GitHub] [arrow-rs] codecov-commenter commented on pull request #444: Add changelog and bump version for proposed 4.3.0 release

2021-06-10 Thread GitBox
codecov-commenter commented on pull request #444: URL: https://github.com/apache/arrow-rs/pull/444#issuecomment-858736646 #

[GitHub] [arrow] pitrou closed pull request #10494: ARROW-12948: [C++][Python] Add slice_replace kernel

2021-06-10 Thread GitBox
pitrou closed pull request #10494: URL: https://github.com/apache/arrow/pull/10494 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow-datafusion] jgoday commented on issue #420: Support for `!=` predicate in pruning predicates

2021-06-10 Thread GitBox
jgoday commented on issue #420: URL: https://github.com/apache/arrow-datafusion/issues/420#issuecomment-85878 @alamb Can I try to solve this issue ? If I understand it correctly, for the non equal predicate the expression should be pruned if the literal value does not fall

[GitHub] [arrow-datafusion] alamb closed issue #528: With window frame present and frame type = RANGE, the current implementation cannot handle numeric bounds

2021-06-10 Thread GitBox
alamb closed issue #528: URL: https://github.com/apache/arrow-datafusion/issues/528 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-rs] alamb opened a new pull request #444: Add changelog and bump version for proposed 4.3.0 release

2021-06-10 Thread GitBox
alamb opened a new pull request #444: URL: https://github.com/apache/arrow-rs/pull/444 re #292 Note: merging into `active_release` (not master) I would like to propose a 4.3.0 Arrow release. Here is the changelog: * a7656a8a3cd1f02e4543e1b971842ca92404f82a refactor

[GitHub] [arrow] github-actions[bot] commented on pull request #10504: ARROW-12937: [C++][Python] Allow setting default metadata for new S3 files

2021-06-10 Thread GitBox
github-actions[bot] commented on pull request #10504: URL: https://github.com/apache/arrow/pull/10504#issuecomment-858667423 https://issues.apache.org/jira/browse/ARROW-12937 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-datafusion] alamb merged pull request #527: remove redundant `into_iter()` calls

2021-06-10 Thread GitBox
alamb merged pull request #527: URL: https://github.com/apache/arrow-datafusion/pull/527 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] alamb commented on pull request #527: remove redundant `into_iter()` calls

2021-06-10 Thread GitBox
alamb commented on pull request #527: URL: https://github.com/apache/arrow-datafusion/pull/527#issuecomment-858696317 I am surprised Clippy didn't complain about this ;) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and

[GitHub] [arrow-rs] alamb merged pull request #441: Cherry pick add lexicographically partition points and ranges to active_release

2021-06-10 Thread GitBox
alamb merged pull request #441: URL: https://github.com/apache/arrow-rs/pull/441 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] pitrou commented on a change in pull request #10471: ARROW-12952: [C++] Add count_substring_regex

2021-06-10 Thread GitBox
pitrou commented on a change in pull request #10471: URL: https://github.com/apache/arrow/pull/10471#discussion_r649293252 ## File path: cpp/src/arrow/compute/kernels/scalar_string.cc ## @@ -1001,21 +1058,51 @@ const FunctionDoc count_substring_doc( "Null inputs emit

[GitHub] [arrow] kszucs commented on pull request #10411: ARROW-12801: [CI][Packaging][Java] Include all modules in script that generate Arrow jars

2021-06-10 Thread GitBox
kszucs commented on pull request #10411: URL: https://github.com/apache/arrow/pull/10411#issuecomment-858748288 We could and probably should list the expected artifacts for the crossbow task so we see if anything is missing during the release process. -- This is an automated message

[GitHub] [arrow] kszucs commented on pull request #10411: ARROW-12801: [CI][Packaging][Java] Include all modules in script that generate Arrow jars

2021-06-10 Thread GitBox
kszucs commented on pull request #10411: URL: https://github.com/apache/arrow/pull/10411#issuecomment-858758216 @github-actions crossbow submit java-jars -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #520: WIP Impl window order by

2021-06-10 Thread GitBox
codecov-commenter commented on pull request #520: URL: https://github.com/apache/arrow-datafusion/pull/520#issuecomment-858784623 #

[GitHub] [arrow-rs] garyanaplan commented on pull request #443: improve BOOLEAN writing logic and report error on encoding fail

2021-06-10 Thread GitBox
garyanaplan commented on pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#issuecomment-858802185 Ok. I'm finished poking this now. I've isolated the changes required to 2 files and eliminated the original runtime impact from the PlainEncoder. -- This is an automated

[GitHub] [arrow-datafusion] Jimexist commented on pull request #530: add error handling and boundary checking for window frames

2021-06-10 Thread GitBox
Jimexist commented on pull request #530: URL: https://github.com/apache/arrow-datafusion/pull/530#issuecomment-858712328 > Looks great @Jimexist -- thanks yeah window functions are _hard_ -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [arrow] github-actions[bot] commented on pull request #10505: ARROW-12995: [C++] Add validation to CSV options

2021-06-10 Thread GitBox
github-actions[bot] commented on pull request #10505: URL: https://github.com/apache/arrow/pull/10505#issuecomment-858713053 https://issues.apache.org/jira/browse/ARROW-12995 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

[GitHub] [arrow-rs] alamb commented on a change in pull request #444: Add changelog and bump version for proposed 4.3.0 release

2021-06-10 Thread GitBox
alamb commented on a change in pull request #444: URL: https://github.com/apache/arrow-rs/pull/444#discussion_r649304786 ## File path: CHANGELOG.md ## @@ -18,3 +43,72 @@ For older versions, see [apache/arrow/CHANGELOG.md](https://github.com/apache/ar *

[GitHub] [arrow] lidavidm commented on a change in pull request #10482: ARROW-12597: [C++] Enable per-row-group parallelism in async Parquet reader

2021-06-10 Thread GitBox
lidavidm commented on a change in pull request #10482: URL: https://github.com/apache/arrow/pull/10482#discussion_r649320187 ## File path: cpp/src/parquet/arrow/reader.cc ## @@ -1024,31 +1027,32 @@ class RowGroupGenerator { ::arrow::internal::Executor* cpu_executor,

[GitHub] [arrow-datafusion] jgoday removed a comment on issue #420: Support for `!=` predicate in pruning predicates

2021-06-10 Thread GitBox
jgoday removed a comment on issue #420: URL: https://github.com/apache/arrow-datafusion/issues/420#issuecomment-858776910 Can I try to solve this issue ? If I understand it correctly, for the non equal predicate the expression should be pruned if the literal value does not fall

[GitHub] [arrow] github-actions[bot] commented on pull request #10411: ARROW-12801: [CI][Packaging][Java] Include all modules in script that generate Arrow jars

2021-06-10 Thread GitBox
github-actions[bot] commented on pull request #10411: URL: https://github.com/apache/arrow/pull/10411#issuecomment-858780758 Revision: ccf7e36fd2c1f9eb30f5af634b1cdd979f569e70 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] cyb70289 commented on a change in pull request #10364: ARROW-12074: [C++][Compute] Add scalar arithmetic kernels for decimal

2021-06-10 Thread GitBox
cyb70289 commented on a change in pull request #10364: URL: https://github.com/apache/arrow/pull/10364#discussion_r649056718 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc ## @@ -1148,5 +1148,326 @@ TYPED_TEST(TestUnaryArithmeticFloating, AbsoluteValue)

[GitHub] [arrow] github-actions[bot] commented on pull request #10499: ARROW-12738: [C++/Python/R] Update conda variant files

2021-06-10 Thread GitBox
github-actions[bot] commented on pull request #10499: URL: https://github.com/apache/arrow/pull/10499#issuecomment-858511027 Revision: b44f49618af8512142ce2678a62a8608379c47bd Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow-datafusion] codecov-commenter commented on pull request #536: use nightly nightly-2021-05-10

2021-06-10 Thread GitBox
codecov-commenter commented on pull request #536: URL: https://github.com/apache/arrow-datafusion/pull/536#issuecomment-858523152 #

[GitHub] [arrow-rs] garyanaplan opened a new pull request #443: improve BOOLEAN writing logic and report error on encoding fail

2021-06-10 Thread GitBox
garyanaplan opened a new pull request #443: URL: https://github.com/apache/arrow-rs/pull/443 # Which issue does this PR close? Closes #349 . # Rationale for this change When writing BOOLEAN data, writing more than 2048 rows of data will overflow the hard-coded 256

[GitHub] [arrow-rs] garyanaplan commented on a change in pull request #443: improve BOOLEAN writing logic and report error on encoding fail

2021-06-10 Thread GitBox
garyanaplan commented on a change in pull request #443: URL: https://github.com/apache/arrow-rs/pull/443#discussion_r649180581 ## File path: parquet/src/encodings/encoding.rs ## @@ -153,7 +155,11 @@ impl Encoder for PlainEncoder { #[inline] fn put( self, values:

[GitHub] [arrow-rs] garyanaplan commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-10 Thread GitBox
garyanaplan commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-858513251 More poking reveals that PlainEncoder has a bit_writer with a hard-coded size of 256 (big enough to hold 2048 bits...). `src/encodings/encoding.rs: line bit_writer:

[GitHub] [arrow-datafusion] Dandandan opened a new issue #537: Support min/max statistics in ParquetTable and ParquetExec

2021-06-10 Thread GitBox
Dandandan opened a new issue #537: URL: https://github.com/apache/arrow-datafusion/issues/537 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** Min/max column statistics support doing statistics based optimizations, for example

[GitHub] [arrow] E-HO commented on issue #10492: Doc update ? For Reading and Writing the Apache Parquet Format

2021-06-10 Thread GitBox
E-HO commented on issue #10492: URL: https://github.com/apache/arrow/issues/10492#issuecomment-858591234 Nice. Sorry for the doc who is legacy, a bit hard to discover like that. A suggestion to add something more clear like a "deprecated flag, see the page XXX" ? In the

[GitHub] [arrow] pitrou opened a new pull request #10503: ARROW-10115: [C++] Add CSV option to treat quoted strings as always non-null

2021-06-10 Thread GitBox
pitrou opened a new pull request #10503: URL: https://github.com/apache/arrow/pull/10503 The option is only applicable to string and binary columns. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

[GitHub] [arrow] cyb70289 commented on a change in pull request #10364: ARROW-12074: [C++][Compute] Add scalar arithmetic kernels for decimal

2021-06-10 Thread GitBox
cyb70289 commented on a change in pull request #10364: URL: https://github.com/apache/arrow/pull/10364#discussion_r649057551 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic_test.cc ## @@ -1148,5 +1148,326 @@ TYPED_TEST(TestUnaryArithmeticFloating, AbsoluteValue)

[GitHub] [arrow-rs] alamb commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-10 Thread GitBox
alamb commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-858589145 @garyanaplan -- I think the best way to get feedback on the approach would be to open a pull request -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-rs] garyanaplan commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-10 Thread GitBox
garyanaplan commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-858592224 Yeah. I'm not really happy with it, because I don't love the special handling for Booleans via the BitWriter. Just growing the buffer indefinitely seems "wrong", but I think any

[GitHub] [arrow] pitrou commented on pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-06-10 Thread GitBox
pitrou commented on pull request #9024: URL: https://github.com/apache/arrow/pull/9024#issuecomment-858609702 Closing as superceded by #10410. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow] pitrou closed pull request #9024: ARROW-11044: [C++] Add "replace" kernel

2021-06-10 Thread GitBox
pitrou closed pull request #9024: URL: https://github.com/apache/arrow/pull/9024 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] cyb70289 commented on a change in pull request #10364: ARROW-12074: [C++][Compute] Add scalar arithmetic kernels for decimal

2021-06-10 Thread GitBox
cyb70289 commented on a change in pull request #10364: URL: https://github.com/apache/arrow/pull/10364#discussion_r649053658 ## File path: cpp/src/arrow/compute/kernels/scalar_arithmetic.cc ## @@ -428,12 +524,69 @@ ArrayKernelExec ArithmeticExecFromOp(detail::GetTypeId

[GitHub] [arrow] jorisvandenbossche edited a comment on pull request #10486: ARROW-13016: [C++][Compute] Support Null type in Sum/Mean/MinMax aggregation

2021-06-10 Thread GitBox
jorisvandenbossche edited a comment on pull request #10486: URL: https://github.com/apache/arrow/pull/10486#issuecomment-858533111 `Count` is indeed certainly not a problem for "null" type (since it only counting the values, not calculating anything with them). And also for `MinMax` it

[GitHub] [arrow] jorisvandenbossche commented on pull request #10486: ARROW-13016: [C++][Compute] Support Null type in Sum/Mean/MinMax aggregation

2021-06-10 Thread GitBox
jorisvandenbossche commented on pull request #10486: URL: https://github.com/apache/arrow/pull/10486#issuecomment-858533111 `Count` is indeed certainly not a problem for "null" type (since it only counting the values, not calculating anything with them). And also for `MinMax` it seems

[GitHub] [arrow] jorisvandenbossche commented on issue #10492: Doc update ? For Reading and Writing the Apache Parquet Format

2021-06-10 Thread GitBox
jorisvandenbossche commented on issue #10492: URL: https://github.com/apache/arrow/issues/10492#issuecomment-858572027 > The new API will accept a URL as a path although it currently only has first-class support for S3 and HDFS Small correction: it will accept a URI, but not a URL

[GitHub] [arrow-rs] alamb merged pull request #442: Cherry pick refactor lexico sort for future code reuse to active_release

2021-06-10 Thread GitBox
alamb merged pull request #442: URL: https://github.com/apache/arrow-rs/pull/442 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please

[GitHub] [arrow] pitrou opened a new pull request #10504: ARROW-12937: [C++][Python] Allow setting default metadata for new S3 files

2021-06-10 Thread GitBox
pitrou opened a new pull request #10504: URL: https://github.com/apache/arrow/pull/10504 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow] pitrou commented on a change in pull request #10494: ARROW-12948: [C++][Python] Add slice_replace kernel

2021-06-10 Thread GitBox
pitrou commented on a change in pull request #10494: URL: https://github.com/apache/arrow/pull/10494#discussion_r649217125 ## File path: docs/source/cpp/compute.rst ## @@ -451,29 +451,33 @@ The third set of functions examines string elements on a byte-per-byte basis: String

[GitHub] [arrow-rs] garyanaplan commented on issue #349: parquet reading hangs when row_group contains more than 2048 rows of data

2021-06-10 Thread GitBox
garyanaplan commented on issue #349: URL: https://github.com/apache/arrow-rs/issues/349#issuecomment-858568206 Looks like that hard-coded value (256) in the bit-writer is the root cause. When writing, if we try to put > 2048 boolean values, then the writer just "ignores" the writes. This

[GitHub] [arrow-datafusion] alamb closed issue #437: RepartitionExec produces no output if the input stream errors

2021-06-10 Thread GitBox
alamb closed issue #437: URL: https://github.com/apache/arrow-datafusion/issues/437 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

[GitHub] [arrow-datafusion] alamb merged pull request #521: Return errors properly from RepartitionExec

2021-06-10 Thread GitBox
alamb merged pull request #521: URL: https://github.com/apache/arrow-datafusion/pull/521 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service,

  1   2   >