[GitHub] [arrow] mapleFU commented on pull request #33776: GH-15164: [C++][Parquet] BloomFilter fixing standard broken

2023-01-18 Thread GitBox
mapleFU commented on PR #33776: URL: https://github.com/apache/arrow/pull/33776#issuecomment-1396555264 I generate a test file, and update it here: https://github.com/apache/parquet-testing/pull/34 You can merge that. And after that, we can run testing here. -- This is an

[GitHub] [arrow-rs] Sach1nAgarwal commented on issue #53: [Parquet] Reading parquet file into an ndarray

2023-01-18 Thread GitBox
Sach1nAgarwal commented on issue #53: URL: https://github.com/apache/arrow-rs/issues/53#issuecomment-1396513788 Parallel column is increasing the performance. I checked by creating multiple `ParquetRecordBatchStream` for each column and all `ParquetRecordBatchStream` reading parallely,

[GitHub] [arrow] ursabot commented on pull request #33728: GH-33726: [CI][Go] Set host name in Go benchmarks

2023-01-18 Thread GitBox
ursabot commented on PR #33728: URL: https://github.com/apache/arrow/pull/33728#issuecomment-1396552408 Benchmark runs are scheduled for baseline = 7319250597b0f4e3b5f859eb073264ce3c72a1bd and contender = 705e04bb15f481e476c9e7a8e2ac92460890ad0c. 705e04bb15f481e476c9e7a8e2ac92460890ad0c

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4767: Move subquery alias assignment onto rules

2023-01-18 Thread GitBox
mingmwang commented on PR #4767: URL: https://github.com/apache/arrow-datafusion/pull/4767#issuecomment-1396574575 One question regarding the Subquery Alias generation logic: Why does the `InSubquery` generate a Subquery Alias, but the `Exists` Subquery does not ? -- This is an

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4969: refactor: display input partitions for `RepartitionExec`

2023-01-18 Thread GitBox
mingmwang commented on PR #4969: URL: https://github.com/apache/arrow-datafusion/pull/4969#issuecomment-1396543284 LGTM -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow] Treize44 commented on issue #33759: [Python][C++] How to limit the memory consumption of to_batches()

2023-01-18 Thread GitBox
Treize44 commented on issue #33759: URL: https://github.com/apache/arrow/issues/33759#issuecomment-1396539880 I use pyarrow 10.0.1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4924: Unify Row hash and hash implementation

2023-01-18 Thread GitBox
mustafasrepo commented on code in PR #4924: URL: https://github.com/apache/arrow-datafusion/pull/4924#discussion_r1073201772 ## datafusion/physical-expr/src/aggregate/count.rs: ## @@ -202,7 +202,9 @@ impl RowAccumulator for CountRowAccumulator { } fn evaluate(,

[GitHub] [arrow-datafusion] ursabot commented on pull request #4953: Minor: Make messages consistent for LogicalPlan::Dml

2023-01-18 Thread GitBox
ursabot commented on PR #4953: URL: https://github.com/apache/arrow-datafusion/pull/4953#issuecomment-1386635748 Benchmark runs are scheduled for baseline = 4623166c50bfb6cdc53cb9d8af5ae68efb1d36a4 and contender = 906896b7c59ff14d71b3056ec4349274cf6662af.

[GitHub] [arrow] raulcd merged pull request #33615: GH-14997: [Release] Ensure archery release tasks works with both new style GitHub issues and old style JIRA issues

2023-01-18 Thread GitBox
raulcd merged PR #33615: URL: https://github.com/apache/arrow/pull/33615 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-datafusion] viirya opened a new pull request, #4963: Replace count with is_empty in Sum

2023-01-18 Thread GitBox
viirya opened a new pull request, #4963: URL: https://github.com/apache/arrow-datafusion/pull/4963 # Which issue does this PR close? Closes #4962. # Rationale for this change # What changes are included in this PR? # Are these changes

[GitHub] [arrow-datafusion] viirya opened a new issue, #4962: Replace count with is_empty in Sum

2023-01-18 Thread GitBox
viirya opened a new issue, #4962: URL: https://github.com/apache/arrow-datafusion/issues/4962 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** In `Sum` aggregation expression, there are two state fields `sum` (int64) and

[GitHub] [arrow-rs] Sach1nAgarwal opened a new pull request, #3550: parquet:: avoid reading extra 8 bytes

2023-01-18 Thread GitBox
Sach1nAgarwal opened a new pull request, #3550: URL: https://github.com/apache/arrow-rs/pull/3550 Effect in performance increase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] mustafasrepo commented on a diff in pull request #4924: Unify Row hash and hash implementation

2023-01-18 Thread GitBox
mustafasrepo commented on code in PR #4924: URL: https://github.com/apache/arrow-datafusion/pull/4924#discussion_r1073201772 ## datafusion/physical-expr/src/aggregate/count.rs: ## @@ -202,7 +202,9 @@ impl RowAccumulator for CountRowAccumulator { } fn evaluate(,

[GitHub] [arrow-datafusion] dependabot[bot] opened a new pull request, #4961: Update substrait requirement from 0.3 to 0.4

2023-01-18 Thread GitBox
dependabot[bot] opened a new pull request, #4961: URL: https://github.com/apache/arrow-datafusion/pull/4961 Updates the requirements on [substrait](https://github.com/substrait-io/substrait-rs) to permit the latest version. Release notes Sourced from

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4923: Support filter pushdown for semi/anti join

2023-01-18 Thread GitBox
mingmwang commented on PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923#issuecomment-1386657477 Except for the method name `on_lr_is_preserved`, this PR LGTM. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4944: Only add outer filter once when transforming exists/in subquery to join

2023-01-18 Thread GitBox
mingmwang commented on PR #4944: URL: https://github.com/apache/arrow-datafusion/pull/4944#issuecomment-1386709025 > cc @mingmwang, this may be relative to your work #4366. This PR didn't addressed #4366. I know where the issue is. In this PR, you can just focus on fixing the

[GitHub] [arrow-datafusion] crepererum opened a new issue, #4964: `RepartitionExec`: print number of input partitions in text representation

2023-01-18 Thread GitBox
crepererum opened a new issue, #4964: URL: https://github.com/apache/arrow-datafusion/issues/4964 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** With the recent changes in query planning, `RepartitionExec` nodes are inserted a

[GitHub] [arrow] wgtmac opened a new pull request, #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
wgtmac opened a new pull request, #33739: URL: https://github.com/apache/arrow/pull/33739 ### Rationale for this change This [commit](https://github.com/apache/arrow/commit/c8d6110a26c41966e539e9fa2f5cb8c31dc2f0fe) implements parallel column writing in the parquet writer. However,

[GitHub] [arrow] github-actions[bot] commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1386699824 * Closes: #33655 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] AlenkaF commented on issue #33727: pandas string[pyarrow] -> category -> to_parquet fails

2023-01-18 Thread GitBox
AlenkaF commented on issue #33727: URL: https://github.com/apache/arrow/issues/33727#issuecomment-1386699394 Thank you for reporting @crusaderky! It seems `array()` method can't handle categorical pandas columns if the dictionary is `string` type. The error is triggered in

[GitHub] [arrow] wgtmac commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
wgtmac commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1386700978 @cyb70289 @westonpace @lidavidm Could you please take a look? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[GitHub] [arrow-rs] Frankonly opened a new issue, #3549: Support customer metadata reading & writing in IPC file's footter

2023-01-18 Thread GitBox
Frankonly opened a new issue, #3549: URL: https://github.com/apache/arrow-rs/issues/3549 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** IPC file footer support customer metadata, but arrow-rs's IPC file writer and reader

[GitHub] [arrow-datafusion] dependabot[bot] opened a new pull request, #4960: Update pyo3 requirement from 0.17.1 to 0.18.0

2023-01-18 Thread GitBox
dependabot[bot] opened a new pull request, #4960: URL: https://github.com/apache/arrow-datafusion/pull/4960 Updates the requirements on [pyo3](https://github.com/pyo3/pyo3) to permit the latest version. Release notes Sourced from https://github.com/pyo3/pyo3/releases;>pyo3's

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4944: Only add outer filter once when transforming exists/in subquery to join

2023-01-18 Thread GitBox
mingmwang commented on PR #4944: URL: https://github.com/apache/arrow-datafusion/pull/4944#issuecomment-1386667967 Looks like the method's comment is out of date. Could you please also fix them in this PR ? ```rust /// # Arguments /// /// * subqry - The subquery portion of

[GitHub] [arrow-datafusion] oersted opened a new issue, #4965: Read DataFrame from URL

2023-01-18 Thread GitBox
oersted opened a new issue, #4965: URL: https://github.com/apache/arrow-datafusion/issues/4965 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** It is a common use-case to dynamically download datasets (usually CSV or JSON)

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4923: Support filter pushdown for semi/anti join

2023-01-18 Thread GitBox
mingmwang commented on PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923#issuecomment-1386652044 @alamb @ygf11 Just share you the SparkSQL's result: ```sql explain extended SELECT t1_id, t1_name FROM t1 LEFT SEMI JOIN t2 ON (t1_id = t2_id and t2_id >=

[GitHub] [arrow] wgtmac commented on a diff in pull request #33736: PARQUET-2232: [C++] Add an api to ColumnChunkMetaData to indicate if the column chunk uses a bloom filter

2023-01-18 Thread GitBox
wgtmac commented on code in PR #33736: URL: https://github.com/apache/arrow/pull/33736#discussion_r1073268411 ## cpp/src/parquet/metadata.h: ## @@ -171,6 +171,7 @@ class PARQUET_EXPORT ColumnChunkMetaData { const std::vector& encodings() const; const std::vector&

[GitHub] [arrow] cyb70289 commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
cyb70289 commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1386748428 Do you have error logs? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow-datafusion] ygf11 commented on pull request #4923: Support filter pushdown for semi/anti join

2023-01-18 Thread GitBox
ygf11 commented on PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923#issuecomment-1386797496 @alamb I bet you are mislead by the title of this pr. This pr is to support pushdown for filters in join, not filter before join, and pushdown filter before join has already

[GitHub] [arrow] pitrou commented on issue #33740: [C++] Flight build error with conda packages (requiring static linking)

2023-01-18 Thread GitBox
pitrou commented on issue #33740: URL: https://github.com/apache/arrow/issues/33740#issuecomment-1386804425 Here is my overall command line: ```bash cmake ${SRC_DIR} -G Ninja \ -DCMAKE_BUILD_TYPE=Debug \ -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \

[GitHub] [arrow] pitrou commented on issue #33740: [C++] Flight build error with conda packages (requiring static linking)

2023-01-18 Thread GitBox
pitrou commented on issue #33740: URL: https://github.com/apache/arrow/issues/33740#issuecomment-1386803793 cc @kou @lidavidm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] zhztheplayer opened a new pull request, #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
zhztheplayer opened a new pull request, #33744: URL: https://github.com/apache/arrow/pull/33744 ### Rationale for this change See https://github.com/apache/arrow/issues/33743 ### What changes are included in this PR? 1. Close outstanding buffer(ledger)s during

[GitHub] [arrow] github-actions[bot] commented on pull request #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33744: URL: https://github.com/apache/arrow/pull/33744#issuecomment-1386829877 * Closes: #33743 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] github-actions[bot] commented on pull request #33744: GH-33743: [Java] Release outstanding buffers when BaseAllocator is being closed

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33744: URL: https://github.com/apache/arrow/pull/33744#issuecomment-1386829954 :warning: GitHub issue #33743 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] github-actions[bot] commented on pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33748: URL: https://github.com/apache/arrow/pull/33748#issuecomment-1386905123 * Closes: #33746 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] github-actions[bot] commented on pull request #33748: GH-33746: [R] Update NEWS.md for 11.0.0

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33748: URL: https://github.com/apache/arrow/pull/33748#issuecomment-1386905170 :warning: GitHub issue #33746 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-rs] tustvold merged pull request #3544: Update pyarrow method call to avoid warning

2023-01-18 Thread GitBox
tustvold merged PR #3544: URL: https://github.com/apache/arrow-rs/pull/3544 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow-rs] tustvold closed issue #3543: Meet warning when use pyarrow

2023-01-18 Thread GitBox
tustvold closed issue #3543: Meet warning when use pyarrow URL: https://github.com/apache/arrow-rs/issues/3543 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [arrow-rs] tustvold merged pull request #3546: Improve concat kernel capacity estimation

2023-01-18 Thread GitBox
tustvold merged PR #3546: URL: https://github.com/apache/arrow-rs/pull/3546 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] wgtmac commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
wgtmac commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1387013313 > Revision: [5d902c4](https://github.com/apache/arrow/commit/5d902c4a8ccc8381522f91a8f1a1bae41f316977) > > Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] pitrou commented on issue #33740: [C++] Flight build error with conda packages (requiring static linking)

2023-01-18 Thread GitBox
pitrou commented on issue #33740: URL: https://github.com/apache/arrow/issues/33740#issuecomment-1386805920 Sorry, was an error on my part! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the

[GitHub] [arrow-datafusion] mingmwang commented on pull request #4923: Support join-filter pushdown for semi/anti join

2023-01-18 Thread GitBox
mingmwang commented on PR #4923: URL: https://github.com/apache/arrow-datafusion/pull/4923#issuecomment-1386818233 @ygf11 BTW, just let you know that SparkSQL also had some issues within the InferFilters and Filter push down logic, the LeftSemi/LeftAnti/IN_EXISTS Subquery push down

[GitHub] [arrow] jorisvandenbossche commented on issue #33729: [Python] Support Python enums in pyarrow

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #33729: URL: https://github.com/apache/arrow/issues/33729#issuecomment-1386840722 In general we don't support converting "random" python objects that don't have a direct mapping to one of the arrow types (so some exceptions from the stdlib are Decimals

[GitHub] [arrow-rs] tustvold merged pull request #3550: parquet:: avoid reading extra 8 bytes

2023-01-18 Thread GitBox
tustvold merged PR #3550: URL: https://github.com/apache/arrow-rs/pull/3550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] jorisvandenbossche commented on issue #27052: [C++][Parquet] Inconsistent batch_size usage in parquet GetRecordBatchReader

2023-01-18 Thread GitBox
jorisvandenbossche commented on issue #27052: URL: https://github.com/apache/arrow/issues/27052#issuecomment-1387011875 I don't think anyone is actively planning to work on this, but contributions to fix this would be very welcome. -- This is an automated message from the Apache Git

[GitHub] [arrow] cyb70289 commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
cyb70289 commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1386757027 @github-actions crossbow submit -g cpp -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3550: parquet:: avoid reading extra 8 bytes

2023-01-18 Thread GitBox
tustvold commented on code in PR #3550: URL: https://github.com/apache/arrow-rs/pull/3550#discussion_r1073406262 ## parquet/src/arrow/async_reader/mod.rs: ## @@ -199,10 +199,11 @@ impl AsyncFileReader for T { self.seek(SeekFrom::End(-FOOTER_SIZE_I64 - metadata_len

[GitHub] [arrow-rs] amartins23 opened a new issue, #3551: [FlightSQL] Allow access to underlying FlightClient

2023-01-18 Thread GitBox
amartins23 opened a new issue, #3551: URL: https://github.com/apache/arrow-rs/issues/3551 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** A FlightSQL server may implement custom actions beyond what is defined by the FlightSQL

[GitHub] [arrow] raulcd commented on a diff in pull request #33660: GH-33659: [Developer Tools] Add definition of Breaking Change and Critical Fix

2023-01-18 Thread GitBox
raulcd commented on code in PR #33660: URL: https://github.com/apache/arrow/pull/33660#discussion_r1073389579 ## .github/pull_request_template.md: ## @@ -57,5 +57,11 @@ If there are user-facing changes then we may require documentation to be updated --> \ No newline at

[GitHub] [arrow-datafusion] alamb opened a new issue, #4967: Sometimes Filters are not repartitioned when they could be

2023-01-18 Thread GitBox
alamb opened a new issue, #4967: URL: https://github.com/apache/arrow-datafusion/issues/4967 **Describe the bug** We previously had a plan like this (where the RepartitionExec was added prior to a filter in order to increase parallelism). However, after upgrading

[GitHub] [arrow] abandy commented on pull request #14561: ARROW-18234: [Swift] Initial Arrow implementation

2023-01-18 Thread GitBox
abandy commented on PR #14561: URL: https://github.com/apache/arrow/pull/14561#issuecomment-1387048498 Awesome, will do, thank you! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] AlenkaF commented on a diff in pull request #33698: GH-31507: [Python] Address docstrings in Streams and File Access (Stream Classes)

2023-01-18 Thread GitBox
AlenkaF commented on code in PR #33698: URL: https://github.com/apache/arrow/pull/33698#discussion_r1073308531 ## python/pyarrow/io.pxi: ## @@ -1459,23 +1484,21 @@ cdef class BufferReader(NativeFile): Examples +Create an Arrow input stream and inspect

[GitHub] [arrow-rs] tustvold commented on issue #3485: Prevent Building Release On X86 without SSE3

2023-01-18 Thread GitBox
tustvold commented on issue #3485: URL: https://github.com/apache/arrow-rs/issues/3485#issuecomment-1386832388 Sadly there does not appear to be a mechanism to unconditionally emit a warning - https://github.com/rust-lang/cargo/issues/3777 -- This is an automated message from the Apache

[GitHub] [arrow-datafusion] alamb opened a new issue, #4968: Don't repartition ProjectionExec when it does not compute anything

2023-01-18 Thread GitBox
alamb opened a new issue, #4968: URL: https://github.com/apache/arrow-datafusion/issues/4968 **Is your feature request related to a problem or challenge? Please describe what you are trying to do.** `ProjectionExec` can either have computations like (`col1` + `col2`) or it can be used

[GitHub] [arrow-rs] ursabot commented on pull request #3550: Parquet Avoid Reading 8 Byte Footer Twice from AsyncRead

2023-01-18 Thread GitBox
ursabot commented on PR #3550: URL: https://github.com/apache/arrow-rs/pull/3550#issuecomment-1387018399 Benchmark runs are scheduled for baseline = 14545a42ec09782ec0371c05c01d112e0ca37604 and contender = 96831de828bcca5e6240c4d5dd5ddb1a1ea778e9. 96831de828bcca5e6240c4d5dd5ddb1a1ea778e9

[GitHub] [arrow-rs] ursabot commented on pull request #3546: Improve concat kernel capacity estimation

2023-01-18 Thread GitBox
ursabot commented on PR #3546: URL: https://github.com/apache/arrow-rs/pull/3546#issuecomment-1387018432 Benchmark runs are scheduled for baseline = 96831de828bcca5e6240c4d5dd5ddb1a1ea778e9 and contender = 56dfad0b2a03bc14f398a2998a68da2bc02fb7d2. 56dfad0b2a03bc14f398a2998a68da2bc02fb7d2

[GitHub] [arrow-rs] ursabot commented on pull request #3544: Update pyarrow method call to avoid warning

2023-01-18 Thread GitBox
ursabot commented on PR #3544: URL: https://github.com/apache/arrow-rs/pull/3544#issuecomment-1387018451 Benchmark runs are scheduled for baseline = 56dfad0b2a03bc14f398a2998a68da2bc02fb7d2 and contender = 40837a87c6a7ae177298fe3fcc0e83aaf678640e. 40837a87c6a7ae177298fe3fcc0e83aaf678640e

[GitHub] [arrow] kou opened a new pull request, #33751: WIP: [Release] Verify release-11.0.0-rc0

2023-01-18 Thread GitBox
kou opened a new pull request, #33751: URL: https://github.com/apache/arrow/pull/33751 PR to verify Release Candidate -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow] kou commented on pull request #33751: WIP: [Release] Verify release-11.0.0-rc0

2023-01-18 Thread GitBox
kou commented on PR #33751: URL: https://github.com/apache/arrow/pull/33751#issuecomment-1387049710 @github-actions crossbow submit --group verify-rc-source --param release=11.0.0 --param rc=0 -- This is an automated message from the Apache Git Service. To respond to the message, please

[GitHub] [arrow] wgtmac commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
wgtmac commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1386752574 > Do you have error logs? I ran TSAN and found the following data race, though it didn't report the root cause. ``` Data race (pid=96483) Read of size 2 at

[GitHub] [arrow-datafusion] mbrobbel opened a new pull request, #4966: Upgrade to Substrait 0.4.0

2023-01-18 Thread GitBox
mbrobbel opened a new pull request, #4966: URL: https://github.com/apache/arrow-datafusion/pull/4966 # Which issue does this PR close? Closes #4950. # Rationale for this change Bump to latest `substrait` crate release. # What changes are included in this PR?

[GitHub] [arrow] lukester1975 commented on issue #15139: [C++] arrow.pc is missing dependencies with Windows static builds

2023-01-18 Thread GitBox
lukester1975 commented on issue #15139: URL: https://github.com/apache/arrow/issues/15139#issuecomment-1386831952 OK. FWIW, I added this patch to vcpkg (along with https://github.com/microsoft/vcpkg/pull/23898/) and tried an install of arrow. It generates: `Libs: "-L${libdir}"

[GitHub] [arrow-datafusion] alamb commented on issue #4943: `EnforceSorting` resorts the inout of UnionExec unnecessarily

2023-01-18 Thread GitBox
alamb commented on issue #4943: URL: https://github.com/apache/arrow-datafusion/issues/4943#issuecomment-1386852191 Yes I am quite pleased with how sophisticated the sorting based optimizations are becoming -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3546: Improve concat kernel capacity estimation

2023-01-18 Thread GitBox
tustvold commented on code in PR #3546: URL: https://github.com/apache/arrow-rs/pull/3546#discussion_r1073407198 ## arrow-select/src/concat.rs: ## @@ -30,24 +30,27 @@ //! assert_eq!(arr.len(), 3); //! ``` +use arrow_array::types::*; use arrow_array::*; +use

[GitHub] [arrow-datafusion] crepererum opened a new pull request, #4969: refactor: display input partitions for `RepartitionExec`

2023-01-18 Thread GitBox
crepererum opened a new pull request, #4969: URL: https://github.com/apache/arrow-datafusion/pull/4969 # Which issue does this PR close? Closes #4964. # Rationale for this change With the recent changes in query planning, `RepartitionExec` nodes are inserted a bit more "black

[GitHub] [arrow-datafusion] mingmwang commented on issue #4943: `EnforceSorting` resorts the inout of UnionExec unnecessarily

2023-01-18 Thread GitBox
mingmwang commented on issue #4943: URL: https://github.com/apache/arrow-datafusion/issues/4943#issuecomment-1386770419 Nice !! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox
jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073352175 ## r/R/expression.R: ## @@ -89,6 +92,56 @@ Expression$create <- function(function_name, expr } + +#' @export +`[[.Expression` <- function(x, i, ...) {

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox
jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073377526 ## r/R/expression.R: ## @@ -89,6 +92,56 @@ Expression$create <- function(function_name, expr } + +#' @export +`[[.Expression` <- function(x, i, ...) {

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3544: Update pyarrow method call to avoid warning

2023-01-18 Thread GitBox
tustvold commented on code in PR #3544: URL: https://github.com/apache/arrow-rs/pull/3544#discussion_r1073422580 ## arrow/src/pyarrow.rs: ## @@ -196,7 +196,8 @@ impl PyArrowConvert for RecordBatch { let module = py.import("pyarrow")?; let class =

[GitHub] [arrow-rs] tustvold commented on a diff in pull request #3544: Update pyarrow method call to avoid warning

2023-01-18 Thread GitBox
tustvold commented on code in PR #3544: URL: https://github.com/apache/arrow-rs/pull/3544#discussion_r1073422580 ## arrow/src/pyarrow.rs: ## @@ -196,7 +196,8 @@ impl PyArrowConvert for RecordBatch { let module = py.import("pyarrow")?; let class =

[GitHub] [arrow] github-actions[bot] commented on pull request #15083: GH-33566: [C++] Add support for nullary and n-ary aggregate functions

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #15083: URL: https://github.com/apache/arrow/pull/15083#issuecomment-1387016068 * Closes: #33566 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] github-actions[bot] commented on pull request #15083: GH-33566: [C++] Add support for nullary and n-ary aggregate functions

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #15083: URL: https://github.com/apache/arrow/pull/15083#issuecomment-1387016107 :warning: GitHub issue #33566 **has been automatically assigned in GitHub** to PR creator. -- This is an automated message from the Apache Git Service. To respond to the

[GitHub] [arrow] ursabot commented on pull request #15267: GH-15265: [Java] Publish SBOM artifacts

2023-01-18 Thread GitBox
ursabot commented on PR #15267: URL: https://github.com/apache/arrow/pull/15267#issuecomment-1386736968 Benchmark runs are scheduled for baseline = 641d1da6be74c7a498067920d22ef0ff7ece74c5 and contender = 5580f27f946f06fc54eb7728b673eb9b40958034. 5580f27f946f06fc54eb7728b673eb9b40958034

[GitHub] [arrow] github-actions[bot] commented on pull request #33739: GH-33655: [C++][Parquet] Fix occasional failure in TestArrowReadWrite.MultithreadedWrite

2023-01-18 Thread GitBox
github-actions[bot] commented on PR #33739: URL: https://github.com/apache/arrow/pull/33739#issuecomment-1386762036 Revision: 5d902c4a8ccc8381522f91a8f1a1bae41f316977 Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] pitrou commented on a diff in pull request #33641: GH-32104: [C++] Add support for Run-End encoded data to Arrow

2023-01-18 Thread GitBox
pitrou commented on code in PR #33641: URL: https://github.com/apache/arrow/pull/33641#discussion_r1073318297 ## cpp/src/arrow/CMakeLists.txt: ## @@ -140,13 +140,16 @@ set(ARROW_SRCS array/array_binary.cc array/array_decimal.cc array/array_dict.cc +

[GitHub] [arrow] jorisvandenbossche commented on a diff in pull request #19706: GH-18818: [R] Create a field ref to a field in a struct

2023-01-18 Thread GitBox
jorisvandenbossche commented on code in PR #19706: URL: https://github.com/apache/arrow/pull/19706#discussion_r1073391100 ## r/src/expression.cpp: ## @@ -46,13 +46,26 @@ std::shared_ptr compute___expr__call(std::string func_name, compute::call(std::move(func_name),

[GitHub] [arrow] ursabot commented on pull request #33609: GH-31506: [Python] Address docstrings in Streams and File Access (Factory Functions)

2023-01-18 Thread GitBox
ursabot commented on PR #33609: URL: https://github.com/apache/arrow/pull/33609#issuecomment-1386944490 Benchmark runs are scheduled for baseline = 5580f27f946f06fc54eb7728b673eb9b40958034 and contender = ef98a9774dbac7705df5369f86948c3af367e73f. ef98a9774dbac7705df5369f86948c3af367e73f

[GitHub] [arrow-datafusion] gruuya commented on a diff in pull request #4958: Expose `sql_to_statement` and `statement_to_plan` on `SessionState`

2023-01-18 Thread GitBox
gruuya commented on code in PR #4958: URL: https://github.com/apache/arrow-datafusion/pull/4958#discussion_r1073510915 ## datafusion/core/src/execution/context.rs: ## @@ -1729,6 +1741,15 @@ impl SessionState { query.statement_to_plan(statement) } +///

[GitHub] [arrow-julia] okartal commented on issue #186: Support Arrow Flight RPC

2023-01-18 Thread GitBox
okartal commented on issue #186: URL: https://github.com/apache/arrow-julia/issues/186#issuecomment-1387064826 Thanks @quinnj, are you also aware of these efforts: https://discourse.julialang.org/t/ann-protobuf-jl-1-0-0/85885 > **`Service`s and `RPC`s are not yet implemented**. We

[GitHub] [arrow] raulcd commented on pull request #33751: WIP: [Release] Verify release-11.0.0-rc0

2023-01-18 Thread GitBox
raulcd commented on PR #33751: URL: https://github.com/apache/arrow/pull/33751#issuecomment-1387082063 Revision: f10f5cfd1376fb0e602334588b3f3624d41dee7d Submitted crossbow builds: [ursacomputing/crossbow @

[GitHub] [arrow] vibhatha opened a new pull request, #33753: GH-30891: [C++] The C++ API for writing datasets could be improved

2023-01-18 Thread GitBox
vibhatha opened a new pull request, #33753: URL: https://github.com/apache/arrow/pull/33753 ### Rationale for this change This PR addresses the issues mentioned [here](https://github.com/apache/arrow/issues/30891#issue-1528196430) ### What changes are included in this PR?

[GitHub] [arrow-datafusion] andygrove closed issue #4950: Upgrade to substrait 0.4

2023-01-18 Thread GitBox
andygrove closed issue #4950: Upgrade to substrait 0.4 URL: https://github.com/apache/arrow-datafusion/issues/4950 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [arrow-datafusion] andygrove merged pull request #4966: Upgrade to Substrait 0.4.0

2023-01-18 Thread GitBox
andygrove merged PR #4966: URL: https://github.com/apache/arrow-datafusion/pull/4966 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] westonpace commented on a diff in pull request #13669: ARROW-16389: [C++][Acero] Add spilling for hash join

2023-01-18 Thread GitBox
westonpace commented on code in PR #13669: URL: https://github.com/apache/arrow/pull/13669#discussion_r1073590080 ## cpp/src/arrow/compute/exec/spilling_join.h: ## @@ -0,0 +1,129 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license

[GitHub] [arrow-datafusion] alamb commented on pull request #4924: Unify Row hash and hash implementation

2023-01-18 Thread GitBox
alamb commented on PR #4924: URL: https://github.com/apache/arrow-datafusion/pull/4924#issuecomment-1387223619 I changed the PR description to say "closes https://github.com/apache/arrow-datafusion/issues/2723; and filed https://github.com/apache/arrow-datafusion/issues/4973 to track the

[GitHub] [arrow-datafusion] alamb commented on issue #2723: Consolidate GroupByHash implementations `row_hash.rs` and `hash.rs` (remove duplication)

2023-01-18 Thread GitBox
alamb commented on issue #2723: URL: https://github.com/apache/arrow-datafusion/issues/2723#issuecomment-1387224800 I filed https://github.com/apache/arrow-datafusion/issues/4973 to track consolidating the aggregators -- This is an automated message from the Apache Git Service. To

[GitHub] [arrow] zeroshade commented on a diff in pull request #14114: ARROW-17709: [Go] Array Builder for REE arrays

2023-01-18 Thread GitBox
zeroshade commented on code in PR #14114: URL: https://github.com/apache/arrow/pull/14114#discussion_r1073723887 ## go/arrow/array/encoded.go: ## @@ -156,3 +158,160 @@ func arrayRunEndEncodedApproxEqual(l, r *RunEndEncoded, opt equalOption) bool { } return true

[GitHub] [arrow] joosthooz commented on pull request #33738: GH-33737: [C++] simplify exec plan tracing

2023-01-18 Thread GitBox
joosthooz commented on PR #33738: URL: https://github.com/apache/arrow/pull/33738#issuecomment-1387311099 In the figure, why does `WaitForFinish(SinkNode:)` end earlier than the `ScalarAggregate`? Can we add a name (and maybe even an id number in case there are multiple) to the names so

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4916: Improve documentation for ExprVisitor, port simple uses to new walking function

2023-01-18 Thread GitBox
jackwener commented on code in PR #4916: URL: https://github.com/apache/arrow-datafusion/pull/4916#discussion_r1073730679 ## datafusion/expr/src/utils.rs: ## @@ -83,20 +85,16 @@ pub fn grouping_set_to_exprlist(group_expr: &[Expr]) -> Result> { } } -/// Recursively walk

[GitHub] [arrow] raulcd commented on pull request #33755: GH-33754: [CI] Install brewfile dependencies for verification task jobs on M1

2023-01-18 Thread GitBox
raulcd commented on PR #33755: URL: https://github.com/apache/arrow/pull/33755#issuecomment-1387365352 @kou this seems to fix the issue. As this is only because of our M1's setup I don't think it requires a new RC for 11.0.0 to be created. We should probably back-port it to the maintenance

[GitHub] [arrow] icexelloss commented on a diff in pull request #33676: GH-33673: [C++] Standardize as-of-join convention for past and future tolerance

2023-01-18 Thread GitBox
icexelloss commented on code in PR #33676: URL: https://github.com/apache/arrow/pull/33676#discussion_r1073794299 ## cpp/src/arrow/compute/exec/asof_join_node_test.cc: ## @@ -662,19 +662,19 @@ TRACED_TEST_P(AsofJoinBasicTest, TestBasic1, { runner(basic_test); })

[GitHub] [arrow] raulcd commented on issue #33754: [CI][C++] macOS arm64 verification tasks fail due to missing grpc++ headers

2023-01-18 Thread GitBox
raulcd commented on issue #33754: URL: https://github.com/apache/arrow/issues/33754#issuecomment-1387116718 I'll try a PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [arrow-datafusion] jdye64 commented on a diff in pull request #4892: refactor and add simple function to deserialize and serialize proto b…

2023-01-18 Thread GitBox
jdye64 commented on code in PR #4892: URL: https://github.com/apache/arrow-datafusion/pull/4892#discussion_r1073568920 ## datafusion/substrait/src/serializer.rs: ## @@ -27,15 +27,20 @@ use std::fs::OpenOptions; use std::io::{Read, Write}; pub async fn serialize(sql: , ctx:

[GitHub] [arrow] kou merged pull request #33712: GH-15139: [C++] Improve bzip2 static library path detection for arrow.pc

2023-01-18 Thread GitBox
kou merged PR #33712: URL: https://github.com/apache/arrow/pull/33712 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail:

[GitHub] [arrow] kou commented on pull request #33712: GH-15139: [C++] Improve bzip2 static library path detection for arrow.pc

2023-01-18 Thread GitBox
kou commented on PR #33712: URL: https://github.com/apache/arrow/pull/33712#issuecomment-1387153508 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [arrow] lidavidm commented on a diff in pull request #14114: ARROW-17709: [Go] Array Builder for REE arrays

2023-01-18 Thread GitBox
lidavidm commented on code in PR #14114: URL: https://github.com/apache/arrow/pull/14114#discussion_r1073589805 ## go/arrow/array/encoded.go: ## @@ -156,3 +158,160 @@ func arrayRunEndEncodedApproxEqual(l, r *RunEndEncoded, opt equalOption) bool { } return true

[GitHub] [arrow-datafusion] jdye64 opened a new pull request, #4971: Add DataFusionError::Substrait variant to DataFusionError enum

2023-01-18 Thread GitBox
jdye64 opened a new pull request, #4971: URL: https://github.com/apache/arrow-datafusion/pull/4971 # Which issue does this PR close? Closes #4970 # Rationale for this change More clear error handling by providing a more specific DataFusionError enum for Substrait

[GitHub] [arrow-datafusion] jdye64 commented on pull request #4892: refactor and add simple function to deserialize and serialize proto b…

2023-01-18 Thread GitBox
jdye64 commented on PR #4892: URL: https://github.com/apache/arrow-datafusion/pull/4892#issuecomment-1387169323 [PR](https://github.com/apache/arrow-datafusion/pull/4971) for DataFusionError::Substrait variant for reference. -- This is an automated message from the Apache Git Service.

[GitHub] [arrow-datafusion] mustafasrepo commented on pull request #4945: Minor: Reduce even more redundancy creating window_agg in sort_enforcement tests

2023-01-18 Thread GitBox
mustafasrepo commented on PR #4945: URL: https://github.com/apache/arrow-datafusion/pull/4945#issuecomment-1387194572 LGTM!. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific

[GitHub] [arrow] rtpsw commented on pull request #33676: GH-33673: [C++] Standardize as-of-join convention for past and future tolerance

2023-01-18 Thread GitBox
rtpsw commented on PR #33676: URL: https://github.com/apache/arrow/pull/33676#issuecomment-1387193893 cc @icexelloss -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

  1   2   3   4   5   >