date:20230109

[GitHub] [arrow] kou commented on issue #15280: [C++] Move Acero out of libarrow

2023-01-09 Thread GitBox



kou commented on issue #15280:
URL: https://github.com/apache/arrow/issues/15280#issuecomment-1376855728

   I prefer to `libacero` because Gandiva uses `libgandiva` not 
`libarrow_gandiva`.
   ADBC also uses `libadbc_*` not `libarrow_adbc_*`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] kou commented on issue #15279: 10.0.1 Ruby gems are not released to RubyGems

2023-01-09 Thread GitBox



kou commented on issue #15279:
URL: https://github.com/apache/arrow/issues/15279#issuecomment-1376852622

   Oh, I forgot to push them!
   I've pushed them.
   
   BTW, what is your platform? I want to see the full log of the problem 
situation.
   I think that red-arrow 10.0.0 requires Apache Arrow C GLib 10.0.0 or later. 
So red-arrow 10.0.0 should not to try installing Apache Arrow C GLib 10.0.1 
when Apache Arrow C GLib 10.0.1 is already installed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4863: Support for SQL Natural Join

2023-01-09 Thread GitBox



ursabot commented on PR #4863:
URL: 
https://github.com/apache/arrow-datafusion/pull/4863#issuecomment-1376808500

   Benchmark runs are scheduled for baseline = 
1eb46df49fcd2479235500810c0562b10da77c90 and contender = 
556282a8b6da6cb7d41d8c311211ae49b7ed82a7. 
556282a8b6da6cb7d41d8c311211ae49b7ed82a7 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4699ea4135f74164bb6e16b6a9875991...93a4e515b1044fa39afc952dbcac6259/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/cb2a0250846d4dedabcc14e082e9b60f...a56bac486f544116ae63385428b85d01/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e46e16de48b1454d8a23337944dbfc30...d3917c7d9226415fb513f6db83e315ca/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/baed42e3723c48c4afd60fcff464b6fe...33d9c9cf6ea24b03be7872af81148197/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4852: Covariance single row input & null skipping

2023-01-09 Thread GitBox



ursabot commented on PR #4852:
URL: 
https://github.com/apache/arrow-datafusion/pull/4852#issuecomment-1376808478

   Benchmark runs are scheduled for baseline = 
292eb954fc0bad3a1febc597233ba26cb60bda3e and contender = 
1eb46df49fcd2479235500810c0562b10da77c90. 
1eb46df49fcd2479235500810c0562b10da77c90 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/7fa5429ba5fc4323b33a306c3ea156f3...4699ea4135f74164bb6e16b6a9875991/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/337f5b7a8b8944bda0f0425ea232e627...cb2a0250846d4dedabcc14e082e9b60f/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/2e4be53365ae4c42a27f8f84b96d17bf...e46e16de48b1454d8a23337944dbfc30/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/bad9c40adbb34927a3668ca46355549e...baed42e3723c48c4afd60fcff464b6fe/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ygf11 commented on issue #4844: Incorrect results for join condition against current master branch

2023-01-09 Thread GitBox



ygf11 commented on issue #4844:
URL: 
https://github.com/apache/arrow-datafusion/issues/4844#issuecomment-1376804032

   > I agree -- I have mentioned it on the mailing list vote thread. @ygf11 can 
you work on a PR for this issue soon?
   
   Of course @alamb , I will submit a pr today or tomorrow.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4863: Support for SQL Natural Join

2023-01-09 Thread GitBox



alamb merged PR #4863:
URL: https://github.com/apache/arrow-datafusion/pull/4863


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #117: Add SQL support for NATURAL JOIN

2023-01-09 Thread GitBox



alamb closed issue #117:   Add SQL support for NATURAL JOIN
URL: https://github.com/apache/arrow-datafusion/issues/117


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4852: Covariance single row input & null skipping

2023-01-09 Thread GitBox



alamb merged PR #4852:
URL: https://github.com/apache/arrow-datafusion/pull/4852


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4852: Covariance single row input & null skipping

2023-01-09 Thread GitBox



alamb commented on PR #4852:
URL: 
https://github.com/apache/arrow-datafusion/pull/4852#issuecomment-1376802487

   Thanks @korowa  and @ozankabak 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4848: Support using var/var_pop/stddev/stddev_pop in window expressions with custom frames

2023-01-09 Thread GitBox



ursabot commented on PR #4848:
URL: 
https://github.com/apache/arrow-datafusion/pull/4848#issuecomment-1376801429

   Benchmark runs are scheduled for baseline = 
13fb42efec4b5ab7f9aa251f1705fdcf89057d23 and contender = 
292eb954fc0bad3a1febc597233ba26cb60bda3e. 
292eb954fc0bad3a1febc597233ba26cb60bda3e is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/597674ce0c554b6c87821768e1a80ede...7fa5429ba5fc4323b33a306c3ea156f3/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/5d9e1bc44e204f3a982e1786fd6bcc0b...337f5b7a8b8944bda0f0425ea232e627/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/89f47298cf844a66bbb6ea7f935e19d3...2e4be53365ae4c42a27f8f84b96d17bf/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/89d3d1c9c5af48c39a55df97f307a4c4...bad9c40adbb34927a3668ca46355549e/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4852: Covariance single row input & null skipping

2023-01-09 Thread GitBox



alamb commented on PR #4852:
URL: 
https://github.com/apache/arrow-datafusion/pull/4852#issuecomment-1376800585

   > Had we migrated to Rust 1.65, I would suggest the much simpler:
   > @alamb, any information as to when we will make the switch?
   
   I am not sure what you are asking here @ozankabak . The CI checks use the 
latest rust stable version. I just verified that this is currently using Rust 
`1.66`. 
   
   
https://github.com/apache/arrow-datafusion/actions/runs/3878200568/jobs/6614116891#step:4:97


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4848: Support using var/var_pop/stddev/stddev_pop in window expressions with custom frames

2023-01-09 Thread GitBox



alamb merged PR #4848:
URL: https://github.com/apache/arrow-datafusion/pull/4848


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4848: Support using var/var_pop/stddev/stddev_pop in window expressions with custom frames

2023-01-09 Thread GitBox



alamb commented on PR #4848:
URL: 
https://github.com/apache/arrow-datafusion/pull/4848#issuecomment-1376798281

   Nice -- thank you @jonmmease  and @ozankabak 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4846: Implement retract_batch for AvgAccumulator

2023-01-09 Thread GitBox



ursabot commented on PR #4846:
URL: 
https://github.com/apache/arrow-datafusion/pull/4846#issuecomment-1376794889

   Benchmark runs are scheduled for baseline = 
42f7dd50913c4f6b6d830b88d55dfd4a8d16d44f and contender = 
13fb42efec4b5ab7f9aa251f1705fdcf89057d23. 
13fb42efec4b5ab7f9aa251f1705fdcf89057d23 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/faa29f51a2314df2957bc566903cb458...597674ce0c554b6c87821768e1a80ede/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/a3b2097585e04c51bc17dfaba323e635...5d9e1bc44e204f3a982e1786fd6bcc0b/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/76c4557b831b456184c5e76d92fb0cd7...89f47298cf844a66bbb6ea7f935e19d3/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/7b369e15214b45baa689fef51fe6e0d3...89d3d1c9c5af48c39a55df97f307a4c4/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on issue #4844: Incorrect results for join condition against current master branch

2023-01-09 Thread GitBox



alamb commented on issue #4844:
URL: 
https://github.com/apache/arrow-datafusion/issues/4844#issuecomment-1376793437

   > IMO it would be a good idea to fix this before releasing,
   
   I agree -- I have mentioned it on the mailing list vote thread.  @ygf11  can 
you work on a PR for this issue soon? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4846: Implement retract_batch for AvgAccumulator

2023-01-09 Thread GitBox



alamb commented on PR #4846:
URL: 
https://github.com/apache/arrow-datafusion/pull/4846#issuecomment-1376789846

   Sorry for the delay @jonmmease  -- thanks again for the contributions!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #4845: Support custom window frame with AVG aggregate function

2023-01-09 Thread GitBox



alamb closed issue #4845: Support custom window frame with AVG aggregate 
function
URL: https://github.com/apache/arrow-datafusion/issues/4845


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4846: Implement retract_batch for AvgAccumulator

2023-01-09 Thread GitBox



alamb merged PR #4846:
URL: https://github.com/apache/arrow-datafusion/pull/4846


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4855: Minor: Move test data into `datafusion/core/tests/data`

2023-01-09 Thread GitBox



alamb commented on PR #4855:
URL: 
https://github.com/apache/arrow-datafusion/pull/4855#issuecomment-1376789486

   Thanks @jackwener 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #15268: GH-14976: [Python] Avoid dependency on exec plan in Table.sort_by to fix minimal tests

2023-01-09 Thread GitBox



ursabot commented on PR #15268:
URL: https://github.com/apache/arrow/pull/15268#issuecomment-1376781511

   Benchmark runs are scheduled for baseline = 
5f3df1255e6e0b7d2d4e197afec447769335f087 and contender = 
154de48f0b6c16ae5acc384ed8383704253afd96. 
154de48f0b6c16ae5acc384ed8383704253afd96 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/6913a328cb6941b2b4ea823560eec9b3...69352a735ab14742bb5463c31c4e7d1d/)
   [Finished :arrow_down:0.58% :arrow_up:0.03%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/1240994dbd664119a06041753ed50fb7...a093b2196b8e4f7784a166efb4c3f781/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/bd6091cf591947b6adc113fadc2ef40e...4cbb0be6cd444cc9bd920d85bb5b6c42/)
   [Finished :arrow_down:0.1% :arrow_up:0.0%] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/9c9c618cfdc04f7bb7a7704a0d8c2cc5...b3cd26fdcfe247faa32546f8e3ba484a/)
   Buildkite builds:
   [Finished] [`154de48f` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2139)
   [Finished] [`154de48f` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2162)
   [Finished] [`154de48f` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2134)
   [Finished] [`154de48f` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2154)
   [Finished] [`5f3df125` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2138)
   [Finished] [`5f3df125` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2161)
   [Finished] [`5f3df125` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2133)
   [Finished] [`5f3df125` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2153)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-ballista] Ted-Jiang closed pull request #285: [arrow-ballista] enable delete log periodically (#280)

2023-01-09 Thread GitBox



Ted-Jiang closed pull request #285: [arrow-ballista] enable delete log 
periodically (#280)
URL: https://github.com/apache/arrow-ballista/pull/285


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lukehsiao commented on issue #15233: [Python] pyarrow.fs.copy_files hangs indefinitely

2023-01-09 Thread GitBox



lukehsiao commented on issue #15233:
URL: https://github.com/apache/arrow/issues/15233#issuecomment-1376741426

   > I'm taking a look at this. I have one debugging question for you in the 
meantime however. When I look at _upload_to_uri_with_exclude it appears that it 
is a recursive copy. Is there a possibility that you've included the current 
directory "." or parent directory ".." and that's why the operation never 
finishes? 
   
   In our case, `exclude` is None, so that code path to 
`_upload_to_uri_with_exclude` is actually never hit (and you'll notice it 
doesn't show in the flamegraph). So, it's essentially just the call to 
`pyarrow.fs.copy_files`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] mapleFU commented on pull request #15241: GH-14923: [C++][Parquet] Fix DELTA_BINARY_PACKED problem on reading the last block with malford bit-width

2023-01-09 Thread GitBox



mapleFU commented on PR #15241:
URL: https://github.com/apache/arrow/pull/15241#issuecomment-1376739203

   @pitrou @wjones127 Mind take a look? Since I made lots of naming changes( 
described https://github.com/apache/arrow/pull/15241#issue-1523751523 ), and 
update a Hex testing format. Don't know whether it's ok.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 merged pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



cyb70289 merged PR #15201:
URL: https://github.com/apache/arrow/pull/15201


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



cyb70289 commented on PR #15201:
URL: https://github.com/apache/arrow/pull/15201#issuecomment-1376733663

   Not related to this PR, but the benchmark result is not obvious to me.
   Why is `double` faster than `int64`? I suppose integer rounding does nothing.
   
   ```
   Ceil/size:1048576/inverse_null_proportion:0  
323253 ns   323251 ns
   2161 bytes_per_second=3.02107G/s items_per_second=405.481M/s null_percent=0 
size=1048.58k
   
   Ceil/size:1048576/inverse_null_proportion:0 
 89834 ns89832 ns 
   7795 bytes_per_second=10.871G/s items_per_second=1.45908G/s null_percent=0 
size=1048.58k
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



cyb70289 commented on PR #15201:
URL: https://github.com/apache/arrow/pull/15201#issuecomment-1376730524

   Travis CI error is not related.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #15288: GH-15284: [C++] Use DeclarationToExecBatches in Acero plan tests

2023-01-09 Thread GitBox



github-actions[bot] commented on PR #15288:
URL: https://github.com/apache/arrow/pull/15288#issuecomment-1376724815

   * Closes: #15284


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] vibhatha opened a new pull request, #15288: GH-15284: [C++] Use DeclarationToExecBatches in Acero plan tests

2023-01-09 Thread GitBox



vibhatha opened a new pull request, #15288:
URL: https://github.com/apache/arrow/pull/15288

   This PR includes a change to the existing test cases in `plan_test.cc` for 
Acero-based plan evaluation. 
   
   At the moment, using the `StartAndCollect` method and manually passing the 
`FutuerMatcher` and expected results are done and the validation is done on the 
results obtained from the `StartAndCollect` method. Most of the updated test 
cases in this PR uses the a common format. 
   
   Replacing that format and using `DeclarationToExecBatches` makes the code 
more readable and easy to debug. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] jackwener commented on a diff in pull request #4863: Support for SQL Natural Join

2023-01-09 Thread GitBox



jackwener commented on code in PR #4863:
URL: https://github.com/apache/arrow-datafusion/pull/4863#discussion_r1065311448


##
datafusion/sql/src/relation/join.rs:
##
@@ -142,10 +142,29 @@ impl<'a, S: ContextProvider> SqlToRel<'a, S> {
 .build()
 }
 JoinConstraint::Natural => {
-// https://issues.apache.org/jira/browse/ARROW-10727
-Err(DataFusionError::NotImplemented(
-"NATURAL JOIN is not supported 
(https://issues.apache.org/jira/browse/ARROW-10727)".to_string(),
-))
+// TODO: what if no common join cols? -> becomes cross join
+// TODO: limit join_type to specific set?
+let left_cols: HashSet<> = left
+.schema()
+.fields()
+.iter()
+.map(|f| f.field().name())
+.collect();
+let keys: Vec = right
+.schema()
+.fields()
+.iter()
+.map(|f| f.field().name())
+.filter(|f| left_cols.contains(f))
+.map(Column::from_name)
+.collect();

Review Comment:
   > I'm not sure I follow, are you suggesting to consider the qualifier when 
checking for common columns? My understanding is that when joining two 
different tables they should have different qualifiers anyway.
   
   Look like current implementation is ok. `field.name()` don't use qualifier, 
so this Problem don't exist



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] EpsilonPrime commented on issue #15233: [Python] pyarrow.fs.copy_files hangs indefinitely

2023-01-09 Thread GitBox



EpsilonPrime commented on issue #15233:
URL: https://github.com/apache/arrow/issues/15233#issuecomment-1376686021

   I'm taking a look at this.  I have one debugging question for you in the 
meantime however.  When I look at _upload_to_uri_with_exclude it appears that 
it is a recursive copy.  Is there a possibility that you've included the 
current directory "." or parent directory ".." and that's why the operation 
never finishes?  One way of determining this would be to insert a log/print at 
https://github.com/ray-project/ray/blob/d7b2b49a962bf33dae7a50376f159ab15d80800f/python/ray/air/_internal/remote_storage.py#L238
 to output the paths that are going to be copied from.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #15269: ARROW-16728: [Python] ParquetDataset to still take legacy code path when old filesystem is passed

2023-01-09 Thread GitBox



ursabot commented on PR #15269:
URL: https://github.com/apache/arrow/pull/15269#issuecomment-1376682079

   Benchmark runs are scheduled for baseline = 
0121ae73f4852ee3c5a6870ed5583916662a35be and contender = 
5f3df1255e6e0b7d2d4e197afec447769335f087. 
5f3df1255e6e0b7d2d4e197afec447769335f087 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/e2790950792344dea8837e8cccbd1c74...6913a328cb6941b2b4ea823560eec9b3/)
   [Finished :arrow_down:0.91% :arrow_up:0.0%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/1a2c8751870940f0ae99d4a02268d742...1240994dbd664119a06041753ed50fb7/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/7fd5af8e17ec442fa199339edcbabd07...bd6091cf591947b6adc113fadc2ef40e/)
   [Finished :arrow_down:0.07% :arrow_up:0.0%] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ebf1d499d58a4016b08dff1a603c6d0d...9c9c618cfdc04f7bb7a7704a0d8c2cc5/)
   Buildkite builds:
   [Finished] [`5f3df125` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2138)
   [Finished] [`5f3df125` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2161)
   [Finished] [`5f3df125` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2133)
   [Finished] [`5f3df125` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2153)
   [Finished] [`0121ae73` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2137)
   [Finished] [`0121ae73` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2160)
   [Finished] [`0121ae73` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2132)
   [Finished] [`0121ae73` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2152)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on a diff in pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



cyb70289 commented on code in PR #15201:
URL: https://github.com/apache/arrow/pull/15201#discussion_r1065303435


##
cpp/src/arrow/compute/kernels/scalar_round_benchmark.cc:
##
@@ -0,0 +1,125 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "benchmark/benchmark.h"
+
+#include 
+
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/util/benchmark_util.h"
+
+namespace arrow {
+namespace compute {
+
+// Use a fixed hash to ensure consistent results from run to run.
+constexpr auto kSeed = 0x94378165;
+
+template 
+static void RoundArrayBenchmark(benchmark::State& state, const std::string& 
func_name) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / sizeof(CType);
+  auto rand = random::RandomArrayGenerator(kSeed);
+
+  // Choose values so as to avoid overflow on all ops and types.
+  auto min = static_cast(6);
+  auto max = static_cast(min + 15);
+  auto val = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+  RoundOptions options;
+  options.round_mode = static_cast(Mode);
+
+  for (auto _ : state) {
+ABORT_NOT_OK(CallFunction(func_name, {val}, ));
+  }
+  state.SetItemsProcessed(state.iterations() * array_size);
+}
+
+void SetRoundArgs(benchmark::internal::Benchmark* bench) {
+  bench->ArgNames({"size", "inverse_null_proportion"});
+
+  for (const auto inverse_null_proportion : std::vector({100, 0})) {
+bench->Args({static_cast(kL2Size), inverse_null_proportion});
+  }
+}
+
+template 
+static void Ceil(benchmark::State& state) {
+  RoundArrayBenchmark(state, "ceil");
+}
+
+template 
+static void Floor(benchmark::State& state) {
+  RoundArrayBenchmark(state, "floor");

Review Comment:
   We can leave decimals as followup task. Can you open an issue to track it? 
Thanks.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] EpsilonPrime commented on pull request #15283: GH-15282: [C++,CI] Temporarily fix continuous integration issue with clang-format v14.

2023-01-09 Thread GitBox



EpsilonPrime commented on PR #15283:
URL: https://github.com/apache/arrow/pull/15283#issuecomment-1376681342

   This attempt appears to change the outer configuration of the docker test 
build and not the interior.  As such this doesn't address the underlying 
problem.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] EpsilonPrime commented on a diff in pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



EpsilonPrime commented on code in PR #15201:
URL: https://github.com/apache/arrow/pull/15201#discussion_r1065302588


##
cpp/src/arrow/compute/kernels/scalar_round_benchmark.cc:
##
@@ -0,0 +1,125 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "benchmark/benchmark.h"
+
+#include 
+
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/util/benchmark_util.h"
+
+namespace arrow {
+namespace compute {
+
+// Use a fixed hash to ensure consistent results from run to run.
+constexpr auto kSeed = 0x94378165;
+
+template 
+static void RoundArrayBenchmark(benchmark::State& state, const std::string& 
func_name) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / sizeof(CType);
+  auto rand = random::RandomArrayGenerator(kSeed);
+
+  // Choose values so as to avoid overflow on all ops and types.
+  auto min = static_cast(6);
+  auto max = static_cast(min + 15);
+  auto val = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+  RoundOptions options;
+  options.round_mode = static_cast(Mode);
+
+  for (auto _ : state) {
+ABORT_NOT_OK(CallFunction(func_name, {val}, ));
+  }
+  state.SetItemsProcessed(state.iterations() * array_size);
+}
+
+void SetRoundArgs(benchmark::internal::Benchmark* bench) {
+  bench->ArgNames({"size", "inverse_null_proportion"});
+
+  for (const auto inverse_null_proportion : std::vector({100, 0})) {
+bench->Args({static_cast(kL2Size), inverse_null_proportion});
+  }
+}
+
+template 
+static void Ceil(benchmark::State& state) {
+  RoundArrayBenchmark(state, "ceil");
+}
+
+template 
+static void Floor(benchmark::State& state) {
+  RoundArrayBenchmark(state, "floor");

Review Comment:
   Good catch, fixed (although I had to remove the Decimal128 benchmarks for 
now).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on a diff in pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



cyb70289 commented on code in PR #15201:
URL: https://github.com/apache/arrow/pull/15201#discussion_r1065293955


##
cpp/src/arrow/compute/kernels/scalar_round_benchmark.cc:
##
@@ -0,0 +1,125 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "benchmark/benchmark.h"
+
+#include 
+
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/util/benchmark_util.h"
+
+namespace arrow {
+namespace compute {
+
+// Use a fixed hash to ensure consistent results from run to run.
+constexpr auto kSeed = 0x94378165;
+
+template 
+static void RoundArrayBenchmark(benchmark::State& state, const std::string& 
func_name) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / sizeof(CType);
+  auto rand = random::RandomArrayGenerator(kSeed);
+
+  // Choose values so as to avoid overflow on all ops and types.
+  auto min = static_cast(6);
+  auto max = static_cast(min + 15);
+  auto val = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+  RoundOptions options;
+  options.round_mode = static_cast(Mode);
+
+  for (auto _ : state) {
+ABORT_NOT_OK(CallFunction(func_name, {val}, ));
+  }
+  state.SetItemsProcessed(state.iterations() * array_size);
+}
+
+void SetRoundArgs(benchmark::internal::Benchmark* bench) {
+  bench->ArgNames({"size", "inverse_null_proportion"});
+
+  for (const auto inverse_null_proportion : std::vector({100, 0})) {
+bench->Args({static_cast(kL2Size), inverse_null_proportion});
+  }
+}
+
+template 
+static void Ceil(benchmark::State& state) {
+  RoundArrayBenchmark(state, "ceil");
+}
+
+template 
+static void Floor(benchmark::State& state) {
+  RoundArrayBenchmark(state, "floor");

Review Comment:
   `FloatType` looks wrong. Should it be `ArrowType`? Same for Round and Trunc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] JayjeetAtGithub commented on issue #3483: Use `array_value_to_string` in `arrow-csv`

2023-01-09 Thread GitBox



JayjeetAtGithub commented on issue #3483:
URL: https://github.com/apache/arrow-rs/issues/3483#issuecomment-1376649197

   @alamb @tustvold Continuing the discussion from apache/arrow-rs#3417, 
currently the `arrow-csv` writer allows users to pass custom datetime formats 
besides the defaults. By using `array_value_to_string`, we would enforce the 
format and disable the customization. Otherwise, if we still want 
customization, we can keep both, where we basically fallback to using 
`array_value_to_string` when no custom format is supplied by the user. Looking 
forward to hearing your thoughts on this. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-ballista] yahoNanJing commented on pull request #576: Use lz4 for shuffle (stream) compression

2023-01-09 Thread GitBox



yahoNanJing commented on PR #576:
URL: https://github.com/apache/arrow-ballista/pull/576#issuecomment-1376640134

   > LZ4 is much more light on the CPU than ZSTD, so would be great if we could 
configure this too!
   
   I think by default, the compression is not enabled, right? So at least, it's 
better for us to add a config to indicate whether to enable the compression 
feature.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 merged pull request #15250: GH-15249: [Documentation] Add PR template

2023-01-09 Thread GitBox



cyb70289 merged PR #15250:
URL: https://github.com/apache/arrow/pull/15250


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #14917: [C++] Issues with GTest when building Arrow C++ on MacOS

2023-01-09 Thread GitBox



westonpace commented on issue #14917:
URL: https://github.com/apache/arrow/issues/14917#issuecomment-1376623935

   > @AlenkaF does this happen after using the Conda instructions in the Python 
development docs?
   
   @mhconradt 
   
   Yes.  It appears that conda's recipe has the same issue.  I have verified 
this manually using the reproducer in 
https://github.com/Homebrew/homebrew-core/issues/111810 
   
   I think conda just uses the cmake file as is and clang defaults to C++14 if 
not specified.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cyb70289 commented on a diff in pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



cyb70289 commented on code in PR #15201:
URL: https://github.com/apache/arrow/pull/15201#discussion_r1065267123


##
cpp/src/arrow/compute/kernels/scalar_round_benchmark.cc:
##
@@ -0,0 +1,111 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "benchmark/benchmark.h"
+
+#include 
+
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/util/benchmark_util.h"
+
+namespace arrow {
+namespace compute {
+
+// Use a fixed hash to ensure consistent results from run to run.
+constexpr auto kSeed = 0x94378165;
+
+template 
+static void RoundArrayBenchmark(benchmark::State& state, const std::string& 
func_name) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / sizeof(CType);
+  auto rand = random::RandomArrayGenerator(kSeed);
+
+  // Choose values so as to avoid overflow on all ops and types.
+  auto min = static_cast(6);
+  auto max = static_cast(min + 15);
+  auto val = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+  RoundOptions options;
+  options.round_mode = static_cast(Mode);
+
+  for (auto _ : state) {
+ABORT_NOT_OK(CallFunction(func_name, {val}, ));
+  }
+  state.SetItemsProcessed(state.iterations() * array_size);
+}
+
+void SetRoundArgs(benchmark::internal::Benchmark* bench) {
+  for (const auto inverse_null_proportion : std::vector({100, 0})) {

Review Comment:
   Ah, that's okay.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] boshek commented on issue #15056: [R][R Shiny] Deployed/published Rshiny app using arrow to query an AWS-hosted dataset causes intermittent "stack imbalance", "segfault", "memory not

2023-01-09 Thread GitBox



boshek commented on issue #15056:
URL: https://github.com/apache/arrow/issues/15056#issuecomment-1376600272

   The feather files are rather large so for something like this where so much 
is going across the wire, csv or parquet can actually be quicker. That said, it 
_should_ still work regardless of file format. 
   
   ```
   microbenchmark::microbenchmark(
  arrow = open_dataset(arrow_bucket, format = 'arrow', unify_schemas = 
FALSE) %>% 
filter(parameter == 'Ozone',
   sample_duration == '1 HOUR',
   poc == 1) %>% 
collect(),
  csv = open_dataset(csv_bucket, format = 'csv', unify_schemas = FALSE) %>% 
filter( parameter == 'Ozone',
   sample_duration == '1 HOUR',
   poc == 1) %>% 
collect(),
  parquet = open_dataset(parquet_bucket, format = 'parquet', unify_schemas 
= FALSE) %>% 
filter( parameter == 'Ozone',
sample_duration == '1 HOUR',
poc == 1) %>% 
collect(), times = 3L
)
   Unit: seconds
   expr   minlq  meanmedianuq   max neval
  arrow 16.984894 17.353667 17.822625 17.722439 18.241490 18.760542 3
csv 10.173464 10.474577 10.876690 10.775691 11.228303 11.680915 3
parquet  8.599491  8.630459  8.705268  8.661427  8.758156  8.854885 3
```

The reason I suggested Linux was because rsc and shinyapps running on linux 
machines so maybe this isn't related at to RStudio Connect. That would be good 
to test. I will see what I can do. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #15220: Speed up Parquet Writing?

2023-01-09 Thread GitBox



westonpace commented on issue #15220:
URL: https://github.com/apache/arrow/issues/15220#issuecomment-1376594167

   Reading json and writing parquet, purely in pyarrow, should look something 
like:
   
   ```
   import pyarrow.parquet
   import pyarrow.json
   table = pyarrow.json.read_json('/tmp/foo.jsonl')
   pyarrow.parquet.write_table(table, '/tmp/foo.parquet')
   ```
   
   Separating disk write from encoding would be trickier (there is an in-memory 
filesystem in C++ but I don't think we expose it in python).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] wjones127 commented on issue #15231: [Benchmarking][C++] Track memory usage in C++ microbenchmarks

2023-01-09 Thread GitBox



wjones127 commented on issue #15231:
URL: https://github.com/apache/arrow/issues/15231#issuecomment-1376592183

   > Though I do wonder how many memory issues would be revealed at the 
micro-benchmark level
   
   It's a good question. One thing I have noticed when I do measure memory 
usage is it tends to be very deterministic. So it's a lot easier to detect 
regressions and improvements for memory usage than it is for time-based metrics 
(which can be very noisy).
   
   But it's also the case that memory issues can happen due to the interaction 
between systems, rather than just within isolated components. If we are using 
Google bench for any macro-benchmarks happy to apply it there as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #15232: [Documentation] Provide guidance to contributors on getting reviews

2023-01-09 Thread GitBox



westonpace commented on issue #15232:
URL: https://github.com/apache/arrow/issues/15232#issuecomment-1376587209

   I'm in favor of more users reading the contributing guidelines / docs 
(they're great).
   
   I don't know how effective assigning reviewers (or something like code 
owners) will be.  On the one hand, I suppose I could stop looking at all 
notifications which is what I do today.  On the other hand, I fear I would 
become blind to anything that wasn't modification of files I was assigned to (I 
assume we would just self-assign?)  Have other projects had success with this?
   
   Another suggestion might be to make sure PRs are assigned to at least one 
reviewer, ping reviewers when they do not respond in a reasonable amount of 
time and, if too much time passes, assign to a different reviewer or close.  
Not sure if that can be automated or would require some kind of human 
intervention.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #15226: [Python] dictionary_encode support duration types

2023-01-09 Thread GitBox



westonpace commented on issue #15226:
URL: https://github.com/apache/arrow/issues/15226#issuecomment-1376578323

   Casting from duration to `int64` should be a zero-copy operation:
   
   ```
   >>> import pyarrow as pa
   >>> import datetime
   >>> arr = pa.array([datetime.timedelta(days=1), datetime.timedelta(hours=1)])
   >>> arr
   
   [
 864,
 36
   ]
   >>> arr.cast(pa.int64())
   
   [
 864,
 36
   ]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #15161: [C++] Inplace renaming of columns in c++ record batch.

2023-01-09 Thread GitBox



westonpace commented on issue #15161:
URL: https://github.com/apache/arrow/issues/15161#issuecomment-1376574396

   > I don't see this function in the record batch
   
   Correct, one does not exist.  I was suggesting that someone create one.  If 
you create one yourself I'd encourage you to submit a PR.
   
   > in what instance, is there a copy of the buffer.
   
   Normally you have to go out of your way to copy a buffer.  For example, 
create an array builder, add all the values from the old array, and then build 
an array from that.
   
   However, many of the compute functions (take, filter, sort, cast, etc.) will 
do this (some casts are zero-copy but not all).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on issue #15231: [Benchmarking][C++] Track memory usage in C++ microbenchmarks

2023-01-09 Thread GitBox



westonpace commented on issue #15231:
URL: https://github.com/apache/arrow/issues/15231#issuecomment-1376572429

   This would be great.  Though I do wonder how many memory issues would be 
revealed at the micro-benchmark level vs. at the macro-benchmark level (that 
being said, features like this make google benchmark convenient for 
macro-benchmarks as well)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #15222: MINOR: [Java] Doc & javadoc improvements

2023-01-09 Thread GitBox



ursabot commented on PR #15222:
URL: https://github.com/apache/arrow/pull/15222#issuecomment-1376565366

   Benchmark runs are scheduled for baseline = 
11d286eafb72b630baf897e619c84ecdfc6b723f and contender = 
0121ae73f4852ee3c5a6870ed5583916662a35be. 
0121ae73f4852ee3c5a6870ed5583916662a35be is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:25.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/452345e6de07436e8b72b4bbb2eab4e2...e2790950792344dea8837e8cccbd1c74/)
   [Failed :arrow_down:0.55% :arrow_up:0.0%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/eddda79cf6f14eff87ad495619c1ced5...1a2c8751870940f0ae99d4a02268d742/)
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/eddc366f198743749dd78cfbc7d7d1ae...7fd5af8e17ec442fa199339edcbabd07/)
   [Finished :arrow_down:0.07% :arrow_up:0.0%] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/cf52e7a7a2744c8d8201e5b095e7d558...ebf1d499d58a4016b08dff1a603c6d0d/)
   Buildkite builds:
   [Finished] [`0121ae73` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2137)
   [Finished] [`0121ae73` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2160)
   [Finished] [`0121ae73` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2132)
   [Finished] [`0121ae73` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2152)
   [Finished] [`11d286ea` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2136)
   [Failed] [`11d286ea` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2159)
   [Finished] [`11d286ea` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2131)
   [Finished] [`11d286ea` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2151)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] vibhatha commented on issue #15284: [C++] Use DeclarationToExecBatches in Acero plan tests

2023-01-09 Thread GitBox



vibhatha commented on issue #15284:
URL: https://github.com/apache/arrow/issues/15284#issuecomment-1376559887

   take


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] cgostic commented on issue #15056: [R][R Shiny] Deployed/published Rshiny app using arrow to query an AWS-hosted dataset causes intermittent "stack imbalance", "segfault", "memory not

2023-01-09 Thread GitBox



cgostic commented on issue #15056:
URL: https://github.com/apache/arrow/issues/15056#issuecomment-1376547224

   Thanks so much for checking in!
   
   I do see that error intermittently as well, though I more frequently get the 
one listed in the logs. From my observations, it seems that the sigpipe issue 
occurs in the first use of the app after publishing, but it usually shifts to 
the "stack imbalance" after continued use.
   
   (1) I can definitely check writing the partitioned data as .csvs and get 
back to you! However, we're really hoping to maximize querying efficiency 
offered by feather files.
   
   (2) I have not tried this on a Linux machine. Our RStudio Connect is hosted 
on an EC2, and shinyapps is also on AWS. These are the only two platforms I 
have access to, unfortunately.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] dongjoon-hyun commented on pull request #15267: GH-15265: [Java] Publish SBOM artifacts

2023-01-09 Thread GitBox



dongjoon-hyun commented on PR #15267:
URL: https://github.com/apache/arrow/pull/15267#issuecomment-1376546296

   Thank you, @assignUser . `.json` and `.xml` patterns are added.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] comphead commented on issue #4644: Comparing a `Timestamp` to a `Date32` fails

2023-01-09 Thread GitBox



comphead commented on issue #4644:
URL: 
https://github.com/apache/arrow-datafusion/issues/4644#issuecomment-1376537326

   I found something weird. first logical plan failed silently on cast. second 
is arrow cast acts not consistently on the same inputs. so if I run through 
first test 
   ```
   [datafusion/expr/src/expr_schema.rs:268] _type = Date32
   [datafusion/expr/src/expr_schema.rs:269] _to_type = Timestamp(
   Nanosecond,
   None,
   )
   [datafusion/expr/src/expr_schema.rs:270] can_cast_types(_type, 
cast_to_type) = true
   ```
   
   but if I run second test
   ```
   [datafusion/expr/src/expr_schema.rs:268] _type = Date32
   [datafusion/expr/src/expr_schema.rs:269] _to_type = Timestamp(
   Nanosecond,
   None,
   )
   [datafusion/expr/src/expr_schema.rs:270] can_cast_types(_type, 
cast_to_type) = false
   ```
   
   I'm looking what in arrow-rs can cause such non determined behaviuor
   @liukun4515 @crepererum @alamb 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] boshek commented on issue #15056: [R][R Shiny] Deployed/published Rshiny app using arrow to query an AWS-hosted dataset causes intermittent "stack imbalance", "segfault", "memory not

2023-01-09 Thread GitBox



boshek commented on issue #15056:
URL: https://github.com/apache/arrow/issues/15056#issuecomment-1376532373

   I can reproduce this issue in RStudio Connect. 
   
   This is the error that I see (which is different that what you are 
reporting): 
   
   
   log
   
   ```
   2023/01/09 23:47:44.920714517 [1] 3785
   2023/01/09 23:47:45.194163992 [1] 3785
   2023/01/09 23:47:45.197808737 [1] "pinged"
   2023/01/09 23:48:15.466341119 [1] 3785
   2023/01/09 23:48:45.750669396 [1] 3785
   2023/01/09 23:49:16.046789291 [1] 3785
   2023/01/09 23:49:46.366873116 [1] 3785
   2023/01/09 23:50:16.639291193 [1] 3785
   2023/01/09 23:50:46.902676880 [1] 3785
   2023/01/09 23:51:17.211774091 [1] 3785
   2023/01/09 23:51:20.593104281 Warning: Error in collect: ignoring SIGPIPE 
signal
   2023/01/09 23:51:20.606683408   212: 
   2023/01/09 23:51:20.606694399   211: signalCondition
   2023/01/09 23:51:20.606724639   210: signal_abort
   2023/01/09 23:51:20.606743769   209: abort
   2023/01/09 23:51:20.606782270   208: augment_io_error_msg
   2023/01/09 23:51:20.606783790   207: value[[3L]]
   2023/01/09 23:51:20.606797460   206: tryCatchOne
   2023/01/09 23:51:20.606798670   205: tryCatchList
   2023/01/09 23:51:20.606891601   204: tryCatch
   2023/01/09 23:51:20.606893811   203: collect.arrow_dplyr_query
   2023/01/09 23:51:20.606907001   202: collect
   2023/01/09 23:51:20.606907791   201: %>%
   2023/01/09 23:51:20.606914991   200:  
[/opt/rstudio-connect/mnt/app/app.R#50]
   2023/01/09 23:51:20.606927181   198: .func
   2023/01/09 23:51:20.606945612   195: contextFunc
   2023/01/09 23:51:20.606947502   194: env$runWith
   2023/01/09 23:51:20.606972372   187: ctx$run
   2023/01/09 23:51:20.606973622   186: self$.updateValue
   2023/01/09 23:51:20.606983322   184: plot_data
   2023/01/09 23:51:20.606984412   182: renderPlot 
[/opt/rstudio-connect/mnt/app/app.R#62]
   2023/01/09 23:51:20.606993812   180: func
   2023/01/09 23:51:20.606995122   140: drawPlot
   2023/01/09 23:51:20.607005132   126: 
   2023/01/09 23:51:20.607006152   110: drawReactive
   2023/01/09 23:51:20.60701929297: renderFunc
   2023/01/09 23:51:20.60702268396: output$TS
   2023/01/09 23:51:20.60703214315: 
   2023/01/09 23:51:20.60703329313: fn
   2023/01/09 23:51:20.607046283 8: retry
   2023/01/09 23:51:20.607047413 7: connect$retryingStartServer
   2023/01/09 23:51:20.607056303 6: eval
   2023/01/09 23:51:20.607058203 5: eval
   2023/01/09 23:51:20.607072323 4: eval
   2023/01/09 23:51:20.607073053 3: eval
   2023/01/09 23:51:20.607080343 2: eval.parent
   2023/01/09 23:51:20.607081113 1: local
   ```
   
   
   
   I tried a few troubleshooting steps including:
   - moving the `collect` call around to see where exactly the issue was. the 
only thing that "worked" was to pull all the data in right away which obviously 
isn't helpful
   - tried using parquet and csv. 
   - Using csv seems to resolve this issue though that does not solve the 
actual thing you wanted. 
   
   Two additional questions:
   - does write the partitioned data as csv and using them as csvs work?
   - have you tried this using a linux machine? I don't have access to the 
linux machine where I can deploy this. it is very strange that this works 
locally but not in RStudio Connect. Trying this on linux would be a useful 
point on information.  
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] askoa commented on a diff in pull request #3501: Add a function to get memory size of array slice

2023-01-09 Thread GitBox



askoa commented on code in PR #3501:
URL: https://github.com/apache/arrow-rs/pull/3501#discussion_r1065214401


##
arrow-data/src/data.rs:
##
@@ -463,6 +463,59 @@ impl ArrayData {
 size
 }
 
+pub fn get_slice_memory_size() -> Result {
+let mut result: usize = 0;
+let layout = layout(_type);
+
+for spec in layout.buffers.iter() {
+match spec {
+BufferSpec::FixedWidth { byte_width } => {
+let buffer_size = self
+.len
+.checked_mul(*byte_width)
+.expect("integer overflow computing buffer size");

Review Comment:
   Thanks. Done!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] EpsilonPrime commented on pull request #15283: GH-15282: [C++,CI] Fix continuous integration issue with clang-format v14.

2023-01-09 Thread GitBox



EpsilonPrime commented on PR #15283:
URL: https://github.com/apache/arrow/pull/15283#issuecomment-1376509214

   An ideal solution would add the missing package to the dependencies to 
install but I haven't found that yet.  This change seems like it would work but 
for some reason changing the Travis CI isn't enough to get the CI to run to 
verify this change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #15283: GH-15282: [C++,CI] Fix continuous integration issue with clang-format v14.

2023-01-09 Thread GitBox



github-actions[bot] commented on PR #15283:
URL: https://github.com/apache/arrow/pull/15283#issuecomment-1376507396

   * Closes: #15282


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #15283: GH-15282: [C++,CI] Fix continuous integration issue with clang-format v14.

2023-01-09 Thread GitBox



github-actions[bot] commented on PR #15283:
URL: https://github.com/apache/arrow/pull/15283#issuecomment-1376507420

   :warning: GitHub issue #15282 **has been automatically assigned in GitHub** 
to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] EpsilonPrime opened a new pull request, #15283: GH-15282: [C++,CI] Fix continuous integration issue with clang-format v14.

2023-01-09 Thread GitBox



EpsilonPrime opened a new pull request, #15283:
URL: https://github.com/apache/arrow/pull/15283

   …ready on Docker for version 14.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] felipecrv commented on a diff in pull request #15083: ARROW-18403: [C++] Add support for nullary and n-ary aggregate functions

2023-01-09 Thread GitBox



felipecrv commented on code in PR #15083:
URL: https://github.com/apache/arrow/pull/15083#discussion_r1064818385


##
cpp/src/arrow/compute/exec/aggregate_node.cc:
##
@@ -46,14 +46,26 @@ namespace {
 
 void AggregatesToString(std::stringstream* ss, const Schema& input_schema,
 const std::vector& aggs,
-const std::vector& target_field_ids, int indent = 
0) {
+const std::vector>& target_fieldsets,
+int indent = 0) {
   *ss << "aggregates=[" << std::endl;
   for (size_t i = 0; i < aggs.size(); i++) {
 for (int j = 0; j < indent; ++j) *ss << "  ";
-*ss << '\t' << aggs[i].function << '('
-<< input_schema.field(target_field_ids[i])->name();
+*ss << '\t' << aggs[i].function << '(';
+const auto& target = target_fieldsets[i];
+if (target.size() == 0) {
+  *ss << "*";
+} else {
+  *ss << input_schema.field(target[0])->name();
+  for (size_t k = 1; k < target.size(); k++) {
+*ss << ", " << input_schema.field(target[k])->name();
+  }
+}
 if (aggs[i].options) {
-  *ss << ", " << aggs[i].options->ToString();
+  auto* options_type = aggs[i].options->options_type();
+  if (options_type->num_properties() > 0) {

Review Comment:
   This function is sometimes (maybe at all times) called after NULL 
`Aggregate::options` gets replaced by a pointer to an instance of the default 
options for that aggregate function.
   
   That happens here 
https://github.com/apache/arrow/blob/master/cpp/src/arrow/compute/exec/aggregate_node.cc#L113
   
   So I ended up adding the first function that needs no options at all, but 
providing default options is required. So I created a options type with 0 
properties. That works as a placeholder for the future, when this function 
might require properties.
   
   I could drop all this and make this render the string `count_all(*, {})`.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #15267: GH-15265: [Java] Publish SBOM artifacts

2023-01-09 Thread GitBox



github-actions[bot] commented on PR #15267:
URL: https://github.com/apache/arrow/pull/15267#issuecomment-1376494798

   Revision: de44a621698683f1891e0e1886a23569d2998eb3
   
   Submitted crossbow builds: [ursacomputing/crossbow @ 
actions-588eb7d929](https://github.com/ursacomputing/crossbow/branches/all?query=actions-588eb7d929)
   
   |Task|Status|
   ||--|
   |java-jars|[![Github 
Actions](https://github.com/ursacomputing/crossbow/workflows/Crossbow/badge.svg?branch=actions-588eb7d929-github-java-jars)](https://github.com/ursacomputing/crossbow/actions/runs/3878713201/jobs/6615146612)|


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] assignUser commented on pull request #15267: GH-15265: [Java] Publish SBOM artifacts

2023-01-09 Thread GitBox



assignUser commented on PR #15267:
URL: https://github.com/apache/arrow/pull/15267#issuecomment-1376494116

   To have them uploaded to nightlies.apache.org the nightly java job needs to 
be updated to: 
https://github.com/apache/arrow/blob/master/.github/workflows/java_nightly.yml#L110


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-adbc] zeroshade commented on pull request #322: feat(go/adbc/driver/flightsql): Native Golang ADBC Flight SQL driver

2023-01-09 Thread GitBox



zeroshade commented on PR #322:
URL: https://github.com/apache/arrow-adbc/pull/322#issuecomment-1376493589

   Updated with further implementations, all that is left is to write tests. I 
should have tests written out tomorrow and updated. Then i'll mark this as 
ready for review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] assignUser commented on pull request #15267: GH-15265: [Java] Publish SBOM artifacts

2023-01-09 Thread GitBox



assignUser commented on PR #15267:
URL: https://github.com/apache/arrow/pull/15267#issuecomment-1376492670

   @github-actions crossbow submit java-jars


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] EpsilonPrime commented on pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



EpsilonPrime commented on PR #15201:
URL: https://github.com/apache/arrow/pull/15201#issuecomment-1376486116

   I've updated the arithmetic benchmarks to also have labels.  Thanks for the 
suggestion!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3501: Add a function to get memory size of array slice

2023-01-09 Thread GitBox



viirya commented on code in PR #3501:
URL: https://github.com/apache/arrow-rs/pull/3501#discussion_r1065198385


##
arrow-data/src/data.rs:
##
@@ -463,6 +463,59 @@ impl ArrayData {
 size
 }
 
+pub fn get_slice_memory_size() -> Result {

Review Comment:
   For public api, it is better to add a doc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] viirya commented on a diff in pull request #3501: Add a function to get memory size of array slice

2023-01-09 Thread GitBox



viirya commented on code in PR #3501:
URL: https://github.com/apache/arrow-rs/pull/3501#discussion_r1065197952


##
arrow-data/src/data.rs:
##
@@ -463,6 +463,59 @@ impl ArrayData {
 size
 }
 
+pub fn get_slice_memory_size() -> Result {
+let mut result: usize = 0;
+let layout = layout(_type);
+
+for spec in layout.buffers.iter() {
+match spec {
+BufferSpec::FixedWidth { byte_width } => {
+let buffer_size = self
+.len
+.checked_mul(*byte_width)
+.expect("integer overflow computing buffer size");

Review Comment:
   Instead of panic, it should be better to return `Err`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] felipecrv commented on a diff in pull request #15083: ARROW-18403: [C++] Add support for nullary and n-ary aggregate functions

2023-01-09 Thread GitBox



felipecrv commented on code in PR #15083:
URL: https://github.com/apache/arrow/pull/15083#discussion_r1065195728


##
cpp/src/arrow/compute/exec.cc:
##
@@ -163,6 +163,34 @@ Result ExecBatch::Make(std::vector 
values) {
   continue;
 }
 
+if (length != value.length()) {
+  // all the arrays should have the same length
+  return -1;
+}

Review Comment:
   I added a commit that adds a reusable function for this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] askoa opened a new pull request, #3501: Add a function to get memory size of array slice

2023-01-09 Thread GitBox



askoa opened a new pull request, #3501:
URL: https://github.com/apache/arrow-rs/pull/3501

   # Which issue does this PR close?
   
   Closes #3407 
   
   # Rationale for this change
   See issue description.
   
   # What changes are included in this PR?
   Function to calculate buffer size of array slice.
   
   # Are there any user-facing changes?
   User will get a new API that'll calculate the buffer size of array slice.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] westonpace commented on a diff in pull request #13669: ARROW-16389: [C++][Acero] Add spilling for hash join

2023-01-09 Thread GitBox



westonpace commented on code in PR #13669:
URL: https://github.com/apache/arrow/pull/13669#discussion_r1064813860


##
cpp/src/arrow/compute/exec/accumulation_queue.cc:
##
@@ -39,20 +37,180 @@ void AccumulationQueue::Concatenate(AccumulationQueue&& 
that) {
   this->batches_.reserve(this->batches_.size() + that.batches_.size());
   std::move(that.batches_.begin(), that.batches_.end(),
 std::back_inserter(this->batches_));
-  this->row_count_ += that.row_count_;
   that.Clear();
 }
 
 void AccumulationQueue::InsertBatch(ExecBatch batch) {
-  row_count_ += batch.length;
   batches_.emplace_back(std::move(batch));
 }
 
+void AccumulationQueue::SetBatch(size_t idx, ExecBatch batch) {
+  ARROW_DCHECK(idx < batches_.size());
+  batches_[idx] = std::move(batch);
+}
+
+size_t AccumulationQueue::CalculateRowCount() const {
+  size_t count = 0;
+  for (const ExecBatch& b : batches_) count += static_cast(b.length);
+  return count;
+}
+
 void AccumulationQueue::Clear() {
-  row_count_ = 0;
   batches_.clear();
 }
 
-ExecBatch& AccumulationQueue::operator[](size_t i) { return batches_[i]; }
-}  // namespace util
+Status SpillingAccumulationQueue::Init(QueryContext* ctx) {
+  ctx_ = ctx;
+  partition_locks_.Init(ctx_->max_concurrency(), kNumPartitions);
+  for (size_t ipart = 0; ipart < kNumPartitions; ipart++) {
+task_group_read_[ipart] = ctx_->RegisterTaskGroup(
+[this, ipart](size_t thread_index, int64_t batch_index) {
+  return read_back_fn_[ipart](thread_index, 
static_cast(batch_index),
+  std::move(queues_[ipart][batch_index]));
+},
+[this, ipart](size_t thread_index) { return 
on_finished_[ipart](thread_index); });
+  }
+  return Status::OK();
+}
+
+Status SpillingAccumulationQueue::InsertBatch(size_t thread_index, ExecBatch 
batch) {
+  Datum& hash_datum = batch.values.back();
+  const uint64_t* hashes =
+  reinterpret_cast(hash_datum.array()->buffers[1]->data());
+  // `permutation` stores the indices of rows in the input batch sorted by 
partition.
+  std::vector permutation(batch.length);
+  uint16_t part_starts[kNumPartitions + 1];
+  PartitionSort::Eval(
+  batch.length, kNumPartitions, part_starts,
+  /*partition_id=*/[&](int64_t i) { return partition_id(hashes[i]); },
+  /*output_fn=*/
+  [](int64_t input_pos, int64_t output_pos) {
+permutation[output_pos] = static_cast(input_pos);
+  });
+
+  int unprocessed_partition_ids[kNumPartitions];
+  RETURN_NOT_OK(partition_locks_.ForEachPartition(
+  thread_index, unprocessed_partition_ids,
+  /*is_prtn_empty=*/
+  [&](int part_id) { return part_starts[part_id + 1] == 
part_starts[part_id]; },
+  /*partition=*/

Review Comment:
   ```suggestion
 /*process_prtn_fn=*/
   ```



##
cpp/src/arrow/compute/exec/accumulation_queue.h:
##
@@ -42,16 +45,88 @@ class AccumulationQueue {
 
   void Concatenate(AccumulationQueue&& that);
   void InsertBatch(ExecBatch batch);
-  int64_t row_count() { return row_count_; }
-  size_t batch_count() { return batches_.size(); }
+  void SetBatch(size_t idx, ExecBatch batch);
+  size_t batch_count() const { return batches_.size(); }
   bool empty() const { return batches_.empty(); }
+  size_t CalculateRowCount() const;
+
+  // Resizes the accumulation queue to contain size batches. The
+  // new batches will be empty and have length 0, but they will be
+  // usable (useful for concurrent modification of the AccumulationQueue
+  // of separate elements).
+  void Resize(size_t size) { batches_.resize(size); }
   void Clear();
-  ExecBatch& operator[](size_t i);
+  ExecBatch& operator[](size_t i) { return batches_[i]; };
+  const ExecBatch& operator[](size_t i) const { return batches_[i]; };
 
  private:
-  int64_t row_count_;
   std::vector batches_;
 };
 
-}  // namespace util
+class SpillingAccumulationQueue {

Review Comment:
   ```suggestion
   /// Accumulates batches in a queue that can be spilled to disk if needed
   ///
   /// Each batch is partitioned by the lower bits of the hash column (which 
must be present) and
   /// rows are initially accumulated in batch builders (one per partition).  
As a batch builder fills
   /// up the completed batch is put into an in-memory accumulation queue (per 
partition).
   ///
   /// When memory pressure is encountered the spilling queue's "spill cursor" 
can be advanced.  This
   /// will cause a partition to be spilled to disk.  Any future data arriving 
for that partition will go immediately
   /// to disk (after accumulating a full batch in the batch builder).
   ///
   /// Later, data is retrieved one partition at a time.  Partitions that are 
in-memory will be delivered immediately
   /// in new thread tasks.  Partitions that are on disk will be read from disk 
and delivered as they arrive.
   class SpillingAccumulationQueue {
   ```
   
   Other things that might be nice to mention:
   
* This is a "write

[GitHub] [arrow] wjones127 commented on a diff in pull request #15201: GH-15200: [C++] Created benchmarks for round kernels.

2023-01-09 Thread GitBox



wjones127 commented on code in PR #15201:
URL: https://github.com/apache/arrow/pull/15201#discussion_r1065174034


##
cpp/src/arrow/compute/kernels/scalar_round_benchmark.cc:
##
@@ -0,0 +1,123 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "benchmark/benchmark.h"
+
+#include 
+
+#include "arrow/compute/api_scalar.h"
+#include "arrow/compute/kernels/test_util.h"
+#include "arrow/testing/gtest_util.h"
+#include "arrow/testing/random.h"
+#include "arrow/util/benchmark_util.h"
+
+namespace arrow {
+namespace compute {
+
+// Use a fixed hash to ensure consistent results from run to run.
+constexpr auto kSeed = 0x94378165;
+
+template 
+static void RoundArrayBenchmark(benchmark::State& state, const std::string& 
func_name) {
+  RegressionArgs args(state);
+
+  const int64_t array_size = args.size / sizeof(CType);
+  auto rand = random::RandomArrayGenerator(kSeed);
+
+  // Choose values so as to avoid overflow on all ops and types.
+  auto min = static_cast(6);
+  auto max = static_cast(min + 15);
+  auto val = std::static_pointer_cast>(
+  rand.Numeric(array_size, min, max, args.null_proportion));
+  RoundOptions options;
+  options.round_mode = static_cast(Mode);
+
+  for (auto _ : state) {
+ABORT_NOT_OK(CallFunction(func_name, {val}, ));
+  }
+  state.SetItemsProcessed(state.iterations() * array_size);
+}
+
+void SetRoundArgs(benchmark::internal::Benchmark* bench) {
+  for (const auto inverse_null_proportion : std::vector({100, 0})) {

Review Comment:
   Would be nice to label these args:
   
   ```suggestion
 bench->ArgNames({"size", "inverse_null_proportion"})
 
 for (const auto inverse_null_proportion : std::vector({100, 0})) 
{
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] wjones127 commented on issue #15280: [C++] Move Acero out of libarrow

2023-01-09 Thread GitBox



wjones127 commented on issue #15280:
URL: https://github.com/apache/arrow/issues/15280#issuecomment-1376425281

   `libarrow_dataset` depends on Acero, and is tightly integrated with it. 
Should it be moved into `libacero`? or renamed `libacero_dataset`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] wjones127 commented on a diff in pull request #15244: GH-15239: [C++][Parquet] Parquet writer writes decimal as int32/64

2023-01-09 Thread GitBox



wjones127 commented on code in PR #15244:
URL: https://github.com/apache/arrow/pull/15244#discussion_r1065167634


##
cpp/src/parquet/properties.h:
##
@@ -199,7 +199,8 @@ class PARQUET_EXPORT WriterProperties {
   pagesize_(kDefaultDataPageSize),
   version_(ParquetVersion::PARQUET_2_4),
   data_page_version_(ParquetDataPageVersion::V1),
-  created_by_(DEFAULT_CREATED_BY) {}
+  created_by_(DEFAULT_CREATED_BY),
+  integer_annotate_decimal_(false) {}

Review Comment:
   I understand the desire for stability, but would like to have a pathway for 
improved defaults to be released. It's unfortunate we always do a major release 
while still trying to respect stability, as there's no clear good time to make 
breaking changes like this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] wjones127 commented on a diff in pull request #15244: GH-15239: [C++][Parquet] Parquet writer writes decimal as int32/64

2023-01-09 Thread GitBox



wjones127 commented on code in PR #15244:
URL: https://github.com/apache/arrow/pull/15244#discussion_r1065082767


##
cpp/src/parquet/arrow/arrow_reader_writer_test.cc:
##
@@ -4727,5 +4726,74 @@ std::vector 
GenerateMapFilteredTestCases() {
 INSTANTIATE_TEST_SUITE_P(MapFilteredReads, TestNestedSchemaFilteredReader,
  ::testing::ValuesIn(GenerateMapFilteredTestCases()));
 
+template 
+class TestIntegerAnnotateDecimalTypeParquetIO : public TestParquetIO 
{
+ public:
+  template 
+  void WriteColumn(const std::shared_ptr& values) {
+auto arrow_schema = ::arrow::schema({::arrow::field("a", values->type())});
+
+parquet::WriterProperties::Builder builder;
+// Enforce integer type to annotate decimal type
+auto writer_properties = 
builder.enable_integer_annotate_decimal()->build();
+std::shared_ptr parquet_schema;
+ASSERT_OK_NO_THROW(ToParquetSchema(arrow_schema.get(), *writer_properties,
+   *default_arrow_writer_properties(),
+   _schema));
+
+this->sink_ = CreateOutputStream();
+auto schema_node = 
std::static_pointer_cast(parquet_schema->schema_root());
+
+std::unique_ptr writer;
+ASSERT_OK_NO_THROW(FileWriter::Make(
+::arrow::default_memory_pool(),
+ParquetFileWriter::Open(this->sink_, schema_node, writer_properties),
+arrow_schema, default_arrow_writer_properties(), ));
+ASSERT_OK_NO_THROW(writer->NewRowGroup(values->length()));
+ASSERT_OK_NO_THROW(writer->WriteColumnChunk(*values));
+ASSERT_OK_NO_THROW(writer->Close());
+  }
+
+  void ReadAndCheckSingleDecimalColumnFile(const Array& values) {
+std::shared_ptr out;
+std::unique_ptr reader;
+this->ReaderFromSink();
+this->ReadSingleColumnFile(std::move(reader), );
+
+// Reader always read values as DECIMAL128 type
+ASSERT_EQ(out->type()->id(), ::arrow::Type::DECIMAL128);
+
+if (values.type()->id() == ::arrow::Type::DECIMAL128) {
+  AssertArraysEqual(values, *out);
+} else {
+  auto& expected_values = dynamic_cast(values);
+  auto& read_values = dynamic_cast(*out);
+  ASSERT_EQ(expected_values.null_count(), read_values.null_count());
+  ASSERT_EQ(0, read_values.null_count());
+  ASSERT_EQ(expected_values.length(), read_values.length());
+  for (int64_t i = 0; i < expected_values.length(); ++i) {
+ASSERT_EQ(::arrow::Decimal256(expected_values.Value(i)).ToString(0),
+  ::arrow::Decimal128(read_values.Value(i)).ToString(0));
+  }
+}
+  }
+};
+
+typedef ::testing::Types<
+DecimalWithPrecisionAndScale<1>, DecimalWithPrecisionAndScale<5>,
+DecimalWithPrecisionAndScale<10>, DecimalWithPrecisionAndScale<18>,
+Decimal256WithPrecisionAndScale<1>, Decimal256WithPrecisionAndScale<5>,
+Decimal256WithPrecisionAndScale<10>, Decimal256WithPrecisionAndScale<18>>
+DecimalTestTypes;
+
+TYPED_TEST_SUITE(TestIntegerAnnotateDecimalTypeParquetIO, DecimalTestTypes);
+
+TYPED_TEST(TestIntegerAnnotateDecimalTypeParquetIO, SingleDecimalColumn) {

Review Comment:
   Could we also have a test with null values?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4836: Remove tests from sql_integration that were ported to sqllogictest

2023-01-09 Thread GitBox



ursabot commented on PR #4836:
URL: 
https://github.com/apache/arrow-datafusion/pull/4836#issuecomment-1376413190

   Benchmark runs are scheduled for baseline = 
c4f4dffec79d3ecc7e177a3d07fcf60312d300db and contender = 
42f7dd50913c4f6b6d830b88d55dfd4a8d16d44f. 
42f7dd50913c4f6b6d830b88d55dfd4a8d16d44f is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/9009bae4a21a47829264f427c6230153...faa29f51a2314df2957bc566903cb458/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/b947a659471042a78e8c8d92cee5024b...a3b2097585e04c51bc17dfaba323e635/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/c4f2e2bbe9654e26a93359c9ad85e6e4...76c4557b831b456184c5e76d92fb0cd7/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/6ee2e6779d214ca0bd019da00a596d47...7b369e15214b45baa689fef51fe6e0d3/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4849: Fix push_down_projection through a distinct

2023-01-09 Thread GitBox



ursabot commented on PR #4849:
URL: 
https://github.com/apache/arrow-datafusion/pull/4849#issuecomment-1376413168

   Benchmark runs are scheduled for baseline = 
ceff6cb44ad621f89c9c4e9c0bd34cb204246910 and contender = 
c4f4dffec79d3ecc7e177a3d07fcf60312d300db. 
c4f4dffec79d3ecc7e177a3d07fcf60312d300db is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/c249b4ad51a6465abd46020854cd2e34...9009bae4a21a47829264f427c6230153/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/9ba12a8737fb488698d66ee4f4d171e2...b947a659471042a78e8c8d92cee5024b/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/e8e8d1f4c4604eae8057adb1be8f20fa...c4f2e2bbe9654e26a93359c9ad85e6e4/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/5d38382ce34649a9bf35c8dd1f16033d...6ee2e6779d214ca0bd019da00a596d47/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #15255: GH-15254: [GLib] garrow_execute_plain_wait() checks the finished status

2023-01-09 Thread GitBox



ursabot commented on PR #15255:
URL: https://github.com/apache/arrow/pull/15255#issuecomment-1376403051

   ['Python', 'R'] benchmarks have high level of regressions.
   
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/b54f1f820ca14747952ca20d32cf645d...eddc366f198743749dd78cfbc7d7d1ae/)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] ursabot commented on pull request #4839: Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting`

2023-01-09 Thread GitBox



ursabot commented on PR #4839:
URL: 
https://github.com/apache/arrow-datafusion/pull/4839#issuecomment-1376402673

   Benchmark runs are scheduled for baseline = 
c5e2594e99b01c12d4f6903cb998a62a5479455c and contender = 
ceff6cb44ad621f89c9c4e9c0bd34cb204246910. 
ceff6cb44ad621f89c9c4e9c0bd34cb204246910 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/ff07244654914621b553179cf4dc6e08...c249b4ad51a6465abd46020854cd2e34/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/5008572ef5df4997b4eac505ba058bc5...9ba12a8737fb488698d66ee4f4d171e2/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/404617774739495385493fef750351b6...e8e8d1f4c4604eae8057adb1be8f20fa/)
   [Skipped :warning: Benchmarking of arrow-datafusion-commits is not supported 
on ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/2a988505448f44f0bcd851583feafa20...5d38382ce34649a9bf35c8dd1f16033d/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] ursabot commented on pull request #15255: GH-15254: [GLib] garrow_execute_plain_wait() checks the finished status

2023-01-09 Thread GitBox



ursabot commented on PR #15255:
URL: https://github.com/apache/arrow/pull/15255#issuecomment-1376402654

   Benchmark runs are scheduled for baseline = 
4d31b1ef70be330356ed9119f63931f8fd90e6d1 and contender = 
11d286eafb72b630baf897e619c84ecdfc6b723f. 
11d286eafb72b630baf897e619c84ecdfc6b723f is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/4f31d7340d4c47ab9fdb35a75dad1063...452345e6de07436e8b72b4bbb2eab4e2/)
   [Failed :arrow_down:0.23% :arrow_up:0.1%] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/00ad91f5d080474c88331d3823083a33...eddda79cf6f14eff87ad495619c1ced5/)
   [Finished :arrow_down:0.26% :arrow_up:0.0%] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/b54f1f820ca14747952ca20d32cf645d...eddc366f198743749dd78cfbc7d7d1ae/)
   [Finished :arrow_down:0.24% :arrow_up:0.03%] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/ca66e67dd0a844578a789d0cebbb0e44...cf52e7a7a2744c8d8201e5b095e7d558/)
   Buildkite builds:
   [Finished] [`11d286ea` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2136)
   [Failed] [`11d286ea` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2159)
   [Finished] [`11d286ea` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2131)
   [Finished] [`11d286ea` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2151)
   [Finished] [`4d31b1ef` 
ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/2135)
   [Failed] [`4d31b1ef` 
test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/2158)
   [Finished] [`4d31b1ef` 
ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/2130)
   [Finished] [`4d31b1ef` 
ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/2150)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4836: Remove tests from sql_integration that were ported to sqllogictest

2023-01-09 Thread GitBox



alamb merged PR #4836:
URL: https://github.com/apache/arrow-datafusion/pull/4836


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #4498: Remove tests from `sql_integration` that were ported to `sqllogictest`

2023-01-09 Thread GitBox



alamb closed issue #4498: Remove tests from `sql_integration` that were ported 
to `sqllogictest`
URL: https://github.com/apache/arrow-datafusion/issues/4498


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb closed issue #4667: `SELECT ... FROM (tbl1 UNION tbl2)` wrongly works like `SELECT DISTINCT ... FROM (tbl1 UNION tbl2)`

2023-01-09 Thread GitBox



alamb closed issue #4667: `SELECT ... FROM (tbl1 UNION tbl2)` wrongly works 
like `SELECT DISTINCT ... FROM (tbl1 UNION tbl2)`
URL: https://github.com/apache/arrow-datafusion/issues/4667


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4849: Fix push_down_projection through a distinct

2023-01-09 Thread GitBox



alamb merged PR #4849:
URL: https://github.com/apache/arrow-datafusion/pull/4849


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4862: Rewrite coerce_plan_expr_for_schema to fix union type coercion

2023-01-09 Thread GitBox



alamb commented on PR #4862:
URL: 
https://github.com/apache/arrow-datafusion/pull/4862#issuecomment-1376400480

   Thanks @ygf11  -- I hope to review this PR more carefully tomorrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4839: Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting`

2023-01-09 Thread GitBox



alamb commented on PR #4839:
URL: 
https://github.com/apache/arrow-datafusion/pull/4839#issuecomment-1376398444

   Thanks everyone


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb merged pull request #4839: Orthogonalize distribution and sort enforcement rules into `EnforceDistribution` and `EnforceSorting`

2023-01-09 Thread GitBox



alamb merged PR #4839:
URL: https://github.com/apache/arrow-datafusion/pull/4839


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4856: Remove unused ExecutionPlan::relies_input_order (has been replaced with `required_input_ordering`)

2023-01-09 Thread GitBox



alamb commented on PR #4856:
URL: 
https://github.com/apache/arrow-datafusion/pull/4856#issuecomment-1376397449

   > LGTM. I remember originally there is logic insider the Repartition rule to 
check the relies_on_input_order(), it seems this logic was removed recently.
   
   Yes I believe it was removed in 
https://github.com/apache/arrow-datafusion/pull/4714/files#diff-4f2e98e6466bc2e0f66c0fdb844f46ef25c6b5b559f9c51610dd9273da2b2ec1L859
 from @yahoNanJing 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] lafiona commented on a diff in pull request #14533: ARROW-17777: [Dev] Update the pull request merge script to work with master or main

2023-01-09 Thread GitBox



lafiona commented on code in PR #14533:
URL: https://github.com/apache/arrow/pull/14533#discussion_r1065152006


##
dev/merge_arrow_pr.py:
##
@@ -110,10 +111,32 @@ def strip_ci_directives(commit_message):
 return _REGEX_CI_DIRECTIVE.sub('', commit_message)
 
 
+def git_default_branch_name():
+default_branch_name = os.getenv("MERGE_SCRIPT_DEFAULT_BRANCH_NAME")
+
+if default_branch_name is None:
+try:
+default_reference = run_cmd(
+"git rev-parse --abbrev-ref origin/HEAD")
+default_branch_name = default_reference.lstrip("origin/")

Review Comment:
   Thank you for your suggestion @kou. There is a newline character that I've 
added an `rstrip()` call to remove. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] matthewwillian commented on issue #4850: Support `select .. from 'data.parquet'` files in SQL from any `SessionContext` (optionally)

2023-01-09 Thread GitBox



matthewwillian commented on issue #4850:
URL: 
https://github.com/apache/arrow-datafusion/issues/4850#issuecomment-1376393999

   I'm happy to take on this issue if no one else is already working on it!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-datafusion] alamb commented on pull request #4842: Simplify the break rule of logical optimizer passes

2023-01-09 Thread GitBox



alamb commented on PR #4842:
URL: 
https://github.com/apache/arrow-datafusion/pull/4842#issuecomment-1376393249

   > Why don't we return None if the rule can't optimize the plan is this 
scenario?
   
   I don't think there is a theoretical reason we don't do this (and it would 
probably be more efficient) but I don't think it is how the code works today. I 
would definitely be supportive of updating the passes so they had this 
property, however  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] save-buffer commented on a diff in pull request #15253: ARROW-17381: [C++] Cleanup error handling

2023-01-09 Thread GitBox



save-buffer commented on code in PR #15253:
URL: https://github.com/apache/arrow/pull/15253#discussion_r1065140241


##
cpp/src/arrow/compute/exec/map_node.cc:
##
@@ -36,22 +36,16 @@ namespace compute {
 MapNode::MapNode(ExecPlan* plan, std::vector inputs,

Review Comment:
   Would it make sense to while we're at it remove MapNode? 



##
cpp/src/arrow/compute/exec/query_context.cc:
##
@@ -60,26 +60,24 @@ Result> QueryContext::BeginExternalTask() {
   return Future<>{};
 }
 
-Status QueryContext::ScheduleTask(std::function fn) {
+void QueryContext::ScheduleTask(std::function fn) {
   ::arrow::internal::Executor* exec = executor();
   // Adds a task which submits fn to the executor and tracks its progress.  If 
we're
   // already stopping then the task is ignored and fn is not executed.
   async_scheduler_->AddSimpleTask([exec, fn]() { return 
exec->Submit(std::move(fn)); });
-  return Status::OK();
 }
 
-Status QueryContext::ScheduleTask(std::function fn) {
+void QueryContext::ScheduleTask(std::function fn) {
   std::function indexed_fn = [this, fn]() {
 size_t thread_index = GetThreadIndex();
 return fn(thread_index);
   };
   return ScheduleTask(std::move(indexed_fn));

Review Comment:
   Can probably remove `return` from here



##
cpp/src/arrow/compute/exec/exec_plan.h:
##
@@ -121,18 +119,20 @@ class ARROW_EXPORT ExecNode {
 
   virtual const char* kind_name() const = 0;
 
-  // The number of inputs/outputs expected by this node
+  // The number of inputs expected by this node
   int num_inputs() const { return static_cast(inputs_.size()); }
-  int num_outputs() const { return num_outputs_; }
 
   /// This node's predecessors in the exec plan
   const NodeVector& inputs() const { return inputs_; }
 
+  /// True if the plan has no output schema (is a sink)
+  bool is_sink() const { return !output_schema_; }

Review Comment:
   Why does  a sink not have an output schema? Sinks still output stuff 
batches, I think it would be more correct to define a node to be a sink if its 
`output_` is null. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] ursabot commented on pull request #3494: Fix: Added support to cast string without time

2023-01-09 Thread GitBox



ursabot commented on PR #3494:
URL: https://github.com/apache/arrow-rs/pull/3494#issuecomment-1376388781

   Benchmark runs are scheduled for baseline = 
592d7a3601b1b7876ab5753abde66113f1a9dc23 and contender = 
fb36dd980b398deabe5547af114982326b97e078. 
fb36dd980b398deabe5547af114982326b97e078 is a master commit associated with 
this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ec2-t3-xlarge-us-east-2] 
[ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/b0233fb6237e4d088115a00737ea4e25...98fb0125f2fa4a90ac771613696efdd1/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
test-mac-arm] 
[test-mac-arm](https://conbench.ursa.dev/compare/runs/fc661025284e43cf8709646fd0dc4968...a5081adbdf8c49ce8b208de98f7890c2/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-i9-9960x] 
[ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/2dea8f6d176b44b5aa4c47d9e33c4147...5c2a7e0ad00d4a909c0763a85c65fc69/)
   [Skipped :warning: Benchmarking of arrow-rs-commits is not supported on 
ursa-thinkcentre-m75q] 
[ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/f4255d24d18c4d409af58ec9f97663cc...9a9d04e866824bcd84d91688ccbb8a50/)
   Buildkite builds:
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only 
benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb commented on a diff in pull request #3494: Fix: Added support to cast string without time

2023-01-09 Thread GitBox



alamb commented on code in PR #3494:
URL: https://github.com/apache/arrow-rs/pull/3494#discussion_r1065144942


##
arrow-cast/src/parse.rs:
##
@@ -121,6 +122,14 @@ pub fn string_to_timestamp_nanos(s: ) -> Result {
 return Ok(ts.timestamp_nanos());
 }
 
+// without a timezone specifier as a local time, only date
+// Example: 2020-09-08
+if let Ok(dt) = NaiveDate::parse_from_str(s, "%Y-%m-%d") {
+if let Some(ts) = dt.and_hms_opt(0, 0, 0) {

Review Comment:
    



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb closed issue #3492: Support casting strings like `'2001-01-01'` to timestamp

2023-01-09 Thread GitBox



alamb closed issue #3492: Support casting strings like `'2001-01-01'` to 
timestamp
URL: https://github.com/apache/arrow-rs/issues/3492


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb merged pull request #3494: Fix: Added support to cast string without time

2023-01-09 Thread GitBox



alamb merged PR #3494:
URL: https://github.com/apache/arrow-rs/pull/3494


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb commented on issue #3407: ArrayData`get_slice_memory_size` or similar

2023-01-09 Thread GitBox



alamb commented on issue #3407:
URL: https://github.com/apache/arrow-rs/issues/3407#issuecomment-1376383261

   Thank you @askoa 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow-rs] alamb commented on pull request #3452: Remove multiversion dependency

2023-01-09 Thread GitBox



alamb commented on PR #3452:
URL: https://github.com/apache/arrow-rs/pull/3452#issuecomment-1376374773

   > It's OS specific (dlopen etc) and precludes users from making statically 
linked executables
   
   I guess what I had in mind was literally create builds of the influxdb_iox 
binary for each architecture (well really probably "AVX512" and "SSE2", and 
maybe a third one in the middle).
   
   As you note Rust doesn't have a stable ABI (and therefore will recompile 
almost everything from source each time anyways)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] fatemehp commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2023-01-09 Thread GitBox



fatemehp commented on code in PR #14603:
URL: https://github.com/apache/arrow/pull/14603#discussion_r1065132773


##
cpp/src/parquet/statistics.h:
##
@@ -216,6 +216,14 @@ class PARQUET_EXPORT Statistics {
   bool has_distinct_count,
   ::arrow::MemoryPool* pool = ::arrow::default_memory_pool());
 
+  // Helper function to convert EncodedStatistics to Statistics.
+  // EncodedStatistics does not contain number of non-null values, and it can 
be
+  // passed using the num_values parameter.
+  static std::shared_ptr Make(
+  const ColumnDescriptor* descr, const EncodedStatistics* 
encoded_statistics,
+  int64_t num_values = -1,

Review Comment:
   We decided to add a convenient way to convert an EncodedStatistics to 
Statistics. However, the encoded statistics misses the number of non-null 
values that is required for making Statistics. So, I made an optional argument 
for it.
   
   Please see here: 
https://github.com/apache/arrow/pull/14603#discussion_r1048484566



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] fatemehp commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2023-01-09 Thread GitBox



fatemehp commented on code in PR #14603:
URL: https://github.com/apache/arrow/pull/14603#discussion_r1065131471


##
cpp/src/parquet/file_deserialize_test.cc:
##
@@ -177,6 +201,217 @@ TEST_F(TestPageSerde, DataPageV1) {
   ASSERT_NO_FATAL_FAILURE(CheckDataPageHeader(data_page_header_, 
current_page.get()));
 }
 
+// Templated test class to test page filtering for both format::DataPageHeader
+// and format::DataPageHeaderV2.
+template 
+class PageFilterTest : public TestPageSerde {
+ public:
+  const int kNumPages = 10;
+  void WriteStream();
+
+ protected:
+  std::vector data_page_headers_;
+  int total_rows_ = 0;
+};
+
+template <>
+void PageFilterTest::WriteStream() {
+  for (int i = 0; i < kNumPages; ++i) {
+// Vary the number of rows to produce different headers.
+int32_t num_rows = i + 100;
+total_rows_ += num_rows;
+int data_size = i + 1024;
+this->data_page_header_.__set_num_values(num_rows);
+this->data_page_header_.statistics.__set_min_value("A");
+this->data_page_header_.statistics.__set_max_value("Z");
+this->data_page_header_.statistics.__set_null_count(0);
+this->data_page_header_.__isset.statistics = true;
+ASSERT_NO_FATAL_FAILURE(
+this->WriteDataPageHeader(/*max_serialized_len=*/1024, data_size, 
data_size));
+data_page_headers_.push_back(this->data_page_header_);
+// Also write data, to make sure we skip the data correctly.
+std::vector faux_data(data_size);
+ASSERT_OK(this->out_stream_->Write(faux_data.data(), data_size));
+  }
+  this->EndStream();
+}
+
+template <>
+void PageFilterTest::WriteStream() {
+  for (int i = 0; i < kNumPages; ++i) {
+// Vary the number of rows to produce different headers.
+int32_t num_rows = i + 100;
+total_rows_ += num_rows;
+int data_size = i + 1024;
+this->data_page_header_v2_.__set_num_values(num_rows);
+this->data_page_header_v2_.__set_num_rows(num_rows);
+this->data_page_header_v2_.statistics.__set_min_value("A");
+this->data_page_header_v2_.statistics.__set_max_value("Z");
+this->data_page_header_v2_.statistics.__set_null_count(0);
+this->data_page_header_v2_.__isset.statistics = true;
+ASSERT_NO_FATAL_FAILURE(
+this->WriteDataPageHeaderV2(/*max_serialized_len=*/1024, data_size, 
data_size));
+data_page_headers_.push_back(this->data_page_header_v2_);
+// Also write data, to make sure we skip the data correctly.
+std::vector faux_data(data_size);
+ASSERT_OK(this->out_stream_->Write(faux_data.data(), data_size));
+  }
+  this->EndStream();
+}
+
+using DataPageHeaderTypes =
+::testing::Types;
+TYPED_TEST_SUITE(PageFilterTest, DataPageHeaderTypes);
+
+// Creates a number of pages and skips some of them with the page filter 
callback.
+TYPED_TEST(PageFilterTest, TestPageFilterCallback) {
+  this->WriteStream();
+
+  {  // Read all pages.
+auto stream = 
std::make_shared<::arrow::io::BufferReader>(this->out_buffer_);
+this->page_reader_ =
+PageReader::Open(stream, this->total_rows_, Compression::UNCOMPRESSED);
+
+// This callback will always return false.
+auto read_all_pages = [](const DataPageStats& stats) -> bool { return 
false; };
+
+this->page_reader_->set_data_page_filter(read_all_pages);
+for (int i = 0; i < this->kNumPages; ++i) {
+  std::shared_ptr current_page = this->page_reader_->NextPage();
+  ASSERT_NE(current_page, nullptr);
+  ASSERT_NO_FATAL_FAILURE(
+  CheckDataPageHeader(this->data_page_headers_[i], 
current_page.get()));
+}
+ASSERT_EQ(this->page_reader_->NextPage(), nullptr);
+  }
+  {  // Skip all pages.
+auto stream = 
std::make_shared<::arrow::io::BufferReader>(this->out_buffer_);
+this->page_reader_ =
+PageReader::Open(stream, this->total_rows_, Compression::UNCOMPRESSED);
+
+auto skip_all_pages = [](const DataPageStats& stats) -> bool { return 
true; };
+
+this->page_reader_->set_data_page_filter(skip_all_pages);
+std::shared_ptr current_page = this->page_reader_->NextPage();
+ASSERT_EQ(this->page_reader_->NextPage(), nullptr);
+  }
+
+  {  // Skip every other page.
+auto stream = 
std::make_shared<::arrow::io::BufferReader>(this->out_buffer_);
+this->page_reader_ =
+PageReader::Open(stream, this->total_rows_, Compression::UNCOMPRESSED);
+
+// Skip pages with even number of values.
+auto skip_even_pages = [](const DataPageStats& stats) -> bool {
+  if (stats.num_values % 2 == 0) return true;
+  return false;
+};
+
+this->page_reader_->set_data_page_filter(skip_even_pages);
+
+for (int i = 0; i < this->kNumPages; ++i) {
+  // Only pages with odd number of values are read.
+  if (i % 2 != 0) {
+std::shared_ptr current_page = this->page_reader_->NextPage();
+ASSERT_NE(current_page, nullptr);
+ASSERT_NO_FATAL_FAILURE(
+CheckDataPageHeader(this->data_page_headers_[i], 
current_page.get()));
+  }
+}
+

[GitHub] [arrow] fatemehp commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2023-01-09 Thread GitBox



fatemehp commented on code in PR #14603:
URL: https://github.com/apache/arrow/pull/14603#discussion_r1065130195


##
cpp/src/parquet/file_deserialize_test.cc:
##
@@ -177,6 +201,305 @@ TEST_F(TestPageSerde, DataPageV1) {
   ASSERT_NO_FATAL_FAILURE(CheckDataPageHeader(data_page_header_, 
current_page.get()));
 }
 
+// Templated test class to test page filtering for both format::DataPageHeader
+// and format::DataPageHeaderV2.
+template 
+class PageFilterTest : public TestPageSerde {
+ public:
+  const int kNumPages = 10;
+  void WriteStream();
+  void WritePageWithoutStats();
+  void CheckNumRows(std::optional num_rows, const T& header);
+
+ protected:
+  std::vector data_page_headers_;
+  int total_rows_ = 0;
+};
+
+template <>
+void PageFilterTest::WriteStream() {
+  for (int i = 0; i < kNumPages; ++i) {
+// Vary the number of rows to produce different headers.
+int32_t num_rows = i + 100;
+total_rows_ += num_rows;
+int data_size = i + 1024;
+this->data_page_header_.__set_num_values(num_rows);
+this->data_page_header_.statistics.__set_min_value("A" + 
std::to_string(i));
+this->data_page_header_.statistics.__set_max_value("Z" + 
std::to_string(i));
+this->data_page_header_.statistics.__set_null_count(0);
+this->data_page_header_.statistics.__set_distinct_count(num_rows);
+this->data_page_header_.__isset.statistics = true;
+ASSERT_NO_FATAL_FAILURE(
+this->WriteDataPageHeader(/*max_serialized_len=*/1024, data_size, 
data_size));
+data_page_headers_.push_back(this->data_page_header_);
+// Also write data, to make sure we skip the data correctly.
+std::vector faux_data(data_size);
+ASSERT_OK(this->out_stream_->Write(faux_data.data(), data_size));
+  }
+  this->EndStream();
+}
+
+template <>
+void PageFilterTest::WriteStream() {
+  for (int i = 0; i < kNumPages; ++i) {
+// Vary the number of rows to produce different headers.
+int32_t num_rows = i + 100;
+total_rows_ += num_rows;
+int data_size = i + 1024;
+this->data_page_header_v2_.__set_num_values(num_rows);
+this->data_page_header_v2_.__set_num_rows(num_rows);
+this->data_page_header_v2_.statistics.__set_min_value("A" + 
std::to_string(i));
+this->data_page_header_v2_.statistics.__set_max_value("Z" + 
std::to_string(i));
+this->data_page_header_v2_.statistics.__set_null_count(0);
+this->data_page_header_v2_.statistics.__set_distinct_count(num_rows);
+this->data_page_header_v2_.__isset.statistics = true;
+ASSERT_NO_FATAL_FAILURE(
+this->WriteDataPageHeaderV2(/*max_serialized_len=*/1024, data_size, 
data_size));
+data_page_headers_.push_back(this->data_page_header_v2_);
+// Also write data, to make sure we skip the data correctly.
+std::vector faux_data(data_size);
+ASSERT_OK(this->out_stream_->Write(faux_data.data(), data_size));
+  }
+  this->EndStream();
+}
+
+template <>
+void PageFilterTest::WritePageWithoutStats() {
+  int32_t num_rows = 100;
+  total_rows_ += num_rows;
+  int data_size = 1024;
+  this->data_page_header_.__set_num_values(num_rows);
+  ASSERT_NO_FATAL_FAILURE(
+  this->WriteDataPageHeader(/*max_serialized_len=*/1024, data_size, 
data_size));
+  data_page_headers_.push_back(this->data_page_header_);
+  std::vector faux_data(data_size);
+  ASSERT_OK(this->out_stream_->Write(faux_data.data(), data_size));
+  this->EndStream();
+}
+
+template <>
+void PageFilterTest::WritePageWithoutStats() {
+  int32_t num_rows = 100;
+  total_rows_ += num_rows;
+  int data_size = 1024;
+  this->data_page_header_v2_.__set_num_values(num_rows);
+  this->data_page_header_v2_.__set_num_rows(num_rows);
+  ASSERT_NO_FATAL_FAILURE(
+  this->WriteDataPageHeaderV2(/*max_serialized_len=*/1024, data_size, 
data_size));
+  data_page_headers_.push_back(this->data_page_header_v2_);
+  std::vector faux_data(data_size);
+  ASSERT_OK(this->out_stream_->Write(faux_data.data(), data_size));
+  this->EndStream();
+}
+
+template <>
+void PageFilterTest::CheckNumRows(
+std::optional num_rows, const format::DataPageHeader& header) {
+  ASSERT_EQ(num_rows, std::nullopt);
+}
+
+template <>
+void PageFilterTest::CheckNumRows(
+std::optional num_rows, const format::DataPageHeaderV2& header) {
+  ASSERT_EQ(*num_rows, header.num_rows);
+}
+
+using DataPageHeaderTypes =
+::testing::Types;
+TYPED_TEST_SUITE(PageFilterTest, DataPageHeaderTypes);
+
+// Test that the returned encoded_statistics is nullptr when there are no
+// statistics in the page header.
+TYPED_TEST(PageFilterTest, TestPageWithoutStatistics) {
+  this->WritePageWithoutStats();
+
+  auto stream = std::make_shared<::arrow::io::BufferReader>(this->out_buffer_);
+  this->page_reader_ =
+  PageReader::Open(stream, this->total_rows_, Compression::UNCOMPRESSED);
+
+  int num_pages = 0;
+  bool is_stats_null = false;
+  auto read_all_pages = [&](const DataPageStats& stats) -> bool {
+is_stats_null = stats.encoded_statistics == nullptr;
+

[GitHub] [arrow] fatemehp commented on a diff in pull request #14603: PARQUET-2210: [C++][Parquet] Skip pages based on header metadata using a callback

2023-01-09 Thread GitBox



fatemehp commented on code in PR #14603:
URL: https://github.com/apache/arrow/pull/14603#discussion_r1065129943


##
cpp/src/parquet/column_reader.h:
##
@@ -55,6 +56,27 @@ static constexpr uint32_t kDefaultMaxPageHeaderSize = 16 * 
1024 * 1024;
 // 16 KB is the default expected page header size
 static constexpr uint32_t kDefaultPageHeaderSize = 16 * 1024;
 
+// \brief DataPageStats stores encoded statistics and number of values/rows for
+// a page.
+struct PARQUET_EXPORT DataPageStats {
+  DataPageStats(const EncodedStatistics* encoded_statistics, int32_t 
num_values,
+std::optional num_rows)
+  : encoded_statistics(encoded_statistics),
+num_values(num_values),
+num_rows(num_rows) {}
+
+  // Encoded statistics extracted from the page header.
+  // Nullptr if there are no statistics in the page header.
+  const EncodedStatistics* encoded_statistics;
+  // Number of values stored in the page. Filled for both V1 and V2 data pages.
+  // For repeated fields, this can be greater than number of rows. For
+  // non-repeated fields, this will be the same as the number of rows.
+  int32_t num_values;
+  // Number of rows stored in the page. std::nullopt for V1 data pages since

Review Comment:
   Done.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 4 >

1 - 100 of 321 matches

Mail list logo