Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-18 Thread via GitHub
viirya commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2237668944 > > ## TPCDS Micro Benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative > > add_many_decimals

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove merged PR #671: URL: https://github.com/apache/datafusion-comet/pull/671 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@da

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on code in PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#discussion_r1681869984 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala: ## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
kazuyukitanimura commented on code in PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#discussion_r1681836876 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala: ## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on code in PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#discussion_r1681816431 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala: ## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foundation

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
kazuyukitanimura commented on code in PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#discussion_r1681790305 ## spark/src/test/scala/org/apache/spark/sql/benchmark/CometTPCDSMicroBenchmark.scala: ## @@ -0,0 +1,141 @@ +/* + * Licensed to the Apache Software Foun

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2234226194 > However, this path would be hit only for precision > 18 or if `spark.comet.use.decimal128` was set to `true` (it is `false` by default). The fields have precision 7 and I

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2234221966 > Just looking at this one case, with decimal fields and only scan enabled, we are much slower. This is consistent with something I saw when working on the parallel reader. >

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
parthchandra commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2234180553 > TPCDS Micro Benchmarks: Best Time(ms) Avg Time(ms) Stdev(ms)Rate(M/s) Per Row(ns) Relative > ---

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
parthchandra commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2234142009 > Can we also commit the benchmark results (as we do for the other microbenchmarks)? That way we can keep an eye open for any performance regressions. I was under the i

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on code in PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#discussion_r1681640222 ## spark/src/test/resources/tpcds-micro-benchmarks/add_many_decimals.sql: ## @@ -0,0 +1,34 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2234106564 For reference, here are the results from running with sf=100gb data on my Linux workstation. ``` OpenJDK 64-Bit Server VM 11.0.23+9-post-Ubuntu-1ubuntu122.04.1 on Linux

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
viirya commented on code in PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#discussion_r1681564129 ## spark/src/test/resources/tpcds-micro-benchmarks/add_many_decimals.sql: ## @@ -0,0 +1,34 @@ +-- Licensed to the Apache Software Foundation (ASF) under one +-- o

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
parthchandra commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2233705069 > I tried integrating into the existing suite but the queries are not running correctly and I am not sure why. > > Example: > > ``` > TPCDS Micro Benchmarks:

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2233704546 @parthchandra @kazuyukitanimura This is ready for another review. I can now run these new microbenchmarks from the existing test framework. -- This is an automated message from

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-17 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2233227228 I tried integrating into the existing suite but the queries are not running correctly and I am not sure why. Example: ``` TPCDS Micro Benchmarks:

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2232452105 > Hmmm we have microbenchmarks at https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark > > Wondering if we should follow

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
parthchandra commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2232116879 > Hmmm we have microbenchmarks at https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark Wondering if we should follow the ex

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
kazuyukitanimura commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2232034311 Hmmm we have microbenchmarks at https://github.com/apache/datafusion-comet/tree/main/spark/src/test/scala/org/apache/spark/sql/benchmark Wondering if we should follow th

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-16 Thread via GitHub
andygrove commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2230835502 @parthchandra @kazuyukitanimura @huaxingao @viirya This is ready for review now -- This is an automated message from the Apache Git Service. To respond to the message, please l

Re: [PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-15 Thread via GitHub
codecov-commenter commented on PR #671: URL: https://github.com/apache/datafusion-comet/pull/671#issuecomment-2229625195 ## [Codecov](https://app.codecov.io/gh/apache/datafusion-comet/pull/671?dropdown=coverage&src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campai

[PR] chore: Add microbenchmarks [datafusion-comet]

2024-07-15 Thread via GitHub
andygrove opened a new pull request, #671: URL: https://github.com/apache/datafusion-comet/pull/671 ## Which issue does this PR close? N/A ## Rationale for this change When running the complex TPC-* queries, it is challenging to debug why Comet is slower