[PR] Add support for TABLESAMPLE [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
yoavcloud opened a new pull request, #1580: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1580 This PR adds support for the `TABLESAMPLE` option in the following dialects: - Standard SQL: https://jakewheat.github.io/sql-overview/sql-2016-foundation-grammar.html#sample-claus

Re: [PR] chore: Remove unused StringView struct [datafusion-comet]

2024-12-06 Thread via GitHub
viirya commented on code in PR #1143: URL: https://github.com/apache/datafusion-comet/pull/1143#discussion_r1874334180 ## native/core/src/data_type.rs: ## @@ -1,241 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributor license agreements.

Re: [PR] chore: Remove unused StringView struct [datafusion-comet]

2024-12-06 Thread via GitHub
sunchao commented on code in PR #1143: URL: https://github.com/apache/datafusion-comet/pull/1143#discussion_r1874248673 ## native/core/src/data_type.rs: ## @@ -1,241 +0,0 @@ -// Licensed to the Apache Software Foundation (ASF) under one -// or more contributor license agreements

Re: [I] Optimize SortPreservingMergeStream for single-column merge [datafusion]

2024-12-06 Thread via GitHub
jayzhan211 commented on issue #13642: URL: https://github.com/apache/datafusion/issues/13642#issuecomment-2524851266 It seems single column case is optimized here https://github.com/apache/datafusion/blob/8404cd05891f1884226ce512eee8b9a89ea99c5b/datafusion/physical-plan/src/sorts/stre

Re: [I] Diagram display problem [datafusion-site]

2024-12-06 Thread via GitHub
lewiszlw closed issue #16: Diagram display problem URL: https://github.com/apache/datafusion-site/issues/16 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [I] Diagram display problem [datafusion-site]

2024-12-06 Thread via GitHub
lewiszlw commented on issue #16: URL: https://github.com/apache/datafusion-site/issues/16#issuecomment-2524848055 No. I checked in my mobile browser and looked good. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

[PR] Enable scenarios accidentally commented out in CometExecBenchmark [datafusion-comet]

2024-12-06 Thread via GitHub
mbutrovich opened a new pull request, #1151: URL: https://github.com/apache/datafusion-comet/pull/1151 ## Which issue does this PR close? Closes #. ## Rationale for this change #987 accidentally commented out some scenarios in CometExecBenchmark. ##

Re: [PR] [comet-parquet-exec] Add Native Scan to CometReadBenchmark [datafusion-comet]

2024-12-06 Thread via GitHub
mbutrovich commented on code in PR #1150: URL: https://github.com/apache/datafusion-comet/pull/1150#discussion_r1874227078 ## native/core/src/execution/datafusion/schema_adapter.rs: ## @@ -114,7 +114,7 @@ impl SchemaAdapter for CometSchemaAdapter { required_sche

[PR] [comet-parquet-exec] Add Native Scan to CometReadBenchmark [datafusion-comet]

2024-12-06 Thread via GitHub
mbutrovich opened a new pull request, #1150: URL: https://github.com/apache/datafusion-comet/pull/1150 I still need to convince myself that CometNativeScan is being generated for the scenarios when it's enabled. [CometReadBenchmark-jdk11-results.txt](https://github.com/user-attachmen

Re: [PR] Compare schema as logically equivalent to workaround disappearing metadata [datafusion]

2024-12-06 Thread via GitHub
github-actions[bot] closed pull request #12631: Compare schema as logically equivalent to workaround disappearing metadata URL: https://github.com/apache/datafusion/pull/12631 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-06 Thread via GitHub
Omega359 commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2524788724 It's soon to be the holiday season so I'm all for cooking the release a bit more. -- This is an automated message from the Apache Git Service. To respond to the message, plea

Re: [PR] [comet-parquet-exec] Change path handling to fix URL decoding [datafusion-comet]

2024-12-06 Thread via GitHub
mbutrovich commented on PR #1149: URL: https://github.com/apache/datafusion-comet/pull/1149#issuecomment-2524716793 An errant .DS_Store made it in. I figured that would be in the gitignore. I’ll fix it tomorrow. -- This is an automated message from the Apache Git Service. To respond to t

Re: [I] Optimize SortPreservingMergeStream for single-column merge [datafusion]

2024-12-06 Thread via GitHub
comphead commented on issue #13642: URL: https://github.com/apache/datafusion/issues/13642#issuecomment-2524709601 Thanks for the flamegraph, indeed looks like it happens when dealing with working with `PartitionedStream`. namely in ``` let cursor = self.convert_batch(&batch)?; ``

[I] Retry logic in ParquetSink [datafusion]

2024-12-06 Thread via GitHub
wiedld opened a new issue, #13679: URL: https://github.com/apache/datafusion/issues/13679 ### Is your feature request related to a problem or challenge? ParquetSink (used for `COPY TO`) encodes bytes to parquet and writes to the sink (e.g. object store). It currently does not include

Re: [PR] [comet-parquet-exec] Change path handling to fix URL decoding [datafusion-comet]

2024-12-06 Thread via GitHub
mbutrovich commented on code in PR #1149: URL: https://github.com/apache/datafusion-comet/pull/1149#discussion_r1874173889 ## native/core/src/execution/datafusion/schema_adapter.rs: ## @@ -114,7 +114,7 @@ impl SchemaAdapter for CometSchemaAdapter { required_sche

[PR] [comet-parquet-exec] Change path handling to fix URL decoding [datafusion-comet]

2024-12-06 Thread via GitHub
mbutrovich opened a new pull request, #1149: URL: https://github.com/apache/datafusion-comet/pull/1149 We had a bug with decoding paths, in particular in CometExecSuite.partition col the path has a : in it, which either wouldn't stay as `%3A` or the spaces would end up as `%20` or worse `%2

Re: [I] Diagram display problem [datafusion-site]

2024-12-06 Thread via GitHub
timsaucer commented on issue #16: URL: https://github.com/apache/datafusion-site/issues/16#issuecomment-2524683305 This isn't reproduced in my browser. Do you still have the problem? I checked two different browsers, one mobile and one desktop. -- This is an automated message from the Apa

Re: [PR] Bump google-protobuf from 4.26.1 to 4.27.5 [datafusion-site]

2024-12-06 Thread via GitHub
dependabot[bot] commented on PR #27: URL: https://github.com/apache/datafusion-site/pull/27#issuecomment-2524661388 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] Bump google-protobuf from 4.26.1 to 4.27.5 [datafusion-site]

2024-12-06 Thread via GitHub
timsaucer commented on PR #27: URL: https://github.com/apache/datafusion-site/pull/27#issuecomment-2524661365 Closing since we no longer have a Gemfile.lock since we have transitioned to pelican builds -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Bump google-protobuf from 4.26.1 to 4.27.5 [datafusion-site]

2024-12-06 Thread via GitHub
timsaucer closed pull request #27: Bump google-protobuf from 4.26.1 to 4.27.5 URL: https://github.com/apache/datafusion-site/pull/27 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Bump rexml from 3.2.8 to 3.3.9 [datafusion-site]

2024-12-06 Thread via GitHub
dependabot[bot] commented on PR #31: URL: https://github.com/apache/datafusion-site/pull/31#issuecomment-2524661146 OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version,

Re: [PR] Bump rexml from 3.2.8 to 3.3.9 [datafusion-site]

2024-12-06 Thread via GitHub
timsaucer closed pull request #31: Bump rexml from 3.2.8 to 3.3.9 URL: https://github.com/apache/datafusion-site/pull/31 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsu

Re: [PR] Bump rexml from 3.2.8 to 3.3.9 [datafusion-site]

2024-12-06 Thread via GitHub
timsaucer commented on PR #31: URL: https://github.com/apache/datafusion-site/pull/31#issuecomment-2524661117 Closing since we no longer have a Gemfile.lock since we have transitioned to pelican builds -- This is an automated message from the Apache Git Service. To respond to the message

Re: [PR] Testing stage site. Do not merge [datafusion-site]

2024-12-06 Thread via GitHub
timsaucer closed pull request #41: Testing stage site. Do not merge URL: https://github.com/apache/datafusion-site/pull/41 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To un

Re: [I] Optimize SortPreservingMergeStream for single-column merge [datafusion]

2024-12-06 Thread via GitHub
comphead commented on issue #13642: URL: https://github.com/apache/datafusion/issues/13642#issuecomment-2524610225 Hi @Dandandan does the `SortPreservingMergeStream` work with row format? I checked quickly https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/sorts/mer

Re: [PR] Testing stage site. Do not merge [datafusion-site]

2024-12-06 Thread via GitHub
alamb commented on PR #41: URL: https://github.com/apache/datafusion-site/pull/41#issuecomment-2524584504 It seems to work: https://datafusion.staged.apache.org/ (though the URL link in the docs needs to be updated) -- This is an automated message from the Apache Git Service

Re: [PR] Consolidate `MapAccess`, and `Subscript` into `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
alamb commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1874012439 ## tests/sqlparser_common.rs: ## @@ -10299,20 +10324,38 @@ fn parse_map_access_expr() { Box::new(ClickHouseDialect {}), ]); let expr =

Re: [PR] Add example for using a separate threadpool for CPU bound work [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13424: URL: https://github.com/apache/datafusion/pull/13424#issuecomment-2524499072 Update here is I hacked up the alternate approproach (annotating all locations in DataFusion that use a different threadpool) on the plane. It didn't go great but I will make a PR tomo

[PR] Adjust site URL to be relative to /blog [datafusion-site]

2024-12-06 Thread via GitHub
timsaucer opened a new pull request, #44: URL: https://github.com/apache/datafusion-site/pull/44 This PR is designed to set the built site to be relative to the /blog directory since ASF infra puts it there for the urls. -- This is an automated message from the Apache Git Service. To resp

Re: [PR] fix: Avoid to call import and export Arrow array for native execution [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove commented on PR #1055: URL: https://github.com/apache/datafusion-comet/pull/1055#issuecomment-2524370340 I understand what this PR is doing now. By using CometNativeVector when importing from native, and then reexporting CometNativeVector we are just passing memory addresses and

Re: [PR] chore: Refactor cast to use SparkCastOptions param [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove merged PR #1146: URL: https://github.com/apache/datafusion-comet/pull/1146 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] WIP Upgrade to arrow-rs/parquet 54.0.0 [datafusion]

2024-12-06 Thread via GitHub
alamb commented on code in PR #13663: URL: https://github.com/apache/datafusion/pull/13663#discussion_r1874025611 ## datafusion/core/src/datasource/file_format/parquet.rs: ## @@ -934,7 +934,7 @@ fn spawn_column_parallel_row_group_writer( max_buffer_size: usize, pool: &

Re: [PR] Fix Duplicated filters within (filter(TableScan)) plan for unparser [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13422: URL: https://github.com/apache/datafusion/pull/13422#issuecomment-2524185524 > Thanks @jayzhan211 @alamb Thank you for all the bug fixes @Sevenannn -- it turns out that @wiedld just found you had fixed a bug we ran into in InfluxDB as well (see https:/

Re: [PR] Minor: Rephrase MSRV policy to be more explanatory [datafusion]

2024-12-06 Thread via GitHub
comphead merged PR #13668: URL: https://github.com/apache/datafusion/pull/13668 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-06 Thread via GitHub
alamb commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2524178402 Does anyone have any opinion about holding the DataFusion 44 release for the next major arrow release? - https://github.com/apache/arrow-rs/issues/6342 That would fix a b

Re: [PR] feat: Support faster multi-column grouping ( `GroupColumn`) for `Date/Time/Timestamp` types [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13457: URL: https://github.com/apache/datafusion/pull/13457#issuecomment-2524172875 Epic -- thank again for finishing this work up @jonathanc-n -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

Re: [I] Release DataFusion `44.0.0` [datafusion]

2024-12-06 Thread via GitHub
alamb commented on issue #13334: URL: https://github.com/apache/datafusion/issues/13334#issuecomment-2524175288 I started gathering a list of items we think we should fix prior to the release in the description -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] ScalarUDFImpl invoke improvements [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13507: URL: https://github.com/apache/datafusion/pull/13507#issuecomment-2524167542 > > * Closes [Perf: Allow User defined functions to potentially reuse their argument arrays (to avoid new allocations)  #13516](https://github.com/apache/datafusion/issues/13516) >

Re: [I] Optimizing `LogicalPlan` with placeholders fails [datafusion]

2024-12-06 Thread via GitHub
alamb closed issue #8819: Optimizing `LogicalPlan` with placeholders fails URL: https://github.com/apache/datafusion/issues/8819 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [I] Optimizing `LogicalPlan` with placeholders fails [datafusion]

2024-12-06 Thread via GitHub
alamb commented on issue #8819: URL: https://github.com/apache/datafusion/issues/8819#issuecomment-2524165976 since https://github.com/apache/datafusion/pull/13632 is merged, closing this ticket Thank you @davisp and @simonvandel -- getting better every day! -- This is an automat

Re: [I] Update ClickBench benchmarks with DataFusion `43.0.0` [datafusion]

2024-12-06 Thread via GitHub
alamb commented on issue #13099: URL: https://github.com/apache/datafusion/issues/13099#issuecomment-2524162828 Thank you for the update @waruto210 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go t

Re: [PR] test: enable more Spark 4.0 tests [datafusion-comet]

2024-12-06 Thread via GitHub
kazuyukitanimura commented on PR #1145: URL: https://github.com/apache/datafusion-comet/pull/1145#issuecomment-2524159564 Merged, Thanks @andygrove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: [PR] test: enable more Spark 4.0 tests [datafusion-comet]

2024-12-06 Thread via GitHub
kazuyukitanimura merged PR #1145: URL: https://github.com/apache/datafusion-comet/pull/1145 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsub

Re: [PR] WIP: example solution for part of Epic 13525; invariant checking for implicit LP changes [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13651: URL: https://github.com/apache/datafusion/pull/13651#issuecomment-2524156980 I think it would also be great if we could consider moving the invariant check directly into `LogicalPlan` -- for exmaple `LogicalPlan::check_invariants` This would make this

Re: [PR] WIP: example solution for part of Epic 13525; invariant checking for implicit LP changes [datafusion]

2024-12-06 Thread via GitHub
alamb commented on code in PR #13651: URL: https://github.com/apache/datafusion/pull/13651#discussion_r1873975821 ## datafusion/optimizer/src/optimizer.rs: ## @@ -451,6 +468,33 @@ impl Optimizer { } } +/// These are invariants to hold true for each logical plan. +/// Do

Re: [PR] add partitioning scheme for unresolved shuffle and shuffle reader exec [datafusion-ballista]

2024-12-06 Thread via GitHub
milenkovicm commented on code in PR #1144: URL: https://github.com/apache/datafusion-ballista/pull/1144#discussion_r1873973802 ## ballista/core/proto/ballista.proto: ## @@ -50,14 +50,15 @@ message ShuffleWriterExecNode { message UnresolvedShuffleExecNode { uint32 stage_id =

Re: [PR] WIP: example solution for part of Epic 13525; invariant checking for implicit LP changes [datafusion]

2024-12-06 Thread via GitHub
alamb commented on code in PR #13651: URL: https://github.com/apache/datafusion/pull/13651#discussion_r1873973604 ## datafusion/optimizer/src/optimizer.rs: ## @@ -451,6 +468,33 @@ impl Optimizer { } } +/// These are invariants to hold true for each logical plan. +/// Do

[PR] feat: default instance for executor configuration [datafusion-ballista]

2024-12-06 Thread via GitHub
milenkovicm opened a new pull request, #1147: URL: https://github.com/apache/datafusion-ballista/pull/1147 ... and clean up scheduler configuration. # Which issue does this PR close? Closes None. # Rationale for this change Two main reasons for this change

Re: [PR] feat!: change catalog provider and schema provider methods to be asynchronous [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13582: URL: https://github.com/apache/datafusion/pull/13582#issuecomment-2524113332 I plan to review this PR soon -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Minor: Rephrase MSRV policy to be more explanatory [datafusion]

2024-12-06 Thread via GitHub
alamb commented on code in PR #13668: URL: https://github.com/apache/datafusion/pull/13668#discussion_r1873948424 ## README.md: ## @@ -126,14 +126,18 @@ Optional features: ## Rust Version Compatibility Policy -DataFusion's Minimum Required Stable Rust Version (MSRV) policy

Re: [PR] Minor: Comment temporary function for documentation migration [datafusion]

2024-12-06 Thread via GitHub
alamb merged PR #13669: URL: https://github.com/apache/datafusion/pull/13669 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Add `#[recursive]` [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
blaginin commented on PR #1522: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1522#issuecomment-2524074922 resolved conflicts, should be good to merge now 🤗 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] refactor: replace `Vec` with `IndexMap` for expression mappings in `ProjectionMapping` and `EquivalenceGroup` [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13675: URL: https://github.com/apache/datafusion/pull/13675#issuecomment-2524068931 Thanks @Weijun-H -- I am running some other benchmarks now too. I'll report back -- This is an automated message from the Apache Git Service. To respond to the message, please log o

Re: [PR] docs: Add some documentation explaining how shuffle works [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove merged PR #1148: URL: https://github.com/apache/datafusion-comet/pull/1148 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [I] Create memory table with target partitions [datafusion]

2024-12-06 Thread via GitHub
demetribu commented on issue #12905: URL: https://github.com/apache/datafusion/issues/12905#issuecomment-2524060879 I tried to implement this in a naive way by just adding the following code: ``` let target_partitions = self.state().config_options().execution.target_partitions;

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-06 Thread via GitHub
cisaacson commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2524027013 I like @pinarbayata #4 very much -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-06 Thread via GitHub
tbar4 commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2524020801 If we could get the color of 4 on option 3 I think that is the best -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [I] Upgrade to hashbrown 0.15.1: migrate from `hashbrown::raw::RawTable` to `hashbrown::hash_table::HashTable` [datafusion]

2024-12-06 Thread via GitHub
Dandandan commented on issue #13433: URL: https://github.com/apache/datafusion/issues/13433#issuecomment-2524019148 Thanks @crepererum for the steady progress on this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-12-06 Thread via GitHub
Dandandan commented on PR #13524: URL: https://github.com/apache/datafusion/pull/13524#issuecomment-2524016127 I found no real changes: ``` Comparing old_version and new Benchmark tpch_sf1.json ┏━━┳━┳━

Re: [I] Update ballista logo [datafusion-ballista]

2024-12-06 Thread via GitHub
timsaucer commented on issue #1133: URL: https://github.com/apache/datafusion-ballista/issues/1133#issuecomment-2524008586 I like 3 the most, with 4 next. I do like the color on the logo of #4 more though. -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Redshift: Fix parsing for quoted numbered columns [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
7phs commented on code in PR #1576: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1576#discussion_r1873845800 ## tests/sqlparser_redshift.rs: ## @@ -353,3 +380,23 @@ fn test_parse_json_path_from() { _ => panic!(), } } + +#[test] +fn test_parse_selec

[I] LogicalPlan::get_parameter_types fails to return all placeholders [datafusion]

2024-12-06 Thread via GitHub
davisp opened a new issue, #13678: URL: https://github.com/apache/datafusion/issues/13678 ### Describe the bug Calling `LogicalPlan::get_parameter_types()` does not return the place holder from `SELECT $1;`. ### To Reproduce The assertion triggers because no para

[PR] fix: repartitioned reads of CSV with custom line terminator [datafusion]

2024-12-06 Thread via GitHub
korowa opened a new pull request, #13677: URL: https://github.com/apache/datafusion/pull/13677 ## Which issue does this PR close? Closes #12328. ## Rationale for this change At this moment DF is unable to properly identify ranges after file repartitioning

[PR] feat: scalar regex match physical expr [datafusion]

2024-12-06 Thread via GitHub
zhuliquan opened a new pull request, #12270: URL: https://github.com/apache/datafusion/pull/12270 ## Which issue does this PR close? Closes #11146. ## Rationale for this change This PR is successor of PR #11455 `BinaryExpr` will compile literal regex pattern when i

Re: [I] Should ScanExec use Spark-compatible cast instead of DataFusion cast? [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove commented on issue #803: URL: https://github.com/apache/datafusion-comet/issues/803#issuecomment-2523868208 For complex type support, we likely need to fix this. We could maybe use the same SchemaAdapter approach that we are using in the Parquet exec proof-of-concept work. --

Re: [PR] chore: Stop scanning batches during query planning in ScanExec [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove closed pull request #1134: chore: Stop scanning batches during query planning in ScanExec URL: https://github.com/apache/datafusion-comet/pull/1134 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above t

Re: [PR] chore: Stop scanning batches during query planning in ScanExec [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove commented on PR #1134: URL: https://github.com/apache/datafusion-comet/pull/1134#issuecomment-2523830883 Thanks for the feedback so far @kazuyukitanimura and @viirya. I am closing this for now, but I may be back with another attempt to remove the initial scan in the ScanEx

Re: [PR] feat: scalar regex match physical expr [datafusion]

2024-12-06 Thread via GitHub
zhuliquan closed pull request #12270: feat: scalar regex match physical expr URL: https://github.com/apache/datafusion/pull/12270 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

Re: [PR] refactor: replace `Vec` with `IndexMap` for expression mappings in `ProjectionMapping` and `EquivalenceGroup` [datafusion]

2024-12-06 Thread via GitHub
comphead commented on code in PR #13675: URL: https://github.com/apache/datafusion/pull/13675#discussion_r1873717759 ## datafusion/physical-expr/src/equivalence/class.rs: ## @@ -546,20 +546,15 @@ impl EquivalenceGroup { .collect::>(); (new_class.len

Re: [PR] docs: Add some documentation explaining how shuffle works [datafusion-comet]

2024-12-06 Thread via GitHub
parthchandra commented on PR #1148: URL: https://github.com/apache/datafusion-comet/pull/1148#issuecomment-2523748763 Thank you for shedding light on this @andygrove ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 3) [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13658: URL: https://github.com/apache/datafusion/pull/13658#issuecomment-2523740092 Thank you @crepererum and @findepi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to g

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 3) [datafusion]

2024-12-06 Thread via GitHub
alamb merged PR #13658: URL: https://github.com/apache/datafusion/pull/13658 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusi

Re: [PR] Redshift: Fix parsing for quoted numbered columns [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
iffyio commented on code in PR #1576: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1576#discussion_r1873688579 ## tests/sqlparser_redshift.rs: ## @@ -353,3 +380,23 @@ fn test_parse_json_path_from() { _ => panic!(), } } + +#[test] +fn test_parse_sel

Re: [PR] Introduce `CompoundExpr` to handle the complex field access chain [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
iffyio commented on code in PR #1551: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1551#discussion_r1871791291 ## src/dialect/snowflake.rs: ## @@ -234,6 +234,10 @@ impl Dialect for SnowflakeDialect { RESERVED_FOR_IDENTIFIER.contains(&kw) }

[PR] docs: Add some documentation explaining how shuffle works [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove opened a new pull request, #1148: URL: https://github.com/apache/datafusion-comet/pull/1148 ## Which issue does this PR close? Closes https://github.com/apache/datafusion-comet/issues/977 ## Rationale for this change I didn't fully understand how

Re: [I] Add related source code locations to errors [datafusion]

2024-12-06 Thread via GitHub
comphead commented on issue #13662: URL: https://github.com/apache/datafusion/issues/13662#issuecomment-2523675208 > ``` > (line 1, column 8) error: 'users.name' in projection does not appear in GROUP BY clause > (line 1, column 33) note: GROUP BY clause is here > (line 1, colum

Re: [PR] Minor: Rephrase MSRV policy to be more explanatory [datafusion]

2024-12-06 Thread via GitHub
comphead commented on PR #13668: URL: https://github.com/apache/datafusion/pull/13668#issuecomment-2523678849 @alamb @findepi can I get the review on this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

Re: [PR] Redshift: Fix parsing for quoted numbered columns [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
7phs commented on code in PR #1576: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1576#discussion_r1873653672 ## src/dialect/mod.rs: ## @@ -128,14 +128,38 @@ pub trait Dialect: Debug + Any { ch == '"' || ch == '`' } -/// Return the character us

Re: [PR] Redshift: Fix parsing for quoted numbered columns [datafusion-sqlparser-rs]

2024-12-06 Thread via GitHub
bombsimon commented on code in PR #1576: URL: https://github.com/apache/datafusion-sqlparser-rs/pull/1576#discussion_r1873553041 ## src/dialect/mod.rs: ## @@ -128,14 +128,38 @@ pub trait Dialect: Debug + Any { ch == '"' || ch == '`' } -/// Return the charact

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-12-06 Thread via GitHub
Dandandan commented on PR #13524: URL: https://github.com/apache/datafusion/pull/13524#issuecomment-2523547379 It's as if it pingpongs between two machine "versions". I'm running it myself as well locally -- This is an automated message from the Apache Git Service. To respond to the mes

Re: [PR] Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) [datafusion]

2024-12-06 Thread via GitHub
alamb commented on code in PR #13637: URL: https://github.com/apache/datafusion/pull/13637#discussion_r1873541980 ## datafusion-examples/examples/advanced_udf.rs: ## @@ -215,9 +265,29 @@ async fn main() -> Result<()> { // print the results df.show().await?; -// Y

Re: [PR] feat: [comet-parquet-exec] Schema adapter fixes [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove merged PR #1139: URL: https://github.com/apache/datafusion-comet/pull/1139 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@

Re: [PR] Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) [datafusion]

2024-12-06 Thread via GitHub
Omega359 commented on code in PR #13637: URL: https://github.com/apache/datafusion/pull/13637#discussion_r1873537537 ## datafusion-examples/examples/advanced_udf.rs: ## @@ -215,9 +265,29 @@ async fn main() -> Result<()> { // print the results df.show().await?; -/

Re: [I] Ballista Python Issue(s) [datafusion-ballista]

2024-12-06 Thread via GitHub
milenkovicm commented on issue #1142: URL: https://github.com/apache/datafusion-ballista/issues/1142#issuecomment-2523530514 > One more option to throw in. Could we reduce the scope for (py)Ballista for now to just support SQL and not the DataFrame API? > > We would just need the abi

Re: [PR] Performance: enable array allocation reuse (`ScalarFunctionArgs` gets owned `ColumnReference`) [datafusion]

2024-12-06 Thread via GitHub
Omega359 commented on code in PR #13637: URL: https://github.com/apache/datafusion/pull/13637#discussion_r1873535480 ## datafusion-examples/examples/advanced_udf.rs: ## @@ -215,9 +265,29 @@ async fn main() -> Result<()> { // print the results df.show().await?; -/

Re: [PR] chore: Refactor cast to use SparkCastOptions param [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove commented on code in PR #1146: URL: https://github.com/apache/datafusion-comet/pull/1146#discussion_r1873533892 ## native/spark-expr/src/cast.rs: ## @@ -547,30 +540,41 @@ impl Cast { pub fn new( child: Arc, data_type: DataType, -eval_mode

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-12-06 Thread via GitHub
alamb commented on PR #13524: URL: https://github.com/apache/datafusion/pull/13524#issuecomment-2523487372 Here is my next run (this time it seems to be faster) ``` Benchmark tpch_sf1.json ┏━━┳━━━┳━━━

Re: [I] [DISCUSSION] Making it easier to use DataFusion (lessons from GlareDB) [datafusion]

2024-12-06 Thread via GitHub
alamb commented on issue #13525: URL: https://github.com/apache/datafusion/issues/13525#issuecomment-2523483021 FWIW check out this page from @XiangpengHao for a cool example of using DataFusion in WASM: https://parquet-viewer.xiangpeng.systems/ -- This is an automated message fro

Re: [PR] refactor: change some `hashbrown` `RawTable` uses to `HashTable` (round 2) [datafusion]

2024-12-06 Thread via GitHub
Dandandan commented on PR #13524: URL: https://github.com/apache/datafusion/pull/13524#issuecomment-2523478262 > 1.40x Hm, that's interesting! The results are not super reliable for me but mostly the noise is in the 5-7% range (running locally). -- This is an automated message from

[PR] chore: Refactor cast to use SparkCastOptions param [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove opened a new pull request, #1146: URL: https://github.com/apache/datafusion-comet/pull/1146 ## Which issue does this PR close? N/A ## Rationale for this change This is in preparation for adding additional cast options to support the `comet-parqu

Re: [PR] refactor: replace `Vec` with `IndexMap` for expression mappings in `ProjectionMapping` and `EquivalenceGroup` [datafusion]

2024-12-06 Thread via GitHub
Weijun-H commented on PR #13675: URL: https://github.com/apache/datafusion/pull/13675#issuecomment-2523473201 ``` Comparing main and 8027-refactor-hashmap Benchmark clickbench_partitioned.json ┏━━┳━━━┳━━━

[I] Report multiple errors, not just the first one [datafusion]

2024-12-06 Thread via GitHub
eliaperantoni opened a new issue, #13676: URL: https://github.com/apache/datafusion/issues/13676 ### Is your feature request related to a problem or challenge? In the following query there are 4 distinct errors: ```sql WITH users AS ( SELECT 1 AS id, 'John' AS name

Re: [PR] Feat/parameterized sql queries [datafusion-python]

2024-12-06 Thread via GitHub
MrPowers commented on PR #964: URL: https://github.com/apache/datafusion-python/pull/964#issuecomment-2523408178 This user interface looks nice 😎 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] Add related source code locations to errors [datafusion]

2024-12-06 Thread via GitHub
eliaperantoni commented on issue #13662: URL: https://github.com/apache/datafusion/issues/13662#issuecomment-2523392024 > Datafusion supports backtraces to find where the problem happens in the first place [datafusion.apache.org/user-guide/crate-configuration.html#enable-backtraces](https:/

Re: [PR] refactor: replace `Vec` with `IndexMap` for expression mappings in `ProjectionMapping` and `EquivalenceGroup` [datafusion]

2024-12-06 Thread via GitHub
Weijun-H commented on PR #13675: URL: https://github.com/apache/datafusion/pull/13675#issuecomment-2523405600 Comparing main and 8027-refactor-hashmap Benchmark clickbench_partitioned.json ┏━━┳━━━┳━━

Re: [I] Investigate native query planning overhead [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove commented on issue #1098: URL: https://github.com/apache/datafusion-comet/issues/1098#issuecomment-2523391629 I will close this now that we understand why planning is so expensive (because it includes partial execution of ScanExec) -- This is an automated message from the Apach

Re: [I] Investigate native query planning overhead [datafusion-comet]

2024-12-06 Thread via GitHub
andygrove closed issue #1098: Investigate native query planning overhead URL: https://github.com/apache/datafusion-comet/issues/1098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

[PR] refactor: replace `Vec` with `IndexMap` for expression mappings in `ProjectionMapping` and `EquivalenceGroup` [datafusion]

2024-12-06 Thread via GitHub
Weijun-H opened a new pull request, #13675: URL: https://github.com/apache/datafusion/pull/13675 ## Which issue does this PR close? Closes #8027 ## Rationale for this change ## What changes are included in this PR? ## Are these changes

Re: [PR] Enable CI to publish main [datafusion-site]

2024-12-06 Thread via GitHub
andygrove merged PR #42: URL: https://github.com/apache/datafusion-site/pull/42 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@dataf

[PR] Prep 43.1.0 [datafusion-python]

2024-12-06 Thread via GitHub
andygrove opened a new pull request, #965: URL: https://github.com/apache/datafusion-python/pull/965 # Which issue does this PR close? Closes #. # Rationale for this change # What changes are included in this PR? # Are there any user-facing changes

  1   2   >