[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1487331617 Please see if this fix can be pulled. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1483606678 Right. This is simple 1 file fix with addition of test case versus the other one which may involve number of files. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1478932824 Perhaps the following would be better solution. Instead of looking for star any UnresolvedFunction should have UnresolvedAlias. Any comments? `private[this] def alias(expr: Expression): NamedExpression = expr match { case expr: NamedExpression => expr case a: AggregateExpression if a.aggregateFunction.isInstanceOf[TypedAggregateExpression] => UnresolvedAlias(a, Some(Column.generateAlias)) case expr: Expression => if (expr.isInstanceOf[UnresolvedFunction]) { UnresolvedAlias(expr, None) } else { Alias(expr, toPrettySQL(expr))() } }` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1470374550 Can anyone tell me how I am getting this single quote in count expression. Attaching the picture. This can potentially cause problems down the lance where tree nodes are compared in the transformDownWithPruning where the two nodes are not same because of this single quote https://user-images.githubusercontent.com/13139216/225378952-74ba895b-2c36-407a-ab1c-7ad46b469ae7.png;> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1468635397 > The auto-generated alias name is fragile and we are trying to improve it at #40126 > > Can you give some examples of how the new update changes the alias name? If it's not reasonable, we should keep the previous code. I am attaching a file showing some failures when all the aggregate expressions were made UnresolvedAlias. My latest checkin where I only make those aggregate expressions that have "*" as UnresolvedAlias works. The build went through.So it is essentially the unresolvedstar() that is being produced by the toPrettySQL for the agg expr with "*" that the Analyzer is not able to resolve. [sqlOtherTests.txt](https://github.com/apache/spark/files/10972570/sqlOtherTests.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1464842147 > I think the test is easy to fix. It wants to test the aggregate function result, but not the generated alias, so we just change the testing query to add alias explicitly. > > ``` > val avgDF = intervalData.select( > avg($"year-month").as("a1"), > avg($"year").as("a2"), > ... > ``` Couple of questions 1. Is it required and documented that we should add alias with the aggregate functions? If that is not a requirement then fixing the test case is potentially covering an issue. 2. The Thread leaks reported in the sql-other tests in not just from DataFrameAggregateSuite, but from multiple other suites 023-03-03T04:05:16.9822203Z 04:05:16.978 ERROR org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 393.0 failed 1 times; aborting job 2023-03-03T04:05:16.9866693Z [0m[[0m[0minfo[0m] [0m[0m[32m- SPARK-30668: use legacy timestamp parser in to_timestamp (154 milliseconds)[0m[0m 2023-03-03T04:05:17.0464670Z [0m[[0m[0minfo[0m] [0m[0m[32m- SPARK-30752: convert time zones on a daylight saving day (62 milliseconds)[0m[0m 2023-03-03T04:05:17.1930942Z [0m[[0m[0minfo[0m] [0m[0m[32m- SPARK-30766: date_trunc of old timestamps to hours and days (142 milliseconds)[0m[0m 2023-03-03T04:05:17.3358608Z [0m[[0m[0minfo[0m] [0m[0m[32m- SPARK-30793: truncate timestamps before the epoch to seconds and minutes (146 milliseconds)[0m[0m 2023-03-03T04:05:17.3824844Z 04:05:17.382 WARN org.apache.spark.sql.DateFunctionsSuite: 2023-03-03T04:05:17.3845065Z 2023-03-03T04:05:17.3846873Z = POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.DateFunctionsSuite, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1460579037 [7_Run Build modules sql - other tests.txt](https://github.com/apache/spark/files/10923393/7_Run.Build.modules.sql.-.other.tests.txt) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1453976515 Any comments. Apparently having all expr as unresolvedAlias is not working. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1453013580 @cloud-fan always using unresolvedAlias seems to be causing the sql-other module to fail. Will be reverting to the original fix of creating unresolvedAlias only for "*" or distinct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1451285915 Not sure why the suggested changes made the build fail in the catalyst,hive-thriftserver module and sql-other test module. 2023-03-01T22:23:36.6700903Z Error instrumenting class:org.apache.spark.sql.execution.streaming.state.SchemaHelper$SchemaV2Reader2023-03-01T22:23:36.8662344Z Error instrumenting class:org.apache.spark.sql.v2.avro.AvroScan 2023-03-01T22:23:36.8712474Z Error instrumenting class:org.apache.spark.api.python.DoubleArrayWritable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1449133178 Is there anything else that I need to do for the fix to be accepted? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect
ritikam2 commented on PR #40116: URL: https://github.com/apache/spark/pull/40116#issuecomment-1446869584 Not sure how my checkins are causing javadoc genration error -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org