[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-28 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1487331617

   Please see if this fix can be pulled.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-24 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1483606678

   Right. This is simple 1 file fix with addition of test case versus the other 
one which may involve number of files.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-21 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1478932824

   Perhaps the following would be better solution. Instead of looking for star 
any  UnresolvedFunction should have UnresolvedAlias.  Any comments?
   
 `private[this] def alias(expr: Expression): NamedExpression = expr match {
   case expr: NamedExpression => expr
   case a: AggregateExpression if 
a.aggregateFunction.isInstanceOf[TypedAggregateExpression] =>
 UnresolvedAlias(a, Some(Column.generateAlias))
   case expr: Expression =>
  if (expr.isInstanceOf[UnresolvedFunction]) {
   UnresolvedAlias(expr, None)
 } else {
Alias(expr, toPrettySQL(expr))()
 }
 }`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-15 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1470374550

   Can anyone tell me how I am getting this single quote in count expression. 
Attaching the picture. This can potentially cause problems down the lance where 
tree nodes are compared in the transformDownWithPruning where the two nodes are 
not same because of this single quote
   https://user-images.githubusercontent.com/13139216/225378952-74ba895b-2c36-407a-ab1c-7ad46b469ae7.png;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-14 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1468635397

   > The auto-generated alias name is fragile and we are trying to improve it 
at #40126
   > 
   > Can you give some examples of how the new update changes the alias name? 
If it's not reasonable, we should keep the previous code.
   
   I am attaching a file showing some failures when all the aggregate 
expressions were made UnresolvedAlias. My latest checkin where I only make 
those aggregate expressions that have "*" as UnresolvedAlias works. The build 
went through.So it is essentially the unresolvedstar() that is being produced 
by the toPrettySQL  for the agg expr with "*" that the Analyzer is not able to 
resolve. 
   
[sqlOtherTests.txt](https://github.com/apache/spark/files/10972570/sqlOtherTests.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-10 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1464842147

   > I think the test is easy to fix. It wants to test the aggregate function 
result, but not the generated alias, so we just change the testing query to add 
alias explicitly.
   > 
   > ```
   > val avgDF = intervalData.select(
   >   avg($"year-month").as("a1"),
   >   avg($"year").as("a2"),
   >   ...
   > ```
   
   Couple of questions
   
   1.  Is it required and documented that we should add  alias with the 
aggregate functions? If that is not a requirement then fixing the test case is 
potentially  covering an issue.
   2. The Thread leaks reported in the sql-other tests in not just from 
DataFrameAggregateSuite, but from multiple other suites
   
   023-03-03T04:05:16.9822203Z 04:05:16.978 ERROR 
org.apache.spark.scheduler.TaskSetManager: Task 0 in stage 393.0 failed 1 
times; aborting job
   2023-03-03T04:05:16.9866693Z [info] - 
SPARK-30668: use legacy timestamp parser in to_timestamp (154 
milliseconds)
   2023-03-03T04:05:17.0464670Z [info] - 
SPARK-30752: convert time zones on a daylight saving day (62 
milliseconds)
   2023-03-03T04:05:17.1930942Z [info] - 
SPARK-30766: date_trunc of old timestamps to hours and days (142 
milliseconds)
   2023-03-03T04:05:17.3358608Z [info] - 
SPARK-30793: truncate timestamps before the epoch to seconds and minutes (146 
milliseconds)
   2023-03-03T04:05:17.3824844Z 04:05:17.382 WARN 
org.apache.spark.sql.DateFunctionsSuite: 
   2023-03-03T04:05:17.3845065Z 
   2023-03-03T04:05:17.3846873Z = POSSIBLE THREAD LEAK IN SUITE 
o.a.s.sql.DateFunctionsSuite,  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-08 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1460579037

   [7_Run  Build modules sql - other 
tests.txt](https://github.com/apache/spark/files/10923393/7_Run.Build.modules.sql.-.other.tests.txt)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-03 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1453976515

   Any comments. Apparently having all expr as unresolvedAlias is not working. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-02 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1453013580

   @cloud-fan  always using unresolvedAlias seems to be causing the sql-other 
module to fail. Will be reverting to the original fix of creating 
unresolvedAlias only for "*" or distinct.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-03-01 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1451285915

   Not sure why the suggested changes made the build fail in the 
   catalyst,hive-thriftserver module  and
   sql-other test module.
   2023-03-01T22:23:36.6700903Z Error instrumenting 
class:org.apache.spark.sql.execution.streaming.state.SchemaHelper$SchemaV2Reader2023-03-01T22:23:36.8662344Z
 Error instrumenting class:org.apache.spark.sql.v2.avro.AvroScan
   2023-03-01T22:23:36.8712474Z Error instrumenting 
class:org.apache.spark.api.python.DoubleArrayWritable


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-02-28 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1449133178

   Is there anything else that I need to do for the fix to be accepted?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ritikam2 commented on pull request #40116: [SPARK-41391][SQL] The output column name of groupBy.agg(count_distinct) is incorrect

2023-02-27 Thread via GitHub


ritikam2 commented on PR #40116:
URL: https://github.com/apache/spark/pull/40116#issuecomment-1446869584

   Not sure how my checkins are causing javadoc genration error


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org