Hi,

you are right, that is an interesting question.

Looks like GROUP BY is doing something funny / magic here (spark-shell 3.3.1 and 3.5.0-SNAPSHOT):

With an alias, it behaves as you have pointed out:

spark.range(3).createTempView("ids_without_dots")
spark.sql("SELECT * FROM ids_without_dots").show()

// works
spark.sql("SELECT id AS `an.id` FROM ids_without_dots GROUP BY an.id").show()
// fails
spark.sql("SELECT id AS `an.id` FROM ids_without_dots GROUP BY `an.id`").show()


Without an alias, it behaves as expected, which is the opposite of above (a column with a dot exists, no alias used in SELECT):

spark.range(3).select($"id".as("an.id")).createTempView("ids_with_dots")
spark.sql("SELECT `an.id` FROM ids_with_dots").show()

// works
spark.sql("SELECT `an.id` FROM ids_with_dots GROUP BY `an.id`").show()
// fails
spark.sql("SELECT `an.id` FROM ids_with_dots GROUP BY an.id").show()


With a struct column, it also behaves as expected:

spark.range(3).select(struct($"id").as("an")).createTempView("ids_with_struct")
spark.sql("SELECT an.id FROM ids_with_struct").show()

// works
spark.sql("SELECT an.id FROM ids_with_struct GROUP BY an.id").show()
// fails
spark.sql("SELECT `an.id` FROM ids_with_struct GROUP BY an.id").show()
spark.sql("SELECT an.id FROM ids_with_struct GROUP BY `an.id`").show()
spark.sql("SELECT `an.id` FROM ids_with_struct GROUP BY `an.id`").show()


This does not feel very consistent.

Enrico



Am 28.01.23 um 00:34 schrieb Kohki Nishio:
this SQL works

    select 1 as *`data.group`* from tbl group by *data.group*


Since there's no such field as *data,* I thought the SQL has to look like this

    select 1 as *`data.group`* from tbl group by `*data.group`*


 But that gives and error (cannot resolve '`data.group`') ... I'm no expert in SQL, but feel like it's a strange behavior... does anybody have a good explanation for it ?

Thanks

--
Kohki Nishio

Reply via email to