Count) push down for Parquet

GitBox Thu, 05 Aug 2021 23:30:30 -0700


huaxingao commented on a change in pull request #33639:
URL: https://github.com/apache/spark/pull/33639#discussion_r683980939




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala
##########
@@ -585,8 +585,8 @@ private[sql] object ParquetSchemaConverter {
     
Types.buildMessage().named(ParquetSchemaConverter.SPARK_PARQUET_SCHEMA_NAME)
 
   def checkFieldName(name: String): Unit = {
-    // ,;{}()\n\t= and space are special characters in Parquet schema
-    if (name.matches(".*[ ,;{}()\n\t=].*")) {
+    // ,;{}\n\t= and space are special characters in Parquet schema
+    if (name.matches(".*[ ,;{}\n\t=].*")) {

Review comment:
       because in `ParquetScan` schema, I have something like this: `Max(col)`, 
`Min(col)`, so I need to remove the restriction for `()`

##########
File path: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveParquetSourceSuite.scala
##########
@@ -206,7 +206,9 @@ class HiveParquetSourceSuite extends 
ParquetPartitioningTest {
     }
   }
 
-  test("Aggregation attribute names can't contain special chars \" 
,;{}()\\n\\t=\"") {
+  // After pushing down aggregate to parquet, we can have something like 
MAX(C) in column name
+  // ignore this test for now

Review comment:
       I realized that my comment is confusing. I revised it. I was trying to 
say that we don't have the restriction for `()` any more because of the changes 
for aggregate push down, so ignore this test for now. Sorry for the confusion. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] huaxingao commented on a change in pull request #33639: [SPARK-34952][SQL] Aggregate (Min/Max/Count) push down for Parquet

Reply via email to