wangyum commented on PR #40462: URL: https://github.com/apache/spark/pull/40462#issuecomment-1486144770
If such queries cannot be optimized, the performance of such queries will be very poor. We use a partition to fetch data from MySQL, and increase its parallelism for downstream computing after fetching the data: ```sql CREATE VIEW full_query_log AS SELECT h.* FROM query_log_hdfs h UNION ALL SELECT /*+ REBALANCE */ q.*, DATE(start) FROM query_log_mysql q; SELECT * FROM full_query_log limit 5; ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
