DaveDeCaprio commented on issue #23169: [SPARK-26103][SQL] Limit the length of debug strings for query plans URL: https://github.com/apache/spark/pull/23169#issuecomment-456209477 @hvanhovell We are generating data sets for machine learning. The end result is a data frame containing ~1-2k columns. We get this by joining together many data frame with smaller numbers (1-100) columns. We have gotten the actual execution of these plans to run very quickly using caching and partitioning, but we have hit issues like this. I have another PR (SPARK-26617) related to blocking in the CacheManager which is a result of the optimizer taking a long time on these plans. I would definitely like to look at whether there is another primitive which would solve our needs. That's something I'd probably have time to dig into in a couple months.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
