DaveDeCaprio commented on issue #23169: [SPARK-26103][SQL] Limit the length of 
debug strings for query plans
URL: https://github.com/apache/spark/pull/23169#issuecomment-456209477
 
 
   @hvanhovell We are generating data sets for machine learning.  The end 
result is a data frame containing ~1-2k columns.  We get this by joining 
together many data frame with smaller numbers (1-100) columns.  We have gotten 
the actual execution of these plans to run very quickly using caching and 
partitioning, but we have hit issues like this.  I have another PR 
(SPARK-26617) related to blocking in the CacheManager which is a result of the 
optimizer taking a long time on these plans.
   
   I would definitely like to look at whether there is another primitive which 
would solve our needs.  That's something I'd probably have time to dig into in 
a couple months.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to