Re: [PR] [SPARK-45871][CONNECT] Change `.toBuffer.toSeq` to `.toSeq` [spark]

via GitHub Fri, 10 Nov 2023 10:19:10 -0800


juliuszsompolski commented on PR #43745:
URL: https://github.com/apache/spark/pull/43745#issuecomment-1806201111


   > hmm... perhaps I misunderstood your point. But if they can achieve the 
same goal, iter.foreach (_=>()) should have better performance than 
iter.toBuffer. It only iterates through all the data and does not create any 
collection ..
   
   Agreed that `iter.foreach(_ => ())` should have better performance, but that 
also immediately discards all responses, so you then couldn't do
   ```
       val response = responseSeq
         .find(_.hasSqlCommandResult)
   ```
   But then one could write to consume it afterwards...
   ```
      // find the response we care about
      val response = iter.find(_.hasSqlCommandResult)
      // consume the rest of the iterator
      iter.foreach(_ => ())
   ```
   
   Maybe that would have been a better change, and not relying on the internal 
scala details of whether .toSeq or .toBuffer result in a lazy structure or not, 
and not materializing it needlessly...
   
   You're right, maybe this would be indeed a better change (and then the test 
is not needed, as .foreach should be always reliable in consuming...)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45871][CONNECT] Change `.toBuffer.toSeq` to `.toSeq` [spark]

Reply via email to