[GitHub] [spark] srowen commented on a change in pull request #35622: [SPARK-38300][SQL] Refactor `fileToString` and `resourceToBytes` in catalyst.util to de-duplicate codes

GitBox Wed, 23 Feb 2022 07:21:26 -0800


srowen commented on a change in pull request #35622:
URL: https://github.com/apache/spark/pull/35622#discussion_r813006627




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/package.scala
##########
@@ -100,6 +70,23 @@ package object util extends Logging {
     file
   }
 
+  private def toByteArray(inStream: InputStream): Array[Byte] = {
+    val outStream = new ByteArrayOutputStream
+    try {
+      var reading = true
+      while (reading) {
+        inStream.read() match {

Review comment:
       OK, hah, now that I look at it - this is a pretty inefficient way to 
copy the byte stream. It should be read and written in chunks, not a byte at a 
time.
   
   That's not hard to implement, but Guava already does this:
   
https://github.com/google/guava/blob/a0e2577de61a0d7e8a3dd075be66a31c93ea0446/android/guava/src/com/google/common/io/ByteStreams.java#L173
   
   In fact we use ByteStreams.toByteArray in several places. Just use that? 
it'll be simpler and more efficient




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] srowen commented on a change in pull request #35622: [SPARK-38300][SQL] Refactor `fileToString` and `resourceToBytes` in catalyst.util to de-duplicate codes

Reply via email to