[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

felixcheung Sat, 10 Nov 2018 23:40:47 -0800

Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22954#discussion_r232477131
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/api/r/SQLUtils.scala 
---
    @@ -225,4 +226,25 @@ private[sql] object SQLUtils extends Logging {
         }
         sparkSession.sessionState.catalog.listTables(db).map(_.table).toArray
       }
    +
    +  /**
    +   * R callable function to read a file in Arrow stream format and create 
a `RDD`
    +   * using each serialized ArrowRecordBatch as a partition.
    +   */
    +  def readArrowStreamFromFile(
    +      sparkSession: SparkSession,
    +      filename: String): JavaRDD[Array[Byte]] = {
    +    ArrowConverters.readArrowStreamFromFile(sparkSession.sqlContext, 
filename)
    --- End diff --
    
    hmm, I see your point... but there could be hundreds of these wrapper we 
need add if we set as a practice, I'm guessing. 
    
    a few problems with these wrappers I see:
    1. they are extra work to add or maintain
    2. many are very simple, not much value add
    3. many get abandoned over the years - they are not called and not removed
    
    but I kinda see your way, let's keep this one and review any new one in the 
future.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #22954: [SPARK-25981][R] Enables Arrow optimization from ...

Reply via email to