HyukjinKwon opened a new pull request #23977: [WIP][SPARK-26923][SQL][R] Refactor ArrowRRunner and RRunner to share one BaseRRunner URL: https://github.com/apache/spark/pull/23977 ## What changes were proposed in this pull request? This PR proposes to have one base R runner. In the high level, Previously, it had `ArrowRRunner` and it inherited `RRunner`: ``` └── RRunner └── ArrowRRunner ``` After this PR, now it has a `BaseRRunner`, and `ArrowRRunner` and `RRunner` inherit `BaseRRunner`: ``` └── BaseRRunner ├── ArrowRRunner └── RRunner ``` This way is consistent with Python's. In more details, see below: ```scala class BaseRRunner[IN, OUT] { def compute: Iterator[OUT] = { ... newWriterThread(...) ... newReaderIterator(...) ... } // Make a thread that writes data from JVM to R process abstract protected def newWriterThread(..., iter: Iterator[IN], ...): WriterThread // Make an iterator that reads data from the R process to JVM abstract protected def newReaderIterator(...): ReaderIterator abstract class WriterThread(..., iter: Iterator[IN], ...) extends Thread { override def run(): Unit { ... writeIteratorToStream(...) ... } // Actually writing logic to the socket stream. abstract protected def writeIteratorToStream(dataOut: DataOutputStream): Unit } abstract class ReaderIterator extends Iterator[OUT] { override def hasNext: Boolean = { } override def next: OUT = { ... hasNext ... } // Actually reading logic from the socket stream. abstract protected def read: OUT } } ``` ```scala case RRunner extends BaseRRunner { override def newWriterThread(...) { new WriterThread(...) { override def writeIteratorToStream { ... } } } override def newReaderIterator(...) { new ReaderIterator(...) { override def read { ... } } } } ``` ## How was this patch tested? Manually tested and existing tests should cover.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
