Github user tdas commented on a diff in the pull request:

    https://github.com/apache/spark/pull/5645#discussion_r29029162
  
    --- Diff: 
streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala
 ---
    @@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
             logDebug(s"Read partition data of $this from block manager, block 
$blockId")
             iterator
           case None => // Data not found in Block Manager, grab it from write 
ahead log file
    -        val reader = new WriteAheadLogRandomReader(partition.segment.path, 
hadoopConf)
    -        val dataRead = reader.read(partition.segment)
    -        reader.close()
    +        var dataRead: ByteBuffer = null
    +        var writeAheadLog: WriteAheadLog = null
    +        try {
    +          val dummyDirectory = FileUtils.getTempDirectoryPath()
    --- End diff --
    
    So the default WAL is file based so a log directory is needed for it to 
work. However, the log directory is really not needed reading a particular 
record. But to read a single record you have to create a FileBasedWriteAheadLog 
object, which needs a log directory. Hence I am providing a dummy directory for 
this. 
    
    I know that this is a little awkward. This is the cost of defining a single 
interface for both writing and reading single records. Earlier there were two 
independent classes (WALWriter and WALRandomReader) that was used for these two 
purposes, which has different requirements. But since I am trying make single 
interface that can be used for all reading and writing, the log directory must 
be provided in the constructor of the default file-based WAL. This results in 
the awkwardness. 
    
    I dont quite like it myself, but it may practically be okay as long as we 
ensure that the FileBasedWAL does not create unnecessary directories/files when 
only reading a single record. I can add a test to ensure that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to