Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/5645#discussion_r29029162
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/rdd/WriteAheadLogBackedBlockRDD.scala
---
@@ -96,9 +99,27 @@ class WriteAheadLogBackedBlockRDD[T: ClassTag](
logDebug(s"Read partition data of $this from block manager, block
$blockId")
iterator
case None => // Data not found in Block Manager, grab it from write
ahead log file
- val reader = new WriteAheadLogRandomReader(partition.segment.path,
hadoopConf)
- val dataRead = reader.read(partition.segment)
- reader.close()
+ var dataRead: ByteBuffer = null
+ var writeAheadLog: WriteAheadLog = null
+ try {
+ val dummyDirectory = FileUtils.getTempDirectoryPath()
--- End diff --
So the default WAL is file based so a log directory is needed for it to
work. However, the log directory is really not needed reading a particular
record. But to read a single record you have to create a FileBasedWriteAheadLog
object, which needs a log directory. Hence I am providing a dummy directory for
this.
I know that this is a little awkward. This is the cost of defining a single
interface for both writing and reading single records. Earlier there were two
independent classes (WALWriter and WALRandomReader) that was used for these two
purposes, which has different requirements. But since I am trying make single
interface that can be used for all reading and writing, the log directory must
be provided in the constructor of the default file-based WAL. This results in
the awkwardness.
I dont quite like it myself, but it may practically be okay as long as we
ensure that the FileBasedWAL does not create unnecessary directories/files when
only reading a single record. I can add a test to ensure that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]