Github user jerryshao commented on a diff in the pull request:
https://github.com/apache/spark/pull/6347#discussion_r30872549
--- Diff:
streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
---
@@ -253,13 +254,19 @@ class FileInputDStream[K, V, F <:
NewInputFormat[K,V]](
private def filesToRDD(files: Seq[String]): RDD[(K, V)] = {
val fileRDDs = files.map(file =>{
val rdd = serializableConfOpt.map(_.value) match {
- case Some(config) => context.sparkContext.newAPIHadoopFile(
+ case Some(config) => {
+ if(flag){ println("PROCESSING FILE :"+file) }
--- End diff --
I think the reason why I file a wish JIRA to expose file name is that user
want to get the file name in the run-time, but here your implementation is just
println the file name on the console, so for end-user, how to use this file
name? I think your current implementation cannot well address the problem.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]