Yes, looks like a solution but quite tricky. You have to parse the debug
string to get the file name, also relies on HadoopRDD to get the file name
:)

2015-04-29 14:52 GMT+08:00 Akhil Das <ak...@sigmoidanalytics.com>:

> It is possible to access the filename, its a bit tricky though.
>
>  val fstream = ssc.fileStream[LongWritable, IntWritable,
>       SequenceFileInputFormat[LongWritable,
> IntWritable]]("/home/akhld/input/")
>
>     fstream.foreach(x =>{
>       //You can get it with this object.
>       println(x.values.toDebugString)
>
>     } )
>
> [image: Inline image 1]
>
> Thanks
> Best Regards
>
> On Wed, Apr 29, 2015 at 8:33 AM, bit1...@163.com <bit1...@163.com> wrote:
>
>> For the SparkContext#textFile, if a directory is given as the path
>> parameter  ,then it will pick up the files in the directory, so the same
>> thing will occur.
>>
>> ------------------------------
>> bit1...@163.com
>>
>>
>> *From:* Saisai Shao <sai.sai.s...@gmail.com>
>> *Date:* 2015-04-29 10:54
>> *To:* Vadim Bichutskiy <vadim.bichuts...@gmail.com>
>> *CC:* bit1...@163.com; lokeshkumar <lok...@dataken.net>; user
>> <user@spark.apache.org>
>> *Subject:* Re: Re: Spark streaming - textFileStream/fileStream - Get
>> file name
>> I think it might be useful in Spark Streaming's file input stream, but
>> not sure is it useful in SparkContext#textFile, since we specify the file
>> by our own, so why we still need to know the file name.
>>
>> I will open up a JIRA to mention about this feature.
>>
>> Thanks
>> Jerry
>>
>>
>> 2015-04-29 10:49 GMT+08:00 Vadim Bichutskiy <vadim.bichuts...@gmail.com>:
>>
>>> I was wondering about the same thing.
>>>
>>> Vadim
>>> ᐧ
>>>
>>> On Tue, Apr 28, 2015 at 10:19 PM, bit1...@163.com <bit1...@163.com>
>>> wrote:
>>>
>>>> Looks to me  that the same thing also applies to the
>>>> SparkContext.textFile or SparkContext.wholeTextFile, there is no way in RDD
>>>> to figure out the file information where the data in RDD is from
>>>>
>>>> ------------------------------
>>>> bit1...@163.com
>>>>
>>>>
>>>> *From:* Saisai Shao <sai.sai.s...@gmail.com>
>>>> *Date:* 2015-04-29 10:10
>>>> *To:* lokeshkumar <lok...@dataken.net>
>>>> *CC:* spark users <user@spark.apache.org>
>>>> *Subject:* Re: Spark streaming - textFileStream/fileStream - Get file
>>>> name
>>>> I think currently there's no API in Spark Streaming you can use to get
>>>> the file names for file input streams. Actually it is not trivial to
>>>> support this, may be you could file a JIRA with wishes you want the
>>>> community to support, so anyone who is interested can take a crack on this.
>>>>
>>>> Thanks
>>>> Jerry
>>>>
>>>>
>>>> 2015-04-29 0:13 GMT+08:00 lokeshkumar <lok...@dataken.net>:
>>>>
>>>>> Hi Forum,
>>>>>
>>>>> Using spark streaming and listening to the files in HDFS using
>>>>> textFileStream/fileStream methods, how do we get the fileNames which
>>>>> are
>>>>> read by these methods?
>>>>>
>>>>> I used textFileStream which has file contents in JavaDStream and I got
>>>>> no
>>>>> success with fileStream as it is throwing me a compilation error with
>>>>> spark
>>>>> version 1.3.1.
>>>>>
>>>>> Can someone please tell me if we have an API function or any other way
>>>>> to
>>>>> get the file names that these streaming methods read?
>>>>>
>>>>> Thanks
>>>>> Lokesh
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-textFileStream-fileStream-Get-file-name-tp22692.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply via email to