Tathagata, thank you for the response.

I have two receivers in my Spark Stream job;  1 reads an endless stream of
data from flume and the other reads data from HDFS directory. However,
files do not get moved into HDFS frequently (let's say it gets moved every
10 minutes). This is where I need to check of there are any events in the
HDFS before doing any action on it.

The RDD.isEmpty() is available in JavaRDD and JavaPairRDD but
not JavaDStream and JavaPairDStream, but I could use foreach and then check
the RDD but it's long winded.

On 21 October 2015 at 20:00, Tathagata Das <t...@databricks.com> wrote:

> What do you mean by checking when a "DStream is empty"? DStream represents
> an endless stream of data, and at point of time checking whether it is
> empty or not does not make sense.
>
> FYI, there is RDD.isEmpty()
>
>
>
> On Wed, Oct 21, 2015 at 10:03 AM, diplomatic Guru <
> diplomaticg...@gmail.com> wrote:
>
>> I tried below code but still carrying out the action even though there is no 
>> new data.
>>
>> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, 
>> LongWritable.class,Text.class, TextInputFormat.class);
>>
>>  if(input != null){
>> //do some action if it is not empty
>> }
>>
>>
>> On 21 October 2015 at 18:00, diplomatic Guru <diplomaticg...@gmail.com>
>> wrote:
>>
>>>
>>> Hello All,
>>>
>>> I have a Spark Streaming job that should  do some action only if the RDD
>>> is not empty. This can be done easily with the spark batch RDD as I could
>>> .take(1) and check whether it is empty or  not. But this cannot been done
>>> in Spark Streaming DStrem
>>>
>>>
>>> JavaPairInputDStream<LongWritable, Text> input = ssc.fileStream(iFolder, 
>>> LongWritable.class,Text.class, TextInputFormat.class);
>>>
>>>  if(inputLines!=null){
>>> //do some action if it is not empty
>>> }
>>>
>>> Any ideas please?
>>>
>>>
>>>
>>>
>>
>

Reply via email to