Thanks alot , But i have already  tried the  second way ,Problem with that
is that how to  identify the particular RDD from source to sink (as we can
do by passing a msg id in storm) . For that i just  updated RDD  and added
a msgID (as static variable) . but while dumping them to file some of the
tuples of RDD are failed/missed (approx 3000 and data rate is aprox 1500
tuples/sec).

On Fri, Jun 19, 2015 at 2:50 AM, Tathagata Das <t...@databricks.com> wrote:

> Couple of ways.
>
> 1. Easy but approx way: Find scheduling delay and processing time using
> StreamingListener interface, and then calculate "end-to-end delay = 0.5 *
> batch interval + scheduling delay + processing time". The 0.5 * batch
> inteval is the approx average batching delay across all the records in the
> batch.
>
> 2. Hard but precise way: You could build a custom receiver that embeds the
> current timestamp in the records, and then compare them with the timestamp
> at the final step of the records. Assuming the executor and driver clocks
> are reasonably in sync, this will measure the latency between the time is
> received by the system and the result from the record is available.
>
> On Thu, Jun 18, 2015 at 2:12 PM, anshu shukla <anshushuk...@gmail.com>
> wrote:
>
>> Sorry , i missed  the LATENCY word.. for a large  streaming query .How to
>> find the time taken by the  particular  RDD  to travel from  initial
>> D-STREAM to  final/last  D-STREAM .
>> Help Please !!
>>
>> On Fri, Jun 19, 2015 at 12:40 AM, Tathagata Das <t...@databricks.com>
>> wrote:
>>
>>> Its not clear what you are asking. Find "what" among RDD?
>>>
>>> On Thu, Jun 18, 2015 at 11:24 AM, anshu shukla <anshushuk...@gmail.com>
>>> wrote:
>>>
>>>> Is there any  fixed way to find  among RDD in stream processing systems
>>>> , in the Distributed set-up .
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Anshu Shukla
>>>>
>>>
>>>
>>
>>
>> --
>> Thanks & Regards,
>> Anshu Shukla
>>
>
>


-- 
Thanks & Regards,
Anshu Shukla

Reply via email to