Hi Patrick,

ok, I think the first processor should be straight forward to implement. I will 
create an issue for that and then start implementing.

Regarding your concerns about the ordering of events within a topic. I think we 
also will need component which is able to deal with out of order events in the 
future, but I would suggest to build such a component with a framework that 
already has a build in solution for that (e.g. flink).
For this component I thought it is sufficient to assume the events are in 
order, since we will use dimension properties for partitioning events in Kafka. 
This means the events should be in order. 
Do you think we need an additional mechanism in the component to ensure the 
ordering of events or is the Kafka guarantee sufficient for the moment?

Philipp



> On 19. Jan 2020, at 22:16, Patrick Wiener <wie...@apache.org> wrote:
> 
> Hi Philipp,
> 
> this sounds like a reasonable topic that should further be investigated. 
> 
> I see especially see the benefit of having a lightweight solution in Java 
> that can potentially run right where the data origins. I come across a lot of 
> cases where control messages in addition to sensor events should be merged in 
> one unified data stream to enable downstream analytics tasks (in robotics 
> including ROS messages).
> 
> Your initial thoughts for designing such a processor seems valid. However, I 
> was wondering about your third point. I guess it depends on where data is 
> coming from. So for instance Kafka - AFAIK Kafka only supports ordering per 
> partition per topic. Any initial ideas on how to deal with that?
> 
> Patrick
> 
>> Am 16.01.2020 um 23:14 schrieb Philipp Zehnder <zehn...@apache.org>:
>> 
>> Hi all,
>> 
>> I’d like to discuss how we could design a processor to merge two data 
>> streams. 
>> We already had several versions of this component in the past, but none of 
>> them is completely satisfactory.
>> 
>> I would suggest two different processors for two common use cases:
>> The first one is to append a label (e.g. for machine learning) to the data 
>> stream. The processor has two inputs, one with the sensor events 
>> (potentially a high frequency) and one with the label information (usually a 
>> much lower event Frequency compared to sensor events). The processor 
>> enriches the sensor stream with the selected properties of the label stream.
>> 
>> The second processor merges two streams by their timestamp. This could be 
>> implemented with flink, but since it is a common use case I think we also 
>> need a lightweight solution in Java. What do you think? 
>> Here are a couple of things we need to keep in mind designing the component:
>> * How to deal with late arriving events?
>> * How big must the buffer (state) for the data streams be to synchronize the 
>> events? (E.g. there is a large delay in one of the streams)
>> * Can we assume that events of one stream are in order?
>> 
>> Do you have any other ideas about what we need to consider?
>> 
>> Cheers,
>> Philipp
>> 
> 

Reply via email to