Re: Out of order message processing with Kafka Streams

2017-03-18 Thread Ali Akhtar
> later when message A arrives it will put that message back into
> the right temporal context and publish an amended result for the proper
> time/session window as if message B were consumed in the timestamp order
> before message A.

Does this apply to the aggregation Kafka stream methods then, and not to
e.g foreach?

On Sun, Mar 19, 2017 at 2:40 AM, Hans Jespersen  wrote:

> Yes stream processing and CEP are subtlety different things.
>
> Kafka Streams helps you write stateful apps and allows that state to be
> preserved on disk (a local State store) as well as distributed for HA or
> for parallel partitioned processing (via Kafka topic partitions and
> consumer groups) as well as in memory (as a performance enhancement).
>
> However a classical CEP engine with a pre-modeled state machine and
> pattern matching rules is something different from stream processing.
>
> It is on course possible to build a CEP system on top on Kafka Streams and
> get the best of both worlds.
>
> -hans
>
> > On Mar 18, 2017, at 11:36 AM, Sabarish Sasidharan <
> sabarish@gmail.com> wrote:
> >
> > Hans
> >
> > What you state would work for aggregations, but not for state machines
> and
> > CEP.
> >
> > Regards
> > Sab
> >
> >> On 19 Mar 2017 12:01 a.m., "Hans Jespersen"  wrote:
> >>
> >> The only way to make sure A is consumed first would be to delay the
> >> consumption of message B for at least 15 minutes which would fly in the
> >> face of the principals of a true streaming platform so the short answer
> to
> >> your question is "no" because that would be batch processing not stream
> >> processing.
> >>
> >> However, Kafka Streams does handle late arriving data. So if you had
> some
> >> analytics that computes results on a time window or a session window
> then
> >> Kafka streams will compute on the stream in real time (processing
> message
> >> B) and then later when message A arrives it will put that message back
> into
> >> the right temporal context and publish an amended result for the proper
> >> time/session window as if message B were consumed in the timestamp order
> >> before message A. The end result of this flow is that you eventually get
> >> the same results you would get in a batch processing system but with the
> >> added benefit of getting intermediary result at much lower latency.
> >>
> >> -hans
> >>
> >> /**
> >> * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
> >> * h...@confluent.io (650)924-2670
> >> */
> >>
> >>> On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar 
> wrote:
> >>>
> >>> Is it possible to have Kafka Streams order messages correctly by their
> >>> timestamps, even if they arrived out of order?
> >>>
> >>> E.g, say Message A with a timestamp of 5:00 PM and Message B with a
> >>> timestamp of 5:15 PM, are sent.
> >>>
> >>> Message B arrives sooner than Message A, due to network issues.
> >>>
> >>> Is it possible to make sure that, across all consumers of Kafka Streams
> >>> (even if they are across different servers, but have the same consumer
> >>> group), Message A is consumed first, before Message B?
> >>>
> >>> Thanks.
> >>>
> >>
>


Re: Out of order message processing with Kafka Streams

2017-03-18 Thread Hans Jespersen
Yes stream processing and CEP are subtlety different things. 

Kafka Streams helps you write stateful apps and allows that state to be 
preserved on disk (a local State store) as well as distributed for HA or for 
parallel partitioned processing (via Kafka topic partitions and consumer 
groups) as well as in memory (as a performance enhancement).

However a classical CEP engine with a pre-modeled state machine and pattern 
matching rules is something different from stream processing.

It is on course possible to build a CEP system on top on Kafka Streams and get 
the best of both worlds.

-hans

> On Mar 18, 2017, at 11:36 AM, Sabarish Sasidharan  
> wrote:
> 
> Hans
> 
> What you state would work for aggregations, but not for state machines and
> CEP.
> 
> Regards
> Sab
> 
>> On 19 Mar 2017 12:01 a.m., "Hans Jespersen"  wrote:
>> 
>> The only way to make sure A is consumed first would be to delay the
>> consumption of message B for at least 15 minutes which would fly in the
>> face of the principals of a true streaming platform so the short answer to
>> your question is "no" because that would be batch processing not stream
>> processing.
>> 
>> However, Kafka Streams does handle late arriving data. So if you had some
>> analytics that computes results on a time window or a session window then
>> Kafka streams will compute on the stream in real time (processing message
>> B) and then later when message A arrives it will put that message back into
>> the right temporal context and publish an amended result for the proper
>> time/session window as if message B were consumed in the timestamp order
>> before message A. The end result of this flow is that you eventually get
>> the same results you would get in a batch processing system but with the
>> added benefit of getting intermediary result at much lower latency.
>> 
>> -hans
>> 
>> /**
>> * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>> * h...@confluent.io (650)924-2670
>> */
>> 
>>> On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar  wrote:
>>> 
>>> Is it possible to have Kafka Streams order messages correctly by their
>>> timestamps, even if they arrived out of order?
>>> 
>>> E.g, say Message A with a timestamp of 5:00 PM and Message B with a
>>> timestamp of 5:15 PM, are sent.
>>> 
>>> Message B arrives sooner than Message A, due to network issues.
>>> 
>>> Is it possible to make sure that, across all consumers of Kafka Streams
>>> (even if they are across different servers, but have the same consumer
>>> group), Message A is consumed first, before Message B?
>>> 
>>> Thanks.
>>> 
>> 


Re: Out of order message processing with Kafka Streams

2017-03-18 Thread Sabarish Sasidharan
Hans

What you state would work for aggregations, but not for state machines and
CEP.

Regards
Sab

On 19 Mar 2017 12:01 a.m., "Hans Jespersen"  wrote:

> The only way to make sure A is consumed first would be to delay the
> consumption of message B for at least 15 minutes which would fly in the
> face of the principals of a true streaming platform so the short answer to
> your question is "no" because that would be batch processing not stream
> processing.
>
> However, Kafka Streams does handle late arriving data. So if you had some
> analytics that computes results on a time window or a session window then
> Kafka streams will compute on the stream in real time (processing message
> B) and then later when message A arrives it will put that message back into
> the right temporal context and publish an amended result for the proper
> time/session window as if message B were consumed in the timestamp order
> before message A. The end result of this flow is that you eventually get
> the same results you would get in a batch processing system but with the
> added benefit of getting intermediary result at much lower latency.
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * h...@confluent.io (650)924-2670
>  */
>
> On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar  wrote:
>
> > Is it possible to have Kafka Streams order messages correctly by their
> > timestamps, even if they arrived out of order?
> >
> > E.g, say Message A with a timestamp of 5:00 PM and Message B with a
> > timestamp of 5:15 PM, are sent.
> >
> > Message B arrives sooner than Message A, due to network issues.
> >
> > Is it possible to make sure that, across all consumers of Kafka Streams
> > (even if they are across different servers, but have the same consumer
> > group), Message A is consumed first, before Message B?
> >
> > Thanks.
> >
>


Re: Out of order message processing with Kafka Streams

2017-03-18 Thread Hans Jespersen
The only way to make sure A is consumed first would be to delay the
consumption of message B for at least 15 minutes which would fly in the
face of the principals of a true streaming platform so the short answer to
your question is "no" because that would be batch processing not stream
processing.

However, Kafka Streams does handle late arriving data. So if you had some
analytics that computes results on a time window or a session window then
Kafka streams will compute on the stream in real time (processing message
B) and then later when message A arrives it will put that message back into
the right temporal context and publish an amended result for the proper
time/session window as if message B were consumed in the timestamp order
before message A. The end result of this flow is that you eventually get
the same results you would get in a batch processing system but with the
added benefit of getting intermediary result at much lower latency.

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * h...@confluent.io (650)924-2670
 */

On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar  wrote:

> Is it possible to have Kafka Streams order messages correctly by their
> timestamps, even if they arrived out of order?
>
> E.g, say Message A with a timestamp of 5:00 PM and Message B with a
> timestamp of 5:15 PM, are sent.
>
> Message B arrives sooner than Message A, due to network issues.
>
> Is it possible to make sure that, across all consumers of Kafka Streams
> (even if they are across different servers, but have the same consumer
> group), Message A is consumed first, before Message B?
>
> Thanks.
>


Re: Out of order message processing with Kafka Streams

2017-03-18 Thread Ali Akhtar
> However, Kafka Streams does handle late arriving data. So if you had some
> analytics that computes results on a time window or a session window then
> Kafka streams will compute on the stream in real time (processing message
> B) and then later when message A arrives it will put that message back
into
> the right temporal context and publish an amended result for the proper
> time/session window as if message B were consumed in the timestamp order
> before message A.

Can you link me to the javadocs or documentation for where I can read more
into how this is done / how to use this? Thanks.



On Sat, Mar 18, 2017 at 11:11 PM, Hans Jespersen  wrote:

> sorry I mixed up Message A and B wrt the to question but the answer is
> still valid.
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * h...@confluent.io (650)924-2670
>  */
>
> On Sat, Mar 18, 2017 at 11:07 AM, Hans Jespersen 
> wrote:
>
> > The only way to make sure A is consumed first would be to delay the
> > consumption of message B for at least 15 minutes which would fly in the
> > face of the principals of a true streaming platform so the short answer
> to
> > your question is "no" because that would be batch processing not stream
> > processing.
> >
> > However, Kafka Streams does handle late arriving data. So if you had some
> > analytics that computes results on a time window or a session window then
> > Kafka streams will compute on the stream in real time (processing message
> > B) and then later when message A arrives it will put that message back
> into
> > the right temporal context and publish an amended result for the proper
> > time/session window as if message B were consumed in the timestamp order
> > before message A. The end result of this flow is that you eventually get
> > the same results you would get in a batch processing system but with the
> > added benefit of getting intermediary result at much lower latency.
> >
> > -hans
> >
> > /**
> >  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
> >  * h...@confluent.io (650)924-2670 <(650)%20924-2670>
> >  */
> >
> > On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar 
> wrote:
> >
> >> Is it possible to have Kafka Streams order messages correctly by their
> >> timestamps, even if they arrived out of order?
> >>
> >> E.g, say Message A with a timestamp of 5:00 PM and Message B with a
> >> timestamp of 5:15 PM, are sent.
> >>
> >> Message B arrives sooner than Message A, due to network issues.
> >>
> >> Is it possible to make sure that, across all consumers of Kafka Streams
> >> (even if they are across different servers, but have the same consumer
> >> group), Message A is consumed first, before Message B?
> >>
> >> Thanks.
> >>
> >
> >
>


Re: Out of order message processing with Kafka Streams

2017-03-18 Thread Hans Jespersen
sorry I mixed up Message A and B wrt the to question but the answer is
still valid.

-hans

/**
 * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
 * h...@confluent.io (650)924-2670
 */

On Sat, Mar 18, 2017 at 11:07 AM, Hans Jespersen  wrote:

> The only way to make sure A is consumed first would be to delay the
> consumption of message B for at least 15 minutes which would fly in the
> face of the principals of a true streaming platform so the short answer to
> your question is "no" because that would be batch processing not stream
> processing.
>
> However, Kafka Streams does handle late arriving data. So if you had some
> analytics that computes results on a time window or a session window then
> Kafka streams will compute on the stream in real time (processing message
> B) and then later when message A arrives it will put that message back into
> the right temporal context and publish an amended result for the proper
> time/session window as if message B were consumed in the timestamp order
> before message A. The end result of this flow is that you eventually get
> the same results you would get in a batch processing system but with the
> added benefit of getting intermediary result at much lower latency.
>
> -hans
>
> /**
>  * Hans Jespersen, Principal Systems Engineer, Confluent Inc.
>  * h...@confluent.io (650)924-2670 <(650)%20924-2670>
>  */
>
> On Sat, Mar 18, 2017 at 10:29 AM, Ali Akhtar  wrote:
>
>> Is it possible to have Kafka Streams order messages correctly by their
>> timestamps, even if they arrived out of order?
>>
>> E.g, say Message A with a timestamp of 5:00 PM and Message B with a
>> timestamp of 5:15 PM, are sent.
>>
>> Message B arrives sooner than Message A, due to network issues.
>>
>> Is it possible to make sure that, across all consumers of Kafka Streams
>> (even if they are across different servers, but have the same consumer
>> group), Message A is consumed first, before Message B?
>>
>> Thanks.
>>
>
>


Out of order message processing with Kafka Streams

2017-03-18 Thread Ali Akhtar
Is it possible to have Kafka Streams order messages correctly by their
timestamps, even if they arrived out of order?

E.g, say Message A with a timestamp of 5:00 PM and Message B with a
timestamp of 5:15 PM, are sent.

Message B arrives sooner than Message A, due to network issues.

Is it possible to make sure that, across all consumers of Kafka Streams
(even if they are across different servers, but have the same consumer
group), Message A is consumed first, before Message B?

Thanks.


Re: about java.io.EOFException / java.lang.ClassNotFoundException: kafka.common.OffsetOutOfRangeException

2017-03-18 Thread Martin Gainty




From: Selina Tech 
Sent: Friday, March 17, 2017 9:02 PM
To: users@kafka.apache.org
Subject: about java.io.EOFException / java.lang.ClassNotFoundException: 
kafka.common.OffsetOutOfRangeException

Hi:
I am processing on a new Kafka topic with Spark and then I got error
below. I google this questions, looks like I lot of people having similar
problems before. But I have not got clue yet.

   Is any one know how to fix this issue?

Sincerely.
Selina


00:39:58,004 WARN  - 2017-03-18
00:39:57,921:7726(0x7f22e69b8700):ZOO_WARN@zookeeper_interest@1557:
Exceeded deadline by 28ms
00:41:01,298 WARN  - 17/03/18 00:41:01 WARN Selector: Error in I/O with /
10.128.64.152
00:41:01,298 WARN  - java.io.EOFException
00:41:01,298 WARN  - at
org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:62)
00:41:01,298 WARN  - at
org.apache.kafka.common.network.Selector.poll(Selector.java:248)
00:41:01,298 WARN  - at
org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:192)
00:41:01,298 WARN  - at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:191)
00:41:01,298 WARN  - at
org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:122)
00:41:01,298 WARN  - at java.lang.Thread.run(Thread.java:745)
00:43:31,514 WARN  - 2017-03-18
00:43:31,514:7726(0x7f22e69b8700):ZOO_WARN@zookeeper_interest@1557:
Exceeded deadline by 11ms
00:44:17,996 WARN  - 17/03/18 00:44:17 WARN ThrowableSerializationWrapper:
Task exception could not be deserialized
00:44:17,996 WARN  - java.lang.ClassNotFoundException:
kafka.common.OffsetOutOfRangeException

MG>cloudera suggests OffsetOutOfRangeException is caused when Spark Storage is 
mis-configured
MG>http://blog.cloudera.com/blog/2015/03/exactly-once-spark-streaming-from-apache-kafka/
[http://www.cloudera.com/content/dam/www/static/images/logos/cloudera-card.jpg]

Exactly-once Spark Streaming from Apache Kafka - Cloudera 
...
blog.cloudera.com
Cloudera Engineering Blog. Best practices, how-tos, use cases, and internals 
from Cloudera Engineering and the community



00:44:17,996 WARN  - at
java.net.URLClassLoader$1.run(URLClassLoader.java:366)
00:44:17,996 WARN  - at
java.net.URLClassLoader$1.run(URLClassLoader.java:355)
00:44:17,996 WARN  - at
java.security.AccessController.doPrivileged(Native Method)

MG>AcssController.doPrivileged is possible you are attempting to run a 
protected class (OutOfRangeException) not covered by "current policy"..here is 
doc:

MG>A "protection domain" encompasses a CodeSource and the permissions granted 
to code from that CodeSource, as determined by the security MG>policy currently 
in effect. Thus,
MG> classes *signed by the same keys* and
MG>*from the same URL*  are placed in the 
same  domain,
MG>Thus a class  belongs to one and only one 
*protection domain*.
MG>Classes that have the same permissions but are from different code sources 

MG>or 
MG>belong to "different" domains.

MG>http://docs.oracle.com/javase/7/docs/technotes/guides/security/doprivileged.html
Privileged Block API - 
Oracle
docs.oracle.com
Overview This document provides background information about what "privileged" 
code is and what it is used for, followed by illustrations of the use of the 
API, with ...


MG>please verify your policy *URL and keys* allows you to run jar containing 
OutOfRangeException

00:44:17,996 WARN  - at
java.net.URLClassLoader.findClass(URLClassLoader.java:354)
00:44:17,996 WARN  - at
java.lang.ClassLoader.loadClass(ClassLoader.java:425)
00:44:17,996 WARN  - at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
00:44:17,996 WARN  - at
java.lang.ClassLoader.loadClass(ClassLoader.java:358)
00:44:17,996 WARN  - at java.lang.Class.forName0(Native Method)
00:44:17,996 WARN  - at java.lang.Class.forName(Class.java:278)
00:44:17,996 WARN  - at
org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
00:44:17,996 WARN  - at
java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
00:44:17,996 WARN  - at
java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
00:44:17,996 WARN  - at
java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
00:44:17,996 WARN  - at
java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
00:44:17,996 WARN  - at
java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
00:44:17,996 WARN  - at
org.apache.spark.ThrowableSerializationWrapper.readObject(TaskEndReason.scala:167)
00:44:17,996 WARN  - at
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
00:44:17,996 WARN  - at