Re: Kafka Monitoring

2016-11-08 Thread vinay patil
Hi Daniel,

Yes Now I am able to see it , this was just a dummy code I was running on
local VM.

However on cluster , I had enabled checkpointing, still I was not able to
see  the consumers, I guess I have to put the brokerPath as you have
provided in the properties.

>From where did you get the brokerPath ? I mean for which property have you
set it

Regards,
Vinay Patil

On Wed, Nov 9, 2016 at 12:32 AM, Daniel Santos [via Apache Flink User
Mailing List archive.]  wrote:

> Hello,
>
> On flink do you have the checkpoint enabled ?
>
> env.enableCheckpointing(interval = CHKPOINT_INTERVAL)
>
> Regards,
>
> Daniel Santos
>
> On 11/08/2016 12:30 PM, vinay patil wrote:
>
> Yes Kafka and Flink connect to that zookeeper only.
>
> Not sure why it is not listing the consumer
>
> Regards,
> Vinay Patil
>
> On Tue, Nov 8, 2016 at 5:36 PM, Daniel Santos [via Apache Flink User
> Mailing List archive.] <[hidden email]
> > wrote:
>
>> Hi,
>>
>> brokerPath is just optional.
>>
>> Used if you want to have multile kafka clusters.
>>
>> Each kafka cluster would connect to the same brokerPath.
>>
>> Since I have multiple clusters I use the brokerPath.
>>
>> From the looks of it you dont. So never mind it doesn't matter.
>>
>> You only have one zookeeper correct ?
>>
>> And kafka and flink connects to that only zookeeper ?
>>
>> Best Regards,
>>
>> Daniel Santos
>>
>> On 11/08/2016 11:18 AM, vinay patil wrote:
>>
>> Hi Daniel,
>>
>> Yes I have specified the zookeeper host in server.properties file , so
>> the broker is connected to zookeeper.
>>
>> https://kafka.apache.org/documentation#brokerconfigs  -> according to
>> this link, I guess all these configs are done in server.prop , so from
>> where did you get kafka09 as brokerPath ?
>>
>> this is my entry in server.prop file -> zookeeper.connect=localhost:2181
>> Have you set this as zkhost:2181/kafka09 ?
>>
>>
>> Regards,
>> Vinay Patil
>>
>> On Tue, Nov 8, 2016 at 4:27 PM, Daniel Santos [via Apache Flink User
>> Mailing List archive.] <[hidden email]
>> > wrote:
>>
>>> Hi,
>>>
>>> Your kafka broker is connected to zookeeper I believe.
>>>
>>> I am using kafka 0.9.0.1 my self too.
>>>
>>> On kafka broker 0.9.0.1 I have configured the zookeeper connect to a
>>> path, for instances :
>>>
>>> zk1:2181,zk2:2181,zk3:2181/kafka09
>>>
>>> https://kafka.apache.org/documentation#brokerconfigs
>>>
>>> Now on the flink side I would configure 
>>> "props.setProperty("zookeeper.connect",
>>> zkHosts)" the same resulting in :
>>>
>>>
>>> props.setProperty("zookeeperconnect", "zk1:2181,zk2:2181,zk3:2181/ka
>>> fka09")
>>>
>>>
>>> That is what I mean by broker's path.
>>>
>>> Best Regards,
>>>
>>> Daniel Santos
>>>
>>> On 11/08/2016 10:49 AM, vinay patil wrote:
>>>
>>> Hi Daniel,
>>>
>>> I have the same properties set for the consumer and the same code
>>>
>>> *brokerspath only needed if you have set it on kafka config* -> I did
>>> not get this, do you mean to check the brokerspath in
>>> conf/server.properties file ? I have even tried by setting offset.storage
>>> property to zookeeeper, but still not getting the consumers listed
>>>
>>> I am using Kafka 0.9.0.1
>>>
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> On Tue, Nov 8, 2016 at 3:59 PM, Daniel Santos [via Apache Flink User
>>> Mailing List archive.] <[hidden email]
>>> > wrote:
>>>
 Hello,

 This is my config.

 On kafka props :

 val props = new Properties()

 props.setProperty("zookeeper.connect", zkHosts)
 props.setProperty("bootstrap.servers", kafHosts)
 props.setProperty("group.id", "prod")
 props.setProperty("auto.offset.reset", "earliest")

 Now for zkHosts beware that all your hosts quorum has to be included.

 For instances you have zk1 and zk2 and zk3 to form a quorum.

 Then it will result in zkHosts being -> 
 zk1:2181,zk2:2181,zk3:2181/[brokerspath]
 .

 brokerspath only needed if you have set it on kafka config. Ignore it
 otherwise, resulting in "zk1:2181,zk2:2181,zk3:2181" .

 After that -> val source = env.addSource(new
 FlinkKafkaConsumer09[String](KAFKA_TOPIC, new SimpleStringSchema(),
 props))

 Then on kafkamanager -> consumers I have the groupID prod.

 Hope it helps.

 Best Regards,

 Daniel Santos
 On 11/08/2016 08:45 AM, vinay patil wrote:

 Hi Limbo,

 I am using 0.9, I am not able to see updated results even after
 refreshing.
 There is some property that we have to set in order to make this work

 Regards,
 Vinay Patil

 On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink User Mailing
 List archive.] <[hidden email]
 > wrote:

> I am using kafka 0.8, just refresh the 

An idea for a parallel AllWindowedStream

2016-11-08 Thread Juan Rodríguez Hortalá
Hi,

As a self training exercise I've defined a class extending WindowedStream
for implementing a proof of concept for a parallel version of
AllWindowStream

/**
 * Tries to create a parallel version of a AllWindowStream for a DataStream
 * by creating a KeyedStream by using as key the hash of the elements module
 * a parallelism level
 *
 * This only makes sense for window assigners that ensure the subwindows will be
 * in sync, like time based window assigners, and it is more stable
with ingestion
 * and event time because the window alignment is more reliable.
 * This doesn't work for counting or sessions window assigners.
 *
 * Also note elements from different partitions might get out of order due
 * to parallelism
 * */
public static class ParAllWindowedStream extends
WindowedStream {
private final transient WindowAssigner windowAssigner;

public ParAllWindowedStream(DataStream stream, final int parallelism,
WindowAssigner windowAssigner) {
super(stream.keyBy(new KeySelector() {
   @Override
public Integer getKey(T value) throws Exception {
return value.hashCode() % parallelism;
}
}),
  windowAssigner);
this.windowAssigner = windowAssigner;
}

@Override
public SingleOutputStreamOperator reduce(ReduceFunction reduceFun) {
return super.reduce(reduceFun)  // reduce each subwindow
.windowAll(windowAssigner)  // synchronize
.reduce(reduceFun); // sequential aggregate of
}

// Cannot override because we need an additional reduce function of type R
// to recombine the result for each window
// @Override
public  SingleOutputStreamOperator
applyPar(ReduceFunction reduceFunction,
WindowFunction function,
ReduceFunction
reduceWindowsFunction) {
return super.apply(reduceFunction, function)
.windowAll(windowAssigner)
 .reduce(reduceWindowsFunction);
}
}

Maybe someone might find this interesting. I have a toy example program in
https://github.com/juanrh/flink-state-eviction/blob/05676ca0eebf83e936b5cc04ecf85e8110ccacf4/src/main/java/com/github/juanrh/streaming/windowAllPoCs/WindowAllTimeKeyedPoC.java
for the curious.

Greetings,

Juan


Re: Why did the Flink Cluster JM crash?

2016-11-08 Thread amir bahmanyari
Ok. There is an OOM exception...but this used to work fine with the same 
configurations.There are four nodes: beam1 through 4.The Kafka partitions are 
4096 > 3584 deg of parallelism.
jobmanager.rpc.address: beam1jobmanager.rpc.port: 6123jobmanager.heap.mb: 
1024taskmanager.heap.mb: 102400taskmanager.numberOfTaskSlots:  896 
taskmanager.memory.preallocate: false
parallelism.default: 3584

Thanks for your valuable time Till.
AnonymousParDo -> AnonymousParDo (3584/3584) (ebe8da5bda017ee31ad774c5bc5e5e88) 
switched from DEPLOYING to RUNNING2016-11-08 22:51:44,471 INFO  
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: 
Read(UnboundedKafkaSource) -> AnonymousParDo -> AnonymousParDo (3573/3584) 
(ddf5a8939c1fc4ad1e6d71f17fe5ab0b) switched from RUNNING to FAILED2016-11-08 
22:51:44,474 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph       
 - Source: Read(UnboundedKafkaSource) -> AnonymousParDo -> AnonymousParDo 
(1/3584) (865c54432153a0230e62bf7610118ff8) switched from RUNNING to 
CANCELING2016-11-08 22:51:44,474 INFO  
org.apache.flink.runtime.jobmanager.JobManager                - Status of job 
e61cada683c0f7a709101c26c2c9a17c (benchbeamrunners-abahman-1108225128) changed 
to FAILING.java.lang.OutOfMemoryError: unable to create new native thread at 
java.lang.Thread.start0(Native Method) at 
java.lang.Thread.start(Thread.java:714) at 
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:950) 
at 
java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1587)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(ScheduledThreadPoolExecutor.java:334)
 at 
java.util.concurrent.ScheduledThreadPoolExecutor.schedule(ScheduledThreadPoolExecutor.java:533)
 at 
java.util.concurrent.Executors$DelegatedScheduledExecutorService.schedule(Executors.java:729)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.registerTimer(StreamTask.java:652)
 at 
org.apache.flink.streaming.api.operators.AbstractStreamOperator.registerTimer(AbstractStreamOperator.java:250)
 at 
org.apache.flink.streaming.api.operators.StreamingRuntimeContext.registerTimer(StreamingRuntimeContext.java:92)
 at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.setNextWatermarkTimer(UnboundedSourceWrapper.java:381)
 at 
org.apache.beam.runners.flink.translation.wrappers.streaming.io.UnboundedSourceWrapper.run(UnboundedSourceWrapper.java:233)
 at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:78) 
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
 at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:224) 
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559) at 
java.lang.Thread.run(Thread.java:745)


  From: Till Rohrmann 
 To: user@flink.apache.org; amir bahmanyari  
 Sent: Tuesday, November 8, 2016 2:11 PM
 Subject: Re: Why did the Flink Cluster JM crash?
   
Hi Amir,
what does the JM logs say?
Cheers,Till
On Tue, Nov 8, 2016 at 9:33 PM, amir bahmanyari  wrote:

Hi colleagues,I started the cluster all fine. Started the Beam app running in 
the Flink Cluster fine.Dashboard showed all tasks being consumed and open for 
business.I started sending data to the Beam app, and all of the sudden the 
Flink JM crashed.Exceptions below.Thanks+regardsAmir
java.lang.RuntimeException: Pipeline execution failed        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:113)        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:48)        at 
org.apache.beam.sdk.Pipeline. run(Pipeline.java:183)        at 
benchmark.flinkspark.flink. BenchBeamRunners.main( BenchBeamRunners.java:622)  
//p.run();        at sun.reflect. NativeMethodAccessorImpl. invoke0(Native 
Method)        at sun.reflect. NativeMethodAccessorImpl. invoke( 
NativeMethodAccessorImpl.java: 62)        at sun.reflect. 
DelegatingMethodAccessorImpl. invoke( DelegatingMethodAccessorImpl. java:43)    
    at java.lang.reflect.Method. invoke(Method.java:498)        at 
org.apache.flink.client. program.PackagedProgram. callMainMethod( 
PackagedProgram.java:505)        at org.apache.flink.client. 
program.PackagedProgram. invokeInteractiveModeForExecut 
ion(PackagedProgram.java:403)        at org.apache.flink.client. 
program.Client.runBlocking( Client.java:248)        at org.apache.flink.client. 
CliFrontend. executeProgramBlocking( CliFrontend.java:866)        at 
org.apache.flink.client. CliFrontend.run(CliFrontend. java:333)        at 
org.apache.flink.client. CliFrontend.parseParameters( CliFrontend.java:1189)    
    at org.apache.flink.client. CliFrontend.main(CliFrontend. java:1239)Caused 
by: org.apache.flink.client. program. ProgramInvocationException: The program 
execution failed: Communication with JobManager failed: Lost connection to the 
JobManager.  

Re: Why did the Flink Cluster JM crash?

2016-11-08 Thread amir bahmanyari
OOps! sorry Till.I replicated it and I see exceptions in JM logs.How can I send 
the logs to you? or what "interesting" part of it do you need so I can 
copy/paste it here...Thanks


  From: Till Rohrmann 
 To: user@flink.apache.org; amir bahmanyari  
 Sent: Tuesday, November 8, 2016 2:11 PM
 Subject: Re: Why did the Flink Cluster JM crash?
   
Hi Amir,
what does the JM logs say?
Cheers,Till
On Tue, Nov 8, 2016 at 9:33 PM, amir bahmanyari  wrote:

Hi colleagues,I started the cluster all fine. Started the Beam app running in 
the Flink Cluster fine.Dashboard showed all tasks being consumed and open for 
business.I started sending data to the Beam app, and all of the sudden the 
Flink JM crashed.Exceptions below.Thanks+regardsAmir
java.lang.RuntimeException: Pipeline execution failed        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:113)        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:48)        at 
org.apache.beam.sdk.Pipeline. run(Pipeline.java:183)        at 
benchmark.flinkspark.flink. BenchBeamRunners.main( BenchBeamRunners.java:622)  
//p.run();        at sun.reflect. NativeMethodAccessorImpl. invoke0(Native 
Method)        at sun.reflect. NativeMethodAccessorImpl. invoke( 
NativeMethodAccessorImpl.java: 62)        at sun.reflect. 
DelegatingMethodAccessorImpl. invoke( DelegatingMethodAccessorImpl. java:43)    
    at java.lang.reflect.Method. invoke(Method.java:498)        at 
org.apache.flink.client. program.PackagedProgram. callMainMethod( 
PackagedProgram.java:505)        at org.apache.flink.client. 
program.PackagedProgram. invokeInteractiveModeForExecut 
ion(PackagedProgram.java:403)        at org.apache.flink.client. 
program.Client.runBlocking( Client.java:248)        at org.apache.flink.client. 
CliFrontend. executeProgramBlocking( CliFrontend.java:866)        at 
org.apache.flink.client. CliFrontend.run(CliFrontend. java:333)        at 
org.apache.flink.client. CliFrontend.parseParameters( CliFrontend.java:1189)    
    at org.apache.flink.client. CliFrontend.main(CliFrontend. java:1239)Caused 
by: org.apache.flink.client. program. ProgramInvocationException: The program 
execution failed: Communication with JobManager failed: Lost connection to the 
JobManager.        at org.apache.flink.client. program.Client.runBlocking( 
Client.java:381)        at org.apache.flink.client. program.Client.runBlocking( 
Client.java:355)        at org.apache.flink.streaming. api.environment. 
StreamContextEnvironment. execute( StreamContextEnvironment.java: 65)        at 
org.apache.beam.runners.flink. FlinkPipelineExecutionEnvironm 
ent.executePipeline( FlinkPipelineExecutionEnvironm ent.java:118)        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:110)        
... 14 moreCaused by: org.apache.flink.runtime. client.JobExecutionException: 
Communication with JobManager failed: Lost connection to the JobManager.        
at org.apache.flink.runtime. client.JobClient. submitJobAndWait(JobClient. 
java:140)        at org.apache.flink.client. program.Client.runBlocking( 
Client.java:379)        ... 18 moreCaused by: org.apache.flink.runtime. client. 
JobClientActorConnectionTimeou tException: Lost connection to the JobManager.   
     at org.apache.flink.runtime. client.JobClientActor. 
handleMessage(JobClientActor. java:244)        at 
org.apache.flink.runtime.akka. FlinkUntypedActor. handleLeaderSessionID( 
FlinkUntypedActor.java:88)        at org.apache.flink.runtime.akka. 
FlinkUntypedActor.onReceive( FlinkUntypedActor.java:68)        at 
akka.actor.UntypedActor$$ anonfun$receive$1.applyOrElse( 
UntypedActor.scala:167)        at akka.actor.Actor$class. 
aroundReceive(Actor.scala:465)        at akka.actor.UntypedActor. 
aroundReceive(UntypedActor. scala:97)        at akka.actor.ActorCell. 
receiveMessage(ActorCell. scala:516)        at akka.actor.ActorCell.invoke( 
ActorCell.scala:487)        at akka.dispatch.Mailbox. 
processMailbox(Mailbox.scala: 254)        at akka.dispatch.Mailbox.run( 
Mailbox.scala:221)        at akka.dispatch.Mailbox.exec( Mailbox.scala:231)     
   at scala.concurrent.forkjoin. ForkJoinTask.doExec( ForkJoinTask.java:260)    
    at scala.concurrent.forkjoin. ForkJoinPool$WorkQueue. 
pollAndExecAll(ForkJoinPool. java:1253)        at scala.concurrent.forkjoin. 
ForkJoinPool$WorkQueue. runTask(ForkJoinPool.java: 1346)        at 
scala.concurrent.forkjoin. ForkJoinPool.runWorker( ForkJoinPool.java:1979)      
  at scala.concurrent.forkjoin. ForkJoinWorkerThread.run( 
ForkJoinWorkerThread.java:107)



   

Re: Kinesis Connector Dependency Problems

2016-11-08 Thread Fabian Hueske
Thanks for checking Steffen and Craig!
If the master does not build with 3.0.3, we should updated the docs.

2016-11-08 23:38 GMT+01:00 Foster, Craig :

> Yes, with Maven 3.0.5-based jar I’m seeing the same error. I cannot seem
> to get the runtime to build with 3.0.3.
>
> On 11/8/16, 7:18 AM, "Steffen Hausmann" 
> wrote:
>
> Hi Fabian,
>
> I can confirm that the behaviour is reproducible with both, Maven
> 3.3.9 and Maven 3.0.5.
>
> Cheers,
> Steffen
>
> Am 8. November 2016 11:11:19 MEZ, schrieb Fabian Hueske <
> fhue...@gmail.com>:
> >Hi,
> >
> >I encountered this issue before as well.
> >
> >Which Maven version are you using?
> >Maven 3.3.x does not properly shade dependencies.
> >You have to use Maven 3.0.3 (see [1]).
> >
> >Best, Fabian
> >
> >[1]
> >https://ci.apache.org/projects/flink/flink-docs-
> release-1.1/setup/building.html
> >
> >2016-11-08 11:05 GMT+01:00 Till Rohrmann :
> >
> >> Yes this definitely looks like a similar issue. Once we shade the
> aws
> >> dependencies in the Kinesis connector, the problem should be
> >(hopefully)
> >> resolved. I've added your problem description to the JIRA. Thanks
> for
> >> reporting it.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Nov 7, 2016 at 8:01 PM, Foster, Craig 
> >wrote:
> >>
> >>> I think this is a similar issue but it was brought to my attention
> >that
> >>> we’re also seeing this on EMR 5.1.0 with the FlinkKinesisConsumer.
> >What I
> >>> did to duplicate this issue:
> >>>
> >>> 1)   I have used the Wikiedit quickstart but used Kinesis
> >instead of
> >>> Kafka to publish results with a FlinkKinesisProducer. This works
> >fine. I
> >>> can use a separate script to read what was published to my stream.
> >>>
> >>> 2)   When using a FlinkKinesisConsumer, however, I get an
> error:
> >>>
> >>>
> >>>
> >>> java.lang.NoSuchMethodError: org.apache.http.params.HttpCon
> >>> nectionParams.setSoKeepalive(Lorg/apache/http/params/
> HttpParams;Z)V
> >>>
> >>> at com.amazonaws.http.HttpClientF
> >>> actory.createHttpClient(HttpClientFactory.java:96)
> >>>
> >>> at com.amazonaws.http.AmazonHttpC
> >>> lient.(AmazonHttpClient.java:187)
> >>>
> >>> at com.amazonaws.AmazonWebService
> >>> Client.(AmazonWebServiceClient.java:136)
> >>>
> >>> at com.amazonaws.services.kinesis
> >>> .AmazonKinesisClient.(AmazonKinesisClient.java:221)
> >>>
> >>> at com.amazonaws.services.kinesis
> >>> .AmazonKinesisClient.(AmazonKinesisClient.java:197)
> >>>
> >>> at org.apache.flink.streaming.con
> >>> nectors.kinesis.util.AWSUtil.createKinesisClient(AWSUtil.java:56)
> >>>
> >>> at org.apache.flink.streaming.con
> >>> nectors.kinesis.proxy.KinesisProxy.(KinesisProxy.java:118)
> >>>
> >>> at org.apache.flink.streaming.con
> >>> nectors.kinesis.proxy.KinesisProxy.create(KinesisProxy.java:176)
> >>>
> >>> at org.apache.flink.streaming.con
> >>> nectors.kinesis.internals.KinesisDataFetcher.(KinesisD
> >>> ataFetcher.java:188)
> >>>
> >>> at org.apache.flink.streaming.con
> >>>
> >nectors.kinesis.FlinkKinesisConsumer.run(
> FlinkKinesisConsumer.java:198)
> >>>
> >>> at org.apache.flink.streaming.api
> >>> .operators.StreamSource.run(StreamSource.java:80)
> >>>
> >>> at org.apache.flink.streaming.api
> >>> .operators.StreamSource.run(StreamSource.java:53)
> >>>
> >>> at org.apache.flink.streaming.run
> >>> time.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
> >>>
> >>> at org.apache.flink.streaming.run
> >>> time.tasks.StreamTask.invoke(StreamTask.java:266)
> >>>
> >>> at org.apache.flink.runtime.taskm
> >>> anager.Task.run(Task.java:585)
> >>>
> >>> at java.lang.Thread.run(Thread.java:745)
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> *From: *Robert Metzger 
> >>> *Reply-To: *"user@flink.apache.org" 
> >>> *Date: *Friday, November 4, 2016 at 2:57 AM
> >>> *To: *"user@flink.apache.org" 
> >>> *Subject: *Re: Kinesis Connector Dependency Problems
> >>>
> >>>
> >>>
> >>> Thank you for helping to investigate the issue. I've filed an issue
> >in
> >>> our bugtracker: https://issues.apache.org/jira/browse/FLINK-5013
> >>>
> >>>
> >>>
> >>> On Wed, Nov 2, 2016 at 10:09 

Re: Why did the Flink Cluster JM crash?

2016-11-08 Thread amir bahmanyari
Clean .No errors...no exceptions :-(Thanks Till.

  From: Till Rohrmann 
 To: user@flink.apache.org; amir bahmanyari  
 Sent: Tuesday, November 8, 2016 2:11 PM
 Subject: Re: Why did the Flink Cluster JM crash?
   
Hi Amir,
what does the JM logs say?
Cheers,Till
On Tue, Nov 8, 2016 at 9:33 PM, amir bahmanyari  wrote:

Hi colleagues,I started the cluster all fine. Started the Beam app running in 
the Flink Cluster fine.Dashboard showed all tasks being consumed and open for 
business.I started sending data to the Beam app, and all of the sudden the 
Flink JM crashed.Exceptions below.Thanks+regardsAmir
java.lang.RuntimeException: Pipeline execution failed        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:113)        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:48)        at 
org.apache.beam.sdk.Pipeline. run(Pipeline.java:183)        at 
benchmark.flinkspark.flink. BenchBeamRunners.main( BenchBeamRunners.java:622)  
//p.run();        at sun.reflect. NativeMethodAccessorImpl. invoke0(Native 
Method)        at sun.reflect. NativeMethodAccessorImpl. invoke( 
NativeMethodAccessorImpl.java: 62)        at sun.reflect. 
DelegatingMethodAccessorImpl. invoke( DelegatingMethodAccessorImpl. java:43)    
    at java.lang.reflect.Method. invoke(Method.java:498)        at 
org.apache.flink.client. program.PackagedProgram. callMainMethod( 
PackagedProgram.java:505)        at org.apache.flink.client. 
program.PackagedProgram. invokeInteractiveModeForExecut 
ion(PackagedProgram.java:403)        at org.apache.flink.client. 
program.Client.runBlocking( Client.java:248)        at org.apache.flink.client. 
CliFrontend. executeProgramBlocking( CliFrontend.java:866)        at 
org.apache.flink.client. CliFrontend.run(CliFrontend. java:333)        at 
org.apache.flink.client. CliFrontend.parseParameters( CliFrontend.java:1189)    
    at org.apache.flink.client. CliFrontend.main(CliFrontend. java:1239)Caused 
by: org.apache.flink.client. program. ProgramInvocationException: The program 
execution failed: Communication with JobManager failed: Lost connection to the 
JobManager.        at org.apache.flink.client. program.Client.runBlocking( 
Client.java:381)        at org.apache.flink.client. program.Client.runBlocking( 
Client.java:355)        at org.apache.flink.streaming. api.environment. 
StreamContextEnvironment. execute( StreamContextEnvironment.java: 65)        at 
org.apache.beam.runners.flink. FlinkPipelineExecutionEnvironm 
ent.executePipeline( FlinkPipelineExecutionEnvironm ent.java:118)        at 
org.apache.beam.runners.flink. FlinkRunner.run(FlinkRunner. java:110)        
... 14 moreCaused by: org.apache.flink.runtime. client.JobExecutionException: 
Communication with JobManager failed: Lost connection to the JobManager.        
at org.apache.flink.runtime. client.JobClient. submitJobAndWait(JobClient. 
java:140)        at org.apache.flink.client. program.Client.runBlocking( 
Client.java:379)        ... 18 moreCaused by: org.apache.flink.runtime. client. 
JobClientActorConnectionTimeou tException: Lost connection to the JobManager.   
     at org.apache.flink.runtime. client.JobClientActor. 
handleMessage(JobClientActor. java:244)        at 
org.apache.flink.runtime.akka. FlinkUntypedActor. handleLeaderSessionID( 
FlinkUntypedActor.java:88)        at org.apache.flink.runtime.akka. 
FlinkUntypedActor.onReceive( FlinkUntypedActor.java:68)        at 
akka.actor.UntypedActor$$ anonfun$receive$1.applyOrElse( 
UntypedActor.scala:167)        at akka.actor.Actor$class. 
aroundReceive(Actor.scala:465)        at akka.actor.UntypedActor. 
aroundReceive(UntypedActor. scala:97)        at akka.actor.ActorCell. 
receiveMessage(ActorCell. scala:516)        at akka.actor.ActorCell.invoke( 
ActorCell.scala:487)        at akka.dispatch.Mailbox. 
processMailbox(Mailbox.scala: 254)        at akka.dispatch.Mailbox.run( 
Mailbox.scala:221)        at akka.dispatch.Mailbox.exec( Mailbox.scala:231)     
   at scala.concurrent.forkjoin. ForkJoinTask.doExec( ForkJoinTask.java:260)    
    at scala.concurrent.forkjoin. ForkJoinPool$WorkQueue. 
pollAndExecAll(ForkJoinPool. java:1253)        at scala.concurrent.forkjoin. 
ForkJoinPool$WorkQueue. runTask(ForkJoinPool.java: 1346)        at 
scala.concurrent.forkjoin. ForkJoinPool.runWorker( ForkJoinPool.java:1979)      
  at scala.concurrent.forkjoin. ForkJoinWorkerThread.run( 
ForkJoinWorkerThread.java:107)



   

Re: Why did the Flink Cluster JM crash?

2016-11-08 Thread Till Rohrmann
Hi Amir,

what does the JM logs say?

Cheers,
Till

On Tue, Nov 8, 2016 at 9:33 PM, amir bahmanyari  wrote:

> Hi colleagues,
> I started the cluster all fine. Started the Beam app running in the Flink
> Cluster fine.
> Dashboard showed all tasks being consumed and open for business.
> I started sending data to the Beam app, and all of the sudden the Flink JM
> crashed.
> Exceptions below.
> Thanks+regards
> Amir
>
> java.lang.RuntimeException: Pipeline execution failed
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.
> java:113)
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.
> java:48)
> at org.apache.beam.sdk.Pipeline.run(Pipeline.java:183)
> at 
> benchmark.flinkspark.flink.BenchBeamRunners.main(BenchBeamRunners.java:622)
>  //p.run();
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.flink.client.program.PackagedProgram.callMainMethod(
> PackagedProgram.java:505)
> at org.apache.flink.client.program.PackagedProgram.
> invokeInteractiveModeForExecution(PackagedProgram.java:403)
> at org.apache.flink.client.program.Client.runBlocking(
> Client.java:248)
> at org.apache.flink.client.CliFrontend.executeProgramBlocking(
> CliFrontend.java:866)
> at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)
> at org.apache.flink.client.CliFrontend.parseParameters(
> CliFrontend.java:1189)
> at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239)
> Caused by: org.apache.flink.client.program.ProgramInvocationException:
> The program execution failed: Communication with JobManager failed: Lost
> connection to the JobManager.
> at org.apache.flink.client.program.Client.runBlocking(
> Client.java:381)
> at org.apache.flink.client.program.Client.runBlocking(
> Client.java:355)
> at org.apache.flink.streaming.api.environment.
> StreamContextEnvironment.execute(StreamContextEnvironment.java:65)
> at org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironm
> ent.executePipeline(FlinkPipelineExecutionEnvironment.java:118)
> at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.
> java:110)
> ... 14 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException:
> Communication with JobManager failed: Lost connection to the JobManager.
> at org.apache.flink.runtime.client.JobClient.
> submitJobAndWait(JobClient.java:140)
> at org.apache.flink.client.program.Client.runBlocking(
> Client.java:379)
> ... 18 more
> Caused by: 
> org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException:
> Lost connection to the JobManager.
> at org.apache.flink.runtime.client.JobClientActor.
> handleMessage(JobClientActor.java:244)
> at org.apache.flink.runtime.akka.FlinkUntypedActor.
> handleLeaderSessionID(FlinkUntypedActor.java:88)
> at org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(
> FlinkUntypedActor.java:68)
> at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(
> UntypedActor.scala:167)
> at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
> at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> pollAndExecAll(ForkJoinPool.java:1253)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1346)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
>


Too few memory segments provided. Hash Table needs at least 33 memory segments.

2016-11-08 Thread Miguel Coimbra
Dear community,

I have a problem which I hope you'll be able to help with.
I apologize in advance for the verbosity of the post.
I am running the Flink standalone cluster (not even storing to the
filesystem) with 2 Docker containers.

I set the image of the Dockerfile for Flink 1.1.2, which was the same
version of the main class in the .jar
The Docker image was configured to use Java 8, which is what the project's
pom.xml requires as well.
I have also edited the TaskManager conf/flink-con.yaml to have the
following values:


taskmanager.heap.mb: 7512

taskmanager.network.numberOfBuffers: 16048



Properties of this host/docker setup:
- host machine has *256 GB *of RAM
- job manager container is running with default flink config
- task manager has *7.5 GB *of memory available
- task manager number of buffers is *16048 *which is very generous compared
to the default value

I am testing on the SNAP DBLP dataset:
https://snap.stanford.edu/data/com-DBLP.html
It has:

 317080 nodes
1049866 edges

These are the relevant parts of the pom.xml of the project:
*(note: the project executes without error for local executions without the
cluster)*



UTF-8

UTF-8
1.8
1.8
1.1.2
  
.


org.apache.flink
flink-java
${flink.version}


org.apache.flink
flink-core
${flink.version}


  org.apache.flink
  flink-streaming-java_2.10
  ${flink.version}


org.apache.flink
flink-clients_2.10
${flink.version}


org.apache.flink
flink-gelly_2.10
${flink.version}


  junit
  junit
  3.8.1
  test

  

I am running (what I believe to be) a simple Gelly application, performing
the ConnectedComponents algorithm with 30 iterations:

public static void main(String[] args) {
final ExecutionEnvironment env =
ExecutionEnvironment.getExecutionEnvironment();


final String dataPath = args[0];

final DataSet> edgeTuples =
env.readCsvFile(dataPath)
.fieldDelimiter("\t") // node IDs are separated by spaces
.ignoreComments("#")  // comments start with "%"
.types(Long.class, Long.class);

try {
System.out.println("Tuple size: " + edgeTuples.count());
} catch (Exception e1) {
e1.printStackTrace();
}

/*
 * @param  the key type for edge and vertex identifiers
 * @param  the value type for vertices
 * @param  the value type for edges
 * public class Graph
 */


final Graph graph = Graph.fromTuple2DataSet(
edgeTuples,
new MapFunction() {
private static final long serialVersionUID =
8713516577419451509L;
public Long map(Long value) {
return value;
}
},
env
);


try {
/**
 * @param  key type
 * @param  vertex value type
 * @param  edge value type
 * @param  the return type

class ConnectedComponents
implements GraphAlgorithm>>
*/

DataSet> verticesWithComponents =
graph.run(new ConnectedComponents(30));
System.out.println("Component count: " +
verticesWithComponents.count());
} catch (Exception e) {
e.printStackTrace();
}
}


However, the following is output on the host machine on execution:

docker exec -it $(docker ps --filter name=jobmanager --format={{.ID}})
flink run -m 3de7625b8e28:6123 -c flink.graph.example.App
/home/myuser/flink-graph-example-0.0.1-SNAPSHOT.jar
/home/myuser/com-dblp.ungraph.txt

Cluster configuration: Standalone cluster with JobManager at /
172.19.0.2:6123
Using address 172.19.0.2:6123 to connect to JobManager.
JobManager web interface address http://172.19.0.2:8081
Starting execution of program
Submitting job with JobID: fd6a12896b749e9ed439bbb196c6aaae. Waiting for
job completion.
Connected to JobManager at Actor[akka.tcp://
flink@172.19.0.2:6123/user/jobmanager#-658812967]

11/08/2016 21:22:44 DataSource (at main(App.java:25)
(org.apache.flink.api.java.io.TupleCsvInputFormat))(1/1) switched to
SCHEDULED
11/08/2016 21:22:44 DataSource (at main(App.java:25)
(org.apache.flink.api.java.io.TupleCsvInputFormat))(1/1) switched to
DEPLOYING
11/08/2016 21:22:44 DataSource (at main(App.java:25)
(org.apache.flink.api.java.io.TupleCsvInputFormat))(1/1) switched to RUNNING
11/08/2016 21:22:44 DataSink (count())(1/1) switched to SCHEDULED
11/08/2016 21:22:44 DataSink (count())(1/1) switched to DEPLOYING
11/08/2016 21:22:44 DataSink (count())(1/1) switched to RUNNING
11/08/2016 21:22:44 DataSink (count())(1/1) 

Why did the Flink Cluster JM crash?

2016-11-08 Thread amir bahmanyari
Hi colleagues,I started the cluster all fine. Started the Beam app running in 
the Flink Cluster fine.Dashboard showed all tasks being consumed and open for 
business.I started sending data to the Beam app, and all of the sudden the 
Flink JM crashed.Exceptions below.Thanks+regardsAmir
java.lang.RuntimeException: Pipeline execution failed        at 
org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:113)        at 
org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:48)        at 
org.apache.beam.sdk.Pipeline.run(Pipeline.java:183)        at 
benchmark.flinkspark.flink.BenchBeamRunners.main(BenchBeamRunners.java:622)  
//p.run();        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
     at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)        at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:505)
        at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:403)
        at org.apache.flink.client.program.Client.runBlocking(Client.java:248)  
      at 
org.apache.flink.client.CliFrontend.executeProgramBlocking(CliFrontend.java:866)
        at org.apache.flink.client.CliFrontend.run(CliFrontend.java:333)        
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1189)   
     at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1239)Caused 
by: org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Communication with JobManager failed: Lost connection to the 
JobManager.        at 
org.apache.flink.client.program.Client.runBlocking(Client.java:381)        at 
org.apache.flink.client.program.Client.runBlocking(Client.java:355)        at 
org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:65)
        at 
org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineExecutionEnvironment.java:118)
        at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:110)  
      ... 14 moreCaused by: 
org.apache.flink.runtime.client.JobExecutionException: Communication with 
JobManager failed: Lost connection to the JobManager.        at 
org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:140)  
      at org.apache.flink.client.program.Client.runBlocking(Client.java:379)    
    ... 18 moreCaused by: 
org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException: Lost 
connection to the JobManager.        at 
org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:244)
        at 
org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:88)
        at 
org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
        at 
akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)  
      at akka.actor.Actor$class.aroundReceive(Actor.scala:465)        at 
akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)        at 
akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)        at 
akka.actor.ActorCell.invoke(ActorCell.scala:487)        at 
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)        at 
akka.dispatch.Mailbox.run(Mailbox.scala:221)        at 
akka.dispatch.Mailbox.exec(Mailbox.scala:231)        at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
        at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
        at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)        
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

Re: Kafka Monitoring

2016-11-08 Thread Daniel Santos

Hello,

On flink do you have the checkpoint enabled ?

env.enableCheckpointing(interval = CHKPOINT_INTERVAL)

Regards,

Daniel Santos


On 11/08/2016 12:30 PM, vinay patil wrote:

Yes Kafka and Flink connect to that zookeeper only.

Not sure why it is not listing the consumer

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 5:36 PM, Daniel Santos [via Apache Flink User 
Mailing List archive.] <[hidden email] 
> wrote:


Hi,

brokerPath is just optional.

Used if you want to have multile kafka clusters.

Each kafka cluster would connect to the same brokerPath.

Since I have multiple clusters I use the brokerPath.

From the looks of it you dont. So never mind it doesn't matter.

You only have one zookeeper correct ?

And kafka and flink connects to that only zookeeper ?

Best Regards,

Daniel Santos


On 11/08/2016 11:18 AM, vinay patil wrote:

Hi Daniel,

Yes I have specified the zookeeper host in server.properties file
, so the broker is connected to zookeeper.

https://kafka.apache.org/documentation#brokerconfigs
 ->
according to this link, I guess all these configs are done in
server.prop , so from where did you get kafka09 as brokerPath ?

this is my entry in server.prop file ->
zookeeper.connect=localhost:2181
Have you set this as zkhost:2181/kafka09 ?


Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 4:27 PM, Daniel Santos [via Apache Flink
User Mailing List archive.] <[hidden email]
> wrote:

Hi,

Your kafka broker is connected to zookeeper I believe.

I am using kafka 0.9.0.1 my self too.

On kafka broker 0.9.0.1 I have configured the zookeeper
connect to a path, for instances :

zk1:2181,zk2:2181,zk3:2181/kafka09

https://kafka.apache.org/documentation#brokerconfigs


Now on the flink side I would configure
"props.setProperty("zookeeper.connect", zkHosts)" the same
resulting in :


props.setProperty("zookeeperconnect",
"zk1:2181,zk2:2181,zk3:2181/kafka09")


That is what I mean by broker's path.

Best Regards,

Daniel Santos


On 11/08/2016 10:49 AM, vinay patil wrote:

Hi Daniel,

I have the same properties set for the consumer and the same
code

/brokerspath only needed if you have set it on kafka config/
-> I did not get this, do you mean to check the brokerspath
in conf/server.properties file ? I have even tried by
setting offset.storage property to zookeeeper, but still not
getting the consumers listed

I am using Kafka 0.9.0.1


Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 3:59 PM, Daniel Santos [via Apache
Flink User Mailing List archive.] <[hidden email]
> wrote:

Hello,

This is my config.

On kafka props :

val props = new Properties()

props.setProperty("zookeeper.connect", zkHosts)
props.setProperty("bootstrap.servers", kafHosts)
props.setProperty("group.id ", "prod")
props.setProperty("auto.offset.reset", "earliest")

Now for zkHosts beware that all your hosts quorum has to
be included.

For instances you have zk1 and zk2 and zk3 to form a quorum.

Then it will result in zkHosts being ->
zk1:2181,zk2:2181,zk3:2181/[brokerspath] .

brokerspath only needed if you have set it on kafka
config. Ignore it otherwise, resulting in
"zk1:2181,zk2:2181,zk3:2181" .

After that -> val source = env.addSource(new
FlinkKafkaConsumer09[String](KAFKA_TOPIC, new
SimpleStringSchema(), props))

Then on kafkamanager -> consumers I have the groupID prod.

Hope it helps.

Best Regards,

Daniel Santos

On 11/08/2016 08:45 AM, vinay patil wrote:

Hi Limbo,

I am using 0.9, I am not able to see updated results
even after refreshing.
There is some property that we have to set in order to
make this work

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache
Flink User Mailing List archive.] <[hidden email]
>
wrote:

I am using kafka 0.8, just refresh the page and you
will see the updated results, it’s not auto update.

There is our logstash consumer:



and this is our flink consumer:


I 

Re: Last event of each window belongs to the next window - Wrong

2016-11-08 Thread Samir Abdou
Hi Aljoscha,

Thanks for the question.

I key by source ID, because I want to isolate users per source. If I would
key by User ID, I would need to have a logic to create sessions based on
time. But I would like to create my sessions based on user ID changes in
the events stream for each source.

Cheers,
Samir

2016-11-07 18:04 GMT+01:00 Aljoscha Krettek :

> Hi,
> why are you keying by the source ID and not by the user ID?
>
> Cheers,
> Aljoscha
>
> On Mon, 7 Nov 2016 at 15:42 Till Rohrmann  wrote:
>
>> Hi Samir,
>>
>> the windowing API in Flink works the following way: First an incoming
>> element is assigned to a window. This is defined in the window clause where
>> you create a GlobalWindow. Thus, all elements for the same sourceId will be
>> assigned to the same window. Next, the element is given to a Trigger which
>> decides whether the window shall be evaluated or not. But at this point the
>> element is already part of the window. That's why the last element of your
>> window has a different ID.
>>
>> What you could try to use is the MergingWindowAssigner to create windows
>> whose elements all have the same ID. There you assign all elements with the
>> same ID to the same session window. The session windows are then triggered
>> by event time, for example. That's the recommended way to create session
>> windows with Flink. Here is some documentation for session windows [1].
>>
>> [1] https://ci.apache.org/projects/flink/flink-docs-
>> master/dev/windows.html#session-windows
>>
>> Cheers,
>> Till
>>
>> On Sun, Nov 6, 2016 at 12:11 PM, Samir Abdou <
>> abdou.samir.mail...@gmail.com> wrote:
>>
>> I am using Flink 1.2-Snapshot. My data looks like the following:
>>
>>- id=25398102, sourceId=1, ts=2016-10-15 00:00:56, user=14, value=919
>>- id=25398185, sourceId=1, ts=2016-10-15 00:01:06, user=14, value=920
>>- id=25398210, sourceId=1, ts=2016-10-15 00:01:16, user=14, value=944
>>- id=25398235, sourceId=1, ts=2016-10-15 00:01:24, user=3149,
>>value=944
>>- id=25398236, sourceId=1, ts=2016-10-15 00:01:25, user=71, value=955
>>- id=25398239, sourceId=1, ts=2016-10-15 00:01:26, user=71, value=955
>>- id=25398265, sourceId=1, ts=2016-10-15 00:01:36, user=71, value=955
>>- id=25398310, sourceId=1, ts=2016-10-15 00:02:16, user=14, value=960
>>- id=25398320, sourceId=1, ts=2016-10-15 00:02:26, user=14, value=1000
>>
>> I am running the following code to create windows based user IDs:
>>
>> stream.flatMap(new LogsParser())
>> .assignTimestampsAndWatermarks(new MessageTimestampExtractor())
>> .keyBy("sourceId")
>> .window(GlobalWindows.create())
>> .trigger(PurgingTrigger.of(new MySessionTrigger()))
>> .apply(new SessionWindowFunction())
>> .print();
>>
>> MySession trigger looks into the received event and check the user ID to
>> trigger the window on user ID changes. The SessionWindowFunction just
>> create a session out of the window.
>>
>> Here are the sessions created:
>>
>>1.
>>
>>Session:
>>- id=25398102, sourceId=1, ts=2016-10-15 00:00:56, user=14, value=919
>>   - id=25398185, sourceId=1, ts=2016-10-15 00:01:06, user=14,
>>   value=920
>>   - id=25398210, sourceId=1, ts=2016-10-15 00:01:16, user=14,
>>   value=944
>>   - id=25398235, sourceId=1, ts=2016-10-15 00:01:24, user=3149,
>>   value=944
>>2.
>>
>>Session:
>>- id=25398236, sourceId=1, ts=2016-10-15 00:01:25, user=71, value=955
>>   - id=25398239, sourceId=1, ts=2016-10-15 00:01:26, user=71,
>>   value=955
>>   - id=25398265, sourceId=1, ts=2016-10-15 00:01:36, user=71,
>>   value=955
>>   - id=25398310, sourceId=1, ts=2016-10-15 00:02:16, user=14,
>>   value=960
>>3.
>>
>>Session:
>>- id=25398320, sourceId=1, ts=2016-10-15 00:02:26, user=14, value=1000
>>
>> The problem as you can see is that in every session the last event
>> belongs actually to the next window. The decision to trigger the window is
>> somehow late as the last event is already in the window.
>>
>> How can I trigger the window without considering the last event in that
>> window?
>>
>> Thanks for your help.
>>
>>
>>


Re: Last event of each window belongs to the next window - Wrong

2016-11-08 Thread Samir Abdou
Hi Till,

Thanks for your answer and the hint.

However, the trigger must be based on user ID changes and not time. I tried
this approach too, but I end-up having some events with the same userID
that belong to the next window. I finally solved the problem by
implementing a custom WindowFunction that pushes the last event of the
window to the beginning of the next window.  I think a proper solution
would be to implement a custom WindowAssigner and a Trigger that just emits
the windows.

Cheers,
Samir

2016-11-07 15:42 GMT+01:00 Till Rohrmann :

> Hi Samir,
>
> the windowing API in Flink works the following way: First an incoming
> element is assigned to a window. This is defined in the window clause where
> you create a GlobalWindow. Thus, all elements for the same sourceId will be
> assigned to the same window. Next, the element is given to a Trigger which
> decides whether the window shall be evaluated or not. But at this point the
> element is already part of the window. That's why the last element of your
> window has a different ID.
>
> What you could try to use is the MergingWindowAssigner to create windows
> whose elements all have the same ID. There you assign all elements with the
> same ID to the same session window. The session windows are then triggered
> by event time, for example. That's the recommended way to create session
> windows with Flink. Here is some documentation for session windows [1].
>
> [1] https://ci.apache.org/projects/flink/flink-docs-
> master/dev/windows.html#session-windows
>
> Cheers,
> Till
>
> On Sun, Nov 6, 2016 at 12:11 PM, Samir Abdou <
> abdou.samir.mail...@gmail.com> wrote:
>
>> I am using Flink 1.2-Snapshot. My data looks like the following:
>>
>>- id=25398102, sourceId=1, ts=2016-10-15 00:00:56, user=14, value=919
>>- id=25398185, sourceId=1, ts=2016-10-15 00:01:06, user=14, value=920
>>- id=25398210, sourceId=1, ts=2016-10-15 00:01:16, user=14, value=944
>>- id=25398235, sourceId=1, ts=2016-10-15 00:01:24, user=3149,
>>value=944
>>- id=25398236, sourceId=1, ts=2016-10-15 00:01:25, user=71, value=955
>>- id=25398239, sourceId=1, ts=2016-10-15 00:01:26, user=71, value=955
>>- id=25398265, sourceId=1, ts=2016-10-15 00:01:36, user=71, value=955
>>- id=25398310, sourceId=1, ts=2016-10-15 00:02:16, user=14, value=960
>>- id=25398320, sourceId=1, ts=2016-10-15 00:02:26, user=14, value=1000
>>
>> I am running the following code to create windows based user IDs:
>>
>> stream.flatMap(new LogsParser())
>> .assignTimestampsAndWatermarks(new MessageTimestampExtractor())
>> .keyBy("sourceId")
>> .window(GlobalWindows.create())
>> .trigger(PurgingTrigger.of(new MySessionTrigger()))
>> .apply(new SessionWindowFunction())
>> .print();
>>
>> MySession trigger looks into the received event and check the user ID to
>> trigger the window on user ID changes. The SessionWindowFunction just
>> create a session out of the window.
>>
>> Here are the sessions created:
>>
>>1.
>>
>>Session:
>>- id=25398102, sourceId=1, ts=2016-10-15 00:00:56, user=14, value=919
>>   - id=25398185, sourceId=1, ts=2016-10-15 00:01:06, user=14,
>>   value=920
>>   - id=25398210, sourceId=1, ts=2016-10-15 00:01:16, user=14,
>>   value=944
>>   - id=25398235, sourceId=1, ts=2016-10-15 00:01:24, user=3149,
>>   value=944
>>2.
>>
>>Session:
>>- id=25398236, sourceId=1, ts=2016-10-15 00:01:25, user=71, value=955
>>   - id=25398239, sourceId=1, ts=2016-10-15 00:01:26, user=71,
>>   value=955
>>   - id=25398265, sourceId=1, ts=2016-10-15 00:01:36, user=71,
>>   value=955
>>   - id=25398310, sourceId=1, ts=2016-10-15 00:02:16, user=14,
>>   value=960
>>3.
>>
>>Session:
>>- id=25398320, sourceId=1, ts=2016-10-15 00:02:26, user=14, value=1000
>>
>> The problem as you can see is that in every session the last event
>> belongs actually to the next window. The decision to trigger the window is
>> somehow late as the last event is already in the window.
>>
>> How can I trigger the window without considering the last event in that
>> window?
>>
>> Thanks for your help.
>>
>
>


Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Till Rohrmann
Flink does not support per key watermarks or type sensitive watermarks. The
underlying assumption is that you have a global watermark which defines the
progress wrt to event time in your topology.

The easiest way would be to have an input which has a monotonically
increasing timestamp. Alternatively you can define the maximum lag between
the watermark and the timestamp and then generate watermarks with w =
timestamp - maxLag. That way you allow elements to be out of order for a
certain amount of event time.

Cheers,
Till

On Tue, Nov 8, 2016 at 5:02 PM, Sendoh  wrote:

> Thank you for confirming.
>
> What would you think an efficient way not having global watermark? The
> following logic fails to build Watermark per KeyStream:
> jsonStreams.keyBy(new JsonKeySelector()).assignTimestampsAndWatermarks(new
> JsonWatermark()).keyBy(JsonKeySelector()).window(
>
> So, using split(), or implementing an event type recognized
> AssignerWithPeriodicWatermarks along with custom EventTimeTrigger would be
> the solution?
>
> Best,
>
> Sendoh
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Cannot-see-all-
> events-in-window-apply-for-big-input-tp9945p9988.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Window PURGE Behaviour 1.0.2 vs 1.1.3

2016-11-08 Thread Konstantin Knauf
Hi Aljoscha,

interesting, this explains it. Well, in our case the PURGE in the
onProcessingTimeTimer is only used to clear KeyValueStates*, and at this
point there are usually no records in the window state.

Any Ideas?

I do have a workaround with an evictor, but it seemed to be
unnecessarily complicated.

*We can not use clear()-callback for that, since this state should
survive the FIRE_AND_PURGEs in the onElement()-calls.

Cheers,

Konstantin


On 08.11.2016 18:31, Aljoscha Krettek wrote:
> Hi,
> the timers are not actually deleted but the WindowOperator will check
> whether there is any window state associated with the window for which
> the timer fires. If there is no window state the timer will silently be
> ignored.
> 
> Is this a problem for you or did you just want to clarify? If yes, then
> we should work on finding a solution.
> 
> Cheers,
> Aljoscha
> 
> On Tue, 8 Nov 2016 at 18:18 Konstantin Knauf
> > wrote:
> 
> Hi everyone,
> 
> I just migrated a streaming Job from 1.0.2 to 1.1.3 and stumbled across
> a problem concerning one of our custom triggers.
> 
> The trigger basically FIRE_AND_PURGEs multiple times in onElement() and
> the window is PURGEd onProcessingTimeTimer(), but it seems that the all
> registered processing time timers are deleted everytime the window is
> PURGEd.
> 
> clear() is the default implementation, i.e. no-op.
> 
> Just wanted to, if this is the expected behavior (processing time timers
> being deleted on PURGE or FIRE_AND_PURGE) from Flink 1.1 on?
> 
> Cheers,
> 
> Konstantin
> 
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com
>  * +49-174-3413182
> 
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
> 
> 

-- 
Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



signature.asc
Description: OpenPGP digital signature


Window PURGE Behaviour 1.0.2 vs 1.1.3

2016-11-08 Thread Konstantin Knauf
Hi everyone,

I just migrated a streaming Job from 1.0.2 to 1.1.3 and stumbled across
a problem concerning one of our custom triggers.

The trigger basically FIRE_AND_PURGEs multiple times in onElement() and
the window is PURGEd onProcessingTimeTimer(), but it seems that the all
registered processing time timers are deleted everytime the window is
PURGEd.

clear() is the default implementation, i.e. no-op.

Just wanted to, if this is the expected behavior (processing time timers
being deleted on PURGE or FIRE_AND_PURGE) from Flink 1.1 on?

Cheers,

Konstantin

-- 
Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082




signature.asc
Description: OpenPGP digital signature


Custom Window Assigner With Lateness

2016-11-08 Thread Seth Wiesman
Is it possible in a custom window assigner to determine if an object has 
appeared after the watermark has passed? I want to have a standard event time 
tumbling window but custom logic for late data. From what I can tell there is 
no way from within the WindowAssigner interface to determine if an element 
arrived after the watermark. Is this currently possible to do in flink?

Thank you,

Seth Wiesman


Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Sendoh
Thank you for confirming.

What would you think an efficient way not having global watermark? The
following logic fails to build Watermark per KeyStream:
jsonStreams.keyBy(new JsonKeySelector()).assignTimestampsAndWatermarks(new
JsonWatermark()).keyBy(JsonKeySelector()).window(

So, using split(), or implementing an event type recognized
AssignerWithPeriodicWatermarks along with custom EventTimeTrigger would be
the solution?

Best,

Sendoh



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cannot-see-all-events-in-window-apply-for-big-input-tp9945p9988.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Kinesis Connector Dependency Problems

2016-11-08 Thread Steffen Hausmann
Hi Fabian,

I can confirm that the behaviour is reproducible with both, Maven 3.3.9 and 
Maven 3.0.5.

Cheers,
Steffen

Am 8. November 2016 11:11:19 MEZ, schrieb Fabian Hueske :
>Hi,
>
>I encountered this issue before as well.
>
>Which Maven version are you using?
>Maven 3.3.x does not properly shade dependencies.
>You have to use Maven 3.0.3 (see [1]).
>
>Best, Fabian
>
>[1]
>https://ci.apache.org/projects/flink/flink-docs-release-1.1/setup/building.html
>
>2016-11-08 11:05 GMT+01:00 Till Rohrmann :
>
>> Yes this definitely looks like a similar issue. Once we shade the aws
>> dependencies in the Kinesis connector, the problem should be
>(hopefully)
>> resolved. I've added your problem description to the JIRA. Thanks for
>> reporting it.
>>
>> Cheers,
>> Till
>>
>> On Mon, Nov 7, 2016 at 8:01 PM, Foster, Craig 
>wrote:
>>
>>> I think this is a similar issue but it was brought to my attention
>that
>>> we’re also seeing this on EMR 5.1.0 with the FlinkKinesisConsumer.
>What I
>>> did to duplicate this issue:
>>>
>>> 1)   I have used the Wikiedit quickstart but used Kinesis
>instead of
>>> Kafka to publish results with a FlinkKinesisProducer. This works
>fine. I
>>> can use a separate script to read what was published to my stream.
>>>
>>> 2)   When using a FlinkKinesisConsumer, however, I get an error:
>>>
>>>
>>>
>>> java.lang.NoSuchMethodError: org.apache.http.params.HttpCon
>>> nectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
>>>
>>> at com.amazonaws.http.HttpClientF
>>> actory.createHttpClient(HttpClientFactory.java:96)
>>>
>>> at com.amazonaws.http.AmazonHttpC
>>> lient.(AmazonHttpClient.java:187)
>>>
>>> at com.amazonaws.AmazonWebService
>>> Client.(AmazonWebServiceClient.java:136)
>>>
>>> at com.amazonaws.services.kinesis
>>> .AmazonKinesisClient.(AmazonKinesisClient.java:221)
>>>
>>> at com.amazonaws.services.kinesis
>>> .AmazonKinesisClient.(AmazonKinesisClient.java:197)
>>>
>>> at org.apache.flink.streaming.con
>>> nectors.kinesis.util.AWSUtil.createKinesisClient(AWSUtil.java:56)
>>>
>>> at org.apache.flink.streaming.con
>>> nectors.kinesis.proxy.KinesisProxy.(KinesisProxy.java:118)
>>>
>>> at org.apache.flink.streaming.con
>>> nectors.kinesis.proxy.KinesisProxy.create(KinesisProxy.java:176)
>>>
>>> at org.apache.flink.streaming.con
>>> nectors.kinesis.internals.KinesisDataFetcher.(KinesisD
>>> ataFetcher.java:188)
>>>
>>> at org.apache.flink.streaming.con
>>>
>nectors.kinesis.FlinkKinesisConsumer.run(FlinkKinesisConsumer.java:198)
>>>
>>> at org.apache.flink.streaming.api
>>> .operators.StreamSource.run(StreamSource.java:80)
>>>
>>> at org.apache.flink.streaming.api
>>> .operators.StreamSource.run(StreamSource.java:53)
>>>
>>> at org.apache.flink.streaming.run
>>> time.tasks.SourceStreamTask.run(SourceStreamTask.java:56)
>>>
>>> at org.apache.flink.streaming.run
>>> time.tasks.StreamTask.invoke(StreamTask.java:266)
>>>
>>> at org.apache.flink.runtime.taskm
>>> anager.Task.run(Task.java:585)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *From: *Robert Metzger 
>>> *Reply-To: *"user@flink.apache.org" 
>>> *Date: *Friday, November 4, 2016 at 2:57 AM
>>> *To: *"user@flink.apache.org" 
>>> *Subject: *Re: Kinesis Connector Dependency Problems
>>>
>>>
>>>
>>> Thank you for helping to investigate the issue. I've filed an issue
>in
>>> our bugtracker: https://issues.apache.org/jira/browse/FLINK-5013
>>>
>>>
>>>
>>> On Wed, Nov 2, 2016 at 10:09 PM, Justin Yan 
>>> wrote:
>>>
>>> Sorry it took me a little while, but I'm happy to report back that
>it
>>> seems to be working properly with EMR 4.8.  It seems so obvious in
>>> retrospect... thanks again for the assistance!
>>>
>>>
>>>
>>> Cheers,
>>>
>>>
>>>
>>> Justin
>>>
>>>
>>>
>>> On Tue, Nov 1, 2016 at 11:44 AM, Robert Metzger
>
>>> wrote:
>>>
>>> Hi Justin,
>>>
>>>
>>>
>>> thank you for sharing the classpath of the Flink container with us.
>It
>>> contains what Till was already expecting: An older version of the
>AWS SDK.
>>>
>>>
>>>
>>> If you have some spare time, could you quickly try to run your
>program
>>> with a newer EMR version, just to validate our suspicion?
>>>
>>> If the error doesn't occur on a more recent EMR version, then we
>know why
>>> its happening.
>>>
>>>
>>>
>>> We'll then probably need to shade (relocate) the Kinesis code to
>make it
>>> work with older EMR libraries.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Robert
>>>
>>>
>>>
>>>
>>>
>>> On Tue, Nov 1, 2016 at 6:27 PM, Justin Yan 
>>> wrote:
>>>
>>> Hi there,
>>>
>>>
>>>

Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Till Rohrmann
Hi Sendoh,

Flink should actually never lose data unless it is so late that it arrives
after the allowed lateness. This should be independent of the total data
size.

The watermarks are indeed global and not bound to a specific input element
or a group. So for example if you create the watermarks from the timestamp
information of your events and you have the following input event sequence:
(eventA, 01-01), (eventB, 02-01), (eventC, 01-02). Then you would generate
the watermark W(02-01) after the second event. The third event would then
be a late element and if it exceeds the allowed lateness, then it will be
discarded.

What you have to make sure is that the events in your queue have a
monotonically increasing timestamp if you generate the watermarks from a
timestamp field of the events.

Cheers,
Till

On Tue, Nov 8, 2016 at 3:37 PM, Sendoh  wrote:

> Hi,
>
> Would the issue be events are too out of ordered and the watermark is
> global?
>
> We want to count event per event type per day, and the data looks like:
>
> eventA, 10-29-XX
> eventB,, 11-02-XX
> eventB,, 11-02-XX
> eventB,, 11-03-XX
> eventB,, 11-04-XX
> 
> 
> eventA, 10-29-XX
> eventA, 10-30-XX
> eventA, 10-30-XX
> .
> .
> .
> eventA, 11-04-XX
>
>
> eventA is much much larger than eventB,
> and it looks like we lost the count of eventA at 10-29 and 10-30 while we
> have count of eventA at 11-04-XX.
> Could it be the problem that watermark is gloabal rather than per event?
>
> Best,
>
> Sendoh
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/Cannot-see-all-
> events-in-window-apply-for-big-input-tp9945p9985.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Sendoh
Hi,

Would the issue be events are too out of ordered and the watermark is
global?

We want to count event per event type per day, and the data looks like:

eventA, 10-29-XX
eventB,, 11-02-XX
eventB,, 11-02-XX
eventB,, 11-03-XX
eventB,, 11-04-XX


eventA, 10-29-XX
eventA, 10-30-XX
eventA, 10-30-XX
.
.
.
eventA, 11-04-XX


eventA is much much larger than eventB,
and it looks like we lost the count of eventA at 10-29 and 10-30 while we
have count of eventA at 11-04-XX.
Could it be the problem that watermark is gloabal rather than per event?

Best,

Sendoh



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cannot-see-all-events-in-window-apply-for-big-input-tp9945p9985.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Listening to timed-out patterns in Flink CEP

2016-11-08 Thread Till Rohrmann
Hi David,

sorry for my late reply. I just found time to look into the problem. You
were right with your observation that the CEP operator did not behave as
I've described it. The problem was that the time of the underlying NFA was
not advanced if there were no events buffered in the CEP operator when a
new watermark arrived. This was not intended and I opened a PR [1] to fix
this problem. I've tested the fix with your example program and it seems to
solve the problem that you don't see timeouts after the timeout interval
has passed. Thanks for reporting this problem and please excuse my long
response time.

Btw, I'll merge the PR this evening. So it should be included in the
current snapshot version by the end of tomorrow.

[1] https://github.com/apache/flink/pull/2771

Cheers,
Till

On Fri, Oct 14, 2016 at 11:40 AM, Till Rohrmann 
wrote:

> Hi guys,
>
> I'll try to come up with an example illustrating the behaviour over the
> weekend.
>
> Cheers,
> Till
>
> On Fri, Oct 14, 2016 at 11:16 AM, David Koch 
> wrote:
>
>> Hello,
>>
>> Thanks for the code Sameer. Unfortunately, it didn't solve the issue.
>> Compared to what I did the principle is the same - make sure that the
>> watermark advances even without events present to trigger timeouts in CEP
>> patterns.
>>
>> If Till or anyone else could provide a minimal example illustrating the
>> supposed behaviour of:
>>
>> [CEP] timeout will be detected when the first watermark exceeding the
>>> timeout value is received
>>
>>
>> I'd very much appreciate it.
>>
>> Regards,
>>
>> David
>>
>>
>> On Wed, Oct 12, 2016 at 1:54 AM, Sameer W  wrote:
>>
>>> Try this. Your WM's need to move forward. Also don't use System
>>> Timestamp. Use the timestamp of the element seen as the reference as the
>>> elements are most likely lagging the system timestamp.
>>>
>>> DataStream withTimestampsAndWatermarks = tuples
>>> .assignTimestampsAndWatermarks(new
>>> AssignerWithPeriodicWatermarks() {
>>>
>>> long waterMarkTmst;
>>> long lastEmittedWM=0;
>>> @Override
>>> public long extractTimestamp(Event element, long
>>> previousElementTimestamp) {
>>> if(element.tmst>lastEmittedWM){
>>>waterMarkTmst = element.tmst-1; //Assumes increasing
>>> timestamps. Need to subtract 1 as more elements with same TS might arrive
>>> }
>>> return element.tmst;
>>> }
>>>
>>> @Override
>>> public Watermark getCurrentWatermark() {
>>> if(lastEmittedWM==waterMarkTmst){ //No new event seen,
>>> move the WM forward by auto watermark interval
>>> waterMarkTmst = waterMarkTmst + 1000l//Increase by
>>> auto watermark interval (Watermarks only move forward in time)
>>> }
>>> lastEmittedWM = waterMarkTmst
>>>
>>> System.out.println(String.format("Watermark at %s", new
>>> Date(waterMarkTmst)));
>>> return new Watermark(waterMarkTmst);//Until an event is
>>> seem WM==0 starts advancing by 1000ms until an event is seen
>>> }
>>> }).keyBy("key");
>>>
>>> On Tue, Oct 11, 2016 at 7:29 PM, David Koch 
>>> wrote:
>>>
 Hello,

 I tried setting the watermark to System.currentTimeMillis() - 5000L,
 event timestamps are System.currentTimeMillis(). I do not observe the
 expected behaviour of the PatternTimeoutFunction firing once the watermark
 moves past the timeout "anchored" by a pattern match.

 Here is the complete test class source ,
 in case someone is interested. The timestamp/watermark assigner looks like
 this:

 DataStream withTimestampsAndWatermarks = tuples
 .assignTimestampsAndWatermarks(new
 AssignerWithPeriodicWatermarks() {

 long waterMarkTmst;

 @Override
 public long extractTimestamp(Event element, long
 previousElementTimestamp) {
 return element.tmst;
 }

 @Override
 public Watermark getCurrentWatermark() {
 waterMarkTmst = System.currentTimeMillis() - 5000L;
 System.out.println(String.format("Watermark at %s",
 new Date(waterMarkTmst)));
 return new Watermark(waterMarkTmst);
 }
 }).keyBy("key");

 withTimestampsAndWatermarks.getExecutionConfig().setAutoWate
 rmarkInterval(1000L);

 // Apply pattern filtering on stream.
 PatternStream patternStream = 
 CEP.pattern(withTimestampsAndWatermarks,
 pattern);

 Any idea what's wrong?

 David


 On Tue, Oct 11, 2016 at 10:20 PM, Sameer W  wrote:

> Assuming an element with timestamp which is 

Re: Kafka Monitoring

2016-11-08 Thread vinay patil
Yes Kafka and Flink connect to that zookeeper only.

Not sure why it is not listing the consumer

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 5:36 PM, Daniel Santos [via Apache Flink User
Mailing List archive.]  wrote:

> Hi,
>
> brokerPath is just optional.
>
> Used if you want to have multile kafka clusters.
>
> Each kafka cluster would connect to the same brokerPath.
>
> Since I have multiple clusters I use the brokerPath.
>
> From the looks of it you dont. So never mind it doesn't matter.
>
> You only have one zookeeper correct ?
>
> And kafka and flink connects to that only zookeeper ?
>
> Best Regards,
>
> Daniel Santos
>
> On 11/08/2016 11:18 AM, vinay patil wrote:
>
> Hi Daniel,
>
> Yes I have specified the zookeeper host in server.properties file , so the
> broker is connected to zookeeper.
>
> https://kafka.apache.org/documentation#brokerconfigs  -> according to
> this link, I guess all these configs are done in server.prop , so from
> where did you get kafka09 as brokerPath ?
>
> this is my entry in server.prop file -> zookeeper.connect=localhost:2181
> Have you set this as zkhost:2181/kafka09 ?
>
>
> Regards,
> Vinay Patil
>
> On Tue, Nov 8, 2016 at 4:27 PM, Daniel Santos [via Apache Flink User
> Mailing List archive.] <[hidden email]
> > wrote:
>
>> Hi,
>>
>> Your kafka broker is connected to zookeeper I believe.
>>
>> I am using kafka 0.9.0.1 my self too.
>>
>> On kafka broker 0.9.0.1 I have configured the zookeeper connect to a
>> path, for instances :
>>
>> zk1:2181,zk2:2181,zk3:2181/kafka09
>>
>> https://kafka.apache.org/documentation#brokerconfigs
>>
>> Now on the flink side I would configure 
>> "props.setProperty("zookeeper.connect",
>> zkHosts)" the same resulting in :
>>
>>
>> props.setProperty("zookeeperconnect", "zk1:2181,zk2:2181,zk3:2181/ka
>> fka09")
>>
>>
>> That is what I mean by broker's path.
>>
>> Best Regards,
>>
>> Daniel Santos
>>
>> On 11/08/2016 10:49 AM, vinay patil wrote:
>>
>> Hi Daniel,
>>
>> I have the same properties set for the consumer and the same code
>>
>> *brokerspath only needed if you have set it on kafka config* -> I did
>> not get this, do you mean to check the brokerspath in
>> conf/server.properties file ? I have even tried by setting offset.storage
>> property to zookeeeper, but still not getting the consumers listed
>>
>> I am using Kafka 0.9.0.1
>>
>>
>> Regards,
>> Vinay Patil
>>
>> On Tue, Nov 8, 2016 at 3:59 PM, Daniel Santos [via Apache Flink User
>> Mailing List archive.] <[hidden email]
>> > wrote:
>>
>>> Hello,
>>>
>>> This is my config.
>>>
>>> On kafka props :
>>>
>>> val props = new Properties()
>>>
>>> props.setProperty("zookeeper.connect", zkHosts)
>>> props.setProperty("bootstrap.servers", kafHosts)
>>> props.setProperty("group.id", "prod")
>>> props.setProperty("auto.offset.reset", "earliest")
>>>
>>> Now for zkHosts beware that all your hosts quorum has to be included.
>>>
>>> For instances you have zk1 and zk2 and zk3 to form a quorum.
>>>
>>> Then it will result in zkHosts being -> 
>>> zk1:2181,zk2:2181,zk3:2181/[brokerspath]
>>> .
>>>
>>> brokerspath only needed if you have set it on kafka config. Ignore it
>>> otherwise, resulting in "zk1:2181,zk2:2181,zk3:2181" .
>>>
>>> After that -> val source = env.addSource(new
>>> FlinkKafkaConsumer09[String](KAFKA_TOPIC, new SimpleStringSchema(),
>>> props))
>>>
>>> Then on kafkamanager -> consumers I have the groupID prod.
>>>
>>> Hope it helps.
>>>
>>> Best Regards,
>>>
>>> Daniel Santos
>>> On 11/08/2016 08:45 AM, vinay patil wrote:
>>>
>>> Hi Limbo,
>>>
>>> I am using 0.9, I am not able to see updated results even after
>>> refreshing.
>>> There is some property that we have to set in order to make this work
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink User Mailing
>>> List archive.] <[hidden email]
>>> > wrote:
>>>
 I am using kafka 0.8, just refresh the page and you will see the
 updated results, it’s not auto update.

 There is our logstash consumer:



 and this is our flink consumer:


 I find the flink consumer just write the offset of the kafka partition
 to zookeeper without owner and ids,
 so we can’t find the consumer in the manager page, we can only find the
 offset info.

 在 2016年11月8日,下午12:02,vinay patil <[hidden email]
 > 写道:

 Hi Limbo,

 I can see the lag by using that URL, but the Lag there is not showing
 updated results, it does not change, also if you try to to change the
 consumer group value it will still show you the same value instead of
 saying consumer group does not exist or similar kind of error :)

 According to documentation of 0.9.x the offsets are stored in 

Re: Kafka Monitoring

2016-11-08 Thread Daniel Santos

Hi,

brokerPath is just optional.

Used if you want to have multile kafka clusters.

Each kafka cluster would connect to the same brokerPath.

Since I have multiple clusters I use the brokerPath.

From the looks of it you dont. So never mind it doesn't matter.

You only have one zookeeper correct ?

And kafka and flink connects to that only zookeeper ?

Best Regards,

Daniel Santos


On 11/08/2016 11:18 AM, vinay patil wrote:

Hi Daniel,

Yes I have specified the zookeeper host in server.properties file , so 
the broker is connected to zookeeper.


https://kafka.apache.org/documentation#brokerconfigs 
 -> according to 
this link, I guess all these configs are done in server.prop , so from 
where did you get kafka09 as brokerPath ?


this is my entry in server.prop file -> zookeeper.connect=localhost:2181
Have you set this as zkhost:2181/kafka09 ?


Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 4:27 PM, Daniel Santos [via Apache Flink User 
Mailing List archive.] <[hidden email] 
> wrote:


Hi,

Your kafka broker is connected to zookeeper I believe.

I am using kafka 0.9.0.1 my self too.

On kafka broker 0.9.0.1 I have configured the zookeeper connect to
a path, for instances :

zk1:2181,zk2:2181,zk3:2181/kafka09

https://kafka.apache.org/documentation#brokerconfigs


Now on the flink side I would configure
"props.setProperty("zookeeper.connect", zkHosts)" the same
resulting in :


props.setProperty("zookeeperconnect",
"zk1:2181,zk2:2181,zk3:2181/kafka09")


That is what I mean by broker's path.

Best Regards,

Daniel Santos


On 11/08/2016 10:49 AM, vinay patil wrote:

Hi Daniel,

I have the same properties set for the consumer and the same code

/brokerspath only needed if you have set it on kafka config/ -> I
did not get this, do you mean to check the brokerspath in
conf/server.properties file ? I have even tried by setting
offset.storage property to zookeeeper, but still not getting the
consumers listed

I am using Kafka 0.9.0.1


Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 3:59 PM, Daniel Santos [via Apache Flink
User Mailing List archive.] <[hidden email]
> wrote:

Hello,

This is my config.

On kafka props :

val props = new Properties()

props.setProperty("zookeeper.connect", zkHosts)
props.setProperty("bootstrap.servers", kafHosts)
props.setProperty("group.id ", "prod")
props.setProperty("auto.offset.reset", "earliest")

Now for zkHosts beware that all your hosts quorum has to be
included.

For instances you have zk1 and zk2 and zk3 to form a quorum.

Then it will result in zkHosts being ->
zk1:2181,zk2:2181,zk3:2181/[brokerspath] .

brokerspath only needed if you have set it on kafka config.
Ignore it otherwise, resulting in "zk1:2181,zk2:2181,zk3:2181" .

After that -> val source = env.addSource(new
FlinkKafkaConsumer09[String](KAFKA_TOPIC, new
SimpleStringSchema(), props))

Then on kafkamanager -> consumers I have the groupID prod.

Hope it helps.

Best Regards,

Daniel Santos

On 11/08/2016 08:45 AM, vinay patil wrote:

Hi Limbo,

I am using 0.9, I am not able to see updated results even
after refreshing.
There is some property that we have to set in order to make
this work

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink
User Mailing List archive.] <[hidden email]
> wrote:

I am using kafka 0.8, just refresh the page and you will
see the updated results, it’s not auto update.

There is our logstash consumer:



and this is our flink consumer:


I find the flink consumer just write the offset of the
kafka partition to zookeeper without owner and ids,
so we can’t find the consumer in the manager page, we
can only find the offset info.


在 2016年11月8日,下午12:02,vinay patil <[hidden email]
>
写道:

Hi Limbo,

I can see the lag by using that URL, but the Lag there
is not showing updated results, it does not change,
also if you try to to change the consumer group value
it will still show you the same value instead of saying
consumer group does not exist or similar kind of error :)

According to documentation of 0.9.x the offsets are
stored in Kafka, but we can set offset.storage property
  

Re: Last-Event-Only Timer (Custom Trigger)

2016-11-08 Thread Julian Bauß
Hi Till,

thank you for your reply.
This is exactly what I was looking for!

Flink continues to surprise me with its versatility. :)

Best Regards,

Julian

2016-11-07 16:47 GMT+01:00 Till Rohrmann :

> Hi Julian,
>
> you can use the TriggerContext to register and unregister event time
> timers which fire when the given event time has been passed. That’s one way
> to implement what you’ve described. If you don’t want to use time windows
> you could also use session windows. Take a look at the
> EventTimeSessionWindows class. In order to only obtain the last element,
> you should use an Evictor which evicts all elements except for the last.
>
> Concerning the purging: Time windows are automatically cleaned up after
> the end of the window + an allowed lateness. That’s why the trigger no
> longer has to take core of that.
>
> Cheers,
> Till
> ​
>
> On Mon, Nov 7, 2016 at 11:34 AM, Julian Bauß 
> wrote:
>
>> Hello everybody,
>>
>> I'm currently trying to implement a Function that allows me to detect
>> that a certain amount of time has passed after receiving the last element
>> of a stream (in a given time window). For example if nothing happened for 6
>> hours within a given Session I want to do something (set a flag, clear some
>> state).
>>
>> I thought I could solve this with a custom trigger on EventTime
>> TimeWindows. I'm currently confused about how I should implement that
>> Trigger. The implementation should not be much different from a
>> EventTimeTrigger except that it discards of any windows with more than one
>> element. This would lead to a windowing mechanism that effectively only
>> fires a window after a certain time for the last element.
>>
>> What I don't understand is when the regular EventTimeTrigger purges
>> windows because it only ever returns FIRE and CONTINUE events.
>>
>> I assumed that after firing a window onEventTime the window would get
>> purged eventually. I then would've added a PURGE once the number of
>> elements was greater than 1.
>>
>> Would that be a suitable implementation?
>>
>> Best Regards,
>>
>> Julian
>>
>
>


Re: Kafka Monitoring

2016-11-08 Thread vinay patil
Hi Daniel,

Yes I have specified the zookeeper host in server.properties file , so the
broker is connected to zookeeper.

https://kafka.apache.org/documentation#brokerconfigs  -> according to this
link, I guess all these configs are done in server.prop , so from where did
you get kafka09 as brokerPath ?

this is my entry in server.prop file -> zookeeper.connect=localhost:2181
Have you set this as zkhost:2181/kafka09 ?


Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 4:27 PM, Daniel Santos [via Apache Flink User
Mailing List archive.]  wrote:

> Hi,
>
> Your kafka broker is connected to zookeeper I believe.
>
> I am using kafka 0.9.0.1 my self too.
>
> On kafka broker 0.9.0.1 I have configured the zookeeper connect to a path,
> for instances :
>
> zk1:2181,zk2:2181,zk3:2181/kafka09
>
> https://kafka.apache.org/documentation#brokerconfigs
>
> Now on the flink side I would configure 
> "props.setProperty("zookeeper.connect",
> zkHosts)" the same resulting in :
>
>
> props.setProperty("zookeeperconnect", "zk1:2181,zk2:2181,zk3:2181/
> kafka09")
>
>
> That is what I mean by broker's path.
>
> Best Regards,
>
> Daniel Santos
>
> On 11/08/2016 10:49 AM, vinay patil wrote:
>
> Hi Daniel,
>
> I have the same properties set for the consumer and the same code
>
> *brokerspath only needed if you have set it on kafka config* -> I did not
> get this, do you mean to check the brokerspath in conf/server.properties
> file ? I have even tried by setting offset.storage property to zookeeeper,
> but still not getting the consumers listed
>
> I am using Kafka 0.9.0.1
>
>
> Regards,
> Vinay Patil
>
> On Tue, Nov 8, 2016 at 3:59 PM, Daniel Santos [via Apache Flink User
> Mailing List archive.] <[hidden email]
> > wrote:
>
>> Hello,
>>
>> This is my config.
>>
>> On kafka props :
>>
>> val props = new Properties()
>>
>> props.setProperty("zookeeper.connect", zkHosts)
>> props.setProperty("bootstrap.servers", kafHosts)
>> props.setProperty("group.id", "prod")
>> props.setProperty("auto.offset.reset", "earliest")
>>
>> Now for zkHosts beware that all your hosts quorum has to be included.
>>
>> For instances you have zk1 and zk2 and zk3 to form a quorum.
>>
>> Then it will result in zkHosts being -> 
>> zk1:2181,zk2:2181,zk3:2181/[brokerspath]
>> .
>>
>> brokerspath only needed if you have set it on kafka config. Ignore it
>> otherwise, resulting in "zk1:2181,zk2:2181,zk3:2181" .
>>
>> After that -> val source = env.addSource(new
>> FlinkKafkaConsumer09[String](KAFKA_TOPIC, new SimpleStringSchema(),
>> props))
>>
>> Then on kafkamanager -> consumers I have the groupID prod.
>>
>> Hope it helps.
>>
>> Best Regards,
>>
>> Daniel Santos
>> On 11/08/2016 08:45 AM, vinay patil wrote:
>>
>> Hi Limbo,
>>
>> I am using 0.9, I am not able to see updated results even after
>> refreshing.
>> There is some property that we have to set in order to make this work
>>
>> Regards,
>> Vinay Patil
>>
>> On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink User Mailing
>> List archive.] <[hidden email]
>> > wrote:
>>
>>> I am using kafka 0.8, just refresh the page and you will see the updated
>>> results, it’s not auto update.
>>>
>>> There is our logstash consumer:
>>>
>>>
>>>
>>> and this is our flink consumer:
>>>
>>>
>>> I find the flink consumer just write the offset of the kafka partition
>>> to zookeeper without owner and ids,
>>> so we can’t find the consumer in the manager page, we can only find the
>>> offset info.
>>>
>>> 在 2016年11月8日,下午12:02,vinay patil <[hidden email]
>>> > 写道:
>>>
>>> Hi Limbo,
>>>
>>> I can see the lag by using that URL, but the Lag there is not showing
>>> updated results, it does not change, also if you try to to change the
>>> consumer group value it will still show you the same value instead of
>>> saying consumer group does not exist or similar kind of error :)
>>>
>>> According to documentation of 0.9.x the offsets are stored in Kafka, but
>>> we can set offset.storage property to zookeeper.
>>> Even by setting this I am not getting the consumer listed.
>>>
>>> Kafka cli command also does not show this consumer
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>> On Tue, Nov 8, 2016 at 8:56 AM, limbo [via Apache Flink User Mailing
>>> List archive.] <>> link="external" class="">[hidden email]> wrote:
>>>
 Hi,

 I have the same problem, I think the reason is that the consumer of
 flink use the low level API,
 and when I type the group name in manager url I can get the lag of the
 flink consumer, like this:

 http://your_manager_url/clusters//consumers/>>> sumer_name>/topic//type/ZK

 在 2016年11月8日,上午5:12,Daniel Santos <[hidden email]
 > 写道:

 Hello,

 I have been using that setup.
 From my understanding, if one desires to see the 

Re: Kafka Monitoring

2016-11-08 Thread Daniel Santos

Hi,

Your kafka broker is connected to zookeeper I believe.

I am using kafka 0.9.0.1 my self too.

On kafka broker 0.9.0.1 I have configured the zookeeper connect to a 
path, for instances :


zk1:2181,zk2:2181,zk3:2181/kafka09

https://kafka.apache.org/documentation#brokerconfigs

Now on the flink side I would configure 
"props.setProperty("zookeeper.connect", zkHosts)" the same resulting in :



props.setProperty("zookeeperconnect", "zk1:2181,zk2:2181,zk3:2181/kafka09")


That is what I mean by broker's path.

Best Regards,

Daniel Santos


On 11/08/2016 10:49 AM, vinay patil wrote:

Hi Daniel,

I have the same properties set for the consumer and the same code

/brokerspath only needed if you have set it on kafka config/ -> I did 
not get this, do you mean to check the brokerspath in 
conf/server.properties file ? I have even tried by setting 
offset.storage property to zookeeeper, but still not getting the 
consumers listed


I am using Kafka 0.9.0.1


Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 3:59 PM, Daniel Santos [via Apache Flink User 
Mailing List archive.] <[hidden email] 
> wrote:


Hello,

This is my config.

On kafka props :

val props = new Properties()

props.setProperty("zookeeper.connect", zkHosts)
props.setProperty("bootstrap.servers", kafHosts)
props.setProperty("group.id ", "prod")
props.setProperty("auto.offset.reset", "earliest")

Now for zkHosts beware that all your hosts quorum has to be included.

For instances you have zk1 and zk2 and zk3 to form a quorum.

Then it will result in zkHosts being ->
zk1:2181,zk2:2181,zk3:2181/[brokerspath] .

brokerspath only needed if you have set it on kafka config. Ignore
it otherwise, resulting in "zk1:2181,zk2:2181,zk3:2181" .

After that -> val source = env.addSource(new
FlinkKafkaConsumer09[String](KAFKA_TOPIC, new
SimpleStringSchema(), props))

Then on kafkamanager -> consumers I have the groupID prod.

Hope it helps.

Best Regards,

Daniel Santos

On 11/08/2016 08:45 AM, vinay patil wrote:

Hi Limbo,

I am using 0.9, I am not able to see updated results even after
refreshing.
There is some property that we have to set in order to make this work

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink User
Mailing List archive.] <[hidden email]
> wrote:

I am using kafka 0.8, just refresh the page and you will see
the updated results, it’s not auto update.

There is our logstash consumer:



and this is our flink consumer:


I find the flink consumer just write the offset of the kafka
partition to zookeeper without owner and ids,
so we can’t find the consumer in the manager page, we can
only find the offset info.


在 2016年11月8日,下午12:02,vinay patil <[hidden email]
> 写道:

Hi Limbo,

I can see the lag by using that URL, but the Lag there is
not showing updated results, it does not change, also if you
try to to change the consumer group value it will still show
you the same value instead of saying consumer group does not
exist or similar kind of error :)

According to documentation of 0.9.x the offsets are stored
in Kafka, but we can set offset.storage property to zookeeper.
Even by setting this I am not getting the consumer listed.

Kafka cli command also does not show this consumer

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 8:56 AM, limbo [via Apache Flink User
Mailing List archive.]<[hidden email]>wrote:

Hi,

I have the same problem, I think the reason is that the
consumer of flink use the low level API,
and when I type the group name in manager url I can get
the lag of the flink consumer, like this:


http://your_manager_url/clusters//consumers//topic//type/ZK




在 2016年11月8日,上午5:12,Daniel Santos <[hidden email]
> 写道:

Hello,

I have been using that setup.
From my understanding, if one desires to see the offset
being consumed by Flink on KafkaManger, one has to set
it up with zookeeper. On 0.9 it will only serve as a
view of progress.

Basically what's mandatory on 0.8 is optional on 0.9,
and for viewing purposes only.

Best Regards,
Daniel Santos

On November 7, 2016 7:13:54 PM GMT+00:00, Vinay Patil
<[hidden email]
>
  

Re: Kafka Monitoring

2016-11-08 Thread vinay patil
Hi Daniel,

I have the same properties set for the consumer and the same code

*brokerspath only needed if you have set it on kafka config* -> I did not
get this, do you mean to check the brokerspath in conf/server.properties
file ? I have even tried by setting offset.storage property to zookeeeper,
but still not getting the consumers listed

I am using Kafka 0.9.0.1


Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 3:59 PM, Daniel Santos [via Apache Flink User
Mailing List archive.]  wrote:

> Hello,
>
> This is my config.
>
> On kafka props :
>
> val props = new Properties()
>
> props.setProperty("zookeeper.connect", zkHosts)
> props.setProperty("bootstrap.servers", kafHosts)
> props.setProperty("group.id", "prod")
> props.setProperty("auto.offset.reset", "earliest")
>
> Now for zkHosts beware that all your hosts quorum has to be included.
>
> For instances you have zk1 and zk2 and zk3 to form a quorum.
>
> Then it will result in zkHosts being -> 
> zk1:2181,zk2:2181,zk3:2181/[brokerspath]
> .
>
> brokerspath only needed if you have set it on kafka config. Ignore it
> otherwise, resulting in "zk1:2181,zk2:2181,zk3:2181" .
>
> After that -> val source = env.addSource(new 
> FlinkKafkaConsumer09[String](KAFKA_TOPIC,
> new SimpleStringSchema(), props))
>
> Then on kafkamanager -> consumers I have the groupID prod.
>
> Hope it helps.
>
> Best Regards,
>
> Daniel Santos
> On 11/08/2016 08:45 AM, vinay patil wrote:
>
> Hi Limbo,
>
> I am using 0.9, I am not able to see updated results even after refreshing.
> There is some property that we have to set in order to make this work
>
> Regards,
> Vinay Patil
>
> On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink User Mailing List
> archive.] <[hidden email]
> > wrote:
>
>> I am using kafka 0.8, just refresh the page and you will see the updated
>> results, it’s not auto update.
>>
>> There is our logstash consumer:
>>
>>
>>
>> and this is our flink consumer:
>>
>>
>> I find the flink consumer just write the offset of the kafka partition to
>> zookeeper without owner and ids,
>> so we can’t find the consumer in the manager page, we can only find the
>> offset info.
>>
>> 在 2016年11月8日,下午12:02,vinay patil <[hidden email]
>> > 写道:
>>
>> Hi Limbo,
>>
>> I can see the lag by using that URL, but the Lag there is not showing
>> updated results, it does not change, also if you try to to change the
>> consumer group value it will still show you the same value instead of
>> saying consumer group does not exist or similar kind of error :)
>>
>> According to documentation of 0.9.x the offsets are stored in Kafka, but
>> we can set offset.storage property to zookeeper.
>> Even by setting this I am not getting the consumer listed.
>>
>> Kafka cli command also does not show this consumer
>>
>> Regards,
>> Vinay Patil
>>
>> On Tue, Nov 8, 2016 at 8:56 AM, limbo [via Apache Flink User Mailing List
>> archive.] <> href="x-msg://4/user/SendEmail.jtp?type=nodenode=9964i=0"
>> target="_top" rel="nofollow" link="external" class="">[hidden email]>
>> wrote:
>>
>>> Hi,
>>>
>>> I have the same problem, I think the reason is that the consumer of
>>> flink use the low level API,
>>> and when I type the group name in manager url I can get the lag of the
>>> flink consumer, like this:
>>>
>>> http://your_manager_url/clusters//consumers/>> sumer_name>/topic//type/ZK
>>>
>>> 在 2016年11月8日,上午5:12,Daniel Santos <[hidden email]
>>> > 写道:
>>>
>>> Hello,
>>>
>>> I have been using that setup.
>>> From my understanding, if one desires to see the offset being consumed
>>> by Flink on KafkaManger, one has to set it up with zookeeper. On 0.9 it
>>> will only serve as a view of progress.
>>>
>>> Basically what's mandatory on 0.8 is optional on 0.9, and for viewing
>>> purposes only.
>>>
>>> Best Regards,
>>> Daniel Santos
>>>
>>> On November 7, 2016 7:13:54 PM GMT+00:00, Vinay Patil <[hidden email]
>>> > wrote:

 Hi,

 I am monitoring Kafka using KafkaManager for checking offset lag and
 other Kafka metrics, however I am not able to see the  consumers when I use
 FlinkKafkaConsumer , for console-consumer it shows them in the Consumers
 list.

 I have set the required parameters for the kafka consumer while running
 the application.

 Has anyone faced this issue ?
 I am using Kafka 0.9.0.1

 Regards,
 Vinay Patil

>>>
>>>
>>>
>>> --
>>> If you reply to this email, your message will be added to the discussion
>>> below:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nab
>>> ble.com/Kafka-Monitoring-tp9957p9962.html
>>> To start a new topic under Apache Flink User Mailing List archive., email
>>>  >> target="_top" rel="nofollow" link="external" class="">[hidden 

Re: Cannot see all events in window apply() for big input

2016-11-08 Thread Sendoh
Yes. the other job performs event time window and we tried 1.2-SNAPSHOT and
1.1.3. The old version 1.0.3 we lost much much less data. We tried both
windowAll() and keyBy() window() already, and tried very tiny lag and
window(1 millisecond).

My doubt comes from smaller input works while bigger input has issue (events
disappear). 

For example, eventA disappears with timestamp after Oct.24 and appears again
after around 5 minutes with timestamp at Nov.08, and all events in
between(10-25 to 11-07) are missing. The output of the window gets stuck for
around 5 minutes. However, if this flink job only reads eventA, we can see
all of them. 

It looks like data is stuck in that operator and the watermark of that event
which should trigger the window comes too late, when there is a lot of data,
or? 

Best,

Sendoh





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Cannot-see-all-events-in-window-apply-for-big-input-tp9945p9977.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Kafka Monitoring

2016-11-08 Thread Daniel Santos

Hello,

This is my config.

On kafka props :

val props = new Properties()

props.setProperty("zookeeper.connect", zkHosts)
props.setProperty("bootstrap.servers", kafHosts)
props.setProperty("group.id", "prod")
props.setProperty("auto.offset.reset", "earliest")

Now for zkHosts beware that all your hosts quorum has to be included.

For instances you have zk1 and zk2 and zk3 to form a quorum.

Then it will result in zkHosts being -> 
zk1:2181,zk2:2181,zk3:2181/[brokerspath] .


brokerspath only needed if you have set it on kafka config. Ignore it 
otherwise, resulting in "zk1:2181,zk2:2181,zk3:2181" .


After that -> val source = env.addSource(new 
FlinkKafkaConsumer09[String](KAFKA_TOPIC, new SimpleStringSchema(), props))


Then on kafkamanager -> consumers I have the groupID prod.

Hope it helps.

Best Regards,

Daniel Santos

On 11/08/2016 08:45 AM, vinay patil wrote:

Hi Limbo,

I am using 0.9, I am not able to see updated results even after 
refreshing.

There is some property that we have to set in order to make this work

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink User Mailing 
List archive.] <[hidden email] 
> wrote:


I am using kafka 0.8, just refresh the page and you will see the
updated results, it’s not auto update.

There is our logstash consumer:



and this is our flink consumer:


I find the flink consumer just write the offset of the kafka
partition to zookeeper without owner and ids,
so we can’t find the consumer in the manager page, we can only
find the offset info.


在 2016年11月8日,下午12:02,vinay patil <[hidden email]
> 写道:

Hi Limbo,

I can see the lag by using that URL, but the Lag there is not
showing updated results, it does not change, also if you try to
to change the consumer group value it will still show you the
same value instead of saying consumer group does not exist or
similar kind of error :)

According to documentation of 0.9.x the offsets are stored in
Kafka, but we can set offset.storage property to zookeeper.
Even by setting this I am not getting the consumer listed.

Kafka cli command also does not show this consumer

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 8:56 AM, limbo [via Apache Flink User
Mailing List archive.]<[hidden
email]>wrote:

Hi,

I have the same problem, I think the reason is that the
consumer of flink use the low level API,
and when I type the group name in manager url I can get the
lag of the flink consumer, like this:


http://your_manager_url/clusters//consumers//topic//type/ZK




在 2016年11月8日,上午5:12,Daniel Santos <[hidden email]
> 写道:

Hello,

I have been using that setup.
From my understanding, if one desires to see the offset
being consumed by Flink on KafkaManger, one has to set it up
with zookeeper. On 0.9 it will only serve as a view of progress.

Basically what's mandatory on 0.8 is optional on 0.9, and
for viewing purposes only.

Best Regards,
Daniel Santos

On November 7, 2016 7:13:54 PM GMT+00:00, Vinay Patil
<[hidden email]
> wrote:

Hi,

I am monitoring Kafka using KafkaManager for checking
offset lag and other Kafka metrics, however I am not
able to see the  consumers when I use FlinkKafkaConsumer
, for console-consumer it shows them in the Consumers list.

I have set the required parameters for the kafka
consumer while running the application.

Has anyone faced this issue ?
I am using Kafka 0.9.0.1

Regards,
Vinay Patil






If you reply to this email, your message will be added to the
discussion below:

http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-Monitoring-tp9957p9962.html


To start a new topic under Apache Flink User Mailing List
archive., email[hidden
email]
To unsubscribe from Apache Flink User Mailing List
archive.,click here.
NAML


Re: Flink on Yarn delegation token renewal

2016-11-08 Thread Theofilos Kakantousis

Thank you for the prompt reply Stefan!

Cheers,
Theo

On 2016-11-08 11:29, Stefan Richter wrote:

Hi,

I think this problem tracked in this issue: 
https://issues.apache.org/jira/browse/FLINK-3670 . This means that the 
current master and upcoming release 1.2 should work correctly.


Best,
Stefan

Am 08.11.2016 um 10:25 schrieb Theofilos Kakantousis >:


Hi everyone,

I'm using Flink 1.1.3 with Hadoop 2.7.3 and was wondering about 
delegation token renewal when running Flink on Yarn. Yarn demands 
services to renew delegation tokens on their own and if not, Yarn 
jobs will fail after one week.

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html

For example, Spark has implemented a solution for that
https://github.com/apache/spark/blob/v2.0.1/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala

and people had reported issues running Flink in the past,
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-job-on-secure-Yarn-fails-after-many-hours-td3856.html

What is the status of this issue? How can Flink on Yarn handle this 
issue so that jobs do not fail due to token expiration?


Cheers,
Theo









Re: Flink on Yarn delegation token renewal

2016-11-08 Thread Stefan Richter
Hi,

I think this problem tracked in this issue: 
https://issues.apache.org/jira/browse/FLINK-3670 
 . This means that the 
current master and upcoming release 1.2 should work correctly.

Best,
Stefan

> Am 08.11.2016 um 10:25 schrieb Theofilos Kakantousis :
> 
> Hi everyone,
> 
> I'm using Flink 1.1.3 with Hadoop 2.7.3 and was wondering about delegation 
> token renewal when running Flink on Yarn. Yarn demands services to renew 
> delegation tokens on their own and if not, Yarn jobs will fail after one week.
> https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html
> 
> For example, Spark has implemented a solution for that
> https://github.com/apache/spark/blob/v2.0.1/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala
> 
> and people had reported issues running Flink in the past,
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-job-on-secure-Yarn-fails-after-many-hours-td3856.html
> 
> What is the status of this issue? How can Flink on Yarn handle this issue so 
> that jobs do not fail due to token expiration?
> 
> Cheers,
> Theo
> 
> 
> 



Re: Kinesis Connector Dependency Problems

2016-11-08 Thread Till Rohrmann
Yes this definitely looks like a similar issue. Once we shade the aws
dependencies in the Kinesis connector, the problem should be (hopefully)
resolved. I've added your problem description to the JIRA. Thanks for
reporting it.

Cheers,
Till

On Mon, Nov 7, 2016 at 8:01 PM, Foster, Craig  wrote:

> I think this is a similar issue but it was brought to my attention that
> we’re also seeing this on EMR 5.1.0 with the FlinkKinesisConsumer. What I
> did to duplicate this issue:
>
> 1)   I have used the Wikiedit quickstart but used Kinesis instead of
> Kafka to publish results with a FlinkKinesisProducer. This works fine. I
> can use a separate script to read what was published to my stream.
>
> 2)   When using a FlinkKinesisConsumer, however, I get an error:
>
>
>
> java.lang.NoSuchMethodError: org.apache.http.params.HttpConnectionParams.
> setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V
>
> at com.amazonaws.http.HttpClientFactory.createHttpClient(
> HttpClientFactory.java:96)
>
> at com.amazonaws.http.AmazonHttpClient.(
> AmazonHttpClient.java:187)
>
> at com.amazonaws.AmazonWebServiceClient.(
> AmazonWebServiceClient.java:136)
>
> at com.amazonaws.services.kinesis.AmazonKinesisClient.<
> init>(AmazonKinesisClient.java:221)
>
> at com.amazonaws.services.kinesis.AmazonKinesisClient.<
> init>(AmazonKinesisClient.java:197)
>
> at org.apache.flink.streaming.connectors.kinesis.util.
> AWSUtil.createKinesisClient(AWSUtil.java:56)
>
> at org.apache.flink.streaming.connectors.kinesis.proxy.
> KinesisProxy.(KinesisProxy.java:118)
>
> at org.apache.flink.streaming.connectors.kinesis.proxy.
> KinesisProxy.create(KinesisProxy.java:176)
>
> at org.apache.flink.streaming.
> connectors.kinesis.internals.KinesisDataFetcher.(
> KinesisDataFetcher.java:188)
>
> at org.apache.flink.streaming.connectors.kinesis.
> FlinkKinesisConsumer.run(FlinkKinesisConsumer.java:198)
>
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:80)
>
> at org.apache.flink.streaming.api.operators.StreamSource.
> run(StreamSource.java:53)
>
> at org.apache.flink.streaming.runtime.tasks.
> SourceStreamTask.run(SourceStreamTask.java:56)
>
> at org.apache.flink.streaming.runtime.tasks.StreamTask.
> invoke(StreamTask.java:266)
>
> at org.apache.flink.runtime.taskmanager.Task.run(Task.
> java:585)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
>
>
>
>
>
>
>
> *From: *Robert Metzger 
> *Reply-To: *"user@flink.apache.org" 
> *Date: *Friday, November 4, 2016 at 2:57 AM
> *To: *"user@flink.apache.org" 
> *Subject: *Re: Kinesis Connector Dependency Problems
>
>
>
> Thank you for helping to investigate the issue. I've filed an issue in our
> bugtracker: https://issues.apache.org/jira/browse/FLINK-5013
>
>
>
> On Wed, Nov 2, 2016 at 10:09 PM, Justin Yan 
> wrote:
>
> Sorry it took me a little while, but I'm happy to report back that it
> seems to be working properly with EMR 4.8.  It seems so obvious in
> retrospect... thanks again for the assistance!
>
>
>
> Cheers,
>
>
>
> Justin
>
>
>
> On Tue, Nov 1, 2016 at 11:44 AM, Robert Metzger 
> wrote:
>
> Hi Justin,
>
>
>
> thank you for sharing the classpath of the Flink container with us. It
> contains what Till was already expecting: An older version of the AWS SDK.
>
>
>
> If you have some spare time, could you quickly try to run your program
> with a newer EMR version, just to validate our suspicion?
>
> If the error doesn't occur on a more recent EMR version, then we know why
> its happening.
>
>
>
> We'll then probably need to shade (relocate) the Kinesis code to make it
> work with older EMR libraries.
>
>
>
> Regards,
>
> Robert
>
>
>
>
>
> On Tue, Nov 1, 2016 at 6:27 PM, Justin Yan  wrote:
>
> Hi there,
>
>
>
> We're using EMR 4.4.0 -> I suppose this is a bit old, and I can migrate
> forward if you think that would be best.
>
>
>
> I've appended the classpath that the Flink cluster was started with at the
> end of this email (with a slight improvement to the formatting to make it
> readable).
>
>
>
> Willing to poke around or fiddle with this as necessary - thanks very much
> for the help!
>
>
>
> Justin
>
>
>
> Task Manager's classpath from logs:
>
>
>
> lib/flink-dist_2.11-1.1.3.jar
>
> lib/flink-python_2.11-1.1.3.jar
>
> lib/log4j-1.2.17.jar
>
> lib/slf4j-log4j12-1.7.7.jar
>
> logback.xml
>
> log4j.properties
>
> flink.jar
>
> flink-conf.yaml
>
> /etc/hadoop/conf
>
> /usr/lib/hadoop/hadoop-annotations-2.7.1-amzn-1.jar
>
> /usr/lib/hadoop/hadoop-extras.jar
>
> /usr/lib/hadoop/hadoop-archives-2.7.1-amzn-1.jar
>
> /usr/lib/hadoop/hadoop-aws-2.7.1-amzn-1.jar

Flink on Yarn delegation token renewal

2016-11-08 Thread Theofilos Kakantousis

Hi everyone,

I'm using Flink 1.1.3 with Hadoop 2.7.3 and was wondering about 
delegation token renewal when running Flink on Yarn. Yarn demands 
services to renew delegation tokens on their own and if not, Yarn jobs 
will fail after one week.

https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YarnApplicationSecurity.html

For example, Spark has implemented a solution for that
https://github.com/apache/spark/blob/v2.0.1/yarn/src/main/scala/org/apache/spark/deploy/yarn/AMDelegationTokenRenewer.scala

and people had reported issues running Flink in the past,
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-job-on-secure-Yarn-fails-after-many-hours-td3856.html

What is the status of this issue? How can Flink on Yarn handle this 
issue so that jobs do not fail due to token expiration?


Cheers,
Theo





Re: Csv to windows?

2016-11-08 Thread Felix Neutatz
Hi Yassine,

thanks that explains it :)

Best regards,
Felix

On Nov 7, 2016 21:28, "Yassine MARZOUGUI"  wrote:

> Hi Flelix,
>
> As I see in kddcup.newtestdata_small_unlabeled_index
> ,
> the first field of connectionRecords (splits[0]), is unique for each
> record, therefore when apply keyBy(0), it will logically partition your
> stream by that field and each partition will contain only one element. So
> the countWindow(2) actually never fires because it never reaches 2
> elements. That's why your files stay empty.
>
> Could you please go into more detail about what the expected output is? Then
> we might be able to figure out the proper way to achieve it.
>
> Best,
> Yassine
>
> 2016-11-07 19:18 GMT+01:00 Felix Neutatz :
>
>> Hi Till,
>>
>> the mapper solution makes sense :)
>>
>> Unfortunately, in my case it was not a typo in the path. I checked and
>> saw that the records are read.
>>
>> You can find the whole program here: https://github.com/Felix
>> Neutatz/CluStream/blob/master/flink-java-project/src/main/
>> java/org/apache/flink/clustream/StreamingJobIndex.java
>>
>> I am happy for any ideas.
>>
>> Best regards,
>> Felix
>>
>> 2016-11-07 16:15 GMT+01:00 Till Rohrmann :
>>
>>> Hi Felix,
>>>
>>> I'm not sure whether grouping/keyBy by processing time makes
>>> semantically any sense. This can be anything depending on the execution
>>> order. Therefore, there is not build in mechanism to group by processing
>>> time. But you can always write a mapper which assigns the current
>>> processing time to the stream record and use this field for grouping.
>>>
>>> Concerning your second problem, could you check the path of the file? At
>>> the moment Flink fails silently if the path is not valid. It might be that
>>> you have a simple typo in the path. I've opened an issue to fix this issue
>>> [1].
>>>
>>> [1] https://issues.apache.org/jira/browse/FLINK-5027
>>>
>>> Cheers,
>>> Till
>>>
>>>
>>>
>>>
>>>
>>> On Sun, Nov 6, 2016 at 12:36 PM, Felix Neutatz 
>>> wrote:
>>>
 Hi everybody,

 I finally reached streaming territory. For a student project I want to
 implement CluStream for Flink. I guess this is especially interesting to
 try queryable state :)

 But I have problems at the first steps. My input data is a csv file of
 records. For the start I just want to window this csv. I don't want to use 
 AllWindows
 because it's not parallelizable.

 So my first question is: Can I window by processing time, like this:

 connectionRecordsT.keyBy(processing_time).timeWindow(Time.milliseconds(1000L))

 I didn't find a way, so I added in the csv an index column and tried to 
 use a countWindow:

 DataStreamSource source = env.readTextFile(file.getPath());

 DataStream> connectionRecords = source.map(new 
 MapToVector()).setParallelism(4);

 connectionRecords.keyBy(0).countWindow(10).apply (
new WindowFunction, Tuple1, Tuple, 
 GlobalWindow>() {
   public void apply (Tuple tuple,
  GlobalWindow window,
  Iterable> values,
  Collector out) throws Exception {
  int sum = 0;
  Iterator iterator = values.iterator();
  while (iterator.hasNext () ) {
 Tuple2 t = (Tuple2)iterator.next();
 sum += 1;
  }
  out.collect (new Tuple1(new Integer(sum)));
   }
 }).writeAsCsv("text");

 To check whether everything works I just count the elements per window and 
 write them into a csv file.

 Flink generates the files but all are empty. Can you tell me, what I did 
 wrong?

 Best regards,

 Felix


>>>
>>
>


Re: Kafka Monitoring

2016-11-08 Thread vinay patil
Hi Limbo,

I am using 0.9, I am not able to see updated results even after refreshing.
There is some property that we have to set in order to make this work

Regards,
Vinay Patil

On Tue, Nov 8, 2016 at 12:32 PM, limbo [via Apache Flink User Mailing List
archive.]  wrote:

> I am using kafka 0.8, just refresh the page and you will see the updated
> results, it’s not auto update.
>
> There is our logstash consumer:
>
>
>
> and this is our flink consumer:
>
>
> I find the flink consumer just write the offset of the kafka partition to
> zookeeper without owner and ids,
> so we can’t find the consumer in the manager page, we can only find the
> offset info.
>
> 在 2016年11月8日,下午12:02,vinay patil <[hidden email]
> > 写道:
>
> Hi Limbo,
>
> I can see the lag by using that URL, but the Lag there is not showing
> updated results, it does not change, also if you try to to change the
> consumer group value it will still show you the same value instead of
> saying consumer group does not exist or similar kind of error :)
>
> According to documentation of 0.9.x the offsets are stored in Kafka, but
> we can set offset.storage property to zookeeper.
> Even by setting this I am not getting the consumer listed.
>
> Kafka cli command also does not show this consumer
>
> Regards,
> Vinay Patil
>
> On Tue, Nov 8, 2016 at 8:56 AM, limbo [via Apache Flink User Mailing List
> archive.] < href="x-msg://4/user/SendEmail.jtp?type=nodenode=9964i=0"
> target="_top" rel="nofollow" link="external" class="">[hidden email]>
> wrote:
>
>> Hi,
>>
>> I have the same problem, I think the reason is that the consumer of flink
>> use the low level API,
>> and when I type the group name in manager url I can get the lag of the
>> flink consumer, like this:
>>
>> http://your_manager_url/clusters//consumers/<
>> consumer_name>/topic//type/ZK
>>
>> 在 2016年11月8日,上午5:12,Daniel Santos <[hidden email]
>> > 写道:
>>
>> Hello,
>>
>> I have been using that setup.
>> From my understanding, if one desires to see the offset being consumed by
>> Flink on KafkaManger, one has to set it up with zookeeper. On 0.9 it will
>> only serve as a view of progress.
>>
>> Basically what's mandatory on 0.8 is optional on 0.9, and for viewing
>> purposes only.
>>
>> Best Regards,
>> Daniel Santos
>>
>> On November 7, 2016 7:13:54 PM GMT+00:00, Vinay Patil <[hidden email]
>> > wrote:
>>>
>>> Hi,
>>>
>>> I am monitoring Kafka using KafkaManager for checking offset lag and
>>> other Kafka metrics, however I am not able to see the  consumers when I use
>>> FlinkKafkaConsumer , for console-consumer it shows them in the Consumers
>>> list.
>>>
>>> I have set the required parameters for the kafka consumer while running
>>> the application.
>>>
>>> Has anyone faced this issue ?
>>> I am using Kafka 0.9.0.1
>>>
>>> Regards,
>>> Vinay Patil
>>>
>>
>>
>>
>> --
>> If you reply to this email, your message will be added to the discussion
>> below:
>> http://apache-flink-user-mailing-list-archive.2336050.n4.
>> nabble.com/Kafka-Monitoring-tp9957p9962.html
>> To start a new topic under Apache Flink User Mailing List archive., email
>>  > target="_top" rel="nofollow" link="external" class="">[hidden email]
>> To unsubscribe from Apache Flink User Mailing List archive., click here.
>> NAML
>> 
>>
>
>
> --
> View this message in context: Re: Kafka Monitoring
> 
> Sent from the Apache Flink User Mailing List archive. mailing list archive
>  at
> Nabble.com .
>
>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Kafka-
> Monitoring-tp9957p9965.html
> To start a new topic under Apache Flink User Mailing List archive., email
> ml-node+s2336050n1...@n4.nabble.com
> To unsubscribe from Apache Flink User Mailing List archive., click here
> 
> .
> NAML
> 

Re: Memory on Aggr

2016-11-08 Thread Alberto Ramón
thanks ¡¡
Now its clear for me


2016-11-08 9:23 GMT+01:00 Fabian Hueske :

> Given the semantics described in the document the query can be computed in
> principle.
> However, if the query is not bounded by time, the required state might
> grow very large if the number of distinct xx values grows over time.
> That's why we will probably enforce a time predicate or meta data that the
> value domain of xx is of constant size.
>
>
>
> 2016-11-08 9:04 GMT+01:00 Alberto Ramón :
>
>> Yes thanks
>>
>> Perhaps my example is too simple
>>
>> *select xx, count(), sum() from ttt group by xx*
>> Why the querie value can't be calculated each 2 secs / waterMark arrive ?
>>
>> I'm try to find the video of: http://flink-forward.org/kb_se
>> ssions/scaling-stream-processing-with-apache-flink-to-very-large-state/
>>
>> 2016-11-07 22:02 GMT+01:00 Fabian Hueske :
>>
>>> First of all, the document only proposes semantics for Flink's support
>>> of relational queries on streams.
>>> It does not describe the implementation and in fact most of it is not
>>> implemented.
>>>
>>> How the queries will be executed would depend on the definition of the
>>> table, i.e., whether the tables are derived in append or replace mode.
>>> For the second query we do not necessarily need to "store all events as
>>> is" but could do some pre-aggregation depending on the configured update
>>> rate.
>>> Watermarks will be used to track time in a query, i.e., to evaluate a
>>> predicate like "*BETWEEN now() - INTERVAL '1' HOUR AND now()"* where
>>> now() would be the current watermark time.
>>>
>>> There are a couple of tricks one can play to reduce the memory
>>> requirements and the implementation should try to optimize for that.
>>> However, it is true that for some queries we will need to keep the
>>> complete input relation (within its time bounds) as state.
>>> The good news is that Flink is very good a managing large state and can
>>> easily scale to hundreds of nodes.
>>>
>>> Did that answer your questions?
>>>
>>> 2016-11-07 21:33 GMT+01:00 Alberto Ramón :
>>>
 From "Relational Queries on Data Stream in Apache Flink" > Bounday
 Memory Requirements
 (https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4kon
 QPW4tnl8THw6rzGUdaqU/edit#)


 *SELECT user, page, COUNT(page) AS pCntFROM pageviews*

 *GROUP BY user, page*

 *-Versus-*


 *SELECT user, page, COUNT(page) AS pCntFROM pageviews*

 *WHERE rowtime BETWEEN now() - INTERVAL '1' HOUR AND now() // only last
 hour*

 *GROUP BY user, page*

 I understand:

- Not use WaterMark to pre-calculate agrr, and save memory
- Store all events "as is" until the end of window

 are My assumptions true ?


>>>
>>
>


Re: Memory on Aggr

2016-11-08 Thread Fabian Hueske
Given the semantics described in the document the query can be computed in
principle.
However, if the query is not bounded by time, the required state might grow
very large if the number of distinct xx values grows over time.
That's why we will probably enforce a time predicate or meta data that the
value domain of xx is of constant size.



2016-11-08 9:04 GMT+01:00 Alberto Ramón :

> Yes thanks
>
> Perhaps my example is too simple
>
> *select xx, count(), sum() from ttt group by xx*
> Why the querie value can't be calculated each 2 secs / waterMark arrive ?
>
> I'm try to find the video of: http://flink-forward.org/kb_se
> ssions/scaling-stream-processing-with-apache-flink-to-very-large-state/
>
> 2016-11-07 22:02 GMT+01:00 Fabian Hueske :
>
>> First of all, the document only proposes semantics for Flink's support of
>> relational queries on streams.
>> It does not describe the implementation and in fact most of it is not
>> implemented.
>>
>> How the queries will be executed would depend on the definition of the
>> table, i.e., whether the tables are derived in append or replace mode.
>> For the second query we do not necessarily need to "store all events as
>> is" but could do some pre-aggregation depending on the configured update
>> rate.
>> Watermarks will be used to track time in a query, i.e., to evaluate a
>> predicate like "*BETWEEN now() - INTERVAL '1' HOUR AND now()"* where
>> now() would be the current watermark time.
>>
>> There are a couple of tricks one can play to reduce the memory
>> requirements and the implementation should try to optimize for that.
>> However, it is true that for some queries we will need to keep the
>> complete input relation (within its time bounds) as state.
>> The good news is that Flink is very good a managing large state and can
>> easily scale to hundreds of nodes.
>>
>> Did that answer your questions?
>>
>> 2016-11-07 21:33 GMT+01:00 Alberto Ramón :
>>
>>> From "Relational Queries on Data Stream in Apache Flink" > Bounday
>>> Memory Requirements
>>> (https://docs.google.com/document/d/1qVVt_16kdaZQ8RTfA_f4kon
>>> QPW4tnl8THw6rzGUdaqU/edit#)
>>>
>>>
>>> *SELECT user, page, COUNT(page) AS pCntFROM pageviews*
>>>
>>> *GROUP BY user, page*
>>>
>>> *-Versus-*
>>>
>>>
>>> *SELECT user, page, COUNT(page) AS pCntFROM pageviews*
>>>
>>> *WHERE rowtime BETWEEN now() - INTERVAL '1' HOUR AND now() // only last
>>> hour*
>>>
>>> *GROUP BY user, page*
>>>
>>> I understand:
>>>
>>>- Not use WaterMark to pre-calculate agrr, and save memory
>>>- Store all events "as is" until the end of window
>>>
>>> are My assumptions true ?
>>>
>>>
>>
>