How to make my execution graph prettier?

2017-10-09 Thread Hao Sun
Hi my execution graph looks like following, all things stuffed into on tile.[image: image.png] How can I get something like this?

Re: Consult about flink on mesos cluster

2017-10-09 Thread Bo Yu
Thanks, Till I tried to set hard host attribute constraints in "flink-conf.yaml" as mesos.constraints.hard.hostattribute: rack:ak03-07,rack:ak16-10,rack:ak03-04 where "rack:akXX-XX" is the MESOS_attributes of each slave. Then I get to the situation that the mesos app master doesn't accept the

Re: serialization error when using multiple metrics counters

2017-10-09 Thread Colin Williams
Thanks everyone, and thank you very much Seth! Adding @transient to the lazy vals is what I needed. On Mon, Oct 9, 2017 at 1:34 PM, Seth Wiesman wrote: > A scala class contains a single lazy val it is implemented using a boolean > flag to track if the field has been

Re: Questions about checkpoints/savepoints

2017-10-09 Thread vipul singh
Thanks Stefan for the answers above. These are really helpful. I have a few followup questions: 1. I see my savepoints are created in a folder, which has a _metadata file and another file. Looking at the code

Re: Unusual log message - Emitter thread got interrupted

2017-10-09 Thread Ken Krugler
Hi Aljoscha, Thanks for responding. > On Oct 9, 2017, at 7:36 AM, Aljoscha Krettek > wrote: > > Hi, > > In my understanding this is the expected behaviour of the code. The only way > to shut down the Emitter is via an interrupt because it is

DataStream joining without window

2017-10-09 Thread Yan Zhou [FDS Science] ­
It seems like flink only supports DataStream joining within same time window. Why is it restricted in this way? I think I can implement a TwoInputStreamOperator to join two DataStreams without considering the window. And inside the operator, create two state to cache records of two streams and

Re: Lost data when resuming from savepoint

2017-10-09 Thread Fabian Hueske
Hi Jose, I had a look at your program but did not spot anything. The query is a simple "SELECT FROM WHERE" query that does not have any state. So the only state is the state of the Kafka source, i.e, the offset. How much time did pass between taking the savepoint and resuming? Did you see any

Re: serialization error when using multiple metrics counters

2017-10-09 Thread Seth Wiesman
A scala class contains a single lazy val it is implemented using a boolean flag to track if the field has been evaluated. When a class contains, multiple lazy val’s it is implemented as a bit mask shared amongst the variables. This can lead to inconsistencies as to whether serialization forces

Re: PartitionNotFoundException when running in yarn-session.

2017-10-09 Thread Ufuk Celebi
Hey Niels, thanks for the detailed report. I don't think that it is related to the Hadoop or Scala version. I think the following happens: - Occasionally, one of your tasks seems to be extremely slow in registering its produced intermediate result (the data shuffled between TaskManagers) -

Re: serialization error when using multiple metrics counters

2017-10-09 Thread Stephan Ewen
Interesting, is there a quirk in Scala that using multiple lazy variables results possibly in eager initialization of some? On Mon, Oct 9, 2017 at 4:37 PM, Kostas Kloudas wrote: > Hi Colin, > > Are you initializing your counters from within the open() method of you

Re: Consult about flink on mesos cluster

2017-10-09 Thread Till Rohrmann
Hi Bo, you can still use Flink with Marathon, because Marathon will only schedule the cluster entrypoint which is the MesosApplicationMasterRunner. Everything else will be scheduled via Fenzo. Moreover, by using Marathon you gain high availability because Marathon makes sure that the

Could not connect to netcat on SocketStreamWordCount example

2017-10-09 Thread Tay Zhen Shen
Hi, As per guided by the Flink official page, i'm trying to run the SocketStreamWordCount example, $ nc -l 9000 however, after i run the netcat using the command, and i try to run bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000 the netcat console has return a

Re: serialization error when using multiple metrics counters

2017-10-09 Thread Kostas Kloudas
Hi Colin, Are you initializing your counters from within the open() method of you rich function? In other words, are you calling counter = getRuntimeContext.getMetricGroup.counter(“my counter”) from within the open(). The counter interface is not serializable. So if you instantiate the

Re: Unusual log message - Emitter thread got interrupted

2017-10-09 Thread Aljoscha Krettek
Hi, In my understanding this is the expected behaviour of the code. The only way to shut down the Emitter is via an interrupt because it is otherwise blocking on the queue. If the Emitter had been interrupted while the operator is still running it would have gone down a different code path:

Re: Failing to recover once checkpoint fails

2017-10-09 Thread Aljoscha Krettek
Hi Vishal, Some relevant Jira issues for you are: - https://issues.apache.org/jira/browse/FLINK-4808: Allow skipping failed checkpoints - https://issues.apache.org/jira/browse/FLINK-4815:

Re: Bucketing/Rolling Sink: How to overwrite method "openNewPartFile" - to append a new timestamp to part file path every time a new part file is being created

2017-10-09 Thread Kostas Kloudas
Hi Raja, To know about the method, I suppose you have looked at the source code of the sink. I think that including the timestamp of the element in the path file is not as easy as overriding the openNewPartFile. The reason is that the filenames serve as identities for the associated state of

Re: Windowing isn't applied per key

2017-10-09 Thread mclendenin
I am using Processing Time, so it is using the default timestamps and watermarks. I am running it with a parallelism of 3, I can see each operator running at a parallelism of 3 on the Web UI. I am pulling data from a Kafka topic with 12 partitions. -- Sent from:

Re: TM get killed/disconnected after a while

2017-10-09 Thread Patrick Lucas
Hi, Can you provide a bit more info about your setup, such as what Kubernetes resources you are using? (Deployments, Service) Is the pod running the taskmanager killed by Kubernetes or does it fail? Can you provide the output of kubectl describe pod and kubectl logs of the taskmanager pod that

Re: Consult about flink on mesos cluster

2017-10-09 Thread yubo
Thanks for your reply, Till We will use without Marathon, and hope the PR is merged to latest version soon. Best regards, Bo > On Oct 9, 29 Heisei, at 6:36 PM, Till Rohrmann wrote: > > Hi Bo, > > Flink uses internally Fenzo to match tasks and offers. Fenzo does not

Re: RocksDB segfault inside timer when accessing/clearing state

2017-10-09 Thread Kien Truong
Hi Stephan, I guess this is the case. Our cluster is a bit overloaded network-wise, so sometime a Task Manager got disconnected, which causes the restart of the entire job, leading to multiple segfaults in other task managers, prolonging recovery. We're upgrading the network, hopefully the

Re: Checkpoint was declined (tasks not ready)

2017-10-09 Thread Karthik Deivasigamani
Hi Stephan, Once the job restarts due to an async io operator timeout we notice that its checkpoints never succeed again. But the job is running fine and is processing data. ~ Karthik On Mon, Oct 9, 2017 at 3:19 PM, Stephan Ewen wrote: > As long as this does not appear

Re: Checkpoint was declined (tasks not ready)

2017-10-09 Thread Stephan Ewen
As long as this does not appear all the time, but only once in a while, it should not be a problem. It simply means that this particular checkpoint could not be triggered, because some sources were not ready yet. It should try another checkpoint and then be okay. On Fri, Oct 6, 2017 at 4:53 PM,

Re: Fwd: Consult about flink on mesos cluster

2017-10-09 Thread Till Rohrmann
Hi Bo, Flink uses internally Fenzo to match tasks and offers. Fenzo does not support the Marathon constraints syntax you are referring to. At the moment, Flink only allows to define hard host attribute constraints which means that you define a host attribute which has to match exactly. Fenzo also

PartitionNotFoundException when running in yarn-session.

2017-10-09 Thread Niels Basjes
Hi, I'm having some trouble running a java based Flink job in a yarn-session. The job itself consists of reading a set of files resulting in a DataStream (I use DataStream because in the future I intend to change the file with a Kafka feed), then does some parsing and eventually writes the data

Re: Failing to recover once checkpoint fails

2017-10-09 Thread Fabian Hueske
Hi Vishal, it would be great if you could create a JIRA ticket with Blocker priority. Please add all relevant information of your detailed analysis, add a link to this email thread (see [1] for the web archive of the mailing list), and post the id of the JIRA issue here. Thanks for looking into