Re: Debugging Python-Api fails with NoClassDefFoundError

2017-01-04 Thread Mathias Peters
Yes, it is. Also, the project import in Idea has worked so far.

Cheers


On 04.01.2017 21:52, Ted Yu wrote:
> This class is in flink-core jar.
>
> Have you verified that the jar is on classpath ?
>
> Cheers
>
> On Wed, Jan 4, 2017 at 12:16 PM, Mathias Peters
> > wrote:
>
> Hi,
>
> I just wanted to debug a custom python script using your python dataset
> api. Running the PythonPlanBinder in Intellij IDEA gives me the
> subjected error. I took a fresh clone, built it with mvn clean install
> -DskipTest, and imported everything in idea. Using an older version this
> worked fine, so assume no(t the usual noob) errors on my side 
>
> Submitting a python script via console works.
>
> The full stack trace looks like this:
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/flink/api/java/typeutils/TupleTypeInfoBase
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at 
> com.intellij.rt.execution.application.AppMain.main(AppMain.java:122)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.flink.api.java.typeutils.TupleTypeInfoBase
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 3 more
>
>
> Thanks for the help.
>
> best
>
> Mathias
>


Re: 1.1.4 IntelliJ Problem

2017-01-04 Thread Stephan Epping
Hey Yuri,

thanks a lot. It was flink-spector that was requiring flink-test-utils 1.1.0

best,
Stephan

> On 04 Jan 2017, at 13:17, Yury Ruchin  wrote:
> 
> Hi Stephan,  
> 
> It looks like you have libraries from different versions of Flink 
> distribution on the same classpath.
> 
> ForkableFlinkMiniCluster resides in flink-test-utils. As of distribution 
> version 1.1.3 it invokes JobManager.startJobManagerActors() with 6 arguments. 
> The signature changed by 1.1.4, and ForkableFlinkMiniCluster now invokes the 
> method with 8 arguments of different types. This might mean that 
> flink-test-utils library on your classpath has version 1.1.3 whereas 
> flink-runtime has 1.1.4.
> 
> You might want to thoroughly inspect your classpath to ensure that every 
> Flink-related dependency has version 1.1.4.
> 
> Regards,
> Yury
> 
> 2017-01-04 11:20 GMT+03:00 Stephan Epping  >:
> I also changed the scala version of the packages/artifacts to 2.11, with no 
> success.
> Further, I am not deeply familiar with maven or java dependency management at 
> all.
> 
> best Stephan
> 
>> On 03 Jan 2017, at 22:06, Stephan Ewen > > wrote:
>> 
>> Hi!
>> 
>> It is probably some inconsistent configuration in the IDE.
>> 
>> It often helps to do "Maven->Reimport" or use "restart and clear caches".
>> 
>> 
>> On Tue, Jan 3, 2017 at 9:48 AM, Stephan Epping > > wrote:
>> Hey,
>> 
>> thanks for the reply. I didn’t change the scala version, as it worked 
>> before. I just changed the flink version in my pom.xml thats it, a one line 
>> change.
>> Maybe you could elaborate a bit more, what I can do to change the scala 
>> version?
>> 
>> best Stephan
>> 
>> 
>>> On 03 Jan 2017, at 03:11, Kurt Young >> > wrote:
>>> 
>>> Seems like you didn't setup the correct scala SDK
>>> 
>>> best,
>>> Kurt
>>> 
>>> On Mon, Jan 2, 2017 at 10:41 PM, Stephan Epping >> > wrote:
>>> Hi,
>>> 
>>> I am getting this error running my tests with 1.1.4 inside intellij ide.
>>> 
>>> java.lang.NoSuchMethodError: 
>>> org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(Lorg/apache/flink/configuration/Configuration;Lakka/actor/ActorSystem;Lscala/Option;Lscala/Option;Ljava/lang/Class;Ljava/lang/Class;)Lscala/Tuple2;
>>> 
>>> at 
>>> org.apache.flink.test.util.ForkableFlinkMiniCluster.startJobManager(ForkableFlinkMiniCluster.scala:103)
>>> at 
>>> org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonfun$1.apply(FlinkMiniCluster.scala:292)
>>> at 
>>> org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonfun$1.apply(FlinkMiniCluster.scala:286)
>>> at 
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at 
>>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>> at scala.collection.immutable.Range.foreach(Range.scala:141)
>>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>> at 
>>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:286)
>>> at 
>>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:277)
>>> at 
>>> org.apache.flink.test.util.ForkableFlinkMiniCluster.start(ForkableFlinkMiniCluster.scala:255)
>>> at 
>>> org.apache.flink.test.util.TestBaseUtils.startCluster(TestBaseUtils.java:152)
>>> at 
>>> org.apache.flink.test.util.TestBaseUtils.startCluster(TestBaseUtils.java:126)
>>> at 
>>> org.flinkspector.datastream.DataStreamTestEnvironment.createTestEnvironment(DataStreamTestEnvironment.java:72)
>>> 
>>> Any ideas?
>>> 
>>> best,
>>> Stephan
>>> 
>>> 
>> 
>> 
> 
> 



Re: Are heterogeneous DataStreams possible?

2017-01-04 Thread ljwagerfield
I should add: the operators determine how to handle each message by
inspecting the message's SCHEMA_ID field (every message has a SCHEMA_ID as
its first field).



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-possible-tp10852p10853.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Are heterogeneous DataStreams possible?

2017-01-04 Thread ljwagerfield
Our data's schema is defined by our users and is not known at compile time.

All data arrives in via a single Kafka topic and is serialized using the
same serialization tech (to be defined). 

We want to use King.com's RBEA technique to process this data in different
ways at runtime (depending on its schema), using a single topology/DAG.

Therefore, each message passing through the DAG will have a different
schema.

---

My question is, what's the best way to implement a system like this, where
each message may have a different schema, and none of the schemas are known
at compile time, but must use the same DAG?

I've tried using an 'array of heterogenous tuples' which appears to work
fine when playing around in the IDE, but before I continue too far down that
route, I just wanted to verify if there were any known methods for doing
this?

Thanks!
Lawrence



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Are-heterogeneous-DataStreams-possible-tp10852.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: How do I ensure binary comparisons are being used?

2017-01-04 Thread ljwagerfield
Thank you Fabian :)



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-binary-comparisons-are-being-used-tp10806p10851.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Som question about Flink stream sql

2017-01-04 Thread Hongyuhong
Hi,
We are currently exploring on Flink streamsql ,
And I see the group-window has been implemented in Table API, and row-window is 
also planning in FLIP-11. It seems that row-window grammar is more similar to 
calcite over clause.
I'm curious about the detail plan and roadmap of stream sql, cause FLIP-11 just 
mentioned Table API.
And is that streamsql priority implement row-window? Or if group-window is 
considered, what is the status on calcite integration?
The issue on calcite jira was raised in August, what's the status right now?
Thanks in advance!

Regards
Yuhong





Rapidly failing job eventually causes "Not enough free slots"

2017-01-04 Thread Shannon Carey
In Flink 1.1.3 on emr-5.2.0, I've experienced a particular problem twice and 
I'm wondering if anyone has some insight about it.

In both cases, we deployed a job that fails very frequently (within 15s-1m of 
launch). Eventually, the Flink cluster dies.

The sequence of events looks something like this:

  *   bad job is launched
  *   bad job fails & is restarted many times (I didn't have the "failure-rate" 
restart strategy configuration right)
  *   Task manager logs: org.apache.flink.yarn.YarnTaskManagerRunner (SIGTERM 
handler): RECEIVED SIGNAL 15: SIGTERM. Shutting down as requested.
  *   At this point, the YARN resource manager also logs the container failure
  *   More logs: Container 
ResourceID{resourceId='container_1481658997383_0003_01_13'} failed. Exit 
status: Pmem limit exceeded (-104)
  *
Diagnostics for container 
ResourceID{resourceId='container_1481658997383_0003_01_13'} in state 
COMPLETE : exitStatus=Pmem limit exceeded (-104) diagnostics=Container 
[pid=21246,containerID=container_1481658997383_0003_01_13] is running 
beyond physical memory limits. Current usage: 5.6 GB of 5.6 GB physical memory 
used; 9.6 GB of 28.1 GB virtual memory used. Killing container.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Total number of failed containers so far: 12
Stopping YARN session because the number of failed containers (12) exceeded the 
maximum failed containers (11). This number is controlled by the 
'yarn.maximum-failed-containers' configuration setting. By default its the 
number of requested containers.
  *   From here onward, the logs repeatedly show that jobs fail to restart due 
to "org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: 
Not enough free slots available to run the job. You can decrease the operator 
parallelism or increase the number of slots per TaskManager in the 
configuration. Task to schedule: < Attempt #68 (Source: …) @ (unassigned) - 
[SCHEDULED] > with groupID < 73191c171abfff61fb5102c161274145 > in sharing 
group < SlotSharingGroup [73191c171abfff61fb5102c161274145, 
19596f7834805c8409c419f0edab1f1b] >. Resources available to scheduler: Number 
of instances=0, total number of slots=0, available slots=0"
  *   Eventually, Flink stops for some reason (with another SIGTERM message), 
presumably because of YARN

Does anyone have an idea why a bad job repeatedly failing would eventually 
result in the Flink cluster dying?

Any idea why I'd get "Pmem limit exceeded" or "Not enough free slots available 
to run the job"? The JVM heap usage and the free memory on the machines both 
look reasonable in my monitoring dashboards. Could it possibly be a memory leak 
due to classloading or something?

Thanks for any help or suggestions you can provide! I am hoping that the 
"failure-rate" restart strategy will help avoid this issue in the future, but 
I'd also like to understand what's making the cluster die so that I can prevent 
it.

-Shannon


Re: Debugging Python-Api fails with NoClassDefFoundError

2017-01-04 Thread Ted Yu
This class is in flink-core jar.

Have you verified that the jar is on classpath ?

Cheers

On Wed, Jan 4, 2017 at 12:16 PM, Mathias Peters 
wrote:

> Hi,
>
> I just wanted to debug a custom python script using your python dataset
> api. Running the PythonPlanBinder in Intellij IDEA gives me the
> subjected error. I took a fresh clone, built it with mvn clean install
> -DskipTest, and imported everything in idea. Using an older version this
> worked fine, so assume no(t the usual noob) errors on my side
>
> Submitting a python script via console works.
>
> The full stack trace looks like this:
>
> Exception in thread "main" java.lang.NoClassDefFoundError:
> org/apache/flink/api/java/typeutils/TupleTypeInfoBase
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at com.intellij.rt.execution.application.AppMain.main(AppMain.java:122)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.flink.api.java.typeutils.TupleTypeInfoBase
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 3 more
>
>
> Thanks for the help.
>
> best
>
> Mathias
>
>
>


Debugging Python-Api fails with NoClassDefFoundError

2017-01-04 Thread Mathias Peters
Hi,

I just wanted to debug a custom python script using your python dataset
api. Running the PythonPlanBinder in Intellij IDEA gives me the
subjected error. I took a fresh clone, built it with mvn clean install
-DskipTest, and imported everything in idea. Using an older version this
worked fine, so assume no(t the usual noob) errors on my side 

Submitting a python script via console works.

The full stack trace looks like this:

Exception in thread "main" java.lang.NoClassDefFoundError:
org/apache/flink/api/java/typeutils/TupleTypeInfoBase
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:122)
Caused by: java.lang.ClassNotFoundException:
org.apache.flink.api.java.typeutils.TupleTypeInfoBase
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more


Thanks for the help.

best

Mathias



RE: Serializing NULLs

2017-01-04 Thread Newport, Billy
Map> in your avro schema is what you want here if the map 
values are nullable.


From: Anirudh Mallem [mailto:anirudh.mal...@247-inc.com]
Sent: Tuesday, December 20, 2016 2:26 PM
To: user@flink.apache.org
Subject: Re: Serializing NULLs

If you are using Avro generated classes then you cannot have your values null.
https://cwiki.apache.org/confluence/display/AVRO/FAQ#FAQ-Whyisn'teveryvalueinAvronullable?

From: Stephan Ewen
Reply-To: "user@flink.apache.org"
Date: Tuesday, December 20, 2016 at 8:17 AM
To: "user@flink.apache.org"
Subject: Re: Serializing NULLs

Thanks for sharing the stack trace.

This seems not really Flink related, it is part of the specific Avro encoding 
logic.
The Avro Generic Record Type apparently does not allow the map value to be null.



On Tue, Dec 20, 2016 at 4:55 PM, Matt 
> wrote:
Here is the back trace: 
https://gist.github.com/56af4818bcf5dee6b97c248fd9233c67

In the meanwhile I've solved the issue by creating a POJO class where null is 
just Long.MIN_VALUE, that with a custom equals() made the trick. I guess it's 
not as fast as de/serializing Double though.

If you need any other information let me know.

Regards,
Matt

On Tue, Dec 20, 2016 at 6:46 AM, Stephan Ewen 
> wrote:
The "null" support in some types is not fully developed. However in that case I 
am wondering why it does not work. Can you share the stack trace, so we can 
take a look at the serializer?



On Mon, Dec 19, 2016 at 9:56 PM, Matt 
> wrote:
Hello list,

I'm getting this error:

java.lang.RuntimeException: Could not forward element to next operator
...
Caused by: java.lang.NullPointerException: in com.entities.Sector in map in 
double null of double of map in field properties of com.entities.Sector
...
Caused by: java.lang.NullPointerException

The field mentioned is a HashMap, and some keys are mapped to 
null values.

Why isn't it possible to forward/serialize those elements with null values?
What do you do when your elements may contain nulls?

Regards,
Matt





Re: Flink Checkpoint runs slow for low load stream

2017-01-04 Thread Fabian Hueske
Hi CVP,

we recently release Flink 1.1.4, i.e., the next bugfix release of the 1.1.x
series with major robustness improvements [1].
You might want to give 1.1.4 a try as well.

Best, Fabian

[1] http://flink.apache.org/news/2016/12/21/release-1.1.4.html

2017-01-04 16:51 GMT+01:00 Chakravarthy varaga :

> Hi Stephan, All,
>
>  I just got a chance to try if 1.1.3 fixes slow check pointing on FS
> backend. It seemed to have been fixed. Thanks for the fix.
>
>  While testing this, with varying check point intervals, there seem to
> be Spikes of slow checkpoints every 30/40 seconds for an interval of 15
> secs. The check point time lasts for about 300 ms as apposed to 10/20 ms.
>  Basically 15 secs seem to be the nominal value so far. anything below
> this interval shoots the spikes too often. For us living with 15 sec
> recovery is do-able and eventually catch up on recovery !
>
> Best Regards
> CVP
>
> On Tue, Oct 4, 2016 at 6:20 PM, Chakravarthy varaga <
> chakravarth...@gmail.com> wrote:
>
>> Thanks for your prompt response Stephan.
>>
>> I'd wait for Flink 1.1.3 !!!
>>
>> Best Regards
>> Varaga
>>
>> On Tue, Oct 4, 2016 at 5:36 PM, Stephan Ewen  wrote:
>>
>>> The plan to release 1.1.3 is asap ;-)
>>>
>>> Waiting for last backported patched to get in, then release testing and
>>> release.
>>>
>>> If you want to test it today, you would need to manually build the
>>> release-1.1 branch.
>>>
>>> Best,
>>> Stephan
>>>
>>>
>>> On Tue, Oct 4, 2016 at 5:46 PM, Chakravarthy varaga <
>>> chakravarth...@gmail.com> wrote:
>>>
 Hi Gordon,

  Do I need to clone and build release-1.1 branch to test this?
  I currently use flinlk 1.1.2 runtime. When is the plan to release
 it in 1.1.3?

 Best Regards
 Varaga

 On Tue, Oct 4, 2016 at 9:25 AM, Tzu-Li (Gordon) Tai <
 tzuli...@apache.org> wrote:

> Hi,
>
> Helping out here: this is the PR for async Kafka offset committing -
> https://github.com/apache/flink/pull/2574.
> It has already been merged into the master and release-1.1 branches,
> so you can try out the changes now if you’d like.
> The change should also be included in the 1.1.3 release, which the
> Flink community is discussing to release soon.
>
> Will definitely be helpful if you can provide feedback afterwards!
>
> Best Regards,
> Gordon
>
>
> On October 3, 2016 at 9:40:14 PM, Chakravarthy varaga (
> chakravarth...@gmail.com) wrote:
>
> Hi Stephan,
>
> Is the Async kafka offset commit released in 1.3.1?
>
> Varaga
>
> On Wed, Sep 28, 2016 at 9:49 AM, Chakravarthy varaga <
> chakravarth...@gmail.com> wrote:
>
>> Hi Stephan,
>>
>>  That should be great. Let me know once the fix is done and the
>> snapshot version to use, I'll check and revert then.
>>  Can you also share the JIRA that tracks the issue?
>>
>>  With regards to offset commit issue, I'm not sure as to how to
>> proceed here. Probably I'll use your fix first and see if the problem
>> reoccurs.
>>
>> Thanks much
>> Varaga
>>
>> On Tue, Sep 27, 2016 at 7:46 PM, Stephan Ewen 
>> wrote:
>>
>>> @CVP
>>>
>>> Flink stores in checkpoints in your case only the Kafka offsets (few
>>> bytes) and the custom state (e).
>>>
>>> Here is an illustration of the checkpoint and what is stored (from
>>> the Flink docs).
>>> https://ci.apache.org/projects/flink/flink-docs-master/inter
>>> nals/stream_checkpointing.html
>>>
>>>
>>> I am quite puzzled why the offset committing problem occurs only for
>>> one input, and not for the other.
>>> I am preparing a fix for 1.2, possibly going into 1.1.3 as well.
>>> Could you try out a snapshot version to see if that fixes your
>>> problem?
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>>
>>> On Tue, Sep 27, 2016 at 2:24 PM, Chakravarthy varaga <
>>> chakravarth...@gmail.com> wrote:
>>>
 Hi Stefan,

  Thanks a million for your detailed explanation. I appreciate
 it.

  -  The *zookeeper bundled with kafka 0.9.0.1* was used to
 start zookeeper. There is only 1 instance (standalone) of zookeeper 
 running
 on my localhost (ubuntu 14.04)
  -  There is only 1 Kafka broker (*version: 0.9.0.1* )

  With regards to Flink cluster there's only 1 JM & 2 TMs
 started with no HA. I presume this does not use zookeeper anyways as it
 runs as standalone cluster.


  BTW., The kafka connector version that I use is as suggested
 in the flink connectors page




 *.
 org.apache.flink
 flink-connector-kafka-0.9_2.10

Re: Flink Checkpoint runs slow for low load stream

2017-01-04 Thread Chakravarthy varaga
Hi Stephan, All,

 I just got a chance to try if 1.1.3 fixes slow check pointing on FS
backend. It seemed to have been fixed. Thanks for the fix.

 While testing this, with varying check point intervals, there seem to
be Spikes of slow checkpoints every 30/40 seconds for an interval of 15
secs. The check point time lasts for about 300 ms as apposed to 10/20 ms.
 Basically 15 secs seem to be the nominal value so far. anything below
this interval shoots the spikes too often. For us living with 15 sec
recovery is do-able and eventually catch up on recovery !

Best Regards
CVP

On Tue, Oct 4, 2016 at 6:20 PM, Chakravarthy varaga <
chakravarth...@gmail.com> wrote:

> Thanks for your prompt response Stephan.
>
> I'd wait for Flink 1.1.3 !!!
>
> Best Regards
> Varaga
>
> On Tue, Oct 4, 2016 at 5:36 PM, Stephan Ewen  wrote:
>
>> The plan to release 1.1.3 is asap ;-)
>>
>> Waiting for last backported patched to get in, then release testing and
>> release.
>>
>> If you want to test it today, you would need to manually build the
>> release-1.1 branch.
>>
>> Best,
>> Stephan
>>
>>
>> On Tue, Oct 4, 2016 at 5:46 PM, Chakravarthy varaga <
>> chakravarth...@gmail.com> wrote:
>>
>>> Hi Gordon,
>>>
>>>  Do I need to clone and build release-1.1 branch to test this?
>>>  I currently use flinlk 1.1.2 runtime. When is the plan to release
>>> it in 1.1.3?
>>>
>>> Best Regards
>>> Varaga
>>>
>>> On Tue, Oct 4, 2016 at 9:25 AM, Tzu-Li (Gordon) Tai >> > wrote:
>>>
 Hi,

 Helping out here: this is the PR for async Kafka offset committing -
 https://github.com/apache/flink/pull/2574.
 It has already been merged into the master and release-1.1 branches, so
 you can try out the changes now if you’d like.
 The change should also be included in the 1.1.3 release, which the
 Flink community is discussing to release soon.

 Will definitely be helpful if you can provide feedback afterwards!

 Best Regards,
 Gordon


 On October 3, 2016 at 9:40:14 PM, Chakravarthy varaga (
 chakravarth...@gmail.com) wrote:

 Hi Stephan,

 Is the Async kafka offset commit released in 1.3.1?

 Varaga

 On Wed, Sep 28, 2016 at 9:49 AM, Chakravarthy varaga <
 chakravarth...@gmail.com> wrote:

> Hi Stephan,
>
>  That should be great. Let me know once the fix is done and the
> snapshot version to use, I'll check and revert then.
>  Can you also share the JIRA that tracks the issue?
>
>  With regards to offset commit issue, I'm not sure as to how to
> proceed here. Probably I'll use your fix first and see if the problem
> reoccurs.
>
> Thanks much
> Varaga
>
> On Tue, Sep 27, 2016 at 7:46 PM, Stephan Ewen 
> wrote:
>
>> @CVP
>>
>> Flink stores in checkpoints in your case only the Kafka offsets (few
>> bytes) and the custom state (e).
>>
>> Here is an illustration of the checkpoint and what is stored (from
>> the Flink docs).
>> https://ci.apache.org/projects/flink/flink-docs-master/inter
>> nals/stream_checkpointing.html
>>
>>
>> I am quite puzzled why the offset committing problem occurs only for
>> one input, and not for the other.
>> I am preparing a fix for 1.2, possibly going into 1.1.3 as well.
>> Could you try out a snapshot version to see if that fixes your
>> problem?
>>
>> Greetings,
>> Stephan
>>
>>
>>
>> On Tue, Sep 27, 2016 at 2:24 PM, Chakravarthy varaga <
>> chakravarth...@gmail.com> wrote:
>>
>>> Hi Stefan,
>>>
>>>  Thanks a million for your detailed explanation. I appreciate it.
>>>
>>>  -  The *zookeeper bundled with kafka 0.9.0.1* was used to
>>> start zookeeper. There is only 1 instance (standalone) of zookeeper 
>>> running
>>> on my localhost (ubuntu 14.04)
>>>  -  There is only 1 Kafka broker (*version: 0.9.0.1* )
>>>
>>>  With regards to Flink cluster there's only 1 JM & 2 TMs started
>>> with no HA. I presume this does not use zookeeper anyways as it runs as
>>> standalone cluster.
>>>
>>>
>>>  BTW., The kafka connector version that I use is as suggested in
>>> the flink connectors page
>>>
>>>
>>>
>>>
>>> *.
>>> org.apache.flink
>>> flink-connector-kafka-0.9_2.10
>>> 1.1.1 *
>>>
>>>  Do you see any issues with versions?
>>>
>>>  1) Do you have benchmarks wrt., to checkpointing in flink?
>>>
>>>  2) There isn't detailed explanation on what states are stored
>>> as part of the checkpointing process. For ex.,  If I have pipeline like
>>> *source -> map -> keyBy -> map -> sink, my assumption on what's
>>> stored is:*
>>>
>>> * a) The source stream's custom watermarked records*

Re: Reading worker-local input files

2017-01-04 Thread Robert Schmidtke
Hi Fabian,

thanks for your directions! They worked flawlessly. I am aware of the
reduced robustness, but then again my input is only available on each
worker and not replicated. In case anyone is wondering, here is how I did
it:
*https://github.com/robert-schmidtke/hdfs-statistics-adapter/tree/2a456a832c9d84b324a966c431171f761f3444f5
*

Thanks again!
Robert

On Tue, Dec 27, 2016 at 4:36 PM, Fabian Hueske  wrote:

>
> Hi Robert,
>
> this is indeed a bit tricky to do. The problem is mostly with the
> generation of the input splits, setup of Flink, and the scheduling of tasks.
>
> 1) you have to ensure that on each worker at least one DataSource task is
> scheduled. The easiest way to do this is to have a bare metal setup (no
> YARN) and a single TaskManager per worker. Each TM should have the same
> number of slots and the DataSource should have a parallelism of #TMs *
> slots. This will ensure that the same number of DataSource tasks is started
> on each worker.
>
> 2) you need to tweak the input split generation. Flink's FileInputFormat
> assume that it can access all files to be read via a distributed file
> system. Your InputFormat should have a parameter for the list of
> taskmanager (hostnames, IP addresses) and the number of slots per TM. The
> InputFormat.createInputSplits() should create one input split for each
> parallel task. Each split should have (hostname, local index)
>
> 3) you need to tweak the input split assignment. You need to provide a
> custom input split provider that ensures that splits are only assigned to
> the correct task manager. Otherwise it might happen that a TaskManager
> processes the split of another TM and some data is read twice while other
> data is not read at all.
>
> 4) once a split is assigned to a task the InputFormat.open() method is
> called. Based on the local index, the task should decide which files (or
> parts of files) it needs to read. This decision must be deterministic (only
> depend on local index) and ensure that all data (files / parts of files)
> are read exactly once (you'll need the number of slots per host for that).
>
> You see, this is not trivial. Moreover, such a setup is not flexible,
> quite fragile, and not fault tolerant.
> Since you need to read local files are not available anywhere else, your
> job will fail if a TM goes down.
>
> If possible, I would recommend to move the data into a distributed file
> system.
>
> Best,
> Fabian
>
> 2016-12-27 13:04 GMT+01:00 Robert Schmidtke :
>
>> Hi everyone,
>>
>> I'm using Flink and/or Hadoop on my cluster, and I'm having them generate
>> log data in each worker node's /local folder (regular mount point). Now I
>> would like to process these files using Flink, but I'm not quite sure how I
>> could tell Flink to use each worker node's /local folder as input path,
>> because I'd expect Flink to look in the /local folder of the submitting
>> node only. Do I have to put these files into HDFS or is there a way to tell
>> Flink the file:///local file URI refers to worker-local data? Thanks in
>> advance for any hints and best
>>
>> Robert
>>
>> --
>> My GPG Key ID: 336E2680
>>
>
>


-- 
My GPG Key ID: 336E2680

On Tue, Dec 27, 2016 at 4:36 PM, Fabian Hueske  wrote:

>
> Hi Robert,
>
> this is indeed a bit tricky to do. The problem is mostly with the
> generation of the input splits, setup of Flink, and the scheduling of tasks.
>
> 1) you have to ensure that on each worker at least one DataSource task is
> scheduled. The easiest way to do this is to have a bare metal setup (no
> YARN) and a single TaskManager per worker. Each TM should have the same
> number of slots and the DataSource should have a parallelism of #TMs *
> slots. This will ensure that the same number of DataSource tasks is started
> on each worker.
>
> 2) you need to tweak the input split generation. Flink's FileInputFormat
> assume that it can access all files to be read via a distributed file
> system. Your InputFormat should have a parameter for the list of
> taskmanager (hostnames, IP addresses) and the number of slots per TM. The
> InputFormat.createInputSplits() should create one input split for each
> parallel task. Each split should have (hostname, local index)
>
> 3) you need to tweak the input split assignment. You need to provide a
> custom input split provider that ensures that splits are only assigned to
> the correct task manager. Otherwise it might happen that a TaskManager
> processes the split of another TM and some data is read twice while other
> data is not read at all.
>
> 4) once a split is assigned to a task the InputFormat.open() method is
> called. Based on the local index, the task should decide which files (or
> parts of files) it needs to read. This decision must be deterministic (only
> depend on local index) and ensure that all data (files / 

Re: Question about Scheduling of Batch Jobs

2017-01-04 Thread Konstantin Knauf
Hi Fabian,

I see, thank's for the quick explanation.

Cheers,

Konstantin


On 04.01.2017 14:15, Fabian Hueske wrote:
> Hi Konstantin,
> 
> the DataSet API tries to execute all operators as soon as possible.
> 
> I assume that in your case, Flink does not do this because it tries to
> avoid a deadlock.
> A dataflow which replicates data from the same source and joins it again
> might get deadlocked because all pipelines need to make progress in
> order to finish the source.
> 
> Think of a simple example like this:
> 
>/-- Map1 --\
> Src --<  >-Join
>\-- Map2 --/
> 
> If the join is executed as a hash join, one input (Map1) is used to
> build a hash table. Only once the hash table is built, the other input
> (Map2) can be consumed.
> If both Map operators would run at the same time, Map2 would stall at
> some point because it cannot emit anymore data due to the backpressure
> of the not-yet-opened probe input of the hash join.
> Once Map2 stalls, the Source would stall and Map1 could not continue to
> finish the build side. At this point we have a deadlock.
> 
> Flink detects these situations and adds an artificial pipeline breaker
> in the dataflow to prevent deadlocks. Due to the pipeline breaker, the
> build side is completed before the probe side input is processed.
> 
> This also answers the question, which operator is executed first: the
> operator on the build side of the first join. Hence the join strategy of
> the optimizer (BUILD_FIRST, BUILD_SECONS) decides.
> You can also give a manual JoinHint to control that. If you give a
> SORT_MERGE hint, all three operators should run concurrently because
> both join input will be concurrently consumed for sorting.
> 
> Best, Fabian
> 
> 
> 2017-01-04 13:30 GMT+01:00 Konstantin Knauf
> >:
> 
> Hi everyone,
> 
> I have a basic question regarding scheduling of batch programs. Let's
> take the following graph:
> 
>   -> Group Combine -> ...
> /
> Source > Group Combine -> ...
> \
>   -> Map -> ...
> 
> So, a source and followed by three operators with ship strategy
> "Forward" and exchange mode "pipelined".
> 
> The three flows are later joined again, so that this results in a single
> job.
> 
> When the job is started, first, only one of the operators immediately
> receive the input read by the source and can therefore run concurrently
> with the source. Once the source is finished, the other two operators
> are scheduled.
> 
> Two questions about this:
> 
> 1) Why doesn't the source forward the records to all three operators
> while still running?
> 2) How does the jobmanager decide, which of the three operators
> receivese the pipelined data first?
> 
> Cheers and Thanks,
> 
> Konstantin
> 
> 
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com
>  * +49-174-3413182
> 
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
> 
> 

-- 
Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



signature.asc
Description: OpenPGP digital signature


Re: Question about Scheduling of Batch Jobs

2017-01-04 Thread Fabian Hueske
Hi Konstantin,

the DataSet API tries to execute all operators as soon as possible.

I assume that in your case, Flink does not do this because it tries to
avoid a deadlock.
A dataflow which replicates data from the same source and joins it again
might get deadlocked because all pipelines need to make progress in order
to finish the source.

Think of a simple example like this:

   /-- Map1 --\
Src --<  >-Join
   \-- Map2 --/

If the join is executed as a hash join, one input (Map1) is used to build a
hash table. Only once the hash table is built, the other input (Map2) can
be consumed.
If both Map operators would run at the same time, Map2 would stall at some
point because it cannot emit anymore data due to the backpressure of the
not-yet-opened probe input of the hash join.
Once Map2 stalls, the Source would stall and Map1 could not continue to
finish the build side. At this point we have a deadlock.

Flink detects these situations and adds an artificial pipeline breaker in
the dataflow to prevent deadlocks. Due to the pipeline breaker, the build
side is completed before the probe side input is processed.

This also answers the question, which operator is executed first: the
operator on the build side of the first join. Hence the join strategy of
the optimizer (BUILD_FIRST, BUILD_SECONS) decides.
You can also give a manual JoinHint to control that. If you give a
SORT_MERGE hint, all three operators should run concurrently because both
join input will be concurrently consumed for sorting.

Best, Fabian


2017-01-04 13:30 GMT+01:00 Konstantin Knauf :

> Hi everyone,
>
> I have a basic question regarding scheduling of batch programs. Let's
> take the following graph:
>
>   -> Group Combine -> ...
> /
> Source > Group Combine -> ...
> \
>   -> Map -> ...
>
> So, a source and followed by three operators with ship strategy
> "Forward" and exchange mode "pipelined".
>
> The three flows are later joined again, so that this results in a single
> job.
>
> When the job is started, first, only one of the operators immediately
> receive the input read by the source and can therefore run concurrently
> with the source. Once the source is finished, the other two operators
> are scheduled.
>
> Two questions about this:
>
> 1) Why doesn't the source forward the records to all three operators
> while still running?
> 2) How does the jobmanager decide, which of the three operators
> receivese the pipelined data first?
>
> Cheers and Thanks,
>
> Konstantin
>
>
> --
> Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
> Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
> Sitz: Unterföhring * Amtsgericht München * HRB 135082
>
>


Re: Flink streaming questions

2017-01-04 Thread Fabian Hueske
Hi Henri,

can you express the logic of your FoldFunction (or WindowFunction) as a
combination of ReduceFunction and WindowFunction [1]?
ReduceFunction should be supported by a merging WindowAssigner and has the
same resource consumption as a FoldFunction, i.e., a single record per
window.

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/windows.html#windowfunction-with-incremental-aggregation

2017-01-03 12:32 GMT+01:00 Henri Heiskanen :

> Hi,
>
> Actually it seems "Fold cannot be used with a merging WindowAssigner" and
> workaround I found was to use generic window function. It seems that I
> would need to use the window apply anyway. Functionality is then all there,
> but I am really concerned on the resource utilisations. We have quite many
> concurrent users, they generate a lot of events and sessions may be long.
>
> The workaround you gave for initialisation was exactly what I was doing
> already and yes it is so dynamic that you can not use constructor. However,
> I would need to also close the resources I open gracefully and as
> initialisation is quite heavy it was weird to put that in fold function to
> be done on first event processed.
>
> Br,
> Henri H
>
> On Mon, Jan 2, 2017 at 10:20 PM, Jamie Grier 
> wrote:
>
>> Hi Henri,
>>
>> #1 - This is by design.  Event time advances with the slowest input
>> source.  If there are input sources that generate no data this is
>> indistinguishable from a slow source.  Kafka topics where some partitions
>> receive no data are a problem in this regard -- but there isn't a simple
>> solution.  If possible I would address it at the source.
>>
>> #2 - If it's possible to run these init functions just once when you
>> submit the job you can run them in the constructor of your FoldFunction.
>> This init will then happen exactly once (on the client) and the constructed
>> FoldFunction is then serialized and distributed around the cluster.  If
>> this doesn't work because you need something truly dynamic you could also
>> accomplish this with a simple local variable in your function.
>>
>> class MyFoldFunction extends FoldFunction {
>>>   private var initialized = false
>>>   def fold(accumulator: T, value: O): T = {
>>> if(!initialized){
>>>   doInitStuff()
>>>   initialized = true
>>> }
>>>
>>> doNormalStuff()
>>>   }
>>> }
>>
>>
>> #3 - One way to do this is as you've said which is to attach the profile
>> information to the event, using a mapper, before it enters the window
>> operations.
>>
>>
>> On Mon, Jan 2, 2017 at 1:25 AM, Henri Heiskanen <
>> henri.heiska...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I have few questions related to Flink streaming. I am on 1.2-SNAPSHOT
>>> and what I would like to accomplish is to have a stream that reads data
>>> from multiple kafka topics, identifies user sessions, uses an external user
>>> user profile to enrich the data, evaluates an script to produce session
>>> aggregates and then create updated profiles from session aggregates. I am
>>> working with high volume data and user sessions may be long, so using
>>> generic window apply might not work. Below is the simplification of the
>>> stream.
>>>
>>> stream = createKafkaStreams(...);
>>> env.setParallelism(4);
>>> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
>>> stream
>>> .keyBy(2)
>>> .window(EventTimeSessionWindow
>>> s.withGap(Time.minutes(10)))
>>> .fold(new SessionData(), new SessionFold(), new
>>> ProfilerApply())
>>> .print();
>>>
>>> The questions:
>>>
>>> 1. Initially when I used event time windowing I could not get any of my
>>> windows to close. The reason seemed to be that I had 6 partitions in my
>>> test kafka setup and only 4 of them generated traffic. If I used
>>> parallelism above 4, then no windows were closed. Is this by design or a
>>> defect? We use flink-connector-kafka-0.10 because earlier versions did not
>>> commit the offsets correctly.
>>>
>>> 2. Rich fold functions are not supported. However I would like execute a
>>> piece of custom script in the fold function that requires initialisation
>>> part. I would have used the open and close lifecycle methods of rich
>>> functions but they are not available now in fold. What would be the
>>> preferred way to run some initialisation routines (and closing the
>>> gracefully) when using fold?
>>>
>>> 3. Kind of related to above. I would also like to fetch a user profile
>>> from external source in the beginning of the session. What would be a best
>>> practice for that kind of operation? If I would be using the generic window
>>> apply I could fetch in in the beginning of the apply method. I was thinking
>>> of introducing a mapper that fetches this profiler periodically and caches
>>> it to flink state. However, with this setup I would not be able to tie this
>>> to user sessions identified for windows.
>>>
>>> 

Re: How do I ensure binary comparisons are being used?

2017-01-04 Thread Fabian Hueske
Hi Lawrence,

comparison of binary data are mainly used by the DataSet API when sorting
large data sets or building and probing hash tables.

The DataStream API mainly benefits from Flink's custom and efficient
serialization when sending data over the wire or taking checkpoints.
There are also plans to implement a state backend based on the
serialization stack which leverages Flink's managed memory instead of
holding object on the heap (the RocksDB state backend is the current
solution to avoid this).

>From what I know, the DataStream API does not perform compare on serialized
data.

Best, Fabian



2017-01-03 7:53 GMT+01:00 ljwagerfield :

> Any insights on this?
>
> Thanks,
> Lawrence
>
>
>
> --
> View this message in context: http://apache-flink-user-
> mailing-list-archive.2336050.n4.nabble.com/How-do-I-ensure-
> binary-comparisons-are-being-used-tp10806p10819.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>


Question about Scheduling of Batch Jobs

2017-01-04 Thread Konstantin Knauf
Hi everyone,

I have a basic question regarding scheduling of batch programs. Let's
take the following graph:

  -> Group Combine -> ...
/
Source > Group Combine -> ...
\
  -> Map -> ...

So, a source and followed by three operators with ship strategy
"Forward" and exchange mode "pipelined".

The three flows are later joined again, so that this results in a single
job.

When the job is started, first, only one of the operators immediately
receive the input read by the source and can therefore run concurrently
with the source. Once the source is finished, the other two operators
are scheduled.

Two questions about this:

1) Why doesn't the source forward the records to all three operators
while still running?
2) How does the jobmanager decide, which of the three operators
receivese the pipelined data first?

Cheers and Thanks,

Konstantin


-- 
Konstantin Knauf * konstantin.kn...@tngtech.com * +49-174-3413182
TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring
Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke
Sitz: Unterföhring * Amtsgericht München * HRB 135082



signature.asc
Description: OpenPGP digital signature


Re: 1.1.4 IntelliJ Problem

2017-01-04 Thread Yury Ruchin
Hi Stephan,

It looks like you have libraries from different versions of Flink
distribution on the same classpath.

ForkableFlinkMiniCluster resides in flink-test-utils. As of distribution
version 1.1.3 it invokes JobManager.startJobManagerActors() with 6
arguments. The signature changed by 1.1.4, and ForkableFlinkMiniCluster now
invokes the method with 8 arguments of different types. This might mean
that flink-test-utils library on your classpath has version 1.1.3 whereas
flink-runtime has 1.1.4.

You might want to thoroughly inspect your classpath to ensure that every
Flink-related dependency has version 1.1.4.

Regards,
Yury

2017-01-04 11:20 GMT+03:00 Stephan Epping :

> I also changed the scala version of the packages/artifacts to 2.11, with
> no success.
> Further, I am not deeply familiar with maven or java dependency management
> at all.
>
> best Stephan
>
> On 03 Jan 2017, at 22:06, Stephan Ewen  wrote:
>
> Hi!
>
> It is probably some inconsistent configuration in the IDE.
>
> It often helps to do "Maven->Reimport" or use "restart and clear caches".
>
>
> On Tue, Jan 3, 2017 at 9:48 AM, Stephan Epping 
> wrote:
>
>> Hey,
>>
>>
>> thanks for the reply. I didn’t change the scala version, as it worked 
>> before. I just changed the flink version in my pom.xml thats it, a one line 
>> change.
>>
>> Maybe you could elaborate a bit more, what I can do to change the scala 
>> version?
>>
>>
>> best Stephan
>>
>>
>>
>> On 03 Jan 2017, at 03:11, Kurt Young  wrote:
>>
>> Seems like you didn't setup the correct scala SDK
>>
>> best,
>> Kurt
>>
>> On Mon, Jan 2, 2017 at 10:41 PM, Stephan Epping <
>> stephan.epp...@zweitag.de> wrote:
>>
>>> Hi,
>>>
>>> I am getting this error running my tests with 1.1.4 inside intellij ide.
>>>
>>> java.lang.NoSuchMethodError: org.apache.flink.runtime.jobma
>>> nager.JobManager$.startJobManagerActors(Lorg/apache/flink/co
>>> nfiguration/Configuration;Lakka/actor/ActorSystem;Lscala/
>>> Option;Lscala/Option;Ljava/lang/Class;Ljava/lang/Class;)Lscala/Tuple2;
>>>
>>> at org.apache.flink.test.util.ForkableFlinkMiniCluster.startJob
>>> Manager(ForkableFlinkMiniCluster.scala:103)
>>> at org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonf
>>> un$1.apply(FlinkMiniCluster.scala:292)
>>> at org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonf
>>> un$1.apply(FlinkMiniCluster.scala:286)
>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>> sableLike.scala:244)
>>> at scala.collection.TraversableLike$$anonfun$map$1.apply(Traver
>>> sableLike.scala:244)
>>> at scala.collection.immutable.Range.foreach(Range.scala:141)
>>> at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>> at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>> at org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(
>>> FlinkMiniCluster.scala:286)
>>> at org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(
>>> FlinkMiniCluster.scala:277)
>>> at org.apache.flink.test.util.ForkableFlinkMiniCluster.start(Fo
>>> rkableFlinkMiniCluster.scala:255)
>>> at org.apache.flink.test.util.TestBaseUtils.startCluster(TestBa
>>> seUtils.java:152)
>>> at org.apache.flink.test.util.TestBaseUtils.startCluster(TestBa
>>> seUtils.java:126)
>>> at org.flinkspector.datastream.DataStreamTestEnvironment.create
>>> TestEnvironment(DataStreamTestEnvironment.java:72)
>>>
>>> Any ideas?
>>>
>>> best,
>>> Stephan
>>>
>>>
>>
>>
>
>


Re: Triggering a saveppoint failed the Job

2017-01-04 Thread Stephan Ewen
Hi!

Thanks for reporting this.

I created a JIRA issue for it:
https://issues.apache.org/jira/browse/FLINK-5407

We'll look into it as part of the 1.2 release testing. If you have any more
details that may help diagnose/fix that, would be great if you could share
them with us.

Thanks,
Stephan


On Wed, Jan 4, 2017 at 10:52 AM, Yassine MARZOUGUI <
y.marzou...@mindlytix.com> wrote:

> Hi all,
>
> I tried to trigger a savepoint for a streaming job, both the savepoint and
> the job failed.
>
> The job failed with the following exception:
>
> java.lang.RuntimeException: Error while triggering checkpoint for 
> IterationSource-7 (1/1)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
>   at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
>   at java.util.concurrent.FutureTask.run(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>   at java.lang.Thread.run(Unknown Source)
> Caused by: java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
>   at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
>   ... 5 more
>
>
> And the savepoint failed with the following exception:
>
> Using address /127.0.0.1:6123 to connect to JobManager.
> Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
> Waiting for response...
>
> 
>  The program finished with the following exception:
>
> java.lang.Exception: Failed to complete savepoint
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
> at org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.
> recover(FlinkFuture.java:263)
> at akka.dispatch.Recover.internal(Future.scala:267)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
> at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
> at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
> at scala.util.Try$.apply(Try.scala:161)
> at scala.util.Failure.recover(Try.scala:185)
> at scala.concurrent.Future$$anonfun$recover$1.apply(
> Future.scala:324)
> at scala.concurrent.Future$$anonfun$recover$1.apply(
> Future.scala:324)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at akka.dispatch.BatchingExecutor$Batch$$
> anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
> at akka.dispatch.BatchingExecutor$Batch$$
> anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
> at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(
> BatchingExecutor.scala:59)
> at akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(
> BatchingExecutor.scala:59)
> at scala.concurrent.BlockContext$.withBlockContext(
> BlockContext.scala:72)
> at akka.dispatch.BatchingExecutor$Batch.run(
> BatchingExecutor.scala:58)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at akka.dispatch.ForkJoinExecutorConfigurator$
> AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(
> ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1339)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
> Caused by: java.lang.Exception: Checkpoint failed: Checkpoint Coordinator
> is shutting down
> at org.apache.flink.runtime.checkpoint.PendingCheckpoint.
> abortError(PendingCheckpoint.java:338)
> at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.
> shutdown(CheckpointCoordinator.java:245)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.
> postRunCleanup(ExecutionGraph.java:1065)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.
> 

Triggering a saveppoint failed the Job

2017-01-04 Thread Yassine MARZOUGUI
Hi all,

I tried to trigger a savepoint for a streaming job, both the savepoint and
the job failed.

The job failed with the following exception:

java.lang.RuntimeException: Error while triggering checkpoint for
IterationSource-7 (1/1)
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1026)
at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
at java.util.concurrent.FutureTask.run(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.createOperatorIdentifier(StreamTask.java:767)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.access$500(StreamTask.java:115)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.createStreamFactory(StreamTask.java:986)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask$CheckpointingOperation.executeCheckpointing(StreamTask.java:956)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.checkpointState(StreamTask.java:583)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.performCheckpoint(StreamTask.java:551)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.triggerCheckpoint(StreamTask.java:511)
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1019)
... 5 more


And the savepoint failed with the following exception:

Using address /127.0.0.1:6123 to connect to JobManager.
Triggering savepoint for job 153310c4a836a92ce69151757c6b73f1.
Waiting for response...


 The program finished with the following exception:

java.lang.Exception: Failed to complete savepoint
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:793)
at
org.apache.flink.runtime.jobmanager.JobManager$$anonfun$handleMessage$1$$anon$7.apply(JobManager.scala:782)
at
org.apache.flink.runtime.concurrent.impl.FlinkFuture$6.recover(FlinkFuture.java:263)
at akka.dispatch.Recover.internal(Future.scala:267)
at akka.dispatch.japi$RecoverBridge.apply(Future.scala:183)
at akka.dispatch.japi$RecoverBridge.apply(Future.scala:181)
at scala.util.Failure$$anonfun$recover$1.apply(Try.scala:185)
at scala.util.Try$.apply(Try.scala:161)
at scala.util.Failure.recover(Try.scala:185)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at
scala.concurrent.Future$$anonfun$recover$1.apply(Future.scala:324)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at
akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: java.lang.Exception: Checkpoint failed: Checkpoint Coordinator
is shutting down
at
org.apache.flink.runtime.checkpoint.PendingCheckpoint.abortError(PendingCheckpoint.java:338)
at
org.apache.flink.runtime.checkpoint.CheckpointCoordinator.shutdown(CheckpointCoordinator.java:245)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.postRunCleanup(ExecutionGraph.java:1065)
at
org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:1034)
at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:435)
at
org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:407)
at
org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:593)
at
org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:729)
at

Re: Does Flink cluster security works in Flink 1.1.4 release?

2017-01-04 Thread Stephan Ewen
Hi!

Flink 1.1.x supports Kerberos for Hadoop (HDFS, YARN, HBase) via Hadoop's
ticket system. It should work via kinit, in the same way when submitting a
secure MapReduce job.

Kerberos for ZooKeeper, Kafka, etc, is only part of the 1.2 release.

Greetings,
Stephan


On Wed, Jan 4, 2017 at 7:25 AM, Zhangrucong  wrote:

> Hi:
>
>  Now I use Flink 1.1.4 release in standalone cluster model. I want to
> do the Kerberos authentication between Flink CLI and the Jobmanager. But in
> the flink-conf.yaml, there is no Flink cluster security configuration.
>
> Does the Kerberos authentication works in Flink 1.1.4 release?
>
> Thanks in advance!
>
>
>


Re: Cannot run using a savepoint with the same jar

2017-01-04 Thread Stephan Ewen
Hi!

Did you change the parallelism in your program, or do the names of some
functions change each time you call the program?

Can you try what happens when you give explicit IDs to operators via the
'.uid(...)' method?

Stephan


On Tue, Jan 3, 2017 at 11:44 PM, Al-Isawi Rami 
wrote:

> Hi,
>
> I have a flink job that I can trigger a save point for with no problem.
> However, If I cancel the job then try to run it with the save point, I get
> the following exception. Any ideas how can I debug or fix it? I am using
> the exact same jar so I did not modify the program in any manner. Using
> Flink version 1.1.4
>
>
> Caused by: java.lang.IllegalStateException: Failed to rollback to
> savepoint jobmanager://savepoints/1. Cannot map savepoint state for
> operator 1692abfa98b4a67c1b7dfc17f79d35d7 to the new program, because the
> operator is not available in the new program. If you want to allow this,
> you can set the --allowNonRestoredState option on the CLI.
> at org.apache.flink.runtime.checkpoint.savepoint.SavepointCoordinator.
> restoreSavepoint(SavepointCoordinator.java:257)
> at org.apache.flink.runtime.executiongraph.ExecutionGraph.
> restoreSavepoint(ExecutionGraph.java:1020)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply$mcV$sp(JobManager.scala:1336)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply(JobManager.scala:1326)
> at org.apache.flink.runtime.jobmanager.JobManager$$
> anonfun$org$apache$flink$runtime$jobmanager$JobManager$
> $submitJob$1.apply(JobManager.scala:1326)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.
> liftedTree1$1(Future.scala:24)
> at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(
> Future.scala:24)
> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
> at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(
> AbstractDispatcher.scala:401)
> at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> pollAndExecAll(ForkJoinPool.java:1253)
> at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.
> runTask(ForkJoinPool.java:1346)
> at scala.concurrent.forkjoin.ForkJoinPool.runWorker(
> ForkJoinPool.java:1979)
> at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(
> ForkJoinWorkerThread.java:107)
> Disclaimer: This message and any attachments thereto are intended solely
> for the addressed recipient(s) and may contain confidential information. If
> you are not the intended recipient, please notify the sender by reply
> e-mail and delete the e-mail (including any attachments thereto) without
> producing, distributing or retaining any copies thereof. Any review,
> dissemination or other use of, or taking of any action in reliance upon,
> this information by persons or entities other than the intended
> recipient(s) is prohibited. Thank you.
>


Re: 1.1.4 IntelliJ Problem

2017-01-04 Thread Stephan Epping
I also changed the scala version of the packages/artifacts to 2.11, with no 
success.
Further, I am not deeply familiar with maven or java dependency management at 
all.

best Stephan

> On 03 Jan 2017, at 22:06, Stephan Ewen  wrote:
> 
> Hi!
> 
> It is probably some inconsistent configuration in the IDE.
> 
> It often helps to do "Maven->Reimport" or use "restart and clear caches".
> 
> 
> On Tue, Jan 3, 2017 at 9:48 AM, Stephan Epping  > wrote:
> Hey,
> 
> thanks for the reply. I didn’t change the scala version, as it worked before. 
> I just changed the flink version in my pom.xml thats it, a one line change.
> Maybe you could elaborate a bit more, what I can do to change the scala 
> version?
> 
> best Stephan
> 
> 
>> On 03 Jan 2017, at 03:11, Kurt Young > > wrote:
>> 
>> Seems like you didn't setup the correct scala SDK
>> 
>> best,
>> Kurt
>> 
>> On Mon, Jan 2, 2017 at 10:41 PM, Stephan Epping > > wrote:
>> Hi,
>> 
>> I am getting this error running my tests with 1.1.4 inside intellij ide.
>> 
>> java.lang.NoSuchMethodError: 
>> org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(Lorg/apache/flink/configuration/Configuration;Lakka/actor/ActorSystem;Lscala/Option;Lscala/Option;Ljava/lang/Class;Ljava/lang/Class;)Lscala/Tuple2;
>> 
>>  at 
>> org.apache.flink.test.util.ForkableFlinkMiniCluster.startJobManager(ForkableFlinkMiniCluster.scala:103)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonfun$1.apply(FlinkMiniCluster.scala:292)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonfun$1.apply(FlinkMiniCluster.scala:286)
>>  at 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>  at 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>  at scala.collection.immutable.Range.foreach(Range.scala:141)
>>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>  at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:286)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:277)
>>  at 
>> org.apache.flink.test.util.ForkableFlinkMiniCluster.start(ForkableFlinkMiniCluster.scala:255)
>>  at 
>> org.apache.flink.test.util.TestBaseUtils.startCluster(TestBaseUtils.java:152)
>>  at 
>> org.apache.flink.test.util.TestBaseUtils.startCluster(TestBaseUtils.java:126)
>>  at 
>> org.flinkspector.datastream.DataStreamTestEnvironment.createTestEnvironment(DataStreamTestEnvironment.java:72)
>> 
>> Any ideas?
>> 
>> best,
>> Stephan
>> 
>> 
> 
> 



Re: 1.1.4 IntelliJ Problem

2017-01-04 Thread Stephan Epping
Thanks Stephan, 

but that didn’t help. The IDE is configured to use Default Scala Compiler and 
JDK 1.8.0_92.

best Stephan


> On 03 Jan 2017, at 22:06, Stephan Ewen  wrote:
> 
> Hi!
> 
> It is probably some inconsistent configuration in the IDE.
> 
> It often helps to do "Maven->Reimport" or use "restart and clear caches".
> 
> 
> On Tue, Jan 3, 2017 at 9:48 AM, Stephan Epping  > wrote:
> Hey,
> 
> thanks for the reply. I didn’t change the scala version, as it worked before. 
> I just changed the flink version in my pom.xml thats it, a one line change.
> Maybe you could elaborate a bit more, what I can do to change the scala 
> version?
> 
> best Stephan
> 
> 
>> On 03 Jan 2017, at 03:11, Kurt Young > > wrote:
>> 
>> Seems like you didn't setup the correct scala SDK
>> 
>> best,
>> Kurt
>> 
>> On Mon, Jan 2, 2017 at 10:41 PM, Stephan Epping > > wrote:
>> Hi,
>> 
>> I am getting this error running my tests with 1.1.4 inside intellij ide.
>> 
>> java.lang.NoSuchMethodError: 
>> org.apache.flink.runtime.jobmanager.JobManager$.startJobManagerActors(Lorg/apache/flink/configuration/Configuration;Lakka/actor/ActorSystem;Lscala/Option;Lscala/Option;Ljava/lang/Class;Ljava/lang/Class;)Lscala/Tuple2;
>> 
>>  at 
>> org.apache.flink.test.util.ForkableFlinkMiniCluster.startJobManager(ForkableFlinkMiniCluster.scala:103)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonfun$1.apply(FlinkMiniCluster.scala:292)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster$$anonfun$1.apply(FlinkMiniCluster.scala:286)
>>  at 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>  at 
>> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>>  at scala.collection.immutable.Range.foreach(Range.scala:141)
>>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>>  at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:286)
>>  at 
>> org.apache.flink.runtime.minicluster.FlinkMiniCluster.start(FlinkMiniCluster.scala:277)
>>  at 
>> org.apache.flink.test.util.ForkableFlinkMiniCluster.start(ForkableFlinkMiniCluster.scala:255)
>>  at 
>> org.apache.flink.test.util.TestBaseUtils.startCluster(TestBaseUtils.java:152)
>>  at 
>> org.apache.flink.test.util.TestBaseUtils.startCluster(TestBaseUtils.java:126)
>>  at 
>> org.flinkspector.datastream.DataStreamTestEnvironment.createTestEnvironment(DataStreamTestEnvironment.java:72)
>> 
>> Any ideas?
>> 
>> best,
>> Stephan
>> 
>> 
> 
>