Re: Rich and incrementally aggregating window functions

2019-05-08 Thread Hequn Cheng
Hi,

There is a discussion about this before, you can take a look at it[1].
Best, Hequn

[1]
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Regarding-implementation-of-aggregate-function-using-a-ProcessFunction-td23473.html#a23531

On Thu, May 9, 2019 at 5:14 AM an0  wrote:

> I want to use ProcessWindowFunction.Context#globalState in my window
> function. But I don't want to apply ProcessWindowFunction directly to my
> WindowedStream because I don't want to buffer all the elements of each
> window. Currently I'm using WindowedStream#aggregate(AggregateFunction,
> ProcessWindowFunction).
>
> My understanding is that RichFunction.runtimeContext also give access to
> those global states. That thought naturally pointed me to
> RichAggregateFunction, RichReduceFunction and RichFoldFunction. However,
> they all cause runtime error like this:
> "AggregateFunction can not be a RichFunction. Please use
> fold(AggregateFunction, WindowFunction) instead."
>
> So how can I use an incrementally aggregating window function and have
> access to global states at the same time?
>


Re: Approach to Auto Scaling Flink Job

2019-05-08 Thread Rong Rong
Hi Anil,

We have a presentation[1] that briefly discuss the higher level of the
approach (via watchdog) in FlinkForward 2018.

We are also restructuring the approach of our open-source AthenaX:
Right now our internal implementation has diverged from the open-source for
too long, it has been a problem for us to merged back to open-source
upstream. So we are likely to create a new modularized version of AthenaX
in the future.

Thanks for the interested, and please stay tune for our next release.

Best,
Rong

[1]
https://www.ververica.com/flink-forward/resources/building-flink-as-a-service-platform-at-uber

On Wed, May 8, 2019 at 11:32 AM Anil  wrote:

> Thanks for the reply Rong. Can you please let me know the design for the
> auto-scaling part, if possible.
> Or guide me in the direction so that I could create this feature myself.
>
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: Read data from HDFS on Hadoop3

2019-05-08 Thread Soheil Pourbafrani
UPDATE

I noticed that it runs using the IntelliJ IDEA but packaging the fat jar
and deploying on the cluster will cause the so-called hdfs scheme error!

On Thu, May 9, 2019 at 2:43 AM Soheil Pourbafrani 
wrote:

> Hi,
>
> I used to read data from HDFS on Hadoop2 by adding the following
> dependencies:
>
> 
> org.apache.flink
> flink-java
> 1.4.0
> 
> 
> org.apache.flink
> flink-streaming-java_2.11
> 1.4.0
> 
> 
> org.apache.flink
> flink-clients_2.11
> 1.4.0
> 
> 
> org.apache.flink
> flink-connector-filesystem_2.11
> 1.4.0
> 
> 
> org.apache.hadoop
> hadoop-client
> 2.7.5
> 
>
>
> But using the Hadoop3 and following dependencies I got the error:
> could not find a filesystem implementation for scheme 'hdfs'
>
> 
> org.apache.flink
> flink-streaming-java_2.11
> 1.8.0
> 
> 
> org.apache.flink
> flink-clients_2.11
> 1.8.0
> 
> 
> org.apache.flink
> flink-connector-filesystem_2.11
> 1.8.0
> 
> 
> org.apache.flink
> flink-hadoop-fs
> 1.8.0
> 
> 
> org.apache.hadoop
> hadoop-client
> 3.1.2
> 
>
> How can I resolve that?
>


Re: Reconstruct object through partial select query

2019-05-08 Thread shkob1
Just to be more clear on my goal -
Im trying to enrich the incoming stream with some meaningful tags based on
conditions from the event itself.
So the input stream could be an event looks like:
Class Car {
  int year;
  String modelName;
} 

i will have a config that are defining tags as:
"NiceCar" -> "year > 2015 AND position("Luxury" in modelName) > 0"

So ideally my output will be in the structure of

Class TaggedEvent {
   Car origin;
   String[] tags;
}






--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Reconstruct object through partial select query

2019-05-08 Thread shkob1
Hey,

I'm trying to create a SQL query which, given input from a stream with
generic class T type will create a new stream which will be in the structure
of 
{
  origin : T
  resultOfSomeSQLCalc : Array[String]
}

it seems that just by doing "SELECT *" i can convert the resulting table
back to a stream of the origin object. 
Im trying to understand how do i create that envelope with the origin
object. Something along the lines of "SELECT box(*), array(..)". I'm sensing
i need a function for that, but not sure how that looks.

Thanks for the help!
Shahar




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Read data from HDFS on Hadoop3

2019-05-08 Thread Soheil Pourbafrani
Hi,

I used to read data from HDFS on Hadoop2 by adding the following
dependencies:


org.apache.flink
flink-java
1.4.0


org.apache.flink
flink-streaming-java_2.11
1.4.0


org.apache.flink
flink-clients_2.11
1.4.0


org.apache.flink
flink-connector-filesystem_2.11
1.4.0


org.apache.hadoop
hadoop-client
2.7.5



But using the Hadoop3 and following dependencies I got the error:
could not find a filesystem implementation for scheme 'hdfs'


org.apache.flink
flink-streaming-java_2.11
1.8.0


org.apache.flink
flink-clients_2.11
1.8.0


org.apache.flink
flink-connector-filesystem_2.11
1.8.0


org.apache.flink
flink-hadoop-fs
1.8.0


org.apache.hadoop
hadoop-client
3.1.2


How can I resolve that?


Rich and incrementally aggregating window functions

2019-05-08 Thread an0
I want to use ProcessWindowFunction.Context#globalState in my window function. 
But I don't want to apply ProcessWindowFunction directly to my WindowedStream 
because I don't want to buffer all the elements of each window. Currently I'm 
using WindowedStream#aggregate(AggregateFunction, ProcessWindowFunction).

My understanding is that RichFunction.runtimeContext also give access to those 
global states. That thought naturally pointed me to RichAggregateFunction, 
RichReduceFunction and RichFoldFunction. However, they all cause runtime error 
like this:
"AggregateFunction can not be a RichFunction. Please use 
fold(AggregateFunction, WindowFunction) instead."

So how can I use an incrementally aggregating window function and have access 
to global states at the same time?


Re: Migration from flink 1.7.2 to 1.8.0

2019-05-08 Thread Farouk
Hi Till

Thanks. I'll check it out.

Farouk


Garanti
sans virus. www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

Le mer. 8 mai 2019 à 16:59, Till Rohrmann  a écrit :

> Hi Farouk,
>
> from the stack trace alone I cannot say much. Would it be possible to
> share a minimal example which reproduces the problem?
>
> My suspicion is that OperatorChain.java:294 produces a null value.
> Differently, said that somehow there is no StreamConfig registered for the
> given outputId. What you could check is whether you have compiled the Flink
> job with Flink 1.8.0 and not 1.7.2? Maybe something changed wrt the output
> enumeration.
>
> Cheers,
> Till
>
> On Tue, May 7, 2019 at 5:34 PM Farouk  wrote:
>
>> Hi
>>
>> We are migrating our app to Flink 1.8.0.
>>
>> We built a docker image like this as Hadoop is not anymore bundled :
>>
>> FROM myrepo:5/flink:1.8.0-scala_2.11-alpine
>>
>> ADD --chown=flink:flink
>> https://my-artifactory-repo/artifactory/my-repo/org/apache/flink/flink-shaded-hadoop2-uber/2.8.3-1.8.0/flink-shaded-hadoop2-uber-2.8.3-1.8.0.jar
>> /opt/flink/lib
>>
>> When running Flink, we are facing the stack trace below  :
>>
>> java.lang.NullPointerException
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:284)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:360)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:296)
>> at
>> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:133)
>> at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
>> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Any idea on what's happening ?
>>
>> I think it's a problem with classloading.
>>
>> With Flink 1.7.2, every thing works fine
>>
>> Thanks
>> Farouk
>>
>


Re: Flink on YARN: TaskManager heap auto-sizing?

2019-05-08 Thread Dylan Adams
Till,

Thanks for the pointer to the code.

Regards,
Dylan

On Wed, May 8, 2019 at 11:18 AM Till Rohrmann  wrote:

> Hi Dylan,
>
> the container's memory will be calculated here [1]. In the case of Yarn,
> the user specifies the container memory size and based on this Flink
> calculates with how much heap memory the JVM is started (container memory
> size - off heap memory - cut off memory).
>
> [1]
> https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ContaineredTaskManagerParameters.java#L160
>
> Cheers,
> Till
>
> On Tue, May 7, 2019 at 3:29 PM Dylan Adams  wrote:
>
>> In the Configuration section of the docs
>> ,
>> the description for "taskmanager.heap.size" contains: "On YARN setups,
>> this value is automatically configured to the size of the TaskManager's
>> YARN container, minus a certain tolerance value."
>>
>> Does that functionality exist?
>>
>> I don't see any documented method to specify the YARN container size for
>> the TaskManagers, nor could I find any logic in the Flink YARN integration
>> code that seemed to implement that behavior.
>>
>> My understanding is that you need to manually calculate and specify 
>> taskmanager.heap.size
>> (and jobmanager.heap.size) based on your YARN setup.
>>
>> Thanks,
>> Dylan
>>
>


Re: I want to use MapState on an unkeyed stream

2019-05-08 Thread an0
I switched to using operator list state. It is more clear. It is also supported 
by RocksDBKeyedStateBackend, isn't it?

On 2019/05/08 14:42:36, Till Rohrmann  wrote: 
> Hi,
> 
> if you want to increase the parallelism you could also pick a key randomly
> from a set of keys. The price you would pay is a shuffle operation (network
> I/O) which would not be needed if you were using the unkeyed stream and
> used the operator list state.
> 
> However, with keyed state you could also use Flink's
> RocksDBKeyedStateBackend which allows to go out of core if your state size
> should grow very large.
> 
> Cheers,
> Till
> 
> On Tue, May 7, 2019 at 5:57 PM an0  wrote:
> 
> > But I only have one stream, nothing to connect it to.
> >
> > On 2019/05/07 00:15:59, Averell  wrote:
> > > From my understanding, having a fake keyBy (stream.keyBy(r =>
> > "dummyString"))
> > > means there would be only one slot handling the data.
> > > Would a broadcast function [1] work for your case?
> > >
> > > Regards,
> > > Averell
> > >
> > > [1]
> > >
> > https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html
> > >
> > >
> > >
> > > --
> > > Sent from:
> > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
> > >
> >
> 


Re: Approach to Auto Scaling Flink Job

2019-05-08 Thread Anil
Thanks for the reply Rong. Can you please let me know the design for the
auto-scaling part, if possible. 
Or guide me in the direction so that I could create this feature myself. 

 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Testing with ProcessingTime and FromElementsFunction with TestEnvironment

2019-05-08 Thread Steven Nelson

That’s what I figured was happening :( Your explanation is a lot better than 
what I gave to my team, so that will help a lot, thank you!

Is there a testing source already created that does this sort of thing? The 
Flink-testing library seems a bit sparse.

-Steve

Sent from my iPhone

> On May 8, 2019, at 9:33 AM, Till Rohrmann  wrote:
> 
> Hi Steve,
> 
> I think the reason for the different behaviour is due to the way event time 
> and processing time are implemented. 
> 
> When you are using event time, watermarks need to travel through the topology 
> denoting the current event time. When you source terminates, the system will 
> send a watermark with Long.MAX_VALUE through the topology. This will 
> effectively trigger the completion of all pending event time operations.
> 
> In the case of processing time, Flink does not do this. Instead it simply 
> relies on the processing time clocks on each machine. Hence, there is no way 
> for Flink to tell the different machines that their respective processing 
> time clocks should proceed to a certain time in case of a shutdown. Instead 
> you should make sure that you don't terminate the job before a certain time 
> (processing time) has passed. You could do this by adding a sleep to your 
> source function after you've output all records and just before leaving the 
> source loop.
> 
> Cheers,
> Till
> 
>> On Tue, May 7, 2019 at 11:49 PM Steven Nelson  
>> wrote:
>> Hello!
>> 
>> I am trying to write a test that runs in the TestEnviroment. I create a 
>> process that uses ProcessingTime, has a source constructed from a 
>> FromElementsFunction and runs data through a Keyed Stream into a 
>> ProcessingTimeSessionWindows.withGap().
>> 
>> The problem is that it appears that the env.execute method returns 
>> immediately after the session closes, not allowing the events to be released 
>> from the window before shutdown occurs. This used to work when I used 
>> EventTime. 
>> 
>> Thoughts?
>> -Steve


Re: Inconsistent documentation of Table Conditional functions

2019-05-08 Thread Rong Rong
Hi Flavio,

I believe the documentation meant "X" as a placeholder, where you can
convert "X" into the numeric values (1, 2, ...) depends on how many "CASE
WHEN" conditions you have.
*"resultZ" *is the default result in the "ELSE" statement, and thus it is a
literal.

Thanks,
Rong

On Wed, May 8, 2019 at 9:08 AM Flavio Pompermaier 
wrote:

> Hi to all,
> in the documentation of the Table Conditional functions [1] the example is
> inconsistent with the related description (there's no resultX for example).
> Or am I wrong?
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/functions.html#conditional-functions
>
>


Re: Inconsistent documentation of Table Conditional functions

2019-05-08 Thread Xingcan Cui
Hi Flavio,

In the description, resultX is just an identifier for the result of the first 
meeting condition.

Best,
Xingcan

> On May 8, 2019, at 12:02 PM, Flavio Pompermaier  wrote:
> 
> Hi to all,
> in the documentation of the Table Conditional functions [1] the example is 
> inconsistent with the related description (there's no resultX for example). 
> Or am I wrong?
> 
> [1] 
> https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/functions.html#conditional-functions
>  
> 
> 



Inconsistent documentation of Table Conditional functions

2019-05-08 Thread Flavio Pompermaier
Hi to all,
in the documentation of the Table Conditional functions [1] the example is
inconsistent with the related description (there's no resultX for example).
Or am I wrong?

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.8/dev/table/functions.html#conditional-functions


Re: Approach to Auto Scaling Flink Job

2019-05-08 Thread Rong Rong
Hi Anil,

Thanks for reporting the issue. I went through the code and I believe the
auto-scaling functionality is still in our internal branch and has not been
merged to the open-source branch yet.
I will change the documentation accordingly.

Thanks,
Rong

On Mon, May 6, 2019 at 9:54 PM Anil  wrote:

> I'm using Uber Open Source project Athenax.  As mentioned in it's docs[1]
> it
> supports `Auto scaling for AthenaX jobs`. I went through the source code on
> Github but didn't find the auto scaling  part. Can someone aware of this
> project please point me in the right direction here.
>
> I'm using Flink's Table API (Flink 1.4.2) and submit my jobs
> programatically
> to the Yarn Cluster. All the JM and TM metric are saved in Prometheus. I am
> thinking of using these metric to develop an algo to re-scale jobs. I would
> also appreciate if someone could share how they developed there
> auto-scaling
> part.
>
> [1]  https://athenax.readthedocs.io/en/latest/
> 
>
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Re: taskmanager.tmp.dirs vs io.tmp.dirs

2019-05-08 Thread Flavio Pompermaier
Great, thanks Till!

On Wed, May 8, 2019 at 4:20 PM Till Rohrmann  wrote:

> Hi Flavio,
>
> taskmanager.tmp.dirs is the deprecated configuration key which has been
> superseded by the io.tmp.dirs configuration option. In the future, you
> should use io.tmp.dirs.
>
> Cheers,
> Till
>
> On Wed, May 8, 2019 at 3:32 PM Flavio Pompermaier 
> wrote:
>
>> Hi to all,
>> looking at
>> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html it's
>> not very clear to me the difference between these two settings (actually I
>> always used the same value for the two).
>>
>> My understanding is that taskmanager.tmp.dirs is used to spill memory
>> when there's no more RAM available, while io.tmp.dirs for all other
>> situations (but which are them exactly?).
>>
>> Thanks in advance,
>> Flavio
>>
>


Re: Flink on YARN: TaskManager heap auto-sizing?

2019-05-08 Thread Till Rohrmann
Hi Dylan,

the container's memory will be calculated here [1]. In the case of Yarn,
the user specifies the container memory size and based on this Flink
calculates with how much heap memory the JVM is started (container memory
size - off heap memory - cut off memory).

[1]
https://github.com/apache/flink/blob/master/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ContaineredTaskManagerParameters.java#L160

Cheers,
Till

On Tue, May 7, 2019 at 3:29 PM Dylan Adams  wrote:

> In the Configuration section of the docs
> ,
> the description for "taskmanager.heap.size" contains: "On YARN setups,
> this value is automatically configured to the size of the TaskManager's
> YARN container, minus a certain tolerance value."
>
> Does that functionality exist?
>
> I don't see any documented method to specify the YARN container size for
> the TaskManagers, nor could I find any logic in the Flink YARN integration
> code that seemed to implement that behavior.
>
> My understanding is that you need to manually calculate and specify 
> taskmanager.heap.size
> (and jobmanager.heap.size) based on your YARN setup.
>
> Thanks,
> Dylan
>


Re: flink 1.7 HA production setup going down completely

2019-05-08 Thread Till Rohrmann
Hi Manju,

I guess this exception

Caused by: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain
block: BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494
file=/flink/checkpoints/submittedJobGraph480ddf9572ed
at
org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1052)
at
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
at
org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
at
org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
at
org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
at java.io.DataInputStream.read(DataInputStream.java:149)
at
org.apache.flink.runtime.fs.hdfs.HadoopDataInputStream.read(HadoopDataInputStream.java:94)
at
java.io.ObjectInputStream$PeekInputStream.read(ObjectInputStream.java:2620)
at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2636)
at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3113)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:853)
at java.io.ObjectInputStream.(ObjectInputStream.java:349)
at
org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.(InstantiationUtil.java:68)
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:520)
at
org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:503)
at
org.apache.flink.runtime.state.RetrievableStreamStateHandle.retrieveState(RetrievableStreamStateHandle.java:58)
at
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:202)

and the following log statements

2019-05-07 08:28:54,136 WARN  org.apache.hadoop.hdfs.DFSClient
- No live nodes contain block
BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 after
checking nodes = [], ignoredNodes = null
2019-05-07 08:28:54,137 INFO  org.apache.hadoop.hdfs.DFSClient
- No node available for
BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494
file=/flink/checkpoints/submittedJobGraph480ddf9572ed
2019-05-07 08:28:54,137 INFO  org.apache.hadoop.hdfs.DFSClient
- Could not obtain
BP-1651346363-10.20.1.81-1525354906737:blk_1083182315_9441494 from any
node:  No live nodes contain current block Block locations: Dead nodes: .
Will get new block locations from namenode and retry...
2019-05-07 08:28:54,137 WARN  org.apache.hadoop.hdfs.DFSClient
- DFS chooseDataNode: got # 1 IOException, will wait for
1498.8531884268646 msec.

pretty much explain what's happening. Flink cannot read all the blocks
belonging to the submitted job graph file and fails due to this. This looks
like a HDFS problem to me.

Cheers,
Till

On Wed, May 8, 2019 at 4:59 PM Manjusha Vuyyuru 
wrote:

> Hi Till,
> Thanks for the response.
> please see the attached log file.
>
> *HA config is : *
> high-availability: zookeeper
> high-availability.storageDir: hdfs://flink-hdfs:9000/flink/checkpoints
> From the logs i can see block missing exceptions from hdfs, but i can see
> that the jobgraph is still present in hdfs.
>
>
>
> On Wed, May 8, 2019 at 7:56 PM Till Rohrmann  wrote:
>
>> Hi Manju,
>>
>> could you share the full logs or at least the full stack trace of the
>> exception with us?
>>
>> I suspect that after a failover Flink tries to restore the JobGraph from
>> persistent storage (the directory which you have configured via
>> `high-availability.storageDir`) but is not able to do so. One reason could
>> be that the JobGraph file has been removed by a third party, for example. I
>> think the cause of the FlinkException could shed light on it. Could you
>> verify that the JobGraph file is still accessible?
>>
>> Cheers,
>> Till
>>
>> On Wed, May 8, 2019 at 11:22 AM Manjusha Vuyyuru 
>> wrote:
>>
>>> Any update on this from community side?
>>>
>>> On Tue, May 7, 2019 at 6:43 PM Manjusha Vuyyuru 
>>> wrote:
>>>
 im using 1.7.2.


 On Tue, May 7, 2019 at 5:50 PM miki haiat  wrote:

> Which flink version are you using?
> I had similar  issues with 1.5.x
>
> On Tue, May 7, 2019 at 2:49 PM Manjusha Vuyyuru <
> vmanjusha@gmail.com> wrote:
>
>> Hello,
>>
>> I have a flink setup with two job managers coordinated by zookeeper.
>>
>> I see the below exception and both jobmanagers are going down:
>>
>> 2019-05-07 08:29:13,346 INFO
>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>> Released locks of job graph f8eb1b482d8ec8c1d3e94c4d0f79df77 from 
>> ZooKeeper.
>> 2019-05-07 08:29:13,346 ERROR
>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -*
>> Fatal error occurred in the cluster entrypoint.*
>> java.lang.RuntimeException: org.apache.flink.util.FlinkException:
>> Could 

Re: flink 1.7 HA production setup going down completely

2019-05-08 Thread Manjusha Vuyyuru
Hi Till,
Thanks for the response.
please see the attached log file.

*HA config is : *
high-availability: zookeeper
high-availability.storageDir: hdfs://flink-hdfs:9000/flink/checkpoints
>From the logs i can see block missing exceptions from hdfs, but i can see
that the jobgraph is still present in hdfs.



On Wed, May 8, 2019 at 7:56 PM Till Rohrmann  wrote:

> Hi Manju,
>
> could you share the full logs or at least the full stack trace of the
> exception with us?
>
> I suspect that after a failover Flink tries to restore the JobGraph from
> persistent storage (the directory which you have configured via
> `high-availability.storageDir`) but is not able to do so. One reason could
> be that the JobGraph file has been removed by a third party, for example. I
> think the cause of the FlinkException could shed light on it. Could you
> verify that the JobGraph file is still accessible?
>
> Cheers,
> Till
>
> On Wed, May 8, 2019 at 11:22 AM Manjusha Vuyyuru 
> wrote:
>
>> Any update on this from community side?
>>
>> On Tue, May 7, 2019 at 6:43 PM Manjusha Vuyyuru 
>> wrote:
>>
>>> im using 1.7.2.
>>>
>>>
>>> On Tue, May 7, 2019 at 5:50 PM miki haiat  wrote:
>>>
 Which flink version are you using?
 I had similar  issues with 1.5.x

 On Tue, May 7, 2019 at 2:49 PM Manjusha Vuyyuru <
 vmanjusha@gmail.com> wrote:

> Hello,
>
> I have a flink setup with two job managers coordinated by zookeeper.
>
> I see the below exception and both jobmanagers are going down:
>
> 2019-05-07 08:29:13,346 INFO
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
> Released locks of job graph f8eb1b482d8ec8c1d3e94c4d0f79df77 from 
> ZooKeeper.
> 2019-05-07 08:29:13,346 ERROR
> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -*
> Fatal error occurred in the cluster entrypoint.*
> java.lang.RuntimeException: org.apache.flink.util.FlinkException:
> Could not retrieve submitted JobGraph from state handle under
> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
> handle is broken. Try cleaning the state handle store.
> at
> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:74)
> at
> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
> at
> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
> at
> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
> at
> akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
> at
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
> at
> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
> at
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
> at
> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
> at
> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
> submitted JobGraph from state handle under
> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
> handle is broken. Try cleaning the state handle store.
> at
> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662)
> at
> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821)
> at
> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72)
> ... 9 more
>
>
> Can someone please help me understand in detail on what is causing
> this exception. I can see zookeeper not able to retrieve job graph. What
> could be the reason for this?
>
> This is second time that my setup is going down with this excepton,
> first time i cleared jobgraph folder in zookeeper and restarted, now again
> faced with same issue.
>
> Since this is production setup this way of outage is not at all
> expected :(. Can someone help me how to give a permanent fix to this 
> issue?
>
>
> Thanks,
> Manju
>
>
2019-05-07 08:28:16,658 INFO  org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 

Re: Class loader premature closure - NoClassDefFoundError: org/apache/kafka/clients/NetworkClient

2019-05-08 Thread Till Rohrmann
Thanks for reporting this issue Chris. It looks indeed as if FLINK-10455
has not been fully fixed. I've reopened it and linked this mailing list
thread. If you want, then you could write to the JIRA thread as well. What
would be super helpful is if you manage to create a reproducing example for
further debugging.

Cheers,
Till

On Tue, May 7, 2019 at 4:04 PM Rohan Thimmappa 
wrote:

> It is a blocker for exactly once support from flink kafka producer.
>
> This issue reported and closed. but still reproducible
> https://issues.apache.org/jira/browse/FLINK-10455
>
> On Mon, May 6, 2019 at 10:20 AM Slotterback, Chris <
> chris_slotterb...@comcast.com> wrote:
>
>> Hey Flink users,
>>
>>
>>
>> Currently using Flink 1.7.2 with a job using FlinkKafkaProducer with its
>> write semantic set to Semantic.EXACTLY_ONCE. When there is a job failure
>> and restart (in our case from checkpoint timeout), it begins a failure loop
>> that requires a cancellation and resubmission to fix. The expected and
>> desired outcome should be a recovery from failure and the job restarts
>> successfully. Some digging revealed an issue where the class loader closes
>> before the connection to kafka is fully terminated resulting in a
>> NoClassDefFoundError. A description of what is happening has already been
>> described here:
>> https://heap.io/blog/engineering/missing-scala-class-noclassdeffounderror,
>> though we are experiencing this with kafka, not Redis:
>>
>>
>>
>> 5/3/19
>>
>> 3:14:18.780 PM
>>
>> 2019-05-03 15:14:18,780 ERROR
>> org.apache.kafka.common.utils.KafkaThread - Uncaught
>> exception in thread 'kafka-producer-network-thread | producer-80':
>>
>> java.lang.NoClassDefFoundError: org/apache/kafka/clients/NetworkClient$1
>>
>> at
>> org.apache.kafka.clients.NetworkClient.processDisconnection(NetworkClient.java:658)
>>
>> at
>> org.apache.kafka.clients.NetworkClient.handleDisconnections(NetworkClient.java:805)
>>
>> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:520)
>>
>> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:226)
>>
>> at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
>>
>> at java.lang.Thread.run(Thread.java:748)
>>
>> Collapse
>>
>> date_hour =   15
>>
>>
>>
>> Interestingly, this only happens when we extend the FlinkKafkaProducer
>> for the purposes of setting the write semantic to EXACTLY_ONCE. When
>> running with the default FlinkKafkaProducer (using Semantic.AT_LEAST_ONCE),
>> the class loader has no issues disconnecting the kafka client on job
>> failure, and the job recovers just fine. We are not doing anything
>> particularly strange in our extended producer as far as I can tell:
>>
>>
>>
>> public class *CustomFlinkKafkaProducer* *extends*
>> *FlinkKafkaProducer* {
>>
>>
>>
>>   public *CustomFlinkKafkaProducer*(Properties properties, String
>> topicId,
>>
>>   AvroKeyedSerializer serializationSchema) {
>>
>> super(
>>
>> topicId,
>>
>> serializationSchema,
>>
>> properties,
>>
>> Optional.of(new FlinkFixedPartitioner<>()),
>>
>> *Semantic.EXACTLY_ONCE*,
>>
>> DEFAULT_KAFKA_PRODUCERS_POOL_SIZE);
>>
>>   }
>>
>>   public static Properties getPropertiesFromBrokerList(String brokerList)
>> {
>>
>> […]
>>
>>   }
>>
>> }
>>
>>
>>
>>
>>
>
>
> --
> Thanks
> Rohan
>


Re: Migration from flink 1.7.2 to 1.8.0

2019-05-08 Thread Till Rohrmann
Hi Farouk,

from the stack trace alone I cannot say much. Would it be possible to share
a minimal example which reproduces the problem?

My suspicion is that OperatorChain.java:294 produces a null value.
Differently, said that somehow there is no StreamConfig registered for the
given outputId. What you could check is whether you have compiled the Flink
job with Flink 1.8.0 and not 1.7.2? Maybe something changed wrt the output
enumeration.

Cheers,
Till

On Tue, May 7, 2019 at 5:34 PM Farouk  wrote:

> Hi
>
> We are migrating our app to Flink 1.8.0.
>
> We built a docker image like this as Hadoop is not anymore bundled :
>
> FROM myrepo:5/flink:1.8.0-scala_2.11-alpine
>
> ADD --chown=flink:flink
> https://my-artifactory-repo/artifactory/my-repo/org/apache/flink/flink-shaded-hadoop2-uber/2.8.3-1.8.0/flink-shaded-hadoop2-uber-2.8.3-1.8.0.jar
> /opt/flink/lib
>
> When running Flink, we are facing the stack trace below  :
>
> java.lang.NullPointerException
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:284)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createChainedOperator(OperatorChain.java:360)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:296)
> at
> org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:133)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:267)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> at java.lang.Thread.run(Thread.java:748)
>
> Any idea on what's happening ?
>
> I think it's a problem with classloading.
>
> With Flink 1.7.2, every thing works fine
>
> Thanks
> Farouk
>


Re: I want to use MapState on an unkeyed stream

2019-05-08 Thread Till Rohrmann
Hi,

if you want to increase the parallelism you could also pick a key randomly
from a set of keys. The price you would pay is a shuffle operation (network
I/O) which would not be needed if you were using the unkeyed stream and
used the operator list state.

However, with keyed state you could also use Flink's
RocksDBKeyedStateBackend which allows to go out of core if your state size
should grow very large.

Cheers,
Till

On Tue, May 7, 2019 at 5:57 PM an0  wrote:

> But I only have one stream, nothing to connect it to.
>
> On 2019/05/07 00:15:59, Averell  wrote:
> > From my understanding, having a fake keyBy (stream.keyBy(r =>
> "dummyString"))
> > means there would be only one slot handling the data.
> > Would a broadcast function [1] work for your case?
> >
> > Regards,
> > Averell
> >
> > [1]
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/stream/state/broadcast_state.html
> >
> >
> >
> > --
> > Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
> >
>


Re: Getting async function call terminated with an exception

2019-05-08 Thread Till Rohrmann
Hi Avi,

you need to complete the given resultFuture and not return a future. You
can do this via resultFuture.complete(r).

Cheers,
Till

On Tue, May 7, 2019 at 8:30 PM Avi Levi  wrote:

> Hi,
> We are using flink 1.8.0 (but the flowing also happens in 1.7.2) I tried
> very simple unordered async call
> override def asyncInvoke(input: Foo, resultFuture:
> ResultFuture[ScoredFoo]) : Unit  = {
>val r = ScoredFoo(Foo("a"), 80)
>Future.successful(r)
>}
>
> Running this stream seem to be stuck in some infinite loop until it
> crashes on timeout exception.:
>
> *java.lang.Exception: An async function call terminated with an exception.
> Failing the AsyncWaitOperator.*
> *at
> org.apache.flink.streaming.api.operators.async.Emitter.output(Emitter.java:137)*
> *at
> org.apache.flink.streaming.api.operators.async.Emitter.run(Emitter.java:85)*
> *at java.base/java.lang.Thread.run(Thread.java:844)*
> *Caused by: java.util.concurrent.ExecutionException:
> java.util.concurrent.TimeoutException: Async function call has timed out.*
> *at
> java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)*
> *at
> java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)*
> *at
> org.apache.flink.streaming.api.operators.async.queue.StreamRecordQueueEntry.get(StreamRecordQueueEntry.java:68)*
> *at
> org.apache.flink.streaming.api.operators.async.Emitter.output(Emitter.java:129)*
> *... 2 common frames omitted*
> *Caused by: java.util.concurrent.TimeoutException: Async function call has
> timed out.*
> *at
> org.apache.flink.streaming.api.scala.async.AsyncFunction.timeout(AsyncFunction.scala:60)*
> *at
> org.apache.flink.streaming.api.scala.async.AsyncFunction.timeout$(AsyncFunction.scala:59)*
> *at
> com.lookalike.analytic.utils.LookalikeScoreEnrich.timeout(LookalikeScoreEnrich.scala:18)*
> *at
> org.apache.flink.streaming.api.scala.AsyncDataStream$$anon$3.timeout(AsyncDataStream.scala:301)*
> *at
> org.apache.flink.streaming.api.operators.async.AsyncWaitOperator$1.onProcessingTime(AsyncWaitOperator.java:211)*
> *at
> org.apache.flink.streaming.runtime.tasks.SystemProcessingTimeService$TriggerTask.run(SystemProcessingTimeService.java:285)*
> *at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:514)*
> *at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)*
> *at
> java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:299)*
> *at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)*
> *at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)*
> *... 1 common frames omitted*
>
> Please advise , Thanks
> Avi
>
>


Re: Testing with ProcessingTime and FromElementsFunction with TestEnvironment

2019-05-08 Thread Till Rohrmann
Hi Steve,

I think the reason for the different behaviour is due to the way event time
and processing time are implemented.

When you are using event time, watermarks need to travel through the
topology denoting the current event time. When you source terminates, the
system will send a watermark with Long.MAX_VALUE through the topology. This
will effectively trigger the completion of all pending event time
operations.

In the case of processing time, Flink does not do this. Instead it simply
relies on the processing time clocks on each machine. Hence, there is no
way for Flink to tell the different machines that their respective
processing time clocks should proceed to a certain time in case of a
shutdown. Instead you should make sure that you don't terminate the job
before a certain time (processing time) has passed. You could do this by
adding a sleep to your source function after you've output all records and
just before leaving the source loop.

Cheers,
Till

On Tue, May 7, 2019 at 11:49 PM Steven Nelson 
wrote:

> Hello!
>
> I am trying to write a test that runs in the TestEnviroment. I create a
> process that uses ProcessingTime, has a source constructed from a
> FromElementsFunction and runs data through a Keyed Stream into
> a ProcessingTimeSessionWindows.withGap().
>
> The problem is that it appears that the env.execute method returns
> immediately after the session closes, not allowing the events to be
> released from the window before shutdown occurs. This used to work when I
> used EventTime.
>
> Thoughts?
> -Steve
>


Re: flink 1.7 HA production setup going down completely

2019-05-08 Thread Till Rohrmann
Hi Manju,

could you share the full logs or at least the full stack trace of the
exception with us?

I suspect that after a failover Flink tries to restore the JobGraph from
persistent storage (the directory which you have configured via
`high-availability.storageDir`) but is not able to do so. One reason could
be that the JobGraph file has been removed by a third party, for example. I
think the cause of the FlinkException could shed light on it. Could you
verify that the JobGraph file is still accessible?

Cheers,
Till

On Wed, May 8, 2019 at 11:22 AM Manjusha Vuyyuru 
wrote:

> Any update on this from community side?
>
> On Tue, May 7, 2019 at 6:43 PM Manjusha Vuyyuru 
> wrote:
>
>> im using 1.7.2.
>>
>>
>> On Tue, May 7, 2019 at 5:50 PM miki haiat  wrote:
>>
>>> Which flink version are you using?
>>> I had similar  issues with 1.5.x
>>>
>>> On Tue, May 7, 2019 at 2:49 PM Manjusha Vuyyuru 
>>> wrote:
>>>
 Hello,

 I have a flink setup with two job managers coordinated by zookeeper.

 I see the below exception and both jobmanagers are going down:

 2019-05-07 08:29:13,346 INFO
 org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
 Released locks of job graph f8eb1b482d8ec8c1d3e94c4d0f79df77 from 
 ZooKeeper.
 2019-05-07 08:29:13,346 ERROR
 org.apache.flink.runtime.entrypoint.ClusterEntrypoint -* Fatal
 error occurred in the cluster entrypoint.*
 java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could
 not retrieve submitted JobGraph from state handle under
 /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
 handle is broken. Try cleaning the state handle store.
 at
 org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
 at
 org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:74)
 at
 java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
 at
 java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
 at
 java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
 at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
 at
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
 at
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: org.apache.flink.util.FlinkException: Could not retrieve
 submitted JobGraph from state handle under
 /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
 handle is broken. Try cleaning the state handle store.
 at
 org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
 at
 org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
 at
 org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
 at
 org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662)
 at
 org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821)
 at
 org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72)
 ... 9 more


 Can someone please help me understand in detail on what is causing this
 exception. I can see zookeeper not able to retrieve job graph. What could
 be the reason for this?

 This is second time that my setup is going down with this excepton,
 first time i cleared jobgraph folder in zookeeper and restarted, now again
 faced with same issue.

 Since this is production setup this way of outage is not at all
 expected :(. Can someone help me how to give a permanent fix to this issue?


 Thanks,
 Manju




Re: taskmanager.tmp.dirs vs io.tmp.dirs

2019-05-08 Thread Till Rohrmann
Hi Flavio,

taskmanager.tmp.dirs is the deprecated configuration key which has been
superseded by the io.tmp.dirs configuration option. In the future, you
should use io.tmp.dirs.

Cheers,
Till

On Wed, May 8, 2019 at 3:32 PM Flavio Pompermaier 
wrote:

> Hi to all,
> looking at
> https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html it's
> not very clear to me the difference between these two settings (actually I
> always used the same value for the two).
>
> My understanding is that taskmanager.tmp.dirs is used to spill memory when
> there's no more RAM available, while io.tmp.dirs for all other situations
> (but which are them exactly?).
>
> Thanks in advance,
> Flavio
>


taskmanager.tmp.dirs vs io.tmp.dirs

2019-05-08 Thread Flavio Pompermaier
Hi to all,
looking at
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html it's
not very clear to me the difference between these two settings (actually I
always used the same value for the two).

My understanding is that taskmanager.tmp.dirs is used to spill memory when
there's no more RAM available, while io.tmp.dirs for all other situations
(but which are them exactly?).

Thanks in advance,
Flavio


Re: flink 1.7 HA production setup going down completely

2019-05-08 Thread Manjusha Vuyyuru
Any update on this from community side?

On Tue, May 7, 2019 at 6:43 PM Manjusha Vuyyuru 
wrote:

> im using 1.7.2.
>
>
> On Tue, May 7, 2019 at 5:50 PM miki haiat  wrote:
>
>> Which flink version are you using?
>> I had similar  issues with 1.5.x
>>
>> On Tue, May 7, 2019 at 2:49 PM Manjusha Vuyyuru 
>> wrote:
>>
>>> Hello,
>>>
>>> I have a flink setup with two job managers coordinated by zookeeper.
>>>
>>> I see the below exception and both jobmanagers are going down:
>>>
>>> 2019-05-07 08:29:13,346 INFO
>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
>>> Released locks of job graph f8eb1b482d8ec8c1d3e94c4d0f79df77 from ZooKeeper.
>>> 2019-05-07 08:29:13,346 ERROR
>>> org.apache.flink.runtime.entrypoint.ClusterEntrypoint -* Fatal
>>> error occurred in the cluster entrypoint.*
>>> java.lang.RuntimeException: org.apache.flink.util.FlinkException: Could
>>> not retrieve submitted JobGraph from state handle under
>>> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
>>> handle is broken. Try cleaning the state handle store.
>>> at
>>> org.apache.flink.util.ExceptionUtils.rethrow(ExceptionUtils.java:199)
>>> at
>>> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:74)
>>> at
>>> java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602)
>>> at
>>> java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
>>> at
>>> java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442)
>>> at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
>>> at
>>> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>>> at
>>> scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
>>> Caused by: org.apache.flink.util.FlinkException: Could not retrieve
>>> submitted JobGraph from state handle under
>>> /147dd022ec91f7381ad4ca3d290387e9. This indicates that the retrieved state
>>> handle is broken. Try cleaning the state handle store.
>>> at
>>> org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore.recoverJobGraph(ZooKeeperSubmittedJobGraphStore.java:208)
>>> at
>>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJob(Dispatcher.java:696)
>>> at
>>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobGraphs(Dispatcher.java:681)
>>> at
>>> org.apache.flink.runtime.dispatcher.Dispatcher.recoverJobs(Dispatcher.java:662)
>>> at
>>> org.apache.flink.runtime.dispatcher.Dispatcher.lambda$null$26(Dispatcher.java:821)
>>> at
>>> org.apache.flink.util.function.FunctionUtils.lambda$uncheckedFunction$2(FunctionUtils.java:72)
>>> ... 9 more
>>>
>>>
>>> Can someone please help me understand in detail on what is causing this
>>> exception. I can see zookeeper not able to retrieve job graph. What could
>>> be the reason for this?
>>>
>>> This is second time that my setup is going down with this excepton,
>>> first time i cleared jobgraph folder in zookeeper and restarted, now again
>>> faced with same issue.
>>>
>>> Since this is production setup this way of outage is not at all expected
>>> :(. Can someone help me how to give a permanent fix to this issue?
>>>
>>>
>>> Thanks,
>>> Manju
>>>
>>>


Re: Unable to build flink from source

2019-05-08 Thread Chesnay Schepler
You're likely using Java9+, but 1.3.3 only supports Java 8 (and maybe 
still 7).


On 06/05/2019 03:20, syed wrote:

Hi
I am trying to build flink 1.3.3 from source using IntelliJ IDEA Ultimate
2019.1 IDE. When I build the project, I am receiving the following error

*java package sun.misc does not exist*

I am using the instruction to Importing Flink into an IDE available at this
link

https://ci.apache.org/projects/flink/flink-docs-release-1.8/flinkDev/ide_setup.html


Can some one guide me how can I fix this issue?
Thanks.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/