Re: java.io.IOException: NSS is already initialized

2018-11-02 Thread Hao Sun
Same environment, new error.
I can run the same docker image with my local Mac, but on K8S, this gives
me this error.
I can not think of any difference between local Docker and K8S Docker.

Any hint will be helpful. Thanks


2018-11-02 23:29:32,981 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph- Job
ConnectedStreams maxwell.accounts ()
switched from state RUNNING to FAILING.
AsynchronousException{java.lang.Exception: Could not materialize checkpoint
235 for operator Source: KafkaSource(maxwell.accounts) ->
MaxwellFilter->Maxwell(maxwell.accounts) ->
FixedDelayWatermark(maxwell.accounts) ->
MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
influxdbSink(maxwell.accounts) (1/1).}
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1153)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:947)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:884)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
*Caused by: java.lang.Exception: Could not materialize checkpoint 235 for
operator Source*: KafkaSource(maxwell.accounts) ->
MaxwellFilter->Maxwell(maxwell.accounts) ->
FixedDelayWatermark(maxwell.accounts) ->
MaxwellFPSEvent->InfluxDBData(maxwell.accounts) -> Sink:
influxdbSink(maxwell.accounts) (1/1).
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942)
... 6 more
*Caused by: java.util.concurrent.ExecutionException:
java.lang.NoClassDefFoundError: Could not initialize class
sun.security.ssl.SSLSessionImpl*
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
at
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
... 5 more
*Caused by: java.lang.NoClassDefFoundError: Could not initialize class
sun.security.ssl.SSLSessionImpl*
at sun.security.ssl.SSLSocketImpl.init(SSLSocketImpl.java:604)
at sun.security.ssl.SSLSocketImpl.(SSLSocketImpl.java:572)
at
sun.security.ssl.SSLSocketFactoryImpl.createSocket(SSLSocketFactoryImpl.java:110)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket(SSLConnectionSocketFactory.java:365)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket(SSLConnectionSocketFactory.java:355)
at
org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.ssl.SdkTLSSocketFactory.connectSocket(SdkTLSSocketFactory.java:132)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:142)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:359)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.ClientConnectionManagerFactory$Handler.invoke(ClientConnectionManagerFactory.java:76)
at
org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.conn.$Proxy4.connect(Unknown
Source)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.MainClientExec.establishRoute(MainClientExec.java:381)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:237)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at
org.apache.flink.fs.s3presto.shaded.org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at
org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.apache.client.impl.SdkHttpClient.execute(SdkHttpClient.java:72)
at
org.apache.flink.fs.s3presto.shaded.com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneReque

Re: Savepoint failed with error "Checkpoint expired before completing"

2018-11-02 Thread Gagan Agrawal
Great, thanks for sharing that info.

Gagan

On Thu, Nov 1, 2018 at 1:50 PM Yun Tang  wrote:

> Haha, actually externalized checkpoint also support parallelism changes,
> you could read my email
> 
> posted in dev-mail-list.
>
> Best
> Yun Tang
> --
> *From:* Gagan Agrawal 
> *Sent:* Thursday, November 1, 2018 13:38
> *To:* myas...@live.com
> *Cc:* happydexu...@gmail.com; user@flink.apache.org
> *Subject:* Re: Savepoint failed with error "Checkpoint expired before
> completing"
>
> Thanks Yun for your inputs. Yes, increasing checkpoint helps and we are
> able to save save points now. In our case we wanted to increase parallelism
> so I believe savepoint is the only option as checkpoint doesn't support
> code/parallelism changes.
>
> Gagan
>
> On Wed, Oct 31, 2018 at 8:46 PM Yun Tang  wrote:
>
> Hi Gagan
>
> Savepoint would generally takes more time than usual incremental
> checkpoint, you could try to increase checkpoint timeout time [1]
>
>env.getCheckpointConfig().setCheckpointTimeout(90);
>
> If you just want to resume from previous job without change the 
> state-backend, I think you could also try to resume from a retained 
> checkpoint without trigger savepoint [2].
>
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/stream/state/checkpointing.html#enabling-and-configuring-checkpointing
>
> [2]
> https://ci.apache.org/projects/flink/flink-docs-release-1.6/ops/state/checkpoints.html#resuming-from-a-retained-checkpoint
> Apache Flink 1.6 Documentation: Checkpoints
> 
> Deployment & Operations; State & Fault Tolerance; Checkpoints;
> Checkpoints. Overview; Retained Checkpoints. Directory Structure;
> Difference to Savepoints; Resuming from a retained checkpoint
> ci.apache.org
>
> Best
> Yun Tang
>
> --
> *From:* Gagan Agrawal 
> *Sent:* Wednesday, October 31, 2018 19:03
> *To:* happydexu...@gmail.com
> *Cc:* user@flink.apache.org
> *Subject:* Re: Savepoint failed with error "Checkpoint expired before
> completing"
>
> Hi Henry,
> Thanks for your response. However we don't face this issue during normal
> run as we have incremental checkpoints. Only when we try to take savepoint
> (which tries to save entire state in one go), we face this problem.
>
> Gagan
>
> On Wed, Oct 31, 2018 at 11:41 AM 徐涛  wrote:
>
> Hi Gagan,
> I have met with the error the checkpoint timeout too.
> In my case, it is not due to big checkpoint size,  but due to slow
> sink then cause high backpressure to the upper operator. Then the barrier
> may take a long time to arrive to sink.
> Please check if it is the case you have met.
>
> Best
> Henry
>
> > 在 2018年10月30日,下午6:07,Gagan Agrawal  写道:
> >
> > Hi,
> > We have a flink job (flink version 1.6.1) which unions 2 streams to pass
> through custom KeyedProcessFunction with RocksDB state store which final
> creates another stream into Kafka. Current size of checkpoint is around
> ~100GB and checkpoints are saved to s3 with 5 mins interval and incremental
> checkpoint enabled. Checkpoints mostly finish in less than 1 min. We are
> running this job on yarn with following parameters
> >
> > -yn 10  (10 task managers)
> > -ytm 2048 (2 GB each)
> > - Operator parallelism is also 10.
> >
> > While trying to run savepoint on this job, it runs for ~10mins and then
> throws following error. Looks like checkpoint default timeout of 10mins is
> causing this. What is recommended way to run savepoint for such job? Should
> we increase checkpoint default timeout of 10mins? Also currently our state
> size is 100GB but it is expected to grow unto 1TB. Is flink good for
> usecases with that much of size? Also how much time savepoint is expected
> to take with such state size and parallelism on Yarn? Any other
> recommendation would be of great help.
> >
> > org.apache.flink.util.FlinkException: Triggering a savepoint for the job
> 434398968e635a49329f59a019b41b6f failed.
> >   at
> org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:714)
> >   at
> org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:692)
> >   at
> org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:979)
> >   at
> org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:689)
> >   at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1059)
> >   at
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
> >   at java.security.AccessController.doPrivileged(Native Method)
> >   at javax.security.auth.Subject.doAs(Subject.java:422)
> >   at
> org.apache.hadoop.security.UserGro

Re: FileNotFoundException on starting the job

2018-11-02 Thread mina...@gmail.com
Thank you, Alex, much appreciated.
I'll check if changing a temporary io folder helps to resolve the issue.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Kinesis Connector

2018-11-02 Thread Hequn Cheng
Hi Steve,

I think we can check the following things step by step:
1. Confirm if the data source has data.
2. Take a look at the log of Taskmanager or Jobmanager and check if there
are exceptions.
3. Take a thread dump to see what was doing in the TaskManager.

Best, Hequn


On Fri, Nov 2, 2018 at 10:28 PM Steve Bistline 
wrote:

> I have tried just about everything to get a simple Flink application to
> consume from Kinesis. The application appears to connect ( I think ), no
> complaints... never receives any data. Even a very simple JAVA app see
> attached.
>
> Any help would be very much appreciated.
>
> Thanks
>
>
>
> --
> Sent from:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
>


Kinesis Connector

2018-11-02 Thread Steve Bistline
I have tried just about everything to get a simple Flink application to
consume from Kinesis. The application appears to connect ( I think ), no
complaints... never receives any data. Even a very simple JAVA app see
attached.

Any help would be very much appreciated.

Thanks



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Using a ProcessFunction as a "Source"

2018-11-02 Thread Aljoscha Krettek
As an update, there is now also this FLIP for the source refactoring: 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface

> On 1. Nov 2018, at 20:47, Addison Higham  wrote:
> 
> This is fairly stale, but getting back to this:
> 
> We ended up going the route of using the Operator API and implementing 
> something similar to the  `readFile` API with one real source function that 
> reads out splits and then a small abstraction over the 
> AbstractStreamOperator, a `MessagableSourceFunction`, that had a similar API 
> to processFunction.
> 
> It is a little bit more to deal with, but wasn't too bad all told.
> 
> I am hoping to get the code cleaned up and post it at least as an example of 
> how to use the Operator API for some more advanced use cases.
> 
> Aljoscha: That looks really interesting! I actually saw that too late to 
> consider something like that, but seems like a good change!
> 
> Thanks for the input.
> 
> Addison
> On Thu, Aug 30, 2018 at 4:07 AM Aljoscha Krettek  > wrote:
> Hi Addison,
> 
> for a while now different ideas about reworking the Source interface have 
> been floated. I implemented a prototype that showcases my favoured approach 
> for such a new interface: 
> https://github.com/aljoscha/flink/commits/refactor-source-interface 
> 
> 
> This basically splits the Source into two parts: a SplitEnumerator and a 
> SplitReader. The enumerator is responsible for discovering what should be 
> read and the reader is responsible for reading splits. In this model, the 
> SplitReader does not necessarily have to sit at the beginning of the 
> pipeline, it could sit somewhere in the middle and the splits don't have to 
> necessarily come from the enumerator but could come from a different source.
> 
> I think this could fit the use case that you're describing.
> 
> Best,
> Aljoscha
> 
>> On 25. Aug 2018, at 11:45, Chesnay Schepler > > wrote:
>> 
>> The SourceFunction interface is rather flexible so you can do pretty much 
>> whatever you want. Exact implementation depends on whether control messages 
>> are pulled or pushed to the source; in the first case you'd simply block 
>> within "run()" on the external call, in the latter you'd have it block on a 
>> queue of some sort that is fed by another thread waiting for messages.
>> 
>> AFAIK you should never use the collector outside of "processElement".
>> 
>> On 25.08.2018 05:15, vino yang wrote:
>>> Hi Addison,
>>> 
>>> I have a lot of things I don't understand. Is your source self-generated 
>>> message? Why can't source receive input? If the source is unacceptable then 
>>> why is it called source? Isn't kafka-connector the input as source?
>>> 
>>> If you mean that under normal circumstances it can't receive another input 
>>> about control messages, there are some ways to solve it.
>>> 
>>> 1) Access external systems in your source to get or subscribe to control 
>>> messages, such as Zookeeper.
>>> 2) If your source is followed by a map or process operator, then they can 
>>> be chained together as a "big" source, then you can pass your control 
>>> message via Flink's new feature "broadcast state". See this blog post for 
>>> details.[1]
>>> 3) Mix control messages with normal messages in the same message flow. 
>>> After the control message is parsed, the corresponding action is taken. Of 
>>> course, this kind of program is not very recommended.
>>> 
>>> [1]: 
>>> https://data-artisans.com/blog/a-practical-guide-to-broadcast-state-in-apache-flink
>>>  
>>> 
>>> 
>>> Thanks, vino.
>>> 
>>> Addison Higham mailto:addis...@gmail.com>> 
>>> 于2018年8月25日周六 上午12:46写道:
>>> HI,
>>> 
>>> I am writing a parallel source function that ideally needs to receive some 
>>> messages as control information (specifically, a state message on where to 
>>> start reading from a kinesis stream). As far as I can tell, there isn't a 
>>> way to make a sourceFunction receive input (which makes sense) so I am 
>>> thinking it makes sense to use a processFunction that will occasionally 
>>> receive control messages and mostly just output a lot of messages.
>>> 
>>> This works from an API perspective, with a few different options, I could 
>>> either:
>>> 
>>> A) have the processElement function block on calling the loop that will 
>>> produce messages or
>>> B) have the processEelement function return (by pass the collector and 
>>> starting the reading on a different thread), but continue to produce 
>>> messages downstream
>>> 
>>> This obviously does raise some concerns though:
>>> 
>>> 1. Does this break any assumptions flink has of message lifecycle? Option A 
>>> of blocking on processElement for very long periods seems straight forward 
>>> but less than ideal, not to mention not being able to handle any other 
>

Re: Window State is not being store on check-pointing

2018-11-02 Thread sohimankotia
Thanks . It solved problem.



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Question about slot and yarn vcores

2018-11-02 Thread sohimankotia
Let's assume I have yarn cluster with 3 nodes, 3 vcores each nodes. So total
available cores = 9

Now if I spin a flink job with taskmanager = 3 and no. of slots per task
manager = 2 ,what will happen :


1. 3 Jvms will be initiated (for each task manager)
2. Each JVM will run 2 threads for tasks .
3. Will each thread will use 1 vcore from yarn or both threads will share
same vcore ??



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Starting a seperate Java process within a Flink cluster

2018-11-02 Thread Jeff Zhang
The error is most likely due to classpath issue. Because classpath is
different when you running flink program in IDE and run it in cluster.

And starting another jvm process in SourceFunction doesn't seems a good
approach to me, is it possible for you to do in your custom SourceFunction ?


Ly, The Anh 于2018年11月2日周五 下午5:22写道:

> Yes, i did. It is definitely there. I tried and made a separate Maven
> project to test if something was wrong with my jar.
> The resulting shaded jar of that test project was fine and the
> message-buffer-process was running with that test jar.
>
>
> Am 02.11.2018 04:47 schrieb Yun Tang :
> Hi
>
> Since you use the message-buffer-process as a dependency and the error
> tells you class not found, have you ever check your application jar package
> whether containing the wanted MessageBufferProcess.class? If not existed,
> try to use assembly-maven
>   or shaded-maven
>  plugin to include
> your classes.
>
> Best
> Yun Tang
> --
> *From:* Ly, The Anh 
> *Sent:* Friday, November 2, 2018 6:33
> *To:* user@flink.apache.org
> *Subject:* Starting a seperate Java process within a Flink cluster
>
>
> Hello,
>
>
> I am currently working on my masters and I encountered a difficult
> problem.
>
>
> Background (for context): I am trying to connect different data stream
> processors. Therefore i am using Flink's internal mechanisms of creating
> custom sinks and sources to receive from and send to different data stream
> processors. I am starting a separate
>
> process (message-buffer-process) in those custom sinks and sources
> to communicate and buffer data into that message-buffer-process.  My
> implementation is created with Maven and it could potentially be added as
> an dependency.
>
>
> Problem: I already tested my implementation by adding it as an dependency
> to a simple Flink word-count example. The test was within an IDE which
> works perfectly fine. But when i package that Flink work-count example and
> try
>
> to run it with "./flink run " or by uploading and submitting it as a job,
> it tells me that my buffer-process-class could not be found:
>
> In German: "Fehler: Hauptklasse
> de.tuberlin.mcc.geddsprocon.messagebuffer.MessageBufferProcess konnte nicht
> gefunden oder geladen werden"
>
> Roughly translated: "Error: Main class
> de.tuberlin.mcc.geddsprocon.messagebuffer.MessageBufferProcess could not be
> found or loaded"
>
>
> Code snipplets:
>
> Example - Adding my custom sink to send data to another data stream
> processor:
>
> dataStream.addSink(
>   (SinkFunction)DSPConnectorFactory
>   .getInstance()
>   .createSinkConnector(
>   new DSPConnectorConfig
>   .Builder("localhost", 9656)
>   .withDSP("flink")
>   
> .withBufferConnectorString("buffer-connection-string")
>   .withHWM(20)
>   .withTimeout(1)
>   .build()));
>
>
>
> The way i am trying to start the separate buffer-process: 
> JavaProcessBuilder.exec(MessageBufferProcess.class, connectionString, 
> addSentMessagesFrame);
> How JavaProcessBuilder.exec looks like:
> public static Process exec(Class javaClass, String connectionString, boolean 
> addSentMessagesFrame) throws IOException, InterruptedException {
> String javaHome = System.getProperty("java.home");
> String javaBin = javaHome +
> File.separator + "bin" +
> File.separator + "java";
> String classpath = System.getProperty("java.class.path");
> String className = javaClass.getCanonicalName();
>
> System.out.println("Trying to build process " + classpath + " " + 
> className);
>
> ProcessBuilder builder = new ProcessBuilder(
> javaBin, "-cp", classpath, className, connectionString, 
> Boolean.toString(addSentMessagesFrame));
>
> builder.redirectOutput(ProcessBuilder.Redirect.INHERIT);
> builder.redirectError(ProcessBuilder.Redirect.INHERIT);
>
> Process process = builder.start();
> return process;
> }
>
> I also tried running that message-buffer process separately in another maven 
> project and its packaged .jar file. That worked perfectly fine too. That is 
> why I am assuming that my approach is not appropriate for running in Flink.
> Did I miss something and starting my approach doesn't actually work within 
> Flink's context? I hope the information I gave you is sufficient to help 
> understanding my issue. If you need any more information feel free to message 
> me!
>
> Thanks for any help!
>
>  With best regards
>
>
>


Re: FileNotFoundException on starting the job

2018-11-02 Thread Alexander Smirnov
my guess is that tmp directory got cleaned on your host and Flink couldn't
restore memory state from it upon startup.

Take a look at
https://ci.apache.org/projects/flink/flink-docs-stable/ops/config.html#configuring-temporary-io-directories
article, I think it is relevant

On Thu, Nov 1, 2018 at 8:51 PM Dmitry Minaev  wrote:

> Hi everyone,
>
> I'm having an issue when restarting a job in Flink. I'm doing a simple
> stop with savepoint and then start from the savepoint. Savepoints are
> stored in a separate folder, there is no configuration for "/tmp" folder in
> my setup. There is only 1 task manager and parallelism is 1.
>
> I'm getting FileNotFoundException:
>
> 31 Oct 2018 23:40:35,837 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph -
> filter-business-metrics -> Sink: data_feed (1/1)
> (51ce53532932c33805291dc188d2f99e) switched from DEPLOYING to RUNNING.
> 31 Oct 2018 23:40:35,837 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph -
> agents-working-on-interactions (1/1) (72a916158d07f2353fb270848d95ba2f)
> switched from DEPLOYING to RUNNING.
> 31 Oct 2018 23:40:35,929 INFO
> org.apache.flink.runtime.executiongraph.ExecutionGraph -
> interaction-details (1/1) (c004e64e90c0dbd3bc007459bc3d7420) switched from
> RUNNING to FAILED.
> java.io.FileNotFoundException:
> /tmp/flink-io-7bfd6603-c115-463d-bcfc-b97e31be5a37/f7ce787242e6afd91c3cbeccc2f74bc4a7dd0e6e600ff83e51bc5be9a95750f9.0.buffer
> (No such file or directory)
> at java.io.RandomAccessFile.open0(Native Method)
> at java.io.RandomAccessFile.open(RandomAccessFile.java:316)
> at java.io.RandomAccessFile.(RandomAccessFile.java:243)
> at
> org.apache.flink.streaming.runtime.io.BufferSpiller.createSpillingChannel(BufferSpiller.java:259)
> at
> org.apache.flink.streaming.runtime.io.BufferSpiller.(BufferSpiller.java:120)
> at
> org.apache.flink.streaming.runtime.io.BarrierBuffer.(BarrierBuffer.java:149)
> at
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.(StreamInputProcessor.java:129)
> at
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.init(OneInputStreamTask.java:56)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:235)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
> at java.lang.Thread.run(Thread.java:748)
>
> I've checked the logs and there are no errors prior to that. The job was
> stopped with no issues, and it was starting normally and passed multiple
> operators setting them to RUNNING state. But for several other operators it
> throws this FileNotFoundException.
>
> Any help is appreciated.
>
> -- Regards, Dmitry
> --
>
> --
> Dmitry
>


Re: Starting a seperate Java process within a Flink cluster

2018-11-02 Thread Ly, The Anh
Yes, i did. It is definitely there. I tried and made a separate Maven project 
to test if something was wrong with my jar.
The resulting shaded jar of that test project was fine and the 
message-buffer-process was running with that test jar.


Am 02.11.2018 04:47 schrieb Yun Tang :
Hi

Since you use the message-buffer-process as a dependency and the error tells 
you class not found, have you ever check your application jar package whether 
containing the wanted MessageBufferProcess.class? If not existed, try to use 
assembly-maven  or 
shaded-maven plugin to 
include your classes.

Best
Yun Tang

From: Ly, The Anh 
Sent: Friday, November 2, 2018 6:33
To: user@flink.apache.org
Subject: Starting a seperate Java process within a Flink cluster


Hello,


I am currently working on my masters and I encountered a difficult problem.


Background (for context): I am trying to connect different data stream 
processors. Therefore i am using Flink's internal mechanisms of creating custom 
sinks and sources to receive from and send to different data stream processors. 
I am starting a separate

process (message-buffer-process) in those custom sinks and sources to 
communicate and buffer data into that message-buffer-process.  My 
implementation is created with Maven and it could potentially be added as an 
dependency.


Problem: I already tested my implementation by adding it as an dependency to a 
simple Flink word-count example. The test was within an IDE which works 
perfectly fine. But when i package that Flink work-count example and try

to run it with "./flink run " or by uploading and submitting it as a job, it 
tells me that my buffer-process-class could not be found:

In German: "Fehler: Hauptklasse 
de.tuberlin.mcc.geddsprocon.messagebuffer.MessageBufferProcess konnte nicht 
gefunden oder geladen werden"

Roughly translated: "Error: Main class 
de.tuberlin.mcc.geddsprocon.messagebuffer.MessageBufferProcess could not be 
found or loaded"


Code snipplets:

Example - Adding my custom sink to send data to another data stream processor:

dataStream.addSink(
(SinkFunction)DSPConnectorFactory
.getInstance()
.createSinkConnector(
new DSPConnectorConfig
.Builder("localhost", 9656)
.withDSP("flink")

.withBufferConnectorString("buffer-connection-string")
.withHWM(20)
.withTimeout(1)
.build()));



The way i am trying to start the separate buffer-process: 
JavaProcessBuilder.exec(MessageBufferProcess.class, connectionString, 
addSentMessagesFrame);
How JavaProcessBuilder.exec looks like:
public static Process exec(Class javaClass, String connectionString, boolean 
addSentMessagesFrame) throws IOException, InterruptedException {
String javaHome = System.getProperty("java.home");
String javaBin = javaHome +
File.separator + "bin" +
File.separator + "java";
String classpath = System.getProperty("java.class.path");
String className = javaClass.getCanonicalName();

System.out.println("Trying to build process " + classpath + " " + 
className);

ProcessBuilder builder = new ProcessBuilder(
javaBin, "-cp", classpath, className, connectionString, 
Boolean.toString(addSentMessagesFrame));

builder.redirectOutput(ProcessBuilder.Redirect.INHERIT);
builder.redirectError(ProcessBuilder.Redirect.INHERIT);

Process process = builder.start();
return process;
}

I also tried running that message-buffer process separately in another maven 
project and its packaged .jar file. That worked perfectly fine too. That is why 
I am assuming that my approach is not appropriate for running in Flink.
Did I miss something and starting my approach doesn't actually work within 
Flink's context? I hope the information I gave you is sufficient to help 
understanding my issue. If you need any more information feel free to message 
me!

Thanks for any help!

 With best regards



Re: Why dont't have a csv formatter for kafka table source

2018-11-02 Thread Timo Walther

I already answered his question but forgot to CC the mailing list:

Hi Jocean,

a standard compliant CSV format for a Kafka table source is in the 
making right now. There is a PR that implements it [1] but it needs 
another review pass. It is high on my priority list and I hope we can 
finalize it after the 1.7 release is out. Feel free to help here by 
reviewing and trying it out.


Regards,
Timo

[1] https://github.com/apache/flink/pull/6541


Am 02.11.18 um 10:11 schrieb Till Rohrmann:

Hi Jocean,

these kind of issues should go to the user mailing list. I've cross 
posted it there and put dev to bcc.


Cheers,
Till

On Fri, Nov 2, 2018 at 6:43 AM Jocean shi > wrote:


Hi all,
     I have  encountered a error When i want to register a table
from kafka
using csv formatter.
     The error is "Could not find a suitable table factory for
'org.apache.flink.table.factories.DeserializationSchemaFactory"

Jocean





Re: Why dont't have a csv formatter for kafka table source

2018-11-02 Thread Till Rohrmann
Hi Jocean,

these kind of issues should go to the user mailing list. I've cross posted
it there and put dev to bcc.

Cheers,
Till

On Fri, Nov 2, 2018 at 6:43 AM Jocean shi  wrote:

> Hi all,
>  I have  encountered a error When i want to register a table from kafka
> using csv formatter.
>  The error is "Could not find a suitable table factory for
> 'org.apache.flink.table.factories.DeserializationSchemaFactory"
>
> Jocean
>


Re: Flink CEP Watermark Exception

2018-11-02 Thread Dawid Wysakowicz
Hi Austin,

Could you provide jobmanagers and taksmanagers logs for a failed run?
The exception you've posted is thrown during processing messages, rather
than during restoring, but you said it failed to restore checkpoint, how
come it processes messages? Could you also describe exact conditions
step by step when the "IllegalStateException: Could not find previous
entry with key" happens?

The first two issues regarding CEP you've linked concern very old Flink
version (1.0.3), CEP library was heavily reworked since then and I would
not look for any similarities in those cases.

Best,

Dawid

On 01/11/2018 14:24, Austin Cawley-Edwards wrote:
> Hi Dawid,
>
> Thank you for your reply. I'm out for the next few days, so I hope you
> don't mind me cc'ing my team in here. We all really appreciate you and
> the rest of the people monitoring the mailing list. 
>
>
> We've only seen this SharedBuffer problem in production, after sending
> around 20 GB of data through. In the Flink UI, we see the checkpoint
> status as:
>
> *Checkpoint failed: Checkpoint Coordinator is suspending.
> *
> *
> *
> It then tries to restore the last previously succeeded checkpoint, but
> cannot and throws the SharedBuffer exception. Our state size is around
> 200MB when it fails.
>
> Unfortunately, the platform we are running our cluster on only
> supports up to Flink 1.5. We will continue trying to find a
> reproducible example for you. I have found a few other people with a
> similar problem (attached at the bottom), but none seem to have been
> resolved. 
>
> Our CEP setup looks generally like this:
> Pattern pattern = Pattern.begin("alertOne")
> .where(new SimpleCondition() {
> @Override
> public boolean filter(AlertEvent event) {
> return event.level > 0;
> }
> })
> .next("alertTwo").subtype(AlertEvent.class)
> .where(new IterativeCondition() {
> @Override
> public boolean filter(AlertEvent subEvent,
> Context ctx) throws Exception {
> double surprisePercent = subEvent.surprise();
> Iterable previousEvents
> = ctx.getEventsForPattern("alertOne");
> for (AlertEvent prevEvent : previousEvents) {
> double prevSurprisePercent = prevEvent.surprise();
> if (prevSurprisePercent > surprisePercent) {
> return false;
> }
> }
> return true;
> }
> })
> .next("alertThree").subtype(AlertEvent.class)
> .where(new IterativeCondition() {
> @Override
> public boolean filter(AlertEvent subEvent,
> Context ctx) throws Exception {
> double surprisePercent = subEvent.surprise();
> Iterable previousEvents =
> ctx.getEventsForPattern("alertTwo");
> for (AlertEvent prevEvent : previousEvents) {
> double prevSurprisePercent = prevEvent.surprise();
> if (prevSurprisePercent > surprisePercent) {
> return false;
> }
> }
> return true;
> }});
> PatternStream alertPatternStream =
> CEP.pattern(alertEvents, pattern);
> DataStream confirmedAlerts = alertPatternStream
> .select(new PatternSelectFunction {
> private static final long serialVersionUID = 1L;
> @Override
> public Alert select(Map> patternIn) {
> List alertEventList = new ArrayList<>();
> // Create an alert composed of three escalating events
> alertEventList.add(patternIn.get("alertOne").get(0));
> alertEventList.add(patternIn.get("alertTwo").get(0));
> alertEventList.add(patternIn.get("alertThree").get(0));
> Alert confirmedAlert = new Alert(alertEventList);
> return confirmedAlert;
> }
> })
> .uid("confirmedAlerts")
> .returns(Alert.class)
> .keyBy("id");
>
> Once again thank you,
> Austin
>
>
> - 
> https://stackoverflow.com/questions/36917569/flink-streaming-cep-could-not-find-previous-shared-buffer-entry-with-key
> - 
> https://mail-archives.apache.org/mod_mbox/flink-user/201604.mbox/%3CCAC27z=nd0sx_ogbh0vuvzwetcke2t40tbxht5huh8uyzdte...@mail.gmail.com%3E
>  
> -
> https://mail-archives.apache.org/mod_mbox/flink-user/201702.mbox/%3CCAGr9p8CzT3=cr+cOas=3k0becmiybfv+kt1fxv_rf8xabtu...@mail.gmail.com%3E
>  
>
> On Thu, Nov 1, 2018 at 4:44 AM Dawid Wysakowicz
> mailto:dwysakow...@apache.org>> wrote:
>
> Hi Austin,
>
> Could you elaborate a bit more what do you mean by "after a
> checkpoint fails", what is the reason why checkpoint fails? Would
> it be possible for you to prepare some reproducible example for
> that problem? Finally, I would also recommend trying out Flink
> 1.6.x, as we reworked the underlying structure for CEP -
> SharedBuffer.
>
> Best,
>
> Dawid
>
> On 30/10/2018 20:59, Austin Cawley-Edwards wrote:
>> Following up, we are using Flink 1.5.0 and Flink-CEP 2.11.
>>
>> Thanks,
>> Austin
>>
>> On Tue, Oct 30, 2018 at 3:58 PM Austin Cawley-Edwards
>> mailto:austin.caw...@gmail.com>> wrote:
>>
>> Hi there,
>>
>> We have a streaming application that uses CEP processing but
>> are getting this error fairly frequently after a checkpoint
>> fails, though not sure if it is related. We have implemented
>> both `hashCode` and `equals()` using
>> `Objects.hash(...properties)` and basic equali