Re: Taskmanager process memory increasing always

2018-09-05 Thread YennieChen88
As far as I know, rocksdb mainly uses off-heap memory, which is hard to be
controlled by JVM. Maybe you can monitor off-heap memory of taskmanager
process by professional tools, such as gperftools...



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: flink list and flink run commands timeout

2018-09-05 Thread Gary Yao
Hi Jason,

>From the stacktrace it seems that you are using the 1.4.0 client to list
jobs
on a 1.5.x cluster. This will not work. You have to use the 1.5.x client.

Best,
Gary

On Wed, Sep 5, 2018 at 5:35 PM, Jason Kania  wrote:

> Hello,
>
> Thanks for the response. I had already tried setting the log level to
> debug in log4j-cli.properties, logback-console.xml, and 
> log4j-console.properties
> but no additional relevant information comes out. On the server, all that
> comes out are zookeeper ping responses:
>
> 2018-09-05 15:16:56,786 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ClientCnxn  - Got ping response for
> sessionid: 0x3659b60bcb50076 after 1ms
>
> The client log indicates only the following (but we are not using hadoop):
>
> 2018-09-05 15:19:53,339 WARN  org.apache.flink.client.cli.CliFrontend
>- Could not load CLI class org.apache.flink.yarn.cli.
> FlinkYarnSessionCli.
> java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:264)
> at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(
> CliFrontend.java:1208)
> at org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(
> CliFrontend.java:1164)
> at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.
> java:1090)
> Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.conf.
> Configuration
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> ... 5 more
>
>
> and
>
> 2018-09-05 15:19:53,881 ERROR org.apache.flink.shaded.
> curator.org.apache.curator.ConnectionState  - Authentication failed
>
>
> despite the zookeeper being configured as 'open' and latest logs showing
> data being read from zookeeper.
>
> 2018-09-05 15:19:54,274 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ClientCnxn  - Reading reply
> sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null
> finished:false header:: 1,3  replyHeader:: 1,47244656277,0  request::
> '/flink,F  response:: s{47244656196,47244656196,
> 1536110417531,1536110417531,0,1,0,0,0,1,47244656197}
>
>
> Much like the basic log output, the detailed trace shows no additional
> information, just a gap after waiting for the response:
>
> 2018-09-05 15:19:54,313 INFO  org.apache.flink.client.cli.CliFrontend
>- Waiting for response...
> 2018-09-05 15:20:07,635 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ClientCnxn  - Got ping response for
> sessionid: 0x265a12437df0074 after 1ms
> 2018-09-05 15:20:20,976 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ClientCnxn  - Got ping response for
> sessionid: 0x265a12437df0074 after 1ms
> 2018-09-05 15:20:24,311 INFO  org.apache.flink.runtime.rest.RestClient
>   - Shutting down rest endpoint.
> 2018-09-05 15:20:24,317 INFO  org.apache.flink.runtime.rest.RestClient
>   - Rest endpoint shutdown complete.
> 2018-09-05 15:20:24,318 INFO  org.apache.flink.runtime.leaderretrieval.
> ZooKeeperLeaderRetrievalService  - Stopping ZooKeeperLeaderRetrievalService
> /leader/rest_server_lock.
> 2018-09-05 15:20:24,320 INFO  org.apache.flink.runtime.leaderretrieval.
> ZooKeeperLeaderRetrievalService  - Stopping ZooKeeperLeaderRetrievalService
> /leader/dispatcher_lock.
> 2018-09-05 15:20:24,320 DEBUG org.apache.flink.shaded.
> curator.org.apache.curator.framework.imps.CuratorFrameworkImpl  - Closing
> 2018-09-05 15:20:24,321 INFO  org.apache.flink.shaded.
> curator.org.apache.curator.framework.imps.CuratorFrameworkImpl  -
> backgroundOperationsLoop exiting
> 2018-09-05 15:20:24,322 DEBUG org.apache.flink.shaded.
> curator.org.apache.curator.CuratorZookeeperClient  - Closing
> 2018-09-05 15:20:24,322 DEBUG org.apache.flink.shaded.
> curator.org.apache.curator.ConnectionState  - Closing
> 2018-09-05 15:20:24,323 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ZooKeeper  - Closing session:
> 0x265a12437df0074
> 2018-09-05 15:20:24,323 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ClientCnxn  - Closing client for session:
> 0x265a12437df0074
> 2018-09-05 15:20:24,329 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ClientCnxn  - Reading reply
> sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null
> finished:false header:: 11,-11  replyHeader:: 11,47244656278,0  request::
> null response:: null
> 2018-09-05 15:20:24,329 DEBUG org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ClientCnxn  - Disconnecting client for
> session: 0x265a12437df0074
> 2018-09-05 15:20:24,330 INFO  org.apache.flink.shaded.
> zookeeper.org.apache.zookeeper.ZooKeeper  - Session: 

Re: AskTimeoutException when canceling job with savepoint on flink 1.6.0

2018-09-05 Thread Gary Yao
Hi Jelmer,

I saw that you have already found the JIRA issue tracking this problem [1]
but
I will still answer on the mailing list for transparency.

The timeout for "cancel with savepoint" should be RpcUtils.INF_TIMEOUT.
Unfortunately Flink is currently not respecting this timeout. A pull request
is already available, and is expected to be merged within the next days [2].

Best,
Gary

[1] https://issues.apache.org/jira/browse/FLINK-10193
[2] https://github.com/apache/flink/pull/6601

On Thu, Sep 6, 2018 at 4:24 AM, vino yang  wrote:

> Hi Jelmer,
>
> Here's a similar question, and you can refer to the discussion options.[1]
>
> [1]: http://mail-archives.apache.org/mod_mbox/flink-user/
> 201808.mbox/%3CCAMJEyBa9zJX_huqTLxDCu87hpHRVRzXZoYJpQxzXDk
> q2h_k...@mail.gmail.com%3E
>
> Hi Till and Chesnay,
>
> Recently, several users have encountered this problem in the past month.
> Maybe the community should give priority to the stability of this part or
> list the guidelines in the official document FAQ?
>
> Thanks, vino.
>
> jelmer  于2018年9月5日周三 下午8:48写道:
>
>> I am trying to upgrade a job from flink 1.4.2 to 1.6.0
>>
>> When we do a deploy we cancel the job with a savepoint then deploy the
>> new version of the job from that savepoint. Because our jobs tend to have a
>> lot of state it often takes multiple minutes for our savepoints to
>> complete.
>>
>> On flink 1.4.2 we set *akka.client.timeout* to a high value to make sure
>> the request did not timeout
>>
>> However on flink 1.6.0 I get an *AskTimeoutException*  and increasing
>> *akka.client.timeout* only works if i apply it to the running flink
>> process.
>> Applying it to just the flink client does nothing.
>>
>> I am reluctant to configure this on the container itself because afaik it
>> applies to everything inside of flink's internal actor system not just to
>> creating savepoints.
>>
>> What is the correct way to use cancel with savepoint for jobs with lots
>> of state in flink 1.6.0 ?
>>
>> I Attached the error.
>>
>>
>>
>>


Re: Job goes down after 1/145 TMs is lost (NoResourceAvailableException)

2018-09-05 Thread Subramanya Suresh
Hi,
With some additional research,

*Before the flag*
I realized for failed containers (that exited for a specific  we still were
Requesting new TM container and launching TM). But for the "Detected
unreachable: [akka.tcp://fl...@blahabc.sfdc.net:123]" issue I do not see
the container marked as failed and a subsequent request for TM.

*After taskmanager.exit-on-fatal-akka-error: true` *
I do not see any unreachable: [akka.tcp://fl...@blahabc.sfdc.net:123] since
we turned on taskmanager.exit-on-fatal-akka-error: true, I am not sure if
that is a coincidence or a direct impact of this change.

*Our Issue:*
I realized we are still exiting the application, i.e. failing when the
containers are lost. The reason for this is before
org.apache.flink.yarn.YarnFlinkResouceManager is able to acquire a new
container TM, launch TM and it is reported as started, the
org.apache.flink.runtime.jobmanager.scheduler
throws a NoResourceAvailableException that causes a failure. In our case we
had fixed restart strategy with 5, and we are running out of it because of
this. I am looking to solve this with a FailureRateRestartStrategy over 2
minutes interval (10 second restart delay, >12 failures), that lets the TM
come back (takes about 50 seconds).

*Flink Bug*
But I cannot help but think why there is no interaction between the
ResourceManager and JobManager, i.e. why is the jobmanager continuing with
the processing despite not having the required TMs ?

Logs to substantiate what I said above (only relevant).

018-09-03 06:17:13,932 INFO
org.apache.flink.yarn.YarnFlinkResourceManager- Container
container_e28_1535135887442_1381_01_97 completed successfully with
diagnostics: Container released by application


2018-09-03 06:34:19,214 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://
fl...@hello-world9-27-crz.ops.sfdc.net:46219] has failed, address is now
gated for [5000] ms. Reason: [Disassociated]
2018-09-03 06:34:19,214 INFO
org.apache.flink.yarn.YarnFlinkResourceManager- Container
container_e27_1535135887442_1381_01_000102 failed. Exit status: -102
2018-09-03 06:34:19,215 INFO
org.apache.flink.yarn.YarnFlinkResourceManager- Diagnostics
for container container_e27_1535135887442_1381_01_000102 in state COMPLETE
: exitStatus=-102 diagnostics=Container preempted by scheduler
2018-09-03 06:34:19,215 INFO
org.apache.flink.yarn.YarnFlinkResourceManager- Total
number of failed containers so far: 1
2018-09-03 06:34:19,215 INFO
org.apache.flink.yarn.YarnFlinkResourceManager- Requesting
new TaskManager container with 2 megabytes memory. Pending requests: 1
2018-09-03 06:34:19,216 INFO  org.apache.flink.yarn.YarnJobManager
- Task manager akka.tcp://
fl...@hello-world9-27-crz.ops.sfdc.net:46219/user/taskmanager terminated.
2018-09-03 06:34:19,218 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph- Job
streaming-searches-prod (96d3b4f60a80a898f44f87c5b06f6981) switched from
state RUNNING to FAILING.
java.lang.Exception: TaskManager was lost/killed:
container_e27_1535135887442_1381_01_000102 @
hello-world9-27-crz.ops.sfdc.net (dataPort=40423)
2018-09-03 06:34:19,466 INFO
org.apache.flink.runtime.instance.InstanceManager -
Unregistered task manager hello-world9-27-crz.ops.sfdc.net/11.11.35.220.
Number of registered task managers 144. Number of available slots 720

2018-09-03 06:34:24,717 INFO
org.apache.flink.yarn.YarnFlinkResourceManager- Received
new container: container_e28_1535135887442_1381_01_000147 - Remaining
pending container requests: 0
2018-09-03 06:34:24,717 INFO
org.apache.flink.yarn.YarnFlinkResourceManager- Launching
TaskManager in container ContainerInLaunch @ 1535956464717: Container:
[ContainerId: container_e28_1535135887442_1381_01_000147, NodeId:
hello-world9-27-crz.ops.sfdc.net:8041, NodeHttpAddress:
hello-world9-27-crz.ops.sfdc.net:8042, Resource: ,
Priority: 0, Token: Token { kind: ContainerToken, service: 11.11.35.220:8041
}, ] on host hello-world9-27-crz.ops.sfdc.net
2018-09-03 06:34:29,256 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://
fl...@hello-world9-27-crz.ops.sfdc.net:46219] has failed, address is now
gated for [5000] ms. Reason: [Association failed with [akka.tcp://
fl...@hello-world9-27-crz.ops.sfdc.net:46219]] Caused by: [Connection
refused: hello-world9-27-crz.ops.sfdc.net/11.11.35.220:46219]
2018-09-03 06:34:34,284 WARN  akka.remote.transport.netty.NettyTransport
- Remote connection to [null] failed with
java.net.ConnectException: Connection refused:
hello-world9-27-crz.ops.sfdc.net/11.11.35.220:46219
2018-09-03 06:34:34,284 WARN  akka.remote.ReliableDeliverySupervisor
- Association with remote system [akka.tcp://
fl...@hello-world9-27-crz.ops.sfdc.net:46219] has failed, address is now
gated for 

ListState - elements order

2018-09-05 Thread Alexey Trenikhun
Hello,
Does keyed managed ListState preserve elements order, for example if I call 
listState.add(e1); listState.add(e2); listState.add(e3); , does ListState 
guarantee that listState.get() will return elements in order they were added 
(e1, e2, e3)

Alexey




How to simulate a blocking (sink) node?

2018-09-05 Thread 陈梓立
Hi all,

Recently I want to write a test in a batch case that some of tasks are
FINISHED.

I try to write a finite SourceFunction and a (expected) blocking
SinkFunction. But the job FINISHED. This surprises me. Why could the Sink
FINISHED in such a case?

The job and log are attached.

Best,
tison.


WeirdJob.java
Description: Binary data


jobmanager.log
Description: Binary data


Re: AskTimeoutException when canceling job with savepoint on flink 1.6.0

2018-09-05 Thread vino yang
Hi Jelmer,

Here's a similar question, and you can refer to the discussion options.[1]

[1]:
http://mail-archives.apache.org/mod_mbox/flink-user/201808.mbox/%3ccamjeyba9zjx_huqtlxdcu87hphrvrzxzoyjpqxzxdkq2h_k...@mail.gmail.com%3E

Hi Till and Chesnay,

Recently, several users have encountered this problem in the past month.
Maybe the community should give priority to the stability of this part or
list the guidelines in the official document FAQ?

Thanks, vino.

jelmer  于2018年9月5日周三 下午8:48写道:

> I am trying to upgrade a job from flink 1.4.2 to 1.6.0
>
> When we do a deploy we cancel the job with a savepoint then deploy the new
> version of the job from that savepoint. Because our jobs tend to have a lot
> of state it often takes multiple minutes for our savepoints to complete.
>
> On flink 1.4.2 we set *akka.client.timeout* to a high value to make sure
> the request did not timeout
>
> However on flink 1.6.0 I get an *AskTimeoutException*  and increasing
> *akka.client.timeout* only works if i apply it to the running flink
> process.
> Applying it to just the flink client does nothing.
>
> I am reluctant to configure this on the container itself because afaik it
> applies to everything inside of flink's internal actor system not just to
> creating savepoints.
>
> What is the correct way to use cancel with savepoint for jobs with lots of
> state in flink 1.6.0 ?
>
> I Attached the error.
>
>
>
>


Increased Size of Incremental Checkpoint

2018-09-05 Thread burgesschen
Hi guys,
I enabled incremental flink checkpoint for my flink job. I had the job read
messages at a stable rate. For each message, the flink job store something
in the keyed state. My question is: For every minute, the increased state
size is the same, shouldn't the incremental checkpoint size remain
relatively constant also? How come it is increasing as shown in the picture?
Thank you!



 



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: REST: reading completed jobs' details

2018-09-05 Thread Chesnay Schepler
No, the cluster isn't shared. For each job a separate cluster is spun up 
when calling execute(), at the end of which it is shut down.


For explicitly creation and shutdown of a cluster I would suggest to 
execute your jobs as a test that contains a MiniClusterResource.


On 05.09.2018 20:59, Miguel Coimbra wrote:

Thanks for the reply.

However, I think my case differs because I am running a sequence of 
independent Flink jobs on the same environment instance.

I only create the LocalExecutionEnvironment once.

The web manager shows the job ID changing correctly every time a new 
job is executed.


Since it is the same execution environment (and therefore the same 
cluster instance I imagine), those completed jobs should show as well, no?


On Wed, 5 Sep 2018 at 18:40, Chesnay Schepler > wrote:


When you create an environment that way, then the cluster is
shutdown once the job completes.
The WebUI can _appear_ as still working since all the files, and
data about the job, is cached in the browser.

On 05.09.2018 17:39, Miguel Coimbra wrote:

Hello,

I'm having difficulty reading the status (such as time taken for
each dataflow operator in a job) of jobs that have completed.

First, when I click on "Completed jobs" on the web interface (by
default at 8081), no job shows up.
I see jobs that exist as "Running", but as soon as they finish, I
would expect them to appear in the "Complete jobs" section, but
no luck.

Consider that I am running locally (web UI is running, I checked
and it is available via browser) on 8081.
None of these links worked for checking jobs that have already
finished, such as the job ID 618fac9da6ea458f5091a9c40e54cbcc
that had been running:

http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc
http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc

I'm running with a LocalExecutionEnvironment with with the method:
ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
I hope anyone may be able to help.

Best,








Re: numRunningJobs metric

2018-09-05 Thread Bryant Baltes
Ok, thanks!

On Wed, Sep 5, 2018 at 2:11 PM Chesnay Schepler  wrote:

> This is a known issue with no workaround except switching the cluster to
> legacy mode: https://issues.apache.org/jira/browse/FLINK-10135
>
> On 05.09.2018 19:58, Bryant Baltes wrote:
> > Hello All,
> >
> > I have recently made the upgrade from Flink 1.3.2 to 1.5.2.
> > Everything has gone well, except I'm no longer seeing the
> > numRunningJobs jobmanager metric.  We use this for some notifications
> > and recovery.  According to the docs, it should still be available.
> > Has something changed or maybe I'm missing something?
> >
> > Thanks!
>
>
>


Re: numRunningJobs metric

2018-09-05 Thread Chesnay Schepler
This is a known issue with no workaround except switching the cluster to 
legacy mode: https://issues.apache.org/jira/browse/FLINK-10135


On 05.09.2018 19:58, Bryant Baltes wrote:

Hello All,

I have recently made the upgrade from Flink 1.3.2 to 1.5.2.  
Everything has gone well, except I'm no longer seeing the 
numRunningJobs jobmanager metric.  We use this for some notifications 
and recovery.  According to the docs, it should still be available.  
Has something changed or maybe I'm missing something?


Thanks!





Re: REST: reading completed jobs' details

2018-09-05 Thread Miguel Coimbra
Thanks for the reply.

However, I think my case differs because I am running a sequence of
independent Flink jobs on the same environment instance.
I only create the LocalExecutionEnvironment once.

The web manager shows the job ID changing correctly every time a new job is
executed.

Since it is the same execution environment (and therefore the same cluster
instance I imagine), those completed jobs should show as well, no?

On Wed, 5 Sep 2018 at 18:40, Chesnay Schepler  wrote:

> When you create an environment that way, then the cluster is shutdown once
> the job completes.
> The WebUI can _appear_ as still working since all the files, and data
> about the job, is cached in the browser.
>
> On 05.09.2018 17:39, Miguel Coimbra wrote:
>
> Hello,
>
> I'm having difficulty reading the status (such as time taken for each
> dataflow operator in a job) of jobs that have completed.
>
> First, when I click on "Completed jobs" on the web interface (by default
> at 8081), no job shows up.
> I see jobs that exist as "Running", but as soon as they finish, I would
> expect them to appear in the "Complete jobs" section, but no luck.
>
> Consider that I am running locally (web UI is running, I checked and it is
> available via browser) on 8081.
> None of these links worked for checking jobs that have already finished,
> such as the job ID 618fac9da6ea458f5091a9c40e54cbcc that had been running:
>
> http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc
> http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc
>
> I'm running with a LocalExecutionEnvironment with with the method:
>
> ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
>
> I hope anyone may be able to help.
>
> Best,
>
>
>
>


numRunningJobs metric

2018-09-05 Thread Bryant Baltes
Hello All,

I have recently made the upgrade from Flink 1.3.2 to 1.5.2.  Everything has
gone well, except I'm no longer seeing the numRunningJobs jobmanager
metric.  We use this for some notifications and recovery.  According to the
docs, it should still be available.  Has something changed or maybe I'm
missing something?

Thanks!


Re: REST: reading completed jobs' details

2018-09-05 Thread Chesnay Schepler
When you create an environment that way, then the cluster is shutdown 
once the job completes.
The WebUI can _appear_ as still working since all the files, and data 
about the job, is cached in the browser.


On 05.09.2018 17:39, Miguel Coimbra wrote:

Hello,

I'm having difficulty reading the status (such as time taken for each 
dataflow operator in a job) of jobs that have completed.


First, when I click on "Completed jobs" on the web interface (by 
default at 8081), no job shows up.
I see jobs that exist as "Running", but as soon as they finish, I 
would expect them to appear in the "Complete jobs" section, but no luck.


Consider that I am running locally (web UI is running, I checked and 
it is available via browser) on 8081.
None of these links worked for checking jobs that have already 
finished, such as the job ID 618fac9da6ea458f5091a9c40e54cbcc that had 
been running:


http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc
http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc

I'm running with a LocalExecutionEnvironment with with the method:
ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
I hope anyone may be able to help.

Best,






REST: reading completed jobs' details

2018-09-05 Thread Miguel Coimbra
Hello,

I'm having difficulty reading the status (such as time taken for each
dataflow operator in a job) of jobs that have completed.

First, when I click on "Completed jobs" on the web interface (by default at
8081), no job shows up.
I see jobs that exist as "Running", but as soon as they finish, I would
expect them to appear in the "Complete jobs" section, but no luck.

Consider that I am running locally (web UI is running, I checked and it is
available via browser) on 8081.
None of these links worked for checking jobs that have already finished,
such as the job ID 618fac9da6ea458f5091a9c40e54cbcc that had been running:

http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc
http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc

I'm running with a LocalExecutionEnvironment with with the method:

ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)

I hope anyone may be able to help.

Best,


Re: flink list and flink run commands timeout

2018-09-05 Thread Jason Kania
Hello,
Thanks for the response. I had already tried setting the log level to debug in 
log4j-cli.properties, logback-console.xml, and log4j-console.properties but no 
additional relevant information comes out. On the server, all that comes out 
are zookeeper ping responses:
2018-09-05 15:16:56,786 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Got ping 
response for sessionid: 0x3659b60bcb50076 after 1ms

The client log indicates only the following (but we are not using hadoop):
2018-09-05 15:19:53,339 WARN  org.apache.flink.client.cli.CliFrontend           
            - Could not load CLI class 
org.apache.flink.yarn.cli.FlinkYarnSessionCli.java.lang.NoClassDefFoundError: 
org/apache/hadoop/conf/Configuration
        at java.lang.Class.forName0(Native Method)        at 
java.lang.Class.forName(Class.java:264)        at 
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFrontend.java:1208)
        at 
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFrontend.java:1164)
        at 
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1090)Caused by: 
java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration        
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)        at 
java.lang.ClassLoader.loadClass(ClassLoader.java:424)        at 
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)        at 
java.lang.ClassLoader.loadClass(ClassLoader.java:357)        ... 5 more

and 
2018-09-05 15:19:53,881 ERROR 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Authentication failed

despite the zookeeper being configured as 'open' and latest logs showing data 
being read from zookeeper.
2018-09-05 15:19:54,274 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Reading 
reply sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null 
finished:false header:: 1,3  replyHeader:: 1,47244656277,0  request:: '/flink,F 
 response:: 
s{47244656196,47244656196,1536110417531,1536110417531,0,1,0,0,0,1,47244656197}

Much like the basic log output, the detailed trace shows no additional 
information, just a gap after waiting for the response:
2018-09-05 15:19:54,313 INFO  org.apache.flink.client.cli.CliFrontend           
            - Waiting for response...2018-09-05 15:20:07,635 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Got ping 
response for sessionid: 0x265a12437df0074 after 1ms2018-09-05 15:20:20,976 
DEBUG org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Got 
ping response for sessionid: 0x265a12437df0074 after 1ms2018-09-05 15:20:24,311 
INFO  org.apache.flink.runtime.rest.RestClient                      - Shutting 
down rest endpoint.2018-09-05 15:20:24,317 INFO  
org.apache.flink.runtime.rest.RestClient                      - Rest endpoint 
shutdown complete.2018-09-05 15:20:24,318 INFO  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
Stopping ZooKeeperLeaderRetrievalService /leader/rest_server_lock.2018-09-05 
15:20:24,320 INFO  
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService  - 
Stopping ZooKeeperLeaderRetrievalService /leader/dispatcher_lock.2018-09-05 
15:20:24,320 DEBUG 
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
  - Closing2018-09-05 15:20:24,321 INFO  
org.apache.flink.shaded.curator.org.apache.curator.framework.imps.CuratorFrameworkImpl
  - backgroundOperationsLoop exiting2018-09-05 15:20:24,322 DEBUG 
org.apache.flink.shaded.curator.org.apache.curator.CuratorZookeeperClient  - 
Closing2018-09-05 15:20:24,322 DEBUG 
org.apache.flink.shaded.curator.org.apache.curator.ConnectionState  - 
Closing2018-09-05 15:20:24,323 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Closing 
session: 0x265a12437df00742018-09-05 15:20:24,323 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Closing 
client for session: 0x265a12437df00742018-09-05 15:20:24,329 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - Reading 
reply sessionid:0x265a12437df0074, packet:: clientPath:null serverPath:null 
finished:false header:: 11,-11  replyHeader:: 11,47244656278,0  request:: null 
response:: null2018-09-05 15:20:24,329 DEBUG 
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - 
Disconnecting client for session: 0x265a12437df00742018-09-05 15:20:24,330 INFO 
 org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ZooKeeper  - Session: 
0x265a12437df0074 closed2018-09-05 15:20:24,330 INFO  
org.apache.flink.shaded.zookeeper.org.apache.zookeeper.ClientCnxn  - 
EventThread shut down for session: 0x265a12437df00742018-09-05 15:20:24,330 
ERROR org.apache.flink.client.cli.CliFrontend                       - Error 
while running the command.



   On Wednesday, September 5, 2018, 3:41:29 a.m. EDT, Chesnay Schepler 
 wrote:  
 
  Please 

Re: Flink parquet read.write performance

2018-09-05 Thread clay4444
hi Regina

I've just been using flink, and recently I've been asked to store Flink data
on HDFS in parquet format. I've found several examples in GitHub and the
community, but there are always bugs. I see your storage directory, and
that's what I want, so I'd like to ask you to reply to me for a slightly
more complete one. Some examples.

The result is similar to the following:

237.6 M  /19_partMapper-m-00018.snappy.parquet

240.7 M  /1_partMapper-m-9.snappy.parquet

240.7 M  /20_partMapper-m-8.snappy.parquet

236.5 M  /2_partMapper-m-00020.snappy.parquet


I look forward to your reply. Thank you very much!




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Queryable State

2018-09-05 Thread Sameer Wadkar
I have used connected streams where one part of the connected stream maintains 
state and the other part consumes it. 

However it was not queryable externally. For state that is queryable externally 
you are right you probably need another operator to store state and support 
query-ability. 

Sent from my iPhone

> On Sep 5, 2018, at 7:26 AM, Darshan Singh  wrote:
> 
> Hi All,
> 
> I was playing with queryable state. As queryable stream can not be modified 
> how do I use the output of say my reduce function for further processing.
> 
> Below is 1 example. I can see that I am processing the data twice. One for 
> the Queryable stream and once for the my original stream. That means state 
> will be kept twice as well?
> 
> In simple term I would like to query the state from rf reduce function and I 
> would like my stream to be written to Kafka. If I use like below I am able to 
> do so but it seems to me that there is duplicate state storing as well as 
> duplicate work is being done.
> 
> Is there any alternate for what I am trying to achieve this?
> 
> Thanks
> 
> public class reducetest {
> 
> public static void main(String[] args) throws Exception {
> 
> 
> // set up streaming execution environment
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.getExecutionEnvironment();
> 
> env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
> env.setParallelism(1);
> List nums = Arrays.asList(1,2,3,4,5,6,7,8,9,10);
> 
> ReducingStateDescriptor> descriptor =
> new ReducingStateDescriptor>(
> "sum", // the state name
> new rf(),
> TypeInformation.of(new TypeHint Integer>>() {})); // type information
> descriptor.setQueryable("reduce");
> 
> DataStream> ds1 = 
> env.fromCollection(nums).map(new MapFunction Integer>>() {
> @Override
> public Tuple2 map(Integer integer) throws 
> Exception {
> return Tuple2.of(integer%2,integer);
> }
> }).keyBy(0)
> ;
> 
> ((KeyedStream) ds1).asQueryableState("reduce", descriptor);
> 
> 
> DataStream ds2 = ((KeyedStream) ds1).reduce( 
> descriptor.getReduceFunction());
> 
> //ds1.print();
> ds2.print();
> 
> System.out.println(env.getExecutionPlan());
> 
> env.execute("re");
> }
> 
> 
> 
> static class  rf implements ReduceFunction> {
> 
> @Override
> public Tuple2 reduce(Tuple2 e, 
> Tuple2 n) throws Exception {
> return Tuple2.of(e.f0, e.f1 + n.f1);
> 
> }
> }
> 
> }


AskTimeoutException when canceling job with savepoint on flink 1.6.0

2018-09-05 Thread jelmer
I am trying to upgrade a job from flink 1.4.2 to 1.6.0

When we do a deploy we cancel the job with a savepoint then deploy the new
version of the job from that savepoint. Because our jobs tend to have a lot
of state it often takes multiple minutes for our savepoints to complete.

On flink 1.4.2 we set *akka.client.timeout* to a high value to make sure
the request did not timeout

However on flink 1.6.0 I get an *AskTimeoutException*  and increasing
*akka.client.timeout* only works if i apply it to the running flink process.
Applying it to just the flink client does nothing.

I am reluctant to configure this on the container itself because afaik it
applies to everything inside of flink's internal actor system not just to
creating savepoints.

What is the correct way to use cancel with savepoint for jobs with lots of
state in flink 1.6.0 ?

I Attached the error.
Cancelling job 9068efa57d6599e7d6a71b7f7eac7d2f with savepoint to default 
savepoint directory.


 The program finished with the following exception:

org.apache.flink.util.FlinkException: Could not cancel job 
9068efa57d6599e7d6a71b7f7eac7d2f.
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$6(CliFrontend.java:604)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:979)
at org.apache.flink.client.cli.CliFrontend.cancel(CliFrontend.java:596)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1053)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1120)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1692)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1120)
Caused by: java.util.concurrent.ExecutionException: 
java.util.concurrent.CompletionException: akka.pattern.AskTimeoutException: Ask 
timed out on [Actor[akka://flink/user/jobmanager_1#172582095]] after [1 
ms]. Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at 
java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895)
at 
org.apache.flink.client.program.rest.RestClusterClient.cancelWithSavepoint(RestClusterClient.java:407)
at 
org.apache.flink.client.cli.CliFrontend.lambda$cancel$6(CliFrontend.java:602)
... 9 more
Caused by: java.util.concurrent.CompletionException: 
akka.pattern.AskTimeoutException: Ask timed out on 
[Actor[akka://flink/user/jobmanager_1#172582095]] after [1 ms]. 
Sender[null] sent message of type 
"org.apache.flink.runtime.rpc.messages.LocalFencedMessage".
at 
java.util.concurrent.CompletableFuture.encodeRelay(CompletableFuture.java:326)
at 
java.util.concurrent.CompletableFuture.completeRelay(CompletableFuture.java:338)
at 
java.util.concurrent.CompletableFuture.uniRelay(CompletableFuture.java:911)
at 
java.util.concurrent.CompletableFuture$UniRelay.tryFire(CompletableFuture.java:899)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:770)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at 
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at 
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at 
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at 
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:603)
at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126)
at 
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at 
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at 
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at 
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329)
at 
akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280)
at 

Queryable State

2018-09-05 Thread Darshan Singh
Hi All,

I was playing with queryable state. As queryable stream can not be modified
how do I use the output of say my reduce function for further processing.

Below is 1 example. I can see that I am processing the data twice. One for
the Queryable stream and once for the my original stream. That means state
will be kept twice as well?

In simple term I would like to query the state from rf reduce function and
I would like my stream to be written to Kafka. If I use like below I am
able to do so but it seems to me that there is duplicate state storing as
well as duplicate work is being done.

Is there any alternate for what I am trying to achieve this?

Thanks

public class reducetest {

public static void main(String[] args) throws Exception {


// set up streaming execution environment
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();

env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.setParallelism(1);
List nums = Arrays.asList(1,2,3,4,5,6,7,8,9,10);

ReducingStateDescriptor> descriptor =
new ReducingStateDescriptor>(
"sum", // the state name
new rf(),
TypeInformation.of(new
TypeHint>() {})); // type information
descriptor.setQueryable("reduce");

DataStream> ds1 =
env.fromCollection(nums).map(new MapFunction>() {
@Override
public Tuple2 map(Integer integer)
throws Exception {
return Tuple2.of(integer%2,integer);
}
}).keyBy(0)
;

((KeyedStream) ds1).asQueryableState("reduce", descriptor);


DataStream ds2 = ((KeyedStream) ds1).reduce(
descriptor.getReduceFunction());

//ds1.print();
ds2.print();

System.out.println(env.getExecutionPlan());

env.execute("re");
}



static class  rf implements ReduceFunction> {

@Override
public Tuple2 reduce(Tuple2 e, Tuple2 n) throws Exception {
return Tuple2.of(e.f0, e.f1 + n.f1);

}
}

}


Re: Missing Calcite SQL functions in table API

2018-09-05 Thread Fabian Hueske
Hi

You are using SQL syntax in a Table API query. You have to stick to Table
API syntax or use SQL as

tEnv.sqlQuery("SELECT col1,CONCAT('field1:',col2,',field2:',CAST(col3 AS
string)) FROM csvTable")

The Flink documentation lists all supported functions for Table API [1] and
SQL [2].

Best, Fabian

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/tableApi.html
[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/table/sql.html

2018-09-05 12:22 GMT+02:00 françois lacombe :

> Hi all,
>
> I'm trying to use CONVERT or CAST functions from Calcite docs to query
> some table with Table API.
> https://calcite.apache.org/docs/reference.html
>
> csv_table.select("col1,CONCAT('field1:',col2,',field2:',CAST(col3 AS
> string))");
> col3 is actually described as int the CSV schema and CONCAT doesn't like
> it.
>
> An exception is thrown "Undefined function: CAST"
>
> The docs mention that SQL implementation is based on Calcite and is there
> a list of available functions please?
> May I skip some useful dependency?
>
>
> Thanks in advance
>
> François
>
>


Missing Calcite SQL functions in table API

2018-09-05 Thread françois lacombe
Hi all,

I'm trying to use CONVERT or CAST functions from Calcite docs to query some
table with Table API.
https://calcite.apache.org/docs/reference.html

csv_table.select("col1,CONCAT('field1:',col2,',field2:',CAST(col3 AS
string))");
col3 is actually described as int the CSV schema and CONCAT doesn't like it.

An exception is thrown "Undefined function: CAST"

The docs mention that SQL implementation is based on Calcite and is there a
list of available functions please?
May I skip some useful dependency?


Thanks in advance

François


ODP: How to add jvm Options when using Flink dashboard?

2018-09-05 Thread Dominik Wosiński
Hey,
You can’t as Chesnay have already said, but for your usecase you could use 
arguments instead of JVM option and they will work equally good. 
Best Regards,
Dom.


Wysłane z aplikacji Poczta dla Windows 10

Od: Chesnay Schepler
Wysłano: środa, 5 września 2018 11:43
Do: zpp; user@flink.apache.org
Temat: Re: How to add jvm Options when using Flink dashboard?

You can't set JVM options when submitting through the Dashboard. This cannot be 
implemented since no separate JVM is spun up when you submit a job that way.

On 05.09.2018 11:41, zpp wrote:
I wrote a task using Typesafe config. It must be pointed config file position 
using jvm Options like "-Dconfig.resource=dev.conf".
How can I do that with Flink dashboard?
Thanks for the help!



 




Re: How to add jvm Options when using Flink dashboard?

2018-09-05 Thread Chesnay Schepler
You can't set JVM options when submitting through the Dashboard. This 
cannot be implemented since no separate JVM is spun up when you submit a 
job that way.


On 05.09.2018 11:41, zpp wrote:
I wrote a task using Typesafe config. It must be pointed config file 
position using jvm Options like "-Dconfig.resource=dev.conf".

How can I do that with Flink dashboard?
Thanks for the help!






How to add jvm Options when using Flink dashboard?

2018-09-05 Thread zpp
I wrote a task using Typesafe config. It must be pointed config file position 
using jvm Options like "-Dconfig.resource=dev.conf".
How can I do that with Flink dashboard?
Thanks for the help!

Does Flink support keyed watermarks?

2018-09-05 Thread HarshithBolar
We collect driving data from thousands of users, each vehicle is associated
with a IMEI (unique code). The device installed in these vehicles emits GPS
points in 5 second intervals. My requirement is to assemble all the GPS
points that belong to a single trip and construct a Trip object, for a given
IMEI.

I am using event time and Session windows to detect the end of a trip (10
minutes of non reception of GPS coordinates), and another 15 minutes of
allowed lateness to wait for late events. The watermark then gets advanced
to the latest received event time. Let's say this is for IMEI=100. Now, if I
receive data for IMEI's 1 through 99 that have event time behind this
watermark, all that data will be deemed late and won't be processed.

In other words, if one vehicle's data advances the watermark, then the data
from all other vehicles will be considered late, because Watermarks are
global.

Given my problem, is there a way I can implement different watermarks for
different keys? If not directly possible, is there some way I can simulate
it to suit my application?

Any help would be greatly appreciated!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Does Flink support keyed watermarks? If not, is there any plan of implementing it in future versions? What are my alternatives?

2018-09-05 Thread HarshithBolar
We collect driving data from thousands of users, each vehicle is associated
with a IMEI (unique code). The device installed in these vehicles emits GPS
points in 5 second intervals. My requirement is to assemble all the GPS
points that belong to a single trip and construct a Trip object, for a given
IMEI.

I am using event time and Session windows to detect the end of a trip (10
minutes of non reception of GPS coordinates), and another 15 minutes of
allowed lateness to wait for late events. The watermark then gets advanced
to the latest received event time. Let's say this is for IMEI=100. Now, if I
receive data for IMEI's 1 through 99 that have event time behind this
watermark, all that data will be deemed late and won't be processed.

In other words, if one vehicle's data advances the watermark, then the data
from all other vehicles will be considered late, because Watermarks are
global.

Given my problem, is there a way I can implement different watermarks for
different keys? If not directly possible, is there some way I can simulate
it to suit my application?

Any help would be greatly appreciated!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Does Flink support keyed watermarks? If not, is there any plan of implementing it soon? What are my alternatives?

2018-09-05 Thread HarshithBolar
We collect driving data from thousands of users, each vehicle is associated
with a IMEI (unique code).  The device installed in these vehicles emits GPS
points in 5 second intervals. My requirement is to assemble all the GPS
points that belong to a single trip and construct a Trip object, for a given
IMEI.

I am using event time and Session windows to detect the end of a trip (10
minutes of non reception of GPS coordinates), and another 15 minutes of
allowed lateness to wait for late events. The watermark then gets advanced
to the latest received event time. Let's say this is for IMEI=100. Now, if I
receive data for IMEI's 1 through 99 that have event time behind this
watermark, all that data will be deemed late and won't be processed. 

In other words, if one vehicle's data advances in event time, then the data
from all other vehicles will be considered late, because Watermarks are
global.

Given my problem, is there a way I can implement different watermarks for
different streams? If not directly possible, is there some way I can
simulate it to suit my application?

Any help would be greatly appreciated!



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: flink list and flink run commands timeout

2018-09-05 Thread Chesnay Schepler
Please enable DEBUG logging for the client and TRACE logging for the 
cluster.


For the client, look for log messages starting with "Sending request 
of", this will contain the host and port that requests are sent to by 
the client. Verify that these are correct and match the host/port that 
you use when accessing the web UI.


For the server, look for log messages starting with "Received request", 
so we can figure out whether the request at least arrives.


On 05.09.2018 00:53, Jason Kania wrote:
I have upgraded from Flink 1.4.0 to Flink 1.5.3 with a three node 
cluster configured with HA. Now I am encountering an issue where the 
flink command line operations timeout. The exception generated is very 
poor because it only indicates a timeout and not the reason or what it 
was trying to do:


>./flink list -f
Waiting for response...


 The program finished with the following exception:
org.apache.flink.util.FlinkException: Failed to retrieve job list.
at 
org.apache.flink.client.cli.CliFrontend.listJobs(CliFrontend.java:433)
at 
org.apache.flink.client.cli.CliFrontend.lambda$list$0(CliFrontend.java:416)
at 
org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:960)
at 
org.apache.flink.client.cli.CliFrontend.list(CliFrontend.java:413)
at 
org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1028)
at 
org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at 
org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at 
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: 
org.apache.flink.runtime.concurrent.FutureUtils$RetryException: Could 
not complete the operation. Exception is not retryable.
at 
org.apache.flink.runtime.concurrent.FutureUtils.lambda$retryOperationWithDelay$5(FutureUtils.java:213)
at 
java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760)
at 
java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736)
at 
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at 
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at 
org.apache.flink.runtime.concurrent.FutureUtils$Timeout.run(FutureUtils.java:793)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.CompletionException: 
java.util.concurrent.TimeoutException
at 
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at 
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at 
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at 
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)

... 10 more
Caused by: java.util.concurrent.TimeoutException

The web interface shows the 2 job managers and 3 task managers that 
are talking with one another.


I have looked at the zookeeper data and it is all present.

I have tried running the command on multiple nodes and they all give 
the same error.


I looked for a verbose or debug option for the commands but found nothing.

Suggestions on this?

Thanks,

Jason





Re: Flink 1.5.2 query

2018-09-05 Thread Chesnay Schepler
You will either have to create a custom Flink build (i.e. there's no 
option to change this behavior) or disable all WARN messages by the 
CliFrontend class with the logger configuration.


On 03.09.2018 15:21, Sarathy, Parth wrote:

The message is in WARN level and not in INFO or DEBUG level , so our log 
analysis tool considers this as an issue. Can the CLI print this message in 
INFO/DEBUG level?

-Original Message-
From: Chesnay Schepler [mailto:ches...@apache.org]
Sent: Monday, September 3, 2018 6:32 PM
To: Parth Sarathy ; user@flink.apache.org
Subject: Re: Flink 1.5.2 query

Cannot be avoided. The CLI eagerly loads client classes for yarn, which as see 
fails since the hadoop classes aren't available.
If you don't use YARN you can safely ignore this.

On 03.09.2018 14:37, Parth Sarathy wrote:

Hi,
When using flink 1.5.2, “Apache Flink Only” binary
(flink-1.5.2-bin-scala_2.11), following  error is seen in client log:

2018-08-30 10:56:59.129 [main] WARN
org.apache.flink.client.cli.CliFrontend
- Could not load CLI class org.apache.flink.yarn.cli.FlinkYarnSessionCli.
java.lang.NoClassDefFoundError:
org/apache/hadoop/yarn/exceptions/YarnException
  at java.lang.Class.forName0(Native Method) ~[na:1.8.0_171]
  at java.lang.Class.forName(Class.java:264) ~[na:1.8.0_171]
  at
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLine(CliFront
end.java:1208)
[flink-dist_2.11-1.5.2.jar:1.5.2]
  at
org.apache.flink.client.cli.CliFrontend.loadCustomCommandLines(CliFron
tend.java:1164)
[flink-dist_2.11-1.5.2.jar:1.5.2]
  at
org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1090)
[flink-dist_2.11-1.5.2.jar:1.5.2]
Caused by: java.lang.ClassNotFoundException:
org.apache.hadoop.yarn.exceptions.YarnException
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
~[na:1.8.0_171]
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
~[na:1.8.0_171]
  at
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
~[na:1.8.0_171]
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
~[na:1.8.0_171]
  ... 5 common frames omitted

There are no functional failures, but how to avoid this exception in log?

Thanks,
Parth Sarathy




--
Sent from:
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/





Re: Ask about running multiple jars for different stream jobs

2018-09-05 Thread Chesnay Schepler
The most likely explanation is that you haven't configured the cluster 
to have enough resources to run multiple jobs.
Try increasing taskmanager.numberOfTaskSlots: 
https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#taskmanager-numberoftaskslots-1


On 04.09.2018 15:13, Rad Rad wrote:

Hi All,

Kindly, could I ask about running multiple jars for different stream jobs in
a cluster? Since I have two jars, each one joins different data streams. I
submitted the first one and it works fine, when I tried to submit the second
one, it failed. When I go to running tab i.e.
http://mycluster:8081/#/running-jobs,  just the first one is running.

The same problem is found when I tried on standalone.
My question is how I can I run two jars ( streaming jobs) in the same time
on Flink cluster?


Thanks.
   




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/





Re: 答复: Flink1.6.0 submit job and got "No content to map due to end-of-input" Error

2018-09-05 Thread Chesnay Schepler
Did you upgrade both the client and cluster to 1.6.0? The server 
returned a completely empty response which shouldn't be possible if it 
runs 1.6.0.


On 05.09.2018 07:27, 潘 功森 wrote:


Hi  Vino,

Below are dependencies I used,please have a look.

I floud it also inclued flink-connector-kafka-0.10_2.11-1.6.0.jar and 
flink-connector-kafka-0.9_2.11-1.6.0.jar, and I don’t know if it has 
any effect?


yours,

Gongsen

发送自Windows 10 版邮件 
应用



*发件人:* vino yang 
*发送时间:* Wednesday, September 5, 2018 10:35:59 AM
*收件人:* d...@flink.apache.org
*抄送:* user
*主题:* Re: Flink1.6.0 submit job and got "No content to map due to 
end-of-input" Error

Hi Pangongsen,

Do you upgrade the Flink-related dependencies you use at the same 
time? In other words, is the dependency consistent with the flink version?


Thanks, vino.

? ?? mailto:pangong...@hotmail.com>> 
于2018年9月4日周二 下午10:07写道:


Hi all,
 I use below way to submit jar to Flink :

StreamExecutionEnvironment env =
StreamExecutionEnvironment.createRemoteEnvironment(config.clusterIp,
config.clusterPort,
config.clusterFlinkJar);


I used Flink 1.3.2 before, and it works fine. But I upgrade it
to 1.6.0, and I got the error below:

2018-09-04 21:38:32.039 [ERROR] [flink-rest-client-netty-19-1]
org.apache.flink.runtime.rest.RestClient - Unexpected plain-text
response:

2018-09-04 21:38:32.137 [ERROR] [flink-rest-client-netty-18-1]
org.apache.flink.runtime.rest.RestClient - Response was not valid
JSON.


org.apache.flink.shaded.jackson2.com.fasterxml.jackson.databind.JsonMappingException:
No content to map due to end-of-input


Could you give me some advice to fix it?

yours,
Gongsen