Re: Flink REST API async?

2017-10-19 Thread Francisco Gonzalez Barea
Hello,

Going back on this thread, quick question: Will this be supported in next Flink 
version? If not, when is it expected to be included?

Regards


On 8 Aug 2017, at 15:46, Aljoscha Krettek 
<aljos...@apache.org<mailto:aljos...@apache.org>> wrote:

I quickly talked to Till about this. The new JobManager, once FLIP-6 is 
implemented, will have a new REST endpoint that allows submitting a JobGraph 
directly. With this, we no longer have to execute the user main() method in the 
WebRuntimeMonitor (which is a component that the current JobManager process 
loads to serve the web frontend and the REST interface).

This should solve the problem, but unfortunately it doesn't solve your current 
problem.

Best,
Aljoscha
On 8. Aug 2017, at 10:26, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:

Aha ok… Thanks for your answer Eron.

Regards


On 7 Aug 2017, at 19:04, Eron Wright 
<eronwri...@gmail.com<mailto:eronwri...@gmail.com>> wrote:

When you submit a program via the REST API, the main method executes inside the 
JobManager process.Unfortunately a static variable is used to establish the 
execution environment that the program obtains from 
`ExecutionEnvironment.getExecutionEnvironment()`.  From the stack trace it 
appears that two main methods are executing simultaneously and one is 
corrupting the other.

On Mon, Aug 7, 2017 at 8:21 AM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hi there!

We are doing some POCs submitting jobs remotely to Flink. We tried with Flink 
CLI and now we´re testing the Rest API.

So the point is that when we try to execute a set of requests in an async way 
(using CompletableFutures) only a couple of them run successfully. For the rest 
we get the exception copied at the end of the email.

Do you know the reason for this?

Thanks in advance!!
Regards,

org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error.
  at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
  at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
  at 
org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80)
 at 
org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318)
 at 
org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72)
 at 
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61)
 at 
org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
 at 
io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
 at 
io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159)
 at 
org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.handler.codec.ByteToMessageD

Re: Problem in Flink 1.3.2 with Mesos task managers offers

2017-09-20 Thread Francisco Gonzalez Barea
Hello Eron,

Thank you for your reply, we will take a look at this.

Regards


On 19 Sep 2017, at 22:37, Eron Wright 
<eronwri...@gmail.com<mailto:eronwri...@gmail.com>> wrote:

Hello, the current behavior is that Flink holds onto received offers for up to 
two minutes while it attempts to provision the TMs.   Flink can combine small 
offers to form a single TM, to combat fragmentation that develops over time in 
a Mesos cluster.   Are you saying that unused offers aren't being released 
after two minutes?

There's a log entry you should see in the JM log whenever an offer is released:
LOG.info<http://LOG.info>(s"Declined offer ${lease.getId} from 
${lease.hostname()} "
  + s"of ${lease.memoryMB()} MB, ${lease.cpuCores()} cpus.")

The timeout value isn't configurable at the moment, but if you're willing to 
experiment by building Flink from source, you may adjust the two minute timeout 
to something lower as follows.   In the `MesosFlinkResourceManager` class, edit 
the `createOptimizer` method to call `withLeaseOfferExpirySecs` on the 
`TaskScheduler.Builder` object.

Let us know if that helps and we'll make the timeout configurable.
-Eron

On Tue, Sep 19, 2017 at 8:58 AM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hello guys,

We have a flink 1.3.2 session deployed from Marathon json to Mesos with some of 
the following parameters as environment variables:


"flink_mesos.initial-tasks": "8",
"flink_mesos.resourcemanager.tasks.mem": "4096",

And other environment variables including zookeeper, etc.

The mesos cluster is used for diferents applications (kafka, ad-hoc...), and 
have fragmentation into the agents. Our problem is that the flink session is 
getting all offers, even small ones. In case there are not enough offers to 
suit that configuration, it gets all of them, so there are no resources and 
offers free for other applications.

So the question would be what is the right configuration in these cases to 
avoid using all resources for the same flink session.

Thanks in advance.
Regards

This message is private and confidential. If you have received this message in 
error, please notify the sender or 
serviced...@piksel.com<mailto:serviced...@piksel.com> and remove it from your 
system.

Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road 
SE, Suite 400, Atlanta, GA 
30339<https://maps.google.com/?q=2100+Powers+Ferry+Road+SE,+Suite+400,+Atlanta,+GA+30339=gmail=g>




Problem in Flink 1.3.2 with Mesos task managers offers

2017-09-19 Thread Francisco Gonzalez Barea
Hello guys,

We have a flink 1.3.2 session deployed from Marathon json to Mesos with some of 
the following parameters as environment variables:


"flink_mesos.initial-tasks": "8",
"flink_mesos.resourcemanager.tasks.mem": "4096",

And other environment variables including zookeeper, etc.

The mesos cluster is used for diferents applications (kafka, ad-hoc...), and 
have fragmentation into the agents. Our problem is that the flink session is 
getting all offers, even small ones. In case there are not enough offers to 
suit that configuration, it gets all of them, so there are no resources and 
offers free for other applications.

So the question would be what is the right configuration in these cases to 
avoid using all resources for the same flink session.

Thanks in advance.
Regards

This message is private and confidential. If you have received this message in 
error, please notify the sender or serviced...@piksel.com and remove it from 
your system.

Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road 
SE, Suite 400, Atlanta, GA 30339


Query Rpc address/port remotely?

2017-08-17 Thread Francisco Gonzalez Barea
Hey guys!

I´ve got a new question.

Having a Flink v1.3.0 running on Mesos, is there any remotely way (rest, or 
another) to query which is the rpc address and port to connect that Flink via 
akka?

We´ve been taking a look at the rest API but the config endpoint doesn’t seem 
to provide this information.

Thanks in advance!

Regards,
Kurro
This message is private and confidential. If you have received this message in 
error, please notify the sender or serviced...@piksel.com and remove it from 
your system.

Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road 
SE, Suite 400, Atlanta, GA 30339


Re: Flink REST API async?

2017-08-08 Thread Francisco Gonzalez Barea
Aha ok… Thanks for your answer Eron.

Regards


On 7 Aug 2017, at 19:04, Eron Wright 
<eronwri...@gmail.com<mailto:eronwri...@gmail.com>> wrote:

When you submit a program via the REST API, the main method executes inside the 
JobManager process.Unfortunately a static variable is used to establish the 
execution environment that the program obtains from 
`ExecutionEnvironment.getExecutionEnvironment()`.  From the stack trace it 
appears that two main methods are executing simultaneously and one is 
corrupting the other.

On Mon, Aug 7, 2017 at 8:21 AM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hi there!

We are doing some POCs submitting jobs remotely to Flink. We tried with Flink 
CLI and now we´re testing the Rest API.

So the point is that when we try to execute a set of requests in an async way 
(using CompletableFutures) only a couple of them run successfully. For the rest 
we get the exception copied at the end of the email.

Do you know the reason for this?

Thanks in advance!!
Regards,

org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error.
  at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
  at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
  at 
org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80)
 at 
org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318)
 at 
org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72)
 at 
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61)
 at 
org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
 at 
io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
 at 
io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159)
 at 
org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
 at 
io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at io.netty.channel.nio.NioEventLoop.r

Flink REST API async?

2017-08-07 Thread Francisco Gonzalez Barea
Hi there!

We are doing some POCs submitting jobs remotely to Flink. We tried with Flink 
CLI and now we´re testing the Rest API.

So the point is that when we try to execute a set of requests in an async way 
(using CompletableFutures) only a couple of them run successfully. For the rest 
we get the exception copied at the end of the email.

Do you know the reason for this?

Thanks in advance!!
Regards,

org.apache.flink.client.program.ProgramInvocationException: The main method 
caused an error.
  at 
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:545)
  at 
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:419)
  at 
org.apache.flink.client.program.OptimizerPlanEnvironment.getOptimizedPlan(OptimizerPlanEnvironment.java:80)
 at 
org.apache.flink.client.program.ClusterClient.getOptimizedPlan(ClusterClient.java:318)
 at 
org.apache.flink.runtime.webmonitor.handlers.JarActionHandler.getJobGraphAndClassLoader(JarActionHandler.java:72)
 at 
org.apache.flink.runtime.webmonitor.handlers.JarRunHandler.handleJsonRequest(JarRunHandler.java:61)
 at 
org.apache.flink.runtime.webmonitor.handlers.AbstractJsonRequestHandler.handleRequest(AbstractJsonRequestHandler.java:41)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandler.respondAsLeader(RuntimeMonitorHandler.java:109)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:97)
 at 
org.apache.flink.runtime.webmonitor.RuntimeMonitorHandlerBase.channelRead0(RuntimeMonitorHandlerBase.java:44)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at io.netty.handler.codec.http.router.Handler.routed(Handler.java:62)
 at 
io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:57)
 at 
io.netty.handler.codec.http.router.DualAbstractHandler.channelRead0(DualAbstractHandler.java:20)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:159)
 at 
org.apache.flink.runtime.webmonitor.HttpRequestHandler.channelRead0(HttpRequestHandler.java:65)
 at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:242)
 at 
io.netty.channel.CombinedChannelDuplexHandler.channelRead(CombinedChannelDuplexHandler.java:147)
 at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:339)
 at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:324)
 at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:847)
 at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
 at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
 at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
 at 
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)
 at java.lang.Thread.run(Thread.java:748)
 \nCaused by: java.util.ConcurrentModificationException
 at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
 at java.util.ArrayList$Itr.next(ArrayList.java:851)
 at 
org.apache.flink.api.java.operators.OperatorTranslation.translateToPlan(OperatorTranslation.java:49)
 at 
org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:1065)
 at 

Re: Flink CLI cannot submit job to Flink on Mesos

2017-08-01 Thread Francisco Gonzalez Barea
Hey! It´s working now!!

I will do a summary for those who might have the same problem in the future:

- Flink 1.3.0 dockerized on Mesos:
- Add the HA configuration values in your flink app: 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#config-file-flink-confyaml
- Add the Mesos HA configuration values in your flink app: 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#high-availability

- Flink CLI 1.3.0 on my local machine (make sure you use the same version!!)
- Add same HA configuration values in your flink CLI configuration: 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#config-file-flink-confyaml


With those steps, my ./fink run command it´s working like a charm.

Thank you very much guys!

Regards,
Francisco


<https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#config-file-flink-confyaml>
<https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#config-file-flink-confyaml><https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/jobmanager_high_availability.html#config-file-flink-confyaml>

On 1 Aug 2017, at 10:24, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:

Hi Stephan,

So, do you mean to remove the “-m” param from the flink CLI call? And on the 
other hand, that I should add the Zookeeper configuration in both sides, the 
remote flink and locally in the flink CLI config, right?

Regards


On 31 Jul 2017, at 22:21, Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> wrote:

Hi Francisco!

Can you drop the explicit address of the jobmanager? The client should pick up 
that address automatically from ZooKeeper as well (together with the HA leader 
session ID).

Please check if you have the ZooKeeper HA config entries in the config used by 
the CLI.

Stephan


On Mon, Jul 31, 2017 at 6:27 PM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hi again,

On the other hand, we are running the following flink CLI command:

./flink run -d -m ${jobmanager.rpc.address}:${jobmanager.rpc.port}  
${our-program-jar} ${our-program-params}

Maybe is the command what we are using wrongly?

Thank you

On 28 Jul 2017, at 11:07, Till Rohrmann 
<trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:

Hi Francisco,

have you set the right high-availability configuration options in your client 
configuration as described here [1]? If not, then Flink is not able to find the 
correct JobManager because it retrieves the address as well as a fencing token 
(called leader session id) from the HA store (ZooKeeper).

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#high-availability

Cheers,
Till

On Thu, Jul 27, 2017 at 6:20 PM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hello,

We´re having lot of issues while trying to submit a job remotely using the 
Flink CLI command line tool. We have tried different configurations but in all 
of them we get errors from AKKA while trying to connect. I will try to 
summarise the configurations we´ve tried.

- Flink 1.3.0 deployed within a docker container on a Mesos cluster (using 
Marathon)
- This flink has the property jobmanager.rpc.address as a hostname (i.e. kind 
of ip-X.eu<http://ip-x.eu/>.west-1.compute.internal)
- Use the same version for Flink Client remotely (e.g. in my laptop).

When I try to submit the job using the command flink run -m myHostName:myPort 
(the same in jobmanager.rpc.address and jobmanager.rpc.port) after some time 
waiting I get the trace at the end of this email. In the flink side we get this 
error from AKKA:

Association with remote system [akka.tcp://flink@10.203.23.24:24469] has 
failed, address is now gated for [5000] ms. Reason: [Association failed with 
[akka.tcp://flink@10.203.23.24:24469]] Caused by: [Connection refused: 
/10.203.23.24:24469<http://10.203.23.24:24469/>]

After reading a bit, it seems there´re some problems related to akka resolving 
hostnames to ips, so we decided to startup the same flink but changing 
jobmanager.rpc.address to have the direct ip (i.e. kind of XX.XXX.XX.XX). In 
this case I´m getting same trace (at the end of the email) from the client side 
and this one from the Flink server:

Discard message 
LeaderSessionMessage(----,SubmitJob(JobGraph(jobId:
 b25d5c5ced962632abc5ee9ef867792e),DETACHED)) because the expected leader 
session ID b4f53899-5d70-467e-8e9d-e56eeb60b6e3 did not equal the received 
leader session ID ----.

We have tried some other stuff but without success… any clue that could help us?

T

Re: Flink CLI cannot submit job to Flink on Mesos

2017-08-01 Thread Francisco Gonzalez Barea
Hi Stephan,

So, do you mean to remove the “-m” param from the flink CLI call? And on the 
other hand, that I should add the Zookeeper configuration in both sides, the 
remote flink and locally in the flink CLI config, right?

Regards


On 31 Jul 2017, at 22:21, Stephan Ewen 
<se...@apache.org<mailto:se...@apache.org>> wrote:

Hi Francisco!

Can you drop the explicit address of the jobmanager? The client should pick up 
that address automatically from ZooKeeper as well (together with the HA leader 
session ID).

Please check if you have the ZooKeeper HA config entries in the config used by 
the CLI.

Stephan


On Mon, Jul 31, 2017 at 6:27 PM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hi again,

On the other hand, we are running the following flink CLI command:

./flink run -d -m ${jobmanager.rpc.address}:${jobmanager.rpc.port}  
${our-program-jar} ${our-program-params}

Maybe is the command what we are using wrongly?

Thank you

On 28 Jul 2017, at 11:07, Till Rohrmann 
<trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:

Hi Francisco,

have you set the right high-availability configuration options in your client 
configuration as described here [1]? If not, then Flink is not able to find the 
correct JobManager because it retrieves the address as well as a fencing token 
(called leader session id) from the HA store (ZooKeeper).

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#high-availability

Cheers,
Till

On Thu, Jul 27, 2017 at 6:20 PM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hello,

We´re having lot of issues while trying to submit a job remotely using the 
Flink CLI command line tool. We have tried different configurations but in all 
of them we get errors from AKKA while trying to connect. I will try to 
summarise the configurations we´ve tried.

- Flink 1.3.0 deployed within a docker container on a Mesos cluster (using 
Marathon)
- This flink has the property jobmanager.rpc.address as a hostname (i.e. kind 
of ip-X.eu<http://ip-x.eu/>.west-1.compute.internal)
- Use the same version for Flink Client remotely (e.g. in my laptop).

When I try to submit the job using the command flink run -m myHostName:myPort 
(the same in jobmanager.rpc.address and jobmanager.rpc.port) after some time 
waiting I get the trace at the end of this email. In the flink side we get this 
error from AKKA:

Association with remote system [akka.tcp://flink@10.203.23.24:24469] has 
failed, address is now gated for [5000] ms. Reason: [Association failed with 
[akka.tcp://flink@10.203.23.24:24469]] Caused by: [Connection refused: 
/10.203.23.24:24469<http://10.203.23.24:24469/>]

After reading a bit, it seems there´re some problems related to akka resolving 
hostnames to ips, so we decided to startup the same flink but changing 
jobmanager.rpc.address to have the direct ip (i.e. kind of XX.XXX.XX.XX). In 
this case I´m getting same trace (at the end of the email) from the client side 
and this one from the Flink server:

Discard message 
LeaderSessionMessage(----,SubmitJob(JobGraph(jobId:
 b25d5c5ced962632abc5ee9ef867792e),DETACHED)) because the expected leader 
session ID b4f53899-5d70-467e-8e9d-e56eeb60b6e3 did not equal the received 
leader session ID ----.

We have tried some other stuff but without success… any clue that could help us?

Thanks in advance!

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: JobManager did not respond within 6 milliseconds
at 
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:454)
at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:99)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
at 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:345)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
at 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at 
javax.security.auth.Subject.do<http://javax.security.auth.subject.do/>As(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)

Re: Flink CLI cannot submit job to Flink on Mesos

2017-07-31 Thread Francisco Gonzalez Barea
Hi again,

On the other hand, we are running the following flink CLI command:

./flink run -d -m ${jobmanager.rpc.address}:${jobmanager.rpc.port}  
${our-program-jar} ${our-program-params}

Maybe is the command what we are using wrongly?

Thank you

On 28 Jul 2017, at 11:07, Till Rohrmann 
<trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:

Hi Francisco,

have you set the right high-availability configuration options in your client 
configuration as described here [1]? If not, then Flink is not able to find the 
correct JobManager because it retrieves the address as well as a fencing token 
(called leader session id) from the HA store (ZooKeeper).

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#high-availability

Cheers,
Till

On Thu, Jul 27, 2017 at 6:20 PM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hello,

We´re having lot of issues while trying to submit a job remotely using the 
Flink CLI command line tool. We have tried different configurations but in all 
of them we get errors from AKKA while trying to connect. I will try to 
summarise the configurations we´ve tried.

- Flink 1.3.0 deployed within a docker container on a Mesos cluster (using 
Marathon)
- This flink has the property jobmanager.rpc.address as a hostname (i.e. kind 
of ip-X.eu<http://ip-x.eu/>.west-1.compute.internal)
- Use the same version for Flink Client remotely (e.g. in my laptop).

When I try to submit the job using the command flink run -m myHostName:myPort 
(the same in jobmanager.rpc.address and jobmanager.rpc.port) after some time 
waiting I get the trace at the end of this email. In the flink side we get this 
error from AKKA:

Association with remote system [akka.tcp://flink@10.203.23.24:24469] has 
failed, address is now gated for [5000] ms. Reason: [Association failed with 
[akka.tcp://flink@10.203.23.24:24469]] Caused by: [Connection refused: 
/10.203.23.24:24469<http://10.203.23.24:24469/>]

After reading a bit, it seems there´re some problems related to akka resolving 
hostnames to ips, so we decided to startup the same flink but changing 
jobmanager.rpc.address to have the direct ip (i.e. kind of XX.XXX.XX.XX). In 
this case I´m getting same trace (at the end of the email) from the client side 
and this one from the Flink server:

Discard message 
LeaderSessionMessage(----,SubmitJob(JobGraph(jobId:
 b25d5c5ced962632abc5ee9ef867792e),DETACHED)) because the expected leader 
session ID b4f53899-5d70-467e-8e9d-e56eeb60b6e3 did not equal the received 
leader session ID ----.

We have tried some other stuff but without success… any clue that could help us?

Thanks in advance!

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: JobManager did not respond within 6 milliseconds
at 
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:454)
at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:99)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
at 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:345)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
at 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
Caused by: org.apache.flink.runtime.client.JobTimeoutException: JobManager did 
not respond within 6 milliseconds
at 
org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:426)
at 
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:451)
... 15 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[6 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:1

Re: Flink CLI cannot submit job to Flink on Mesos

2017-07-31 Thread Francisco Gonzalez Barea
Hi Till,

Thanks for your answer.

We have reviewed the configuration and everything seems fine in our side…  But 
we´re still getting the message:

“Discard message 
LeaderSessionMessage(----,SubmitJob(JobGraph(jobId:
 041b67c7ef765c2f61bd69c2b9dacbce),DETACHED)) because the expected leader 
session ID 9e9e4e4b-1236-4140-9156-fd207929aab5 did not equal the received 
leader session ID ----.”

The point is we have another configuration using Flink 1.1.3 on YARN, and it´s 
working cool. And if I take a look at the configuration values, the main 
difference I can see (apart from mesos/yarn config parameters) is that in yarn 
the jobmanager.rpc.address is an ip and on mesos it´s a hostname. Might this be 
related?

Thanks in advance.


On 28 Jul 2017, at 11:07, Till Rohrmann 
<trohrm...@apache.org<mailto:trohrm...@apache.org>> wrote:

Hi Francisco,

have you set the right high-availability configuration options in your client 
configuration as described here [1]? If not, then Flink is not able to find the 
correct JobManager because it retrieves the address as well as a fencing token 
(called leader session id) from the HA store (ZooKeeper).

[1] 
https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/mesos.html#high-availability

Cheers,
Till

On Thu, Jul 27, 2017 at 6:20 PM, Francisco Gonzalez Barea 
<francisco.gonza...@piksel.com<mailto:francisco.gonza...@piksel.com>> wrote:
Hello,

We´re having lot of issues while trying to submit a job remotely using the 
Flink CLI command line tool. We have tried different configurations but in all 
of them we get errors from AKKA while trying to connect. I will try to 
summarise the configurations we´ve tried.

- Flink 1.3.0 deployed within a docker container on a Mesos cluster (using 
Marathon)
- This flink has the property jobmanager.rpc.address as a hostname (i.e. kind 
of ip-X.eu<http://ip-x.eu/>.west-1.compute.internal)
- Use the same version for Flink Client remotely (e.g. in my laptop).

When I try to submit the job using the command flink run -m myHostName:myPort 
(the same in jobmanager.rpc.address and jobmanager.rpc.port) after some time 
waiting I get the trace at the end of this email. In the flink side we get this 
error from AKKA:

Association with remote system [akka.tcp://flink@10.203.23.24:24469] has 
failed, address is now gated for [5000] ms. Reason: [Association failed with 
[akka.tcp://flink@10.203.23.24:24469]] Caused by: [Connection refused: 
/10.203.23.24:24469<http://10.203.23.24:24469/>]

After reading a bit, it seems there´re some problems related to akka resolving 
hostnames to ips, so we decided to startup the same flink but changing 
jobmanager.rpc.address to have the direct ip (i.e. kind of XX.XXX.XX.XX). In 
this case I´m getting same trace (at the end of the email) from the client side 
and this one from the Flink server:

Discard message 
LeaderSessionMessage(----,SubmitJob(JobGraph(jobId:
 b25d5c5ced962632abc5ee9ef867792e),DETACHED)) because the expected leader 
session ID b4f53899-5d70-467e-8e9d-e56eeb60b6e3 did not equal the received 
leader session ID ----.

We have tried some other stuff but without success… any clue that could help us?

Thanks in advance!

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: JobManager did not respond within 6 milliseconds
at 
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:454)
at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:99)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
at 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:345)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
at 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
Caused by: org.apache.flink.runtime.client.JobTimeoutException: JobManager did 
not respond within 6 milliseconds
at 
org.apache.flink.runtime.client.JobClient.

Flink CLI cannot submit job to Flink on Mesos

2017-07-27 Thread Francisco Gonzalez Barea
Hello,

We´re having lot of issues while trying to submit a job remotely using the 
Flink CLI command line tool. We have tried different configurations but in all 
of them we get errors from AKKA while trying to connect. I will try to 
summarise the configurations we´ve tried.

- Flink 1.3.0 deployed within a docker container on a Mesos cluster (using 
Marathon)
- This flink has the property jobmanager.rpc.address as a hostname (i.e. kind 
of ip-X.eu.west-1.compute.internal)
- Use the same version for Flink Client remotely (e.g. in my laptop).

When I try to submit the job using the command flink run -m myHostName:myPort 
(the same in jobmanager.rpc.address and jobmanager.rpc.port) after some time 
waiting I get the trace at the end of this email. In the flink side we get this 
error from AKKA:

Association with remote system [akka.tcp://flink@10.203.23.24:24469] has 
failed, address is now gated for [5000] ms. Reason: [Association failed with 
[akka.tcp://flink@10.203.23.24:24469]] Caused by: [Connection refused: 
/10.203.23.24:24469]

After reading a bit, it seems there´re some problems related to akka resolving 
hostnames to ips, so we decided to startup the same flink but changing 
jobmanager.rpc.address to have the direct ip (i.e. kind of XX.XXX.XX.XX). In 
this case I´m getting same trace (at the end of the email) from the client side 
and this one from the Flink server:

Discard message 
LeaderSessionMessage(----,SubmitJob(JobGraph(jobId:
 b25d5c5ced962632abc5ee9ef867792e),DETACHED)) because the expected leader 
session ID b4f53899-5d70-467e-8e9d-e56eeb60b6e3 did not equal the received 
leader session ID ----.

We have tried some other stuff but without success… any clue that could help us?

Thanks in advance!

org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: JobManager did not respond within 6 milliseconds
at 
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:454)
at 
org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:99)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:400)
at 
org.apache.flink.client.program.DetachedEnvironment.finalizeExecute(DetachedEnvironment.java:76)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:345)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:831)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:256)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1073)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1120)
at org.apache.flink.client.CliFrontend$2.call(CliFrontend.java:1117)
at 
org.apache.flink.runtime.security.HadoopSecurityContext$1.run(HadoopSecurityContext.java:43)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
at 
org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:40)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1116)
Caused by: org.apache.flink.runtime.client.JobTimeoutException: JobManager did 
not respond within 6 milliseconds
at 
org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:426)
at 
org.apache.flink.client.program.ClusterClient.runDetached(ClusterClient.java:451)
... 15 more
Caused by: java.util.concurrent.TimeoutException: Futures timed out after 
[6 milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223)
at scala.concurrent.Await$$anonfun$result$1.apply(package.scala:190)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.result(package.scala:190)
at scala.concurrent.Await.result(package.scala)
at 
org.apache.flink.runtime.client.JobClient.submitJobDetached(JobClient.java:423)
... 16 more




This message is private and confidential. If you have received this message in 
error, please notify the sender or serviced...@piksel.com and remove it from 
your system.

Piksel Inc is a company registered in the United States, 2100 Powers Ferry Road 
SE, Suite 400, Atlanta, GA 30339