Re: REST: reading completed jobs' details

Miguel Coimbra Thu, 06 Sep 2018 10:36:02 -0700

Exactly, that was the problem.
Didn't realize the restructured cluster channels all communications to the
REST port.


Thanks again.

Best,

On Thu, 6 Sep 2018 at 17:57, Chesnay Schepler <ches...@apache.org> wrote:

> Did you by chance use the RemoteEnvironment and pass in 6123 as the port?
> If so, try using 8081 instead, which is the REST port.
>
> On 06.09.2018 18:24, Miguel Coimbra wrote:
>
> Hello Chesnay,
>
> Thanks for the information.
>
> Decided to move straight away to launching a standalone cluster.
> I'm now having another problem when trying to submit a job through my Java
> program after launching the standalone cluster.
>
> I configured the cluster (flink-1.6.0/conf/flink-conf.yaml) to use 2
> TaskManager instances and assigned port ranges for most Flink cluster
> entities (to avoid port collisions with more than 1 TaskManager):
>
> query.server.ports: 30000-35000
> query.proxy.ports: 35001-40000
> taskmanager.rpc.port: 45001-50000
> taskmanager.data.port: 50001-55000
> blob.server.port: 55001-60000
>
> I'm launching in Linux with:
>
> ./start-cluster.sh
>
> Starting cluster.
> Starting standalonesession daemon on host xxxxxxx.
> Starting taskexecutor daemon on host xxxxxxx.
> [INFO] 1 instance(s) of taskexecutor are already running on xxxxxxx.
> Starting taskexecutor daemon on host xxxxxxx.
>
>
> However, my Java program ends up hanging as soon as I perform an execute()
> call (for example by calling count() on a DataSet).
>
> Checking the JobManager log, I find the following exception whenever my
> Java program calls execute() over the ExecutionEnvironment (either using
> Maven on the terminal or from IntelliJ IDEA):
>
> WARN  akka.remote.transport.netty.NettyTransport                    -
> Remote connection to [/127.0.0.1:47774] failed with
> org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException:
> Adjusted frame length exceeds 10485760: 1347375960 - discarded
>
> I checked that the problem is happening on a count(), so I don't think it
> has to do with the JobManager/TaskManagers trying to exchange
> excessively-big messages.
>
> While searching, I tried to make sure my program compiles with the same
> library versions as those in this cluster version of Flink.
>
>
> I downloaded the Apache Flink 1.6 binaries to launch the cluster:
>
>
>
> https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz
>
>
> I then checked the library versions used in the pom.xml of the 1.6.0
> branch of the Flink repository:
>
>
> https://github.com/apache/flink/blob/release-1.6/pom.xml
>
> On my project's pom.xml, I have the following:
>
> <properties>
>    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
>    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
>    <maven.compiler.source>1.8</maven.compiler.source>
>    <maven.compiler.target>1.8</maven.compiler.target>
>    <flink.version>1.6.0</flink.version>   <slf4j.version>1.7.7</slf4j.version>
>    <log4j.version>1.2.17</log4j.version>
>    <scala.version>2.11.12</scala.version>
>    <scala.binary.version>2.11</scala.binary.version>
>    <akka.version>2.4.20</akka.version>
>    <junit.version>4.12</junit.version>
>    <junit.jupiter.version>5.0.0</junit.jupiter.version>
>    <junit.vintage.version>${junit.version}.1</junit.vintage.version>
>    <junit.platform.version>1.0.1</junit.platform.version>
>    <aspectj.version>1.9.1</aspectj.version></properties>
>
>
> My project's dependency versions match those of the Flink 1.6 repository
> (for libraries such as akka).
> However, I'm having difficulty understanding what else may be causing this
> problem.
>
> Thanks for your attention.
>
> Best,
>
> On Wed, 5 Sep 2018 at 20:18, Chesnay Schepler <ches...@apache.org> wrote:
>
>> No, the cluster isn't shared. For each job a separate cluster is spun up
>> when calling execute(), at the end of which it is shut down.
>>
>> For explicitly creation and shutdown of a cluster I would suggest to
>> execute your jobs as a test that contains a MiniClusterResource.
>>
>> On 05.09.2018 20:59, Miguel Coimbra wrote:
>>
>> Thanks for the reply.
>>
>> However, I think my case differs because I am running a sequence of
>> independent Flink jobs on the same environment instance.
>> I only create the LocalExecutionEnvironment once.
>>
>> The web manager shows the job ID changing correctly every time a new job
>> is executed.
>>
>> Since it is the same execution environment (and therefore the same
>> cluster instance I imagine), those completed jobs should show as well, no?
>>
>> On Wed, 5 Sep 2018 at 18:40, Chesnay Schepler <ches...@apache.org> wrote:
>>
>>> When you create an environment that way, then the cluster is shutdown
>>> once the job completes.
>>> The WebUI can _appear_ as still working since all the files, and data
>>> about the job, is cached in the browser.
>>>
>>> On 05.09.2018 17:39, Miguel Coimbra wrote:
>>>
>>> Hello,
>>>
>>> I'm having difficulty reading the status (such as time taken for each
>>> dataflow operator in a job) of jobs that have completed.
>>>
>>> First, when I click on "Completed jobs" on the web interface (by default
>>> at 8081), no job shows up.
>>> I see jobs that exist as "Running", but as soon as they finish, I would
>>> expect them to appear in the "Complete jobs" section, but no luck.
>>>
>>> Consider that I am running locally (web UI is running, I checked and it
>>> is available via browser) on 8081.
>>> None of these links worked for checking jobs that have already finished,
>>> such as the job ID 618fac9da6ea458f5091a9c40e54cbcc that had been running:
>>>
>>> http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc
>>> http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc
>>>
>>> I'm running with a LocalExecutionEnvironment with with the method:
>>>
>>> ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf)
>>>
>>> I hope anyone may be able to help.
>>>
>>> Best,
>>>
>>>
>>>
>>>
>>
>

Re: REST: reading completed jobs' details

Reply via email to