Exactly, that was the problem. Didn't realize the restructured cluster channels all communications to the REST port.
Thanks again. Best, On Thu, 6 Sep 2018 at 17:57, Chesnay Schepler <ches...@apache.org> wrote: > Did you by chance use the RemoteEnvironment and pass in 6123 as the port? > If so, try using 8081 instead, which is the REST port. > > On 06.09.2018 18:24, Miguel Coimbra wrote: > > Hello Chesnay, > > Thanks for the information. > > Decided to move straight away to launching a standalone cluster. > I'm now having another problem when trying to submit a job through my Java > program after launching the standalone cluster. > > I configured the cluster (flink-1.6.0/conf/flink-conf.yaml) to use 2 > TaskManager instances and assigned port ranges for most Flink cluster > entities (to avoid port collisions with more than 1 TaskManager): > > query.server.ports: 30000-35000 > query.proxy.ports: 35001-40000 > taskmanager.rpc.port: 45001-50000 > taskmanager.data.port: 50001-55000 > blob.server.port: 55001-60000 > > I'm launching in Linux with: > > ./start-cluster.sh > > Starting cluster. > Starting standalonesession daemon on host xxxxxxx. > Starting taskexecutor daemon on host xxxxxxx. > [INFO] 1 instance(s) of taskexecutor are already running on xxxxxxx. > Starting taskexecutor daemon on host xxxxxxx. > > > However, my Java program ends up hanging as soon as I perform an execute() > call (for example by calling count() on a DataSet). > > Checking the JobManager log, I find the following exception whenever my > Java program calls execute() over the ExecutionEnvironment (either using > Maven on the terminal or from IntelliJ IDEA): > > WARN akka.remote.transport.netty.NettyTransport - > Remote connection to [/127.0.0.1:47774] failed with > org.apache.flink.shaded.akka.org.jboss.netty.handler.codec.frame.TooLongFrameException: > Adjusted frame length exceeds 10485760: 1347375960 - discarded > > I checked that the problem is happening on a count(), so I don't think it > has to do with the JobManager/TaskManagers trying to exchange > excessively-big messages. > > While searching, I tried to make sure my program compiles with the same > library versions as those in this cluster version of Flink. > > > I downloaded the Apache Flink 1.6 binaries to launch the cluster: > > > > https://www.apache.org/dyn/closer.lua/flink/flink-1.6.0/flink-1.6.0-bin-scala_2.11.tgz > > > I then checked the library versions used in the pom.xml of the 1.6.0 > branch of the Flink repository: > > > https://github.com/apache/flink/blob/release-1.6/pom.xml > > On my project's pom.xml, I have the following: > > <properties> > <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> > <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding> > <maven.compiler.source>1.8</maven.compiler.source> > <maven.compiler.target>1.8</maven.compiler.target> > <flink.version>1.6.0</flink.version> <slf4j.version>1.7.7</slf4j.version> > <log4j.version>1.2.17</log4j.version> > <scala.version>2.11.12</scala.version> > <scala.binary.version>2.11</scala.binary.version> > <akka.version>2.4.20</akka.version> > <junit.version>4.12</junit.version> > <junit.jupiter.version>5.0.0</junit.jupiter.version> > <junit.vintage.version>${junit.version}.1</junit.vintage.version> > <junit.platform.version>1.0.1</junit.platform.version> > <aspectj.version>1.9.1</aspectj.version></properties> > > > My project's dependency versions match those of the Flink 1.6 repository > (for libraries such as akka). > However, I'm having difficulty understanding what else may be causing this > problem. > > Thanks for your attention. > > Best, > > On Wed, 5 Sep 2018 at 20:18, Chesnay Schepler <ches...@apache.org> wrote: > >> No, the cluster isn't shared. For each job a separate cluster is spun up >> when calling execute(), at the end of which it is shut down. >> >> For explicitly creation and shutdown of a cluster I would suggest to >> execute your jobs as a test that contains a MiniClusterResource. >> >> On 05.09.2018 20:59, Miguel Coimbra wrote: >> >> Thanks for the reply. >> >> However, I think my case differs because I am running a sequence of >> independent Flink jobs on the same environment instance. >> I only create the LocalExecutionEnvironment once. >> >> The web manager shows the job ID changing correctly every time a new job >> is executed. >> >> Since it is the same execution environment (and therefore the same >> cluster instance I imagine), those completed jobs should show as well, no? >> >> On Wed, 5 Sep 2018 at 18:40, Chesnay Schepler <ches...@apache.org> wrote: >> >>> When you create an environment that way, then the cluster is shutdown >>> once the job completes. >>> The WebUI can _appear_ as still working since all the files, and data >>> about the job, is cached in the browser. >>> >>> On 05.09.2018 17:39, Miguel Coimbra wrote: >>> >>> Hello, >>> >>> I'm having difficulty reading the status (such as time taken for each >>> dataflow operator in a job) of jobs that have completed. >>> >>> First, when I click on "Completed jobs" on the web interface (by default >>> at 8081), no job shows up. >>> I see jobs that exist as "Running", but as soon as they finish, I would >>> expect them to appear in the "Complete jobs" section, but no luck. >>> >>> Consider that I am running locally (web UI is running, I checked and it >>> is available via browser) on 8081. >>> None of these links worked for checking jobs that have already finished, >>> such as the job ID 618fac9da6ea458f5091a9c40e54cbcc that had been running: >>> >>> http://127.0.0.1:8081/jobs/618fac9da6ea458f5091a9c40e54cbcc >>> http://127.0.0.1:8081/completed-jobs/618fac9da6ea458f5091a9c40e54cbcc >>> >>> I'm running with a LocalExecutionEnvironment with with the method: >>> >>> ExecutionEnvironment.createLocalEnvironmentWithWebUI(conf) >>> >>> I hope anyone may be able to help. >>> >>> Best, >>> >>> >>> >>> >> >