Hello,
this is the task manage log but it does not change after I run the
program. I think the Flink planner has problem with my program. It can
not even start the job.
Best,
Alieh
018-12-10 12:20:20,386 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner -
--------------------------------------------------------------------------------
2018-12-10 12:20:20,387 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Starting
TaskManager (Version: 1.6.0, Rev:ff472b4, Date:07.08.2018 @ 13:31:13 UTC)
2018-12-10 12:20:20,387 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - OS current
user: alieh
2018-12-10 12:20:20,609 WARN org.apache.hadoop.util.NativeCodeLoader
- Unable to load native-hadoop library for your platform... using
builtin-java classes where applicable
2018-12-10 12:20:20,768 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Current
Hadoop/Kerberos user: alieh
2018-12-10 12:20:20,769 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM: Java
HotSpot(TM) 64-Bit Server VM - Oracle Corporation - 1.8/25.161-b12
2018-12-10 12:20:20,769 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum heap
size: 922 MiBytes
2018-12-10 12:20:20,769 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JAVA_HOME:
/usr/lib/jvm/java-8-oracle
2018-12-10 12:20:20,774 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Hadoop
version: 2.4.1
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - JVM Options:
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -XX:+UseG1GC
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Xms922M
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - -Xmx922M
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner -
-XX:MaxDirectMemorySize=8388607T
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner -
-Dlog.file=/home/alieh/flink-1.6.0/log/flink-alieh-taskexecutor-0-alieh-P67A-D3-B3.log
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner -
-Dlog4j.configuration=file:/home/alieh/flink-1.6.0/conf/log4j.properties
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner -
-Dlogback.configurationFile=file:/home/alieh/flink-1.6.0/conf/logback.xml
2018-12-10 12:20:20,775 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Program
Arguments:
2018-12-10 12:20:20,776 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - --configDir
2018-12-10 12:20:20,776 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner -
/home/alieh/flink-1.6.0/conf
2018-12-10 12:20:20,776 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Classpath:
/home/alieh/flink-1.6.0/lib/flink-python_2.11-1.6.0.jar:/home/alieh/flink-1.6.0/lib/flink-shaded-hadoop2-uber-1.6.0.jar:/home/alieh/flink-1.6.0/lib/log4j-1.2.17.jar:/home/alieh/flink-1.6.0/lib/slf4j-log4j12-1.7.7.jar:/home/alieh/flink-1.6.0/lib/flink-dist_2.11-1.6.0.jar:::
2018-12-10 12:20:20,776 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner -
--------------------------------------------------------------------------------
2018-12-10 12:20:20,777 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Registered UNIX
signal handlers for [TERM, HUP, INT]
2018-12-10 12:20:20,785 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - Maximum number
of open file descriptors is 1048576.
2018-12-10 12:20:20,803 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.address, localhost
2018-12-10 12:20:20,803 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.rpc.port, 6123
2018-12-10 12:20:20,803 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: jobmanager.heap.size, 1024m
2018-12-10 12:20:20,803 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.heap.size, 1024m
2018-12-10 12:20:20,803 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: taskmanager.numberOfTaskSlots, 1
2018-12-10 12:20:20,803 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: parallelism.default, 1
2018-12-10 12:20:20,804 INFO
org.apache.flink.configuration.GlobalConfiguration - Loading
configuration property: rest.port, 8081
2018-12-10 12:20:20,912 INFO
org.apache.flink.runtime.security.modules.HadoopModule - Hadoop user set
to alieh (auth:SIMPLE)
2018-12-10 12:20:21,131 WARN org.apache.flink.configuration.Configuration
- Config uses deprecated configuration key 'jobmanager.rpc.address'
instead of proper key 'rest.address'
2018-12-10 12:20:21,135 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils - Trying to
select the network interface and address to use by connecting to the leading
JobManager.
2018-12-10 12:20:21,136 INFO
org.apache.flink.runtime.util.LeaderRetrievalUtils - TaskManager
will try to connect for 10000 milliseconds before falling back to heuristics
2018-12-10 12:20:21,145 INFO org.apache.flink.runtime.net.ConnectionUtils
- Retrieved new target address localhost/127.0.0.1:6123.
2018-12-10 12:20:21,204 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerRunner - TaskManager
will use hostname/address 'alieh-P67A-D3-B3' (127.0.1.1) for communication.
2018-12-10 12:20:21,208 INFO
org.apache.flink.runtime.rpc.akka.AkkaRpcServiceUtils - Starting
AkkaRpcService at alieh-p67a-d3-b3:0.
2018-12-10 12:20:21,805 INFO akka.event.slf4j.Slf4jLogger
- Slf4jLogger started
2018-12-10 12:20:21,898 INFO akka.remote.Remoting
- Starting remoting
2018-12-10 12:20:22,091 INFO akka.remote.Remoting
- Remoting started; listening on addresses
:[akka.tcp://flink@alieh-p67a-d3-b3:44267]
2018-12-10 12:20:22,117 INFO
org.apache.flink.runtime.metrics.MetricRegistryImpl - No metrics
reporter configured, no metrics will be exposed/reported.
2018-12-10 12:20:22,124 INFO org.apache.flink.runtime.blob.PermanentBlobCache
- Created BLOB cache storage directory
/tmp/blobStore-32ec7a05-737e-4b46-b716-3a0831683c47
2018-12-10 12:20:22,127 INFO org.apache.flink.runtime.blob.TransientBlobCache
- Created BLOB cache storage directory
/tmp/blobStore-4b33c843-b7d3-45dc-814f-850e8c6be21a
2018-12-10 12:20:22,136 INFO
org.apache.flink.runtime.io.network.netty.NettyConfig - NettyConfig
[server address: alieh-P67A-D3-B3/127.0.1.1, server port: 0, ssl enabled:
false, memory segment size (bytes): 32768, transport type: NIO, number of
server threads: 1 (manual), number of client threads: 1 (manual), server
connect backlog: 0 (use Netty's default), client connect timeout (sec): 120,
send/receive buffer size (bytes): 0 (use Netty's default)]
2018-12-10 12:20:22,166 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerServices - Temporary file
directory '/tmp': total 450 GB, usable 91 GB (20.22% usable)
2018-12-10 12:20:22,211 INFO
org.apache.flink.runtime.io.network.buffer.NetworkBufferPool - Allocated 102
MB for network buffer pool (number of memory segments: 3278, bytes per segment:
32768).
2018-12-10 12:20:22,256 INFO
org.apache.flink.runtime.query.QueryableStateUtils - Could not load
Queryable State Client Proxy. Probable reason: flink-queryable-state-runtime is
not in the classpath. To enable Queryable State, please move the
flink-queryable-state-runtime jar from the opt to the lib folder.
2018-12-10 12:20:22,256 INFO
org.apache.flink.runtime.query.QueryableStateUtils - Could not load
Queryable State Server. Probable reason: flink-queryable-state-runtime is not
in the classpath. To enable Queryable State, please move the
flink-queryable-state-runtime jar from the opt to the lib folder.
2018-12-10 12:20:22,257 INFO
org.apache.flink.runtime.io.network.NetworkEnvironment - Starting the
network environment and its components.
2018-12-10 12:20:22,289 INFO
org.apache.flink.runtime.io.network.netty.NettyClient - Successful
initialization (took 31 ms).
2018-12-10 12:20:22,325 INFO
org.apache.flink.runtime.io.network.netty.NettyServer - Successful
initialization (took 35 ms). Listening on SocketAddress /127.0.1.1:46127.
2018-12-10 12:20:22,326 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerServices - Limiting
managed memory to 0.7 of the currently free heap space (640 MB), memory will be
allocated lazily.
2018-12-10 12:20:22,329 INFO
org.apache.flink.runtime.io.disk.iomanager.IOManager - I/O manager
uses directory /tmp/flink-io-4f10dc60-3805-4c50-85a1-497c99dfb20c for spill
files.
2018-12-10 12:20:22,387 INFO
org.apache.flink.runtime.taskexecutor.TaskManagerConfiguration - Messages have
a max timeout of 10000 ms
2018-12-10 12:20:22,394 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService
- Starting RPC endpoint for
org.apache.flink.runtime.taskexecutor.TaskExecutor at
akka://flink/user/taskmanager_0 .
2018-12-10 12:20:22,406 INFO
org.apache.flink.runtime.taskexecutor.JobLeaderService - Start job
leader service.
2018-12-10 12:20:22,407 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor - Connecting to
ResourceManager
akka.tcp://flink@localhost:6123/user/resourcemanager(00000000000000000000000000000000).
2018-12-10 12:20:22,409 INFO org.apache.flink.runtime.filecache.FileCache
- User file cache uses directory
/tmp/flink-dist-cache-058052c5-36cc-432f-88eb-8acf7dc5f1f1
2018-12-10 12:20:22,743 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor - Resolved
ResourceManager address, beginning registration
2018-12-10 12:20:22,743 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor - Registration at
ResourceManager attempt 1 (timeout=100ms)
2018-12-10 12:20:22,814 INFO
org.apache.flink.runtime.taskexecutor.TaskExecutor - Successful
registration at resource manager
akka.tcp://flink@localhost:6123/user/resourcemanager under registration id
ba9dd638db7ebccde63a3e0df420a990.
On 12/10/2018 12:14 PM, Piotr Nowojski wrote:
Hi,
Have you checked task managers logs?
Piotrek
On 8 Dec 2018, at 12:23, Alieh <sae...@informatik.uni-leipzig.de
<mailto:sae...@informatik.uni-leipzig.de>> wrote:
Hello Piotrek,
thank you for your answer. I installed a Flink on a local cluster and
used the GUI in order to monitor the task managers. It seems the
program *d**oes not start at all*. The whole time just the job
manager is struggling... For very very toy examples, after a long
time (during this time I see the job manager logs as I mentioned
before), the job is started and can be executed in 2 seconds.
Best,
Alieh
On 12/07/2018 10:43 AM, Piotr Nowojski wrote:
Hi,
Please investigate logs/standard output/error from the task manager that has
failed (the logs that you showed are from job manager). Probably there is some
obvious error/exception explaining why has it failed. Most common reasons:
- out of memory
- long GC pause
- seg fault or other error from some native library
- task manager killed via for example SIGKILL
Piotrek
On 6 Dec 2018, at 17:34, Alieh<sae...@informatik.uni-leipzig.de> wrote:
Hello all,
I have an algorithm x () which contains several joins and usage of 3 times of
gelly ConnectedComponents. The problem is that if I call x() inside a script
more than three times, I receive the messages listed below in the log and the
program is somehow stopped. It happens even if I run it with a toy example of a
graph with less that 10 vertices. Do you have any clue what is the problem?
Cheers,
Alieh
129149 [flink-akka.actor.default-dispatcher-20] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger
heartbeat request.
129149 [flink-akka.actor.default-dispatcher-20] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Trigger
heartbeat request.
129150 [flink-akka.actor.default-dispatcher-20] DEBUG
org.apache.flink.runtime.taskexecutor.TaskExecutor - Received heartbeat
request from e80ec35f3d0a04a68000ecbdc555f98b.
129150 [flink-akka.actor.default-dispatcher-22] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Received
heartbeat from 78cdd7a4-0c00-4912-992f-a2990a5d46db.
129151 [flink-akka.actor.default-dispatcher-22] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Received
new slot report from TaskManager 78cdd7a4-0c00-4912-992f-a2990a5d46db.
129151 [flink-akka.actor.default-dispatcher-22] DEBUG
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Received
slot report from instance 4c3e3654c11b09fbbf8e993a08a4c2da.
129200 [flink-akka.actor.default-dispatcher-15] DEBUG
org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Release
TaskExecutor 4c3e3654c11b09fbbf8e993a08a4c2da because it exceeded the idle
timeout.
129200 [flink-akka.actor.default-dispatcher-15] DEBUG
org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - Worker
78cdd7a4-0c00-4912-992f-a2990a5d46db could not be stopped.