Kafka Offset commit failed on partition

2019-11-26 Thread PedroMrChaves
Hello, 

Since the last update to the universal Kafka connector, I'm getting the
following error fairly often.

/2019-11-18 15:42:52,689 ERROR
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer
clientId=consumer-10, groupId=srx-consumer-group] Offset commit failed on
partition events-4 at offset 38173628004: The request timed out.
 2019-11-18 15:42:52,707 WARN
org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher -
Committing offsets to Kafka failed. This does not compromise Flink's
checkpoints.
 2019-11-18 15:42:52,707 WARN
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Async
Kafka commit failed.
red 2019-11-18 15:42:52,689 ERROR
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator - [Consumer
clientId=consumer-10, groupId=srx-consumer-group] Offset commit failed on
partition events-4 at offset 38173628004: The request timed out.
yellow 2019-11-18 15:42:52,707 WARN
org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher -
Committing offsets to Kafka failed. This does not compromise Flink's
checkpoints.
yellow 2019-11-18 15:42:52,707 WARN
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase - Async
Kafka commit failed./

---


I've been investigating and I can't find what is causing this. We've been
monitoring several metrics like the recordsConsumed, fetch-size-avg and
fetch-rate but the values are the same when the error happens and when it
doesn't. So we know there isn't a peak of events or a larger fetched size
when the problem occurs. We also monitor other metrics like CPU, Memory,
GCs, Network IO, Network connections and Disk IO but we haven't found
anything out of the ordinary.  

Our job has two source nodes reading from two distinct kafka topics, the
problem happens on both source nodes.

*Flink Version: *1.8.2
*Kafka Version:* 2.3.0

*My kafka consumer Properties:*
/2019-11-14 16:51:20,142 INFO 
org.apache.kafka.clients.consumer.ConsumerConfig  -
ConsumerConfig values:
auto.commit.interval.ms = 5000
auto.offset.reset = latest
bootstrap.servers = [hidden]
check.crcs = true
client.id =
connections.max.idle.ms = 54
default.api.timeout.ms = 6
enable.auto.commit = false
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = srx-consumer-group
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 30
max.poll.records = 500
metadata.max.age.ms = 30
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 3
partition.assignment.strategy = [class
org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 3
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 6
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = SSL
send.buffer.bytes = 131072
session.timeout.ms = 1
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = /etc/pki/java/ca.ks
ssl.keystore.password = [hidden]
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = /etc/pki/java/ca.ks
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class
org.apache.kafka.common.serialization.ByteArrayDeserializer
/

*
Commit latency vs commits failed:*

 

We have checkpointing enabled (synchronous checkpoints every 10 secs). We
also have the job configured to 

Re: Checkpoints periodically fail with hdfs as the state backend - Could not flush and close the file system output stream

2019-05-22 Thread PedroMrChaves
Unfortunately the audit logs for hdfs were not enabled. We will enable them
and post he results when the problem happens again. Nonetheless, we don't
have ay other process using hadoop besides flink



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Checkpoints periodically fail with hdfs as the state backend - Could not flush and close the file system output stream

2019-05-21 Thread PedroMrChaves
The issue happened again.

/AsynchronousException{java.lang.Exception: Could not materialize checkpoint
47400 for operator ENRICH (1/4).}
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1153)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:947)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:884)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not materialize checkpoint 47400 for
operator ENRICH (1/4).
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942)
... 6 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
Could not flush and close the file system output stream to
hdfs:/flink/data/checkpoints/da596dd6909fe250ccf553f9c31e000c/chk-47400/72df57cb-2286-4c1a-ade6-79891f39b150
in order to obtain the stream state handle
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
at
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
... 5 more
Caused by: java.io.IOException: Could not flush and close the file system
output stream to
hdfs:/flink/data/checkpoints/da596dd6909fe250ccf553f9c31e000c/chk-47400/72df57cb-2286-4c1a-ade6-79891f39b150
in order to obtain the stream state handle
at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:326)
at
org.apache.flink.runtime.state.DefaultOperatorStateBackend$DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackend.java:767)
at
org.apache.flink.runtime.state.DefaultOperatorStateBackend$DefaultOperatorStateBackendSnapshotStrategy$1.callInternal(DefaultOperatorStateBackend.java:696)
at
org.apache.flink.runtime.state.AsyncSnapshotCallable.call(AsyncSnapshotCallable.java:76)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
... 7 more
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
does not exist:
/flink/data/checkpoints/da596dd6909fe250ccf553f9c31e000c/chk-47400/72df57cb-2286-4c1a-ade6-79891f39b150
(inode 188591918) Holder DFSClient_NONMAPREDUCE_-2031154839_1 does not have
any open files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2668)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:621)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:601)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2713)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:882)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:555)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at org.apache.hadoop.ipc.Client.call(Client.java:1345)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at

Re: Received fatal alert: certificate_unknown

2019-05-17 Thread PedroMrChaves
We found the issue.

It was using the DNSName for the certificate validation and we were
accessing via localhost.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Checkpoints periodically fail with hdfs as the state backend - Could not flush and close the file system output stream

2019-05-16 Thread PedroMrChaves
Hello Andrey,

The audit log doesn't have anything that would point to it being deleted.
The only thing worth mentioning is the following line.

/2019-05-15 10:01:39,082 INFO
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: BLOCK*
blk_1248714854_174974084 is COMMITTED but not COMPLETE(numNodes= 0 < 
minimum = 1) in file
/flink/data/checkpoints/76f7b4f5c679e8f2d822c9c3c73faf5d/chk-65912/68776faf-b687-403b-ba0c-17419f8684dc/

Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Checkpoints periodically fail with hdfs as the state backend - Could not flush and close the file system output stream

2019-05-16 Thread PedroMrChaves
Hello, 

Thanks for the help.
I've attached the logs. Our cluster has 2 job managers (HA) and 4 task
managers. 

logs.tgz

  

Regards,
Pedro



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Checkpoints periodically fail with hdfs as the state backend - Could not flush and close the file system output stream

2019-05-15 Thread PedroMrChaves
Hello,

Every once in a while our checkpoints fail with the following exception:

/AsynchronousException{java.lang.Exception: Could not materialize checkpoint
65912 for operator AGGREGATION-FILTER (2/2).}
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointExceptionHandler.tryHandleCheckpointException(StreamTask.java:1153)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:947)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:884)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.Exception: Could not materialize checkpoint 65912 for
operator AGGREGATION-FILTER (2/2).
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.handleExecutionException(StreamTask.java:942)
... 6 more
Caused by: java.util.concurrent.ExecutionException: java.io.IOException:
Could not flush and close the file system output stream to
hdfs:/flink/data/checkpoints/76f7b4f5c679e8f2d822c9c3c73faf5d/chk-65912/68776faf-b687-403b-ba0c-17419f8684dc
in order to obtain the stream state handle
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:53)
at
org.apache.flink.streaming.api.operators.OperatorSnapshotFinalizer.(OperatorSnapshotFinalizer.java:53)
at
org.apache.flink.streaming.runtime.tasks.StreamTask$AsyncCheckpointRunnable.run(StreamTask.java:853)
... 5 more
Caused by: java.io.IOException: Could not flush and close the file system
output stream to
hdfs:/flink/data/checkpoints/76f7b4f5c679e8f2d822c9c3c73faf5d/chk-65912/68776faf-b687-403b-ba0c-17419f8684dc
in order to obtain the stream state handle
at
org.apache.flink.runtime.state.filesystem.FsCheckpointStreamFactory$FsCheckpointStateOutputStream.closeAndGetHandle(FsCheckpointStreamFactory.java:328)
at
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:454)
at
org.apache.flink.runtime.state.DefaultOperatorStateBackend$1.performOperation(DefaultOperatorStateBackend.java:359)
at
org.apache.flink.runtime.io.async.AbstractAsyncCallableWithResources.call(AbstractAsyncCallableWithResources.java:75)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.flink.util.FutureUtil.runIfNotDoneAndGet(FutureUtil.java:50)
... 7 more
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
does not exist:
/flink/data/checkpoints/76f7b4f5c679e8f2d822c9c3c73faf5d/chk-65912/68776faf-b687-403b-ba0c-17419f8684dc
(inode 181723246) Holder DFSClient_NONMAPREDUCE_-10072319_1 does not have
any open files.
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2668)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFileInternal(FSDirWriteFileOp.java:621)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.completeFile(FSDirWriteFileOp.java:601)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.completeFile(FSNamesystem.java:2713)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.complete(NameNodeRpcServer.java:882)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.complete(ClientNamenodeProtocolServerSideTranslatorPB.java:555)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:447)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:989)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:847)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:790)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2486)

at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1489)
at org.apache.hadoop.ipc.Client.call(Client.java:1435)
at org.apache.hadoop.ipc.Client.call(Client.java:1345)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
at

Received fatal alert: certificate_unknown

2019-05-14 Thread PedroMrChaves
Every time that I access Flink's WEB UI I get the following exception:

/2019-05-14 12:31:47,837 WARN 
org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint- Unhandled
exception
org.apache.flink.shaded.netty4.io.netty.handler.codec.DecoderException:
javax.net.ssl.SSLException: Received fatal alert: certificate_unknown
at
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:459)
at
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:392)
at
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInputClosed(ByteToMessageDecoder.java:359)
at
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.channelInactive(ByteToMessageDecoder.java:342)
at
org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.channelInactive(SslHandler.java:1028)
at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.fireChannelInactive(AbstractChannelHandlerContext.java:224)
at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline$HeadContext.channelInactive(DefaultChannelPipeline.java:1429)
at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:245)
at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannelHandlerContext.invokeChannelInactive(AbstractChannelHandlerContext.java:231)
at
org.apache.flink.shaded.netty4.io.netty.channel.DefaultChannelPipeline.fireChannelInactive(DefaultChannelPipeline.java:947)
at
org.apache.flink.shaded.netty4.io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:822)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404)
at
org.apache.flink.shaded.netty4.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
at
org.apache.flink.shaded.netty4.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884)
at java.lang.Thread.run(Thread.java:748)
Caused by: javax.net.ssl.SSLException: Received fatal alert:
certificate_unknown
at sun.security.ssl.Alerts.getSSLException(Alerts.java:208)
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1647)
at sun.security.ssl.SSLEngineImpl.fatal(SSLEngineImpl.java:1615)
at sun.security.ssl.SSLEngineImpl.recvAlert(SSLEngineImpl.java:1781)
at
sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1070)
at
sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:896)
at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766)
at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
at
org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler$SslEngineType$3.unwrap(SslHandler.java:294)
at
org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1275)
at
org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1177)
at
org.apache.flink.shaded.netty4.io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1221)
at
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489)
at
org.apache.flink.shaded.netty4.io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428)
... 17 more/


Our SSL setup:

security.ssl.internal.enabled: true
security.ssl.internal.keystore:  /etc/pki/java/flink.keystore
security.ssl.internal.keystore-password: 
security.ssl.internal.key-password: 
security.ssl.internal.truststore:  /etc/pki/java/flink.truststore
security.ssl.internal.truststore-password: 

security.ssl.rest.enabled: true
security.ssl.rest.keystore:  /etc/pki/java/flink.keystore
security.ssl.rest.keystore-password: 
security.ssl.rest.key-password: 
security.ssl.rest.truststore:  /etc/pki/java/flink.truststore
security.ssl.rest.truststore-password: 
security.ssl.rest.authentication-enabled: false

security.ssl.verify-hostname: false

Our truststore contains the CA certificate and the keystore contains the
issued certificate and the private key entry, as recommended. 

Flink version: 1.7.2

--
Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
Sent from: 

Re: CodeCache is full - Issues with job deployments

2018-12-11 Thread PedroMrChaves
Hello Stefan,

Thank you for the reply.

I've taken a heap dump from a development cluster using jmap and analysed
it. To do the tests we restarted the cluster and then left a job running for
a few minutes. After that, we restarted the job a couple of times and
stopped it. After leaving the cluster with no running jobs for 20 min we
toke a heap dump.

We've found out that a thread which consumes data from kafka was still
running with a lot of finalizer calls as depicted bellow. 



 

I will deploy a job without a Kafka consumer to see if the code cache still
increases  (all of our cluster have problems with the code cache,
coincidentally all of the deployed jobs read from kafka).


Best Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


CodeCache is full - Issues with job deployments

2018-12-11 Thread PedroMrChaves
Hello,

Every time I deploy a flink job the code cache increases, which is expected.
However, when I stop and start the job or it restarts the code cache
continuous to increase.

Screenshot_2018-12-11_at_11.png

  


I've added the flags "-XX:+PrintCompilation -XX:ReservedCodeCacheSize=350m
-XX:-UseCodeCacheFlushing" to Flink taskmanagers and jobmanagers, but the
cache doesn't decrease very much, as it is depicted in the screenshot above.
Even if I stop all the jobs, the cache doesn't decrease. 

This gets to a point where I get the error "CodeCache is full. Compiler has
been disabled".

I've attached the taskmanagers output with the "XX:+PrintCompilation" flag
activated.

flink-flink-taskexecutor.out

  

Flink: 1.6.2
Java:  openjdk version "1.8.0_191"

Best Regards,
Pedro Chaves.




-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Get savepoint status fails - Flink 1.6.2

2018-11-15 Thread PedroMrChaves
Hello,

I've tried with different (jobId, triggerId) pairs but it doesn't work.


Regards,
Pedro Chaves.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Get savepoint status fails - Flink 1.6.2

2018-11-13 Thread PedroMrChaves
Hello,

I am trying to get the status for a savepoint using the rest api but the GET
request is failing with an error as depicted bellow.

/curl -k
https://localhost:8081/jobs/c78511cf0dc10c1e9f7db17566522d5b/savepoints/51c174eab1efd2c1354282f52f37fadb
{"errors":["Operation not found under key:
org.apache.flink.runtime.rest.handler.job.AsynchronousJobOperationKey@848b6c1c"]}
/

Additionally, the commands:
/GET request to /jobs/:jobid/cancel-with-savepoint/ 
GEt request to /jobs/:jobid/cancel-with-savepoint/in-progress/:requestId/

no longer work, but they are mentioned in the docs
(https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#job-cancellation)

How am I able to monitor the savepoints in version  1.6.2?


Best Regards,
Pedro.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Could not retrieve the redirect address - No REST endpoint has been started

2018-09-25 Thread PedroMrChaves
Hello,

Thank you for the reply.

The problem sometimes happens when there is a jobmanager failover. I've
attached the jobmanager logs for further debugging. 

flink-flink-jobmanager-1-demchcep00-01.log

  

Thank you and Regards,
Pedro Chaves. 



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: ***UNCHECKED*** Error while confirming Checkpoint

2018-09-24 Thread PedroMrChaves
Hello Stefan, 

Thank you for the help.

I've actually lost those logs to due several cluster restarts that we did,
which cause log rotation up (limit = 5 versions).
Those log lines that i've posted were the only ones that showed signs of
some problem. 

*The configuration of the job is as follows:*

/ private static final int DEFAULT_MAX_PARALLELISM = 16;
private static final int CHECKPOINTING_INTERVAL = 1000;
private static final int MIN_PAUSE_BETWEEN_CHECKPOINTS = 1000;
private static final int CHECKPOINT_TIMEOUT = 6;
private static final int INTERVAL_BETWEEN_RESTARTS = 120; 
(...)

  environment.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
environment.setMaxParallelism(DEFAULT_MAX_PARALLELISM);
environment.enableCheckpointing(CHECKPOINTING_INTERVAL,
CheckpointingMode.EXACTLY_ONCE);
   
environment.getCheckpointConfig().setMinPauseBetweenCheckpoints(MIN_PAUSE_BETWEEN_CHECKPOINTS);
   
environment.getCheckpointConfig().setCheckpointTimeout(CHECKPOINT_TIMEOUT);
environment.setRestartStrategy(RestartStrategies.noRestart());
environment.setParallelism(parameters.getInt(JOB_PARALLELISM));/
*
the kafka consumer/producer configuration is:*
/
properties.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
properties.put("max.request.size","1579193");
properties.put("processing.guarantee","exactly_once");
properties.put("isolation.level","read_committed");/



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


***UNCHECKED*** Error while confirming Checkpoint

2018-09-19 Thread PedroMrChaves
Hello,

I have a running Flink job that reads data form one Kafka topic, applies
some transformations and writes data back into another Kafka topic. The job
sometimes restarts due to the following error:

/java.lang.RuntimeException: Error while confirming checkpoint
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1260)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: checkpoint completed, but no
transaction pending
at
org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
at
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:258)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyOfCompletedCheckpoint(AbstractUdfStreamOperator.java:130)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:650)
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1255)
... 5 more
2018-09-18 22:00:10,716 INFO 
org.apache.flink.runtime.executiongraph.ExecutionGraph- Could not
restart the job Alert_Correlation (3c60b8670c81a629716bb2e42334edea) because
the restart strategy prevented it.
java.lang.RuntimeException: Error while confirming checkpoint
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1260)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: checkpoint completed, but no
transaction pending
at
org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
at
org.apache.flink.streaming.api.functions.sink.TwoPhaseCommitSinkFunction.notifyCheckpointComplete(TwoPhaseCommitSinkFunction.java:258)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.notifyOfCompletedCheckpoint(AbstractUdfStreamOperator.java:130)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:650)
at org.apache.flink.runtime.taskmanager.Task$3.run(Task.java:1255)
... 5 more/

My state is very small for this particular job, just a few KBs.


 


Flink Version: 1.4.2
State Backend: hadoop 2.8

Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Flink JMX Metrics

2018-09-10 Thread PedroMrChaves
Hello,

I've upgraded my cluster to version 1.5.3 from 1.4.2. After the upgrade I
notice that some of the metrics reported via JMX, like the number of running
jobs, are missing.


I've listed all of the domains and this is what I have:
/$>domains
#following domains are available
JMImplementation
com.sun.management
java.lang
java.nio
java.util.logging
org.apache.flink.jobmanager.Status.JVM.CPU.Load
org.apache.flink.jobmanager.Status.JVM.CPU.Time
org.apache.flink.jobmanager.Status.JVM.ClassLoader.ClassesLoaded
org.apache.flink.jobmanager.Status.JVM.ClassLoader.ClassesUnloaded
org.apache.flink.jobmanager.Status.JVM.GarbageCollector.PS_MarkSweep.Count
org.apache.flink.jobmanager.Status.JVM.GarbageCollector.PS_MarkSweep.Time
org.apache.flink.jobmanager.Status.JVM.GarbageCollector.PS_Scavenge.Count
org.apache.flink.jobmanager.Status.JVM.GarbageCollector.PS_Scavenge.Time
org.apache.flink.jobmanager.Status.JVM.Memory.Direct.Count
org.apache.flink.jobmanager.Status.JVM.Memory.Direct.MemoryUsed
org.apache.flink.jobmanager.Status.JVM.Memory.Direct.TotalCapacity
org.apache.flink.jobmanager.Status.JVM.Memory.Heap.Committed
org.apache.flink.jobmanager.Status.JVM.Memory.Heap.Max
org.apache.flink.jobmanager.Status.JVM.Memory.Heap.Used
org.apache.flink.jobmanager.Status.JVM.Memory.Mapped.Count
org.apache.flink.jobmanager.Status.JVM.Memory.Mapped.MemoryUsed
org.apache.flink.jobmanager.Status.JVM.Memory.Mapped.TotalCapacity
org.apache.flink.jobmanager.Status.JVM.Memory.NonHeap.Committed
org.apache.flink.jobmanager.Status.JVM.Memory.NonHeap.Max
org.apache.flink.jobmanager.Status.JVM.Memory.NonHeap.Used
org.apache.flink.jobmanager.Status.JVM.Threads.Count
org.apache.flink.jobmanager.job.downtime
org.apache.flink.jobmanager.job.fullRestarts
org.apache.flink.jobmanager.job.lastCheckpointAlignmentBuffered
org.apache.flink.jobmanager.job.lastCheckpointDuration
org.apache.flink.jobmanager.job.lastCheckpointExternalPath
org.apache.flink.jobmanager.job.lastCheckpointRestoreTimestamp
org.apache.flink.jobmanager.job.lastCheckpointSize
org.apache.flink.jobmanager.job.numberOfCompletedCheckpoints
org.apache.flink.jobmanager.job.numberOfFailedCheckpoints
org.apache.flink.jobmanager.job.numberOfInProgressCheckpoints
org.apache.flink.jobmanager.job.restartingTime
org.apache.flink.jobmanager.job.totalNumberOfCheckpoints
org.apache.flink.jobmanager.job.uptime/

But the numRunningJobs is not there.

Was there any change on the metrics that are reported? 

Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Could not retrieve the redirect address - No REST endpoint has been started

2018-08-02 Thread PedroMrChaves
Hello,

It happens whether the WEB UI is opened or not and it no longer works.
When this happens I have to restart the job managers.

regards,
Pedro.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Could not retrieve the redirect address - No REST endpoint has been started

2018-08-01 Thread PedroMrChaves
Hello,

I have a running standalone Flink cluster with 2 task managers and 2 job
manages (one task manager and job manager per machine). 
Sometimes, when I restart the cluster I get the following error message: 
/
java.util.concurrent.CompletionException:
org.apache.flink.util.FlinkException: No REST endpoint has been started for
the JobManager.
at
java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at
java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at
java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:593)
at
java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577)
at
java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474)
at
java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1977)
at
org.apache.flink.runtime.concurrent.FutureUtils$1.onComplete(FutureUtils.java:442)
at akka.dispatch.OnComplete.internal(Future.scala:258)
at akka.dispatch.OnComplete.internal(Future.scala:256)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:186)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:183)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at
org.apache.flink.runtime.concurrent.Executors$DirectExecutionContext.execute(Executors.java:83)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at scala.concurrent.Promise$class.complete(Promise.scala:55)
at
scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:157)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:237)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:36)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:63)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:78)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at
scala.concurrent.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:55)
at
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at
scala.concurrent.BatchingExecutor$Batch.run(BatchingExecutor.scala:54)
at
scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
at
scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:106)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
at
scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:44)
at
scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:252)
at akka.pattern.PromiseActorRef.$bang(AskSupport.scala:534)
at akka.remote.DefaultMessageDispatcher.dispatch(Endpoint.scala:97)
at
akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:982)
at akka.actor.Actor$class.aroundReceive(Actor.scala:502)
at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:446)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:526)
at akka.actor.ActorCell.invoke(ActorCell.scala:495)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:257)
at akka.dispatch.Mailbox.run(Mailbox.scala:224)
at akka.dispatch.Mailbox.exec(Mailbox.scala:234)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.util.FlinkException: No REST endpoint has been
started for the JobManager./

which prevents the access to the web interface. 

AM using version 1.4.2

Any idea on what might be causing this?

Regards,
Pedro.





-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Substasks - Uneven allocation

2018-04-20 Thread PedroMrChaves
That is only used to split the load across all of the subtasks, which am
already doing.
It is not related with the allocation.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Substasks - Uneven allocation

2018-04-18 Thread PedroMrChaves
Hello,

I have a job that has one async operational node (i.e. implements
AsyncFunction). This Operational node will spawn multiple threads that
perform heavy tasks (cpu bound). 

I have a Flink Standalone cluster deployed on two machines of 32 cores and
128 gb of RAM, each machine has one task manager and one Job Manager. When I
deploy the job, all of the subtasks from the async operational node end up
on the same machine, which causes it to have a much higher cpu load then the
other. 

I've researched ways to overcome this issue, but I haven't found a solution
to my problem. 
Ideally, the subtasks would be evenly split across both machines. 

Can this problem be solved somehow? 

Regards,
Pedro Chaves. 



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Metric Registry Warnings

2018-03-20 Thread PedroMrChaves
I have multiple sources but with distinct names and UIDs.

More information about my execution environment:

Flink Version: 1.4.2 bundled with hadoop 2.8
State backend: Hadoop 2.8
Job compiled for version 1.4.2 using the Scala version libs from Scala
version 2.11. 

Am using the com.github.davidb to export the metrics to influxdb as
exemplified here:
https://github.com/jgrier/flink-stuff/tree/master/flink-influx-reporter but
recompiled for version 1.4.2.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Error while reporting metrics - ConcorrentModificationException

2018-03-20 Thread PedroMrChaves
Hello,

I have the following error while trying to report metrics to influxdb using
the DropwizardReporter.

2018-03-20 13:51:00,288 WARN 
org.apache.flink.runtime.metrics.MetricRegistryImpl   - Error while
reporting metrics
java.util.ConcurrentModificationException
at
java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:719)
at 
java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:742)
at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
at java.util.HashSet.(HashSet.java:120)
at
org.apache.kafka.common.internals.PartitionStates.partitionSet(PartitionStates.java:65)
at
org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedPartitions(SubscriptionState.java:298)
at
org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$ConsumerCoordinatorMetrics$1.measure(ConsumerCoordinator.java:906)
at 
org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:61)
at 
org.apache.kafka.common.metrics.KafkaMetric.value(KafkaMetric.java:52)
at
org.apache.flink.streaming.connectors.kafka.internals.metrics.KafkaMetricWrapper.getValue(KafkaMetricWrapper.java:35)
at
org.apache.flink.streaming.connectors.kafka.internals.metrics.KafkaMetricWrapper.getValue(KafkaMetricWrapper.java:26)
at
org.apache.flink.dropwizard.metrics.FlinkGaugeWrapper.getValue(FlinkGaugeWrapper.java:36)
at
metrics_influxdb.measurements.MeasurementReporter.fromGauge(MeasurementReporter.java:163)
at
metrics_influxdb.measurements.MeasurementReporter.report(MeasurementReporter.java:55)
at
org.apache.flink.dropwizard.ScheduledDropwizardReporter.report(ScheduledDropwizardReporter.java:231)
at
org.apache.flink.runtime.metrics.MetricRegistryImpl$ReporterTask.run(MetricRegistryImpl.java:417)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

Any ideas on what might be the problem?




-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Metric Registry Warnings

2018-03-20 Thread PedroMrChaves
Hello,

I still have the same issue with Flink Version 1.4.2.

java.lang.IllegalArgumentException: A metric named
.taskmanager.6aa8d13575228d38ae4abdfb37fa229e.CDC.Source:
EVENTS.1.numRecordsIn already exists
at
com.codahale.metrics.MetricRegistry.register(MetricRegistry.java:91)
at
org.apache.flink.dropwizard.ScheduledDropwizardReporter.notifyOfAddedMetric(ScheduledDropwizardReporter.java:131)
at
org.apache.flink.runtime.metrics.MetricRegistryImpl.register(MetricRegistryImpl.java:319)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:370)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:303)
at
org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:293)
at
org.apache.flink.runtime.metrics.groups.OperatorIOMetricGroup.(OperatorIOMetricGroup.java:40)
at
org.apache.flink.runtime.metrics.groups.OperatorMetricGroup.(OperatorMetricGroup.java:48)
at
org.apache.flink.runtime.metrics.groups.TaskMetricGroup.addOperator(TaskMetricGroup.java:146)
at
org.apache.flink.streaming.api.operators.AbstractStreamOperator.setup(AbstractStreamOperator.java:182)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.setup(AbstractUdfStreamOperator.java:82)
at
org.apache.flink.streaming.runtime.tasks.OperatorChain.(OperatorChain.java:136)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:231)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
at java.lang.Thread.run(Thread.java:748)



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Getting warning messages (...hdfs.DataStreamer - caught exception) while running Flink with Hadoop as the state backend

2018-03-01 Thread PedroMrChaves
No. It is just a log message with no apparent side effects. 



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Getting warning messages (...hdfs.DataStreamer - caught exception) while running Flink with Hadoop as the state backend

2018-03-01 Thread PedroMrChaves
While my flink job is running I keep getting the following warning message in
the log:

/2018-02-23 09:08:11,681 WARN  org.apache.hadoop.hdfs.DataStreamer  

- Caught exception
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at
org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:927)
at
org.apache.hadoop.hdfs.DataStreamer.endBlock(DataStreamer.java:578)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:755)
But the job keeps running, apparently without issues./

Flink Version: 1.4.0 bundled with Hadoop 2.8

Hadoop version: 2.8.3

Any ideas on what might be the problem?



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Best way to setup different log files for distinct jobs

2017-10-10 Thread PedroMrChaves
Hello,

I'm using logback as my logging framework. I would like to setup Flink so
that each job outputs to a different file. Any Ideas on how could I do that? 

I am running flink in a standalone cluster with version 1.3.2.

Regards,
Pedro Chaves.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: Could not initialize keyed state backend.

2017-09-18 Thread PedroMrChaves
Hello,

I thought that the checkpoints would be propagated to all the machines in
the cluster when using a local filesystem. 

Thank you,
Regards.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: FlinkKafkaConsumer010 - Memory Issue

2017-09-18 Thread PedroMrChaves
Hello,

Sorry for the delay.

The buffer memory of the Kafka consumer was piling up. Once I updated to the
1.3.2 version the problem no longer occurred. 

Pedro.



-
Best Regards,
Pedro Chaves
--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Re: FlinkKafkaConsumer010 - Memory Issue

2017-07-25 Thread PedroMrChaves
Hello,

Thank you for the reply. 

The problem is not that the task manager uses a lot of memory, the problem
is that every time I cancel and re-submit the Job the task manager does not
release the previously allocated memory.

Regards,
Pedro Chaves.



-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaConsumer010-Memory-Issue-tp14342p14445.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


FlinkKafkaConsumer010 - Memory Issue

2017-07-19 Thread PedroMrChaves
Hello,

Whenever I submit a job to Flink that retrieves data from Kafka the memory
consumption continuously increases. I've changed the max heap memory from
2gb, to 4gb, even to 6gb but the memory consumption keeps reaching the
limit. 

An example of a simple Job that shows this behavior is depicted bellow.  

/  /*
 * Execution Environment Setup
 */
final StreamExecutionEnvironment environment =
getGlobalJobConfiguration(configDir, configurations);

/**
 * Collect event data from Kafka
 */
DataStreamSource s = environment.addSource(new
FlinkKafkaConsumer010(
configurations.get(ConfigKeys.KAFKA_INPUT_TOPIC), 
new SimpleStringSchema(),
getKafkaConfiguration(configurations)));

s.filter(new FilterFunction() {
@Override
public boolean filter(String value) throws Exception {
return false;
}
}).print();

private static Properties getKafkaConfiguration(ParameterTool
configurations) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers",
configurations.get(ConfigKeys.KAFKA_HOSTS));
properties.put("group.id",
"flink-consumer-"+UUID.randomUUID().toString());
properties.put("value.deserializer",
"org.apache.kafka.common.serialization.StringDeserializer");
properties.put("security.protocol",
configurations.get(ConfigKeys.KAFKA_SECURITY_PROTOCOL));
properties.put("ssl.truststore.location",
configurations.get(ConfigKeys.KAFKA_SSL_TRUSTSTORE_LOCATION));
properties.put("ssl.truststore.password",
configurations.get(ConfigKeys.KAFKA_SSL_TRUSTSTORE_PASSWORD));
properties.put("ssl.keystore.location",
configurations.get(ConfigKeys.KAFKA_SSL_KEYSTORE_LOCATION));
properties.put("ssl.keystore.password",
configurations.get(ConfigKeys.KAFKA_SSL_KEYSTORE_PASSWORD));
return properties;
}
/


Moreover, when I stop the job, the task manager does not terminate the kafka
connection and the memory is kept allocated. To stop this, I have to kill
the task manager process. 

*My Flink version: 1.2.1
Kafka consumer: 010
Kafka version: 2_11_0.10.1.0-2*

I've activated the /taskmanager.debug.memory.startLogThread/ property to
output for every 5 seconds and attached the log with the results.

The output of free -m before submitting the job:
/  totalusedfree  shared  buff/cache  
available
Mem:  15817 245   14755  24 816  
15121
Swap: 0   0   0/

after having the job running for about 5 min:
 free -m
/  totalusedfree  shared  buff/cache  
available
Mem:  1581798195150  24 847   
5547
Swap: 0   0   0
/

taskmanager.log

  





-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaConsumer010-Memory-Issue-tp14342.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Can AsyncFunction be applied to connected streams

2017-07-07 Thread PedroMrChaves
Hello,

I wanted to keep the data locally, if I associate the fetched metadata with
eachevent (in an enrichment phase) it would considerably increase their size
since the metadata that I need to process the event in the async I/O is to
large.

Regards,
Pedro. 



-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-AsyncFunction-be-applied-to-connected-streams-tp14137p14148.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Can AsyncFunction be applied to connected streams

2017-07-06 Thread PedroMrChaves
Hello,

Is there a way to apply the AsyncFunction to connected streams like in a
CoFlatMap?
I would like to connect streams from different types and process one of them
based on the state
created by the other in an asynchronous fashion.

Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Can-AsyncFunction-be-applied-to-connected-streams-tp14137.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Isolate Tasks - Run Distinct Tasks in Different Task Managers

2017-03-14 Thread PedroMrChaves
Can YARN provide task isolation?




-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Isolate-Tasks-Run-Distinct-Tasks-in-Different-Task-Managers-tp12104p12201.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Remove Accumulators at runtime

2017-03-08 Thread PedroMrChaves
Hi,

I'm building a system that maintains a set of rules that can be dynamically
added/removed. I wanted to count every element that matched each rule in an
accumulator ( I have several parallel instances). If the rule is removed so
should the accumulator.





-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Remove-Accumulators-at-runtime-tp12106p12119.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Isolate Tasks - Run Distinct Tasks in Different Task Managers

2017-03-08 Thread PedroMrChaves
Thanks for the response.

I would like to assure that the map operator is not in the same task manager
as the window/apply operator, regardless of the number of slots of each task
manager. 



-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Isolate-Tasks-Run-Distinct-Tasks-in-Different-Task-Managers-tp12104p12118.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Remove Accumulators at runtime

2017-03-08 Thread PedroMrChaves
Hello,

We can add an accumulator using the following call:
getRuntimeContext().addAccumulator(NAME, ACCUMULATOR);

Is there a way to remove the added accumulators at runtime? 

Regards,
Pedro Chaves



-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Remove-Accumulators-at-runtime-tp12106.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Isolate Tasks - Run Distinct Tasks in Different Task Managers

2017-03-08 Thread PedroMrChaves
Hello,

Assuming that I have the following Job Graph,
   
(Source) -> (map) -> (KeyBy | Window | apply) -> (Sink)

Is there a way to assure that the map operator (and all its subtasks) run on
a different
task manager than the operator (map | window | apply)?

This would allow JVM memory isolation without using YARN.

Regards,
Pedro





-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Isolate-Tasks-Run-Distinct-Tasks-in-Different-Task-Managers-tp12104.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Problem - Negative currentWatermark if the watermark assignment is made before connecting the streams

2016-11-29 Thread PedroMrChaves
Hi Vinay ,

I'm simply using Netbeans Debugger.

Regards,
Pedro



-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-Negative-currentWatermark-if-the-watermark-assignment-is-made-before-connecting-the-streams-tp10315p10381.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-11-24 Thread PedroMrChaves
The best answer I can give you is the one given in the post. Currently,
there is no way of dynamically changing the patterns.
The only way would be to dive into Flink's core code and change the way
operators are shipped to the cluster.

On Thu, Nov 24, 2016 at 3:34 PM, kaelumania [via Apache Flink User Mailing
List archive.]  wrote:

> I also found this really interesting post
>
> http://stackoverflow.com/questions/40018199/flink-and-
> dynamic-templates-recognition
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/What-is-the-best-way-to-load-add-patterns-
> dynamically-at-runtime-with-Flink-tp9461p10326.html
> To unsubscribe from What is the best way to load/add patterns dynamically
> (at runtime) with Flink?, click here
> 
> .
> NAML
> 
>




-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-is-the-best-way-to-load-add-patterns-dynamically-at-runtime-with-Flink-tp9461p10328.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Problem - Negative currentWatermark if the watermark assignment is made before connecting the streams

2016-11-23 Thread PedroMrChaves
Hello,

I have an application which has two different streams of data, one
represents a set of events and the other a set of rules that need to be
matched against the events. In order to do this I use a coFlatMapOperator.
The problem is that if I assign the timestamps and watermarks after the
streams have been connected everything works fine but if I do it before, I
get a negative *currentwatermark* at the window and the operations on
windows have no effect. What could be the problem?

If I assign *Before *the connect:


 

If I assign *After *the connect:


 

*Main *Code:

/DataStream sourceStream = environment
.addSource(new SampleDataGenerator(sourceData,
true)).name("Source").setParallelism(1)
.assignTimestampsAndWatermarks(new TimestampAssigner());
*// if I assign the timestamps here the watermak seen at the window is
negative and the operations are not applied*

DataStream rulesStream = environment
.socketTextStream(monitorAddress, monitorPort,
DELIMITER)
.name("Rules Stream")
.setParallelism(1);


SplitStream processedStream =
sourceStream.connect(rulesStream)
.flatMap(new
RProcessor(rulesPath)).name("RBProcessor").setParallelism(1)
//.assignTimestampsAndWatermarks(new
DynamicTimestampAssigner()).name("Assign Timestamps").setParallelism(1) *//
If I assign the watermarks here everything works fine*
.split(new Spliter());

  processedStream
.select(RuleOperations.WINDOW_AGGRATION)
.keyBy(new DynamicKeySelector())
.window(new DynamicSlidingWindowAssigner())
.apply(new AggregationOperation()).name("Aggregation
Operation").setParallelism(1)
.print().name("Windowed Rule Output").setParallelism(1);

(..omitted details..)/

*
Timestamps and watermarks* assigner:
/
public class TimestampAssigner implements
AssignerWithPeriodicWatermarks {

private final long MAX_DELAY = 2000; // 2 seconds
private long currentMaxTimestamp;
private long lastEmittedWatermark = Long.MIN_VALUE;

@Override
public long extractTimestamp(CSVEvent element, long
previousElementTimestamp) {
long timestamp =
Long.parseLong(element.event.get(element.getTimeField()));
currentMaxTimestamp = Math.max(timestamp, currentMaxTimestamp);
return timestamp;
}

@Override
public Watermark getCurrentWatermark() {
// return the watermark as current highest timestamp minus the
out-of-orderness bound
long potentialWM = currentMaxTimestamp - MAX_DELAY;
if (potentialWM >= lastEmittedWatermark) {
lastEmittedWatermark = potentialWM;
}
return new Watermark(lastEmittedWatermark);
}

}/

Regards,
Pedro Chaves





-
Best Regards,
Pedro Chaves
--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Problem-Negative-currentWatermark-if-the-watermark-assignment-is-made-before-connecting-the-streams-tp10315.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Processing streams of events with unpredictable delays

2016-11-09 Thread PedroMrChaves
Hello,

I have a stream source of events. Each event is assigned a timestamp by the
machine that generated the event and then those events are retreived by
other machines (collectors). Finally those collectors will send the events
to flink. In flink, when I receive those events I extract their timestamps
and process them in a windowed fashion.

The problem is that the event timestamps are unpredictable because the
collectors can fail. When a collector fails and restarts it will keep
sending the events that it didn't sent before , so those events can have a
delay of many hours or days (depending on how much time the collector was
down). 

I am trying to think of a way for processing those delayed events. As a
first approach I could allow an arbitrary lateness (when assigning
watermarks) and when an event arrives late I still can process it if it is
within the max lateness. The problem is that the collectors are very
unpredictable and I can't set an arbitrary lateness of several days because
the memory consumption would keep growing. 

So I'm trying to figure out a way to recover the events when a collector
stops and restarts. All the events that arrive to my flink job are stored in
a persistent storage. So if a collector restarts, I can retrieve the events
that belong to the same timewindow as the late events. The problem is that I
need to keep processing those late events in the same way I would if they
where arriving on time, but I don't know how can I do that with Flink or if
its even possible. 

Depicted in the figure bellow is an an example of my use case. 


 

Events A,B,C,D,E,F,G arrive on time. Then the collector fails and when it
restarts it sends
the events H,I,J,K,L,M  that where generated much earlier than the current
time.

Regards,
Pedro Chaves



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Processing-streams-of-events-with-unpredictable-delays-tp10016.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-11-03 Thread PedroMrChaves
Hi,

Thank you for the response.

Can you give me an example?
I'm new to Flink and I still don't understand all the constructs.

I also read this article
https://techblog.king.com/rbea-scalable-real-time-analytics-king/. They use
a similar approach, but am still not understanding how assign windows.

Regards,
Pedro Chaves



On Thu, Nov 3, 2016 at 6:02 PM, Aljoscha Krettek [via Apache Flink User
Mailing List archive.] <ml-node+s2336050n9882...@n4.nabble.com> wrote:

> Hi Pedro,
> you can have dynamic windows by assigning the windows to elements in your
> Processor (so you would need to extend that type to have a field for the
> window). Then, you can write a custom WindowAssigner that will simply get
> the window from an event and assign that as the internal window.
>
> Please let me know if you need more details.
>
> Cheers,
> Aljoscha
>
> On Thu, 3 Nov 2016 at 18:40 PedroMrChaves <[hidden email]
> <http:///user/SendEmail.jtp?type=node=9882=0>> wrote:
>
>> Hello,
>>
>> Your tip was very helpful and I took a similar approach.
>>
>> I have something like this:
>> class Processor extends RichCoFlatMapFunction<Event, Rule, String> {
>> public void flatMap1(Event event, Collector out) {
>>  process(event,out); // run the javscript (rules)  against the
>> incoming events
>> }
>>
>> public void flatMap2(Rule rule , Collector out) {
>>   // We add the rule to the list of existing rules
>>   addNewRule(rule)
>> }
>> }
>>
>> But know I face a new challenge, I don't have access to the windowed
>> constructs of flink and I can't dynamically create new window aggregations
>> inside the flatMap. At least not that I know of.
>>
>> Did you face a similar problem? Any Ideas?
>>
>> Thank you and regards,
>> Pedro Chaves
>>
>>
>>
>> --
>> View this message in context: http://apache-flink-user-
>> mailing-list-archive.2336050.n4.nabble.com/What-is-the-
>> best-way-to-load-add-patterns-dynamically-at-runtime-with-
>> Flink-tp9461p9876.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.
>>
>
>
> --
> If you reply to this email, your message will be added to the discussion
> below:
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/What-is-the-best-way-to-load-add-patterns-
> dynamically-at-runtime-with-Flink-tp9461p9882.html
> To unsubscribe from What is the best way to load/add patterns dynamically
> (at runtime) with Flink?, click here
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=9461=cGVkcm8ubXIuY2hhdmVzQGdtYWlsLmNvbXw5NDYxfDE1MjI4MzU3MDg=>
> .
> NAML
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-is-the-best-way-to-load-add-patterns-dynamically-at-runtime-with-Flink-tp9461p9891.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.

Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-11-03 Thread PedroMrChaves
Hello,

Your tip was very helpful and I took a similar approach.

I have something like this:
class Processor extends RichCoFlatMapFunction {
public void flatMap1(Event event, Collector out) {
 process(event,out); // run the javscript (rules)  against the
incoming events
}

public void flatMap2(Rule rule , Collector out) {
  // We add the rule to the list of existing rules
  addNewRule(rule)
}
}

But know I face a new challenge, I don't have access to the windowed
constructs of flink and I can't dynamically create new window aggregations
inside the flatMap. At least not that I know of. 

Did you face a similar problem? Any Ideas?

Thank you and regards,
Pedro Chaves



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-is-the-best-way-to-load-add-patterns-dynamically-at-runtime-with-Flink-tp9461p9876.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Best Practices/Advice - Execution of jobs

2016-11-03 Thread PedroMrChaves
Thank you.



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Best-Practices-Advice-Execution-of-jobs-tp9822p9873.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: NotSerializableException: jdk.nashorn.api.scripting.NashornScriptEngine

2016-11-02 Thread PedroMrChaves
Hello,

I'm having the exact same problem.
I'm using a filter function on a datastream.
My flink version is 1.1.3.

What could be the problem? 


Regards,
Pedro Chaves.



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/NotSerializableException-jdk-nashorn-api-scripting-NashornScriptEngine-tp1496p9834.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Best Practices/Advice - Execution of jobs

2016-11-02 Thread PedroMrChaves
Hello,

I'm trying to build a stream event correlation engine with Flink and I have
some questions regarding the for the execution of jobs. 

In my architecture I need to have different sources of data, lets say for
instance:
/firewallStream= environment.addSource([FirewalLogsSource]);
proxyStream = environment.addSource([ProxyLogsSource]);
/
and for each of these sources, I need to apply a set of rules. 
So lets say I have a job that has as a source the proxy stream data with the
following rules:

///Abnormal Request Method
stream.[RuleLogic].addSink([output])
//Web Service on Non-Typical Port
stream.[RuleLogic].addSink([output])
//Possible Brute Force 
stream.[RuleLogic].addSink([output])/

These rules will probably scale to be in the order of 15 to 20 rules.

What is the best approach in this case:
1. Should I create 2 jobs one for each source and each job would have the
15-20 rules?
2. Should I split the rules into several jobs?
3. Other options?


Thank you and Regards,
Pedro Chaves.





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Best-Practices-Advice-Execution-of-jobs-tp9822.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Elasticsearch sink: Java.lang.NoSuchMethodError: org.elasticsearch.common.settings.Settings.settingsBuilder

2016-10-28 Thread PedroMrChaves
Hello,

I am using Flink to write data to elasticsearch.

Flink version : 1.1.3
Elasticsearch version: 2.4.1

But I am getting the following error:

1/0/28/2016 18:58:56 Job execution switched to status FAILING.
java.lang.NoSuchMethodError:
org.elasticsearch.common.settings.Settings.settingsBuilder()Lorg/elasticsearch/common/settings/Settings$Builder;
at
org.apache.flink.streaming.connectors.elasticsearch2.ElasticsearchSink.open(ElasticsearchSink.java:162)
at
org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:38)
at
org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:91)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:376)
at
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:256)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585)
at java.lang.Thread.run(Thread.java:745)/

This is the code I use to configure the sink (similar to the taxi ride
example in
https://www.elastic.co/blog/building-real-time-dashboard-applications-with-apache-flink-elasticsearch-and-kibana)
 

/private void elasticSink() {
Map config = new HashMap<>();
// This instructs the sink to emit after every element, otherwise
they would be buffered
config.put("bulk.flush.max.actions", "10");
config.put("cluster.name", "elasticdemo");

List transports = new ArrayList<>();
try {
transports.add(new
InetSocketAddress(InetAddress.getByName("localhost"), 9200));
} catch (UnknownHostException ex) {
Logger.getLogger(CEPEngine.class.getName()).log(Level.SEVERE,
null, ex);
}

stream.addSink(new ElasticsearchSink<>(
config,
transports,
new AccessDataInsert()));

}/

What could be the problem?

Regards,
Pedro Chaves




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Elasticsearch-sink-Java-lang-NoSuchMethodError-org-elasticsearch-common-settings-Settings-settingsBur-tp9773.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: FlinkKafkaConsumerBase - Received confirmation for unknown checkpoint

2016-10-27 Thread PedroMrChaves
Hello,

I Am using the version 1.2-SNAPSHOT. 
I will try with a stable version to see if the problem persists.

Regards,
Pedro Chaves. 



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaConsumerBase-Received-confirmation-for-unknown-checkpoint-tp9674p9749.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


FlinkKafkaConsumerBase - Received confirmation for unknown checkpoint

2016-10-21 Thread PedroMrChaves
Hello,

Am getting the following warning upon executing a checkpoint 

/2016-10-21 16:31:54,229 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering
checkpoint 5 @ 1477063914229
2016-10-21 16:31:54,233 INFO 
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Completed
checkpoint 5 (in 3 ms)
2016-10-21 16:31:54,234 WARN 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  -
Received confirmation for unknown checkpoint id 5/

This is the code I have to setup the environment and the kafka consumer:

 / /**
 * Flink execution environment configuration
 */
private void setupEnvironmnet() {
environment = StreamExecutionEnvironment.getExecutionEnvironment();
environment.enableCheckpointing(CHECKPOINTING_INTERVAL);
tableEnvironment =
TableEnvironment.getTableEnvironment(environment);

}

/**
 * Kafka Consumer configuration
 */
private void kafkaConsumer(String server, String topic) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", server);
properties.setProperty("group.id", "Demo");
stream = environment.addSource(new FlinkKafkaConsumer09<>(topic, new
SimpleStringSchema(), properties))
.map(new Parser());
}/


Any idea what the problem might be?

Thank you and regards,
Pedro Chaves



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/FlinkKafkaConsumerBase-Received-confirmation-for-unknown-checkpoint-tp9674.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Flink SQL Stream Parser based on calcite

2016-10-18 Thread PedroMrChaves
Thank you.
Great presentation about the high-level translation process.

Regards,
Pedro



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-SQL-Stream-Parser-based-on-calcite-tp9592p9608.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Flink SQL Stream Parser based on calcite

2016-10-17 Thread PedroMrChaves
Thank you for the response. 

I'm not understanding where does something like this,

/SELECT * WHERE action='denied' /

gets translated to something similar in the Flink Stream API,

/filter.(new FilterFunction() {
public boolean filter(Event event) {
return event.action.equals("denied");
}
});/

or if that happens at all. My idea was to extend the library to support
other unsupported
calls like (TUMBLE -> timeWindow) but it's probably more complex than what
I'm thinking. 

Regards.




--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-SQL-Stream-Parser-based-on-calcite-tp9592p9596.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Flink SQL Stream Parser based on calcite

2016-10-17 Thread PedroMrChaves
Hello,

I am pretty new to Apache Flink.

I am trying to figure out how does Flink parses an Apache Calcite sql query
to its own Streaming API in order to maybe extend it, because, as far as I
know, many operations are still being developed and not currently supported
(like TUMBLE windows). I need to be able to load rules from a file , like
so:

/tableEnv.sql([File])../

in order to do that I need a fully functional Streaming SQL parser. 

I am currently analyzing the StreamTableEnvironment class from github [1] in
order to understand the method sql but I can't figure out where does the
parsing happens.

Can someone point me in the right direction? 


[1] 
https://github.com/apache/flink/blob/d7b59d761601baba6765bb4fc407bcd9fd6a9387/flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/StreamTableEnvironment.scala

  

Best Regards,
Pedro Chaves





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-SQL-Stream-Parser-based-on-calcite-tp9592.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


JsonMappingException: No content to map due to end-of-input

2016-10-13 Thread PedroMrChaves
Hello,

I recently started programming with Apache Flink API. I am trying to get
input directly 
from kafka in a JSON format with the following code:

/private void kafkaConsumer(String server, String topic) {
Properties properties = new Properties();
properties.setProperty("bootstrap.servers", server);
properties.setProperty("group.id", "Demo");
stream = environment.addSource(new 
FlinkKafkaConsumer09<>(topic, new
JSONDeserializationSchema(), properties))
.map(new MapFunction() {
@Override
public Event map(ObjectNode value) 
throws Exception {
return new 
Event(Integer.parseInt(value.get("id").asText()),
value.get("user").asText(),

value.get("action").asText(), value.get("ip").asText());
}
});
}/

But I alwys get the following error:


/17:56:46,335 ERROR org.apache.flink.runtime.taskmanager.Task   
 
- Task execution failed.
com.fasterxml.jackson.databind.JsonMappingException: No content to map due
to end-of-input
 at [Source: [B@69a90966; line: 1, column: 1]
at
com.fasterxml.jackson.databind.JsonMappingException.from(JsonMappingException.java:148)
at
com.fasterxml.jackson.databind.ObjectMapper._initForReading(ObjectMapper.java:3095)
at
com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:3036)
at
com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:2215)
at
org.apache.flink.streaming.util.serialization.JSONDeserializationSchema.deserialize(JSONDeserializationSchema.java:38)
at
org.apache.flink.streaming.util.serialization.JSONDeserializationSchema.deserialize(JSONDeserializationSchema.java:30)
at
org.apache.flink.streaming.util.serialization.KeyedDeserializationSchemaWrapper.deserialize(KeyedDeserializationSchemaWrapper.java:39)
at
org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher.run(Kafka09Fetcher.java:227)
at java.lang.Thread.run(Thread.java:745)/

What am I doing wrong?

Attached follows the JSON sample that I am using. 

Thank you and Regards.

log.json

  



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/JsonMappingException-No-content-to-map-due-to-end-of-input-tp9536.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Error with table sql query - No match found for function signature TUMBLE(, )

2016-10-13 Thread PedroMrChaves
I have this so far:

result = eventData
.filter(new FilterFunction(){
public boolean filter(Event event){
return 
event.action.equals("denied");
}
})
.keyBy(0)
.timeWindow(Time.seconds(10))
.apply("???")
.filter(new FilterFunction(){
public boolean filter(Integer count){
return count.intValue() > 5
}
});

I don't know what can I put in the apply function in order to the count of
pairs (user,ip ) whose action="denied.

Regards.



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-with-table-sql-query-No-match-found-for-function-signature-TUMBLE-TIME-INTERVAL-DAY-TIME-tp9497p9526.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Error with table sql query - No match found for function signature TUMBLE(, )

2016-10-13 Thread PedroMrChaves
Hi, 

Thanks for the response.
What would be the easiest way to do this query using the DataStream API?

Thank you.





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-with-table-sql-query-No-match-found-for-function-signature-TUMBLE-TIME-INTERVAL-DAY-TIME-tp9497p9523.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Error with table sql query - No match found for function signature TUMBLE(, )

2016-10-12 Thread PedroMrChaves
Hello,

I am trying to build an query using the StreamTableEnvironment API. I Am
trying to build this queries with tableEnvironment.sql("QUERY") so that I
can in the future load those queries from a file. 

Code source:

Table accesses = tableEnvironment.sql 
("SELECT STREAM TUMBLE_END(rowtime, INTERVAL 
'1' HOUR) AS rowtime,
user,ip "
+ "FROM eventData "
+ "WHERE action='denied' "
+ "GROUP BY TUMBLE(rowtime, 
INTERVAL '1' HOUR) user,ip"
+ " HAVING COUNT(user,ip) > 5");

But I always get the following error:

/Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1,
column 120 to line 1, column 153: No match found for function signature
TUMBLE(, )
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405)
at
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:765)
at
org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:753)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:3929)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.handleUnresolvedFunction(SqlValidatorImpl.java:1544)
at
org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:278)
at
org.apache.calcite.sql.SqlFunction.deriveType(SqlFunction.java:222)
at
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4266)
at
org.apache.calcite.sql.validate.SqlValidatorImpl$DeriveTypeVisitor.visit(SqlValidatorImpl.java:4253)
at org.apache.calcite.sql.SqlCall.accept(SqlCall.java:135)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveTypeImpl(SqlValidatorImpl.java:1462)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.deriveType(SqlValidatorImpl.java:1445)
at org.apache.calcite.sql.SqlNode.validateExpr(SqlNode.java:233)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateGroupClause(SqlValidatorImpl.java:3305)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect(SqlValidatorImpl.java:2959)
at
org.apache.calcite.sql.validate.SelectNamespace.validateImpl(SelectNamespace.java:60)
at
org.apache.calcite.sql.validate.AbstractNamespace.validate(AbstractNamespace.java:86)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace(SqlValidatorImpl.java:845)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery(SqlValidatorImpl.java:831)
at org.apache.calcite.sql.SqlSelect.validate(SqlSelect.java:208)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression(SqlValidatorImpl.java:807)
at
org.apache.calcite.sql.validate.SqlValidatorImpl.validate(SqlValidatorImpl.java:523)
at
org.apache.flink.api.table.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:84)
... 10 more
Caused by: org.apache.calcite.sql.validate.SqlValidatorException: No match
found for function signature TUMBLE(, )
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405)
at
org.apache.calcite.runtime.Resources$ExInst.ex(Resources.java:514)
/

What am I doing wrong?

Regards.



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Error-with-table-sql-query-No-match-found-for-function-signature-TUMBLE-TIME-INTERVAL-DAY-TIME-tp9497.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-12 Thread PedroMrChaves
I've been thinking in several options to solve this problem:

1. I can use Flink savepoints in order to save the application state ,
change the jar file and submit a new job (as the new jar file with the
patterns added/changed). The problem in this case is to be able to correctly
handle the savepoints and because I must stop and start the job, the events
will be delayed.

2. I can compile java code at runtime using using java compiler library I
don't know if this would be a viable solution.

3. I can use a scripting language like you did, but I would lose the ability
to use the native Flink library which is available in scala and java. 





--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-is-the-best-way-to-load-add-patterns-dynamically-at-runtime-with-Flink-tp9461p9470.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


What is the best way to load/add patterns dynamically (at runtime) with Flink?

2016-10-11 Thread PedroMrChaves
Hello,

I am new to Apache Flink and am trying to build a CEP using Flink's API. One
of the requirements is the ability to add/change patterns at runtime for
anomaly detection (maintaining the systems availability). Any Ideas of how
could I do that?

For instance, If I have a stream of security events (accesses,
authentications ,etc.) and a pattern for detecting anomalies I would like to
be able to change that pattern parameters, for instance instead of detecting
the occurrence of events A->B->C I would like to change the condition on B
to B’ in order to have a new rule. Moreover, I would like to be able to
create new patterns dynamically as new use cases arise. 

Best Regards,
Pedro Chaves
 



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/What-is-the-best-way-to-load-add-patterns-dynamically-at-runtime-with-Flink-tp9461.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.