[jira] [Commented] (FLINK-14197) Increasing trend for state size of keyed stream using ProcessWindowFunction with ProcessingTimeSessionWindows

2019-09-24 Thread Ken Krugler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16937009#comment-16937009
 ] 

Ken Krugler commented on FLINK-14197:
-

On the Flink user mailing list, [this 
thread|http://mail-archives.apache.org/mod_mbox/flink-user/201702.mbox/%3c58c0a1c0-8715-4614-ae45-d96afbce3...@mediamath.com%3e]
 seems relevant. 

Also, I didn't see this question on the mailing list. That's where you 
typically want to start, and only file a bug report in Jira once committers 
have provided some input on whether your issue is in fact a bug, and if so, the 
root cause, thanks.

> Increasing trend for state size of keyed stream using ProcessWindowFunction 
> with ProcessingTimeSessionWindows
> -
>
> Key: FLINK-14197
> URL: https://issues.apache.org/jira/browse/FLINK-14197
> Project: Flink
>  Issue Type: Bug
>  Components: Runtime / Checkpointing, Runtime / State Backends
>Affects Versions: 1.9.0
> Environment: Tested with:
>  * Local Flink Mini Cluster running from IDE
>  * Flink standalone cluster run in docker
>Reporter: Oliver Kostera
>Priority: Major
>
> I'm using *ProcessWindowFunction* in a keyed stream with the following 
> definition:
> {code:java}
> final SingleOutputStreamOperator processWindowFunctionStream 
> =
> 
> keyedStream.window(ProcessingTimeSessionWindows.withGap(Time.milliseconds(100)))
> .process(new 
> CustomProcessWindowFunction()).uid(PROCESS_WINDOW_FUNCTION_OPERATOR_ID)
> .name("Process window function");
> {code}
> My checkpointing configuration is set to use RocksDB state backend with 
> incremental checkpointing and EXACTLY_ONCE mode.
> In a runtime I noticed that even though data ingestion is static - same keys 
> and frequency of messages the size of the process window operator keeps 
> increasing. I tried to reproduce it with minimal similar setup here: 
> https://github.com/loliver1234/flink-process-window-function and was 
> successful to do so.
> Testing conditions:
> - RabbitMQ source with Exactly-once guarantee and 65k prefetch count
> - RabbitMQ sink to collect messages
> - Simple ProcessWindowFunction that only pass messages through
> - Stream time characteristic set to TimeCharacteristic.ProcessingTime
> Testing scenario:
> - Start flink job and check initial state size - State Size: 127 KB
> - Start sending messages, 1000 same unique keys every 1s (they are not 
> falling into defined time window gap set to 100ms, each message should create 
> new window)
> - State of the process window operator keeps increasing - after 1mln messages 
> state ended up to be around 2mb
> - Stop sending messages and wait till rabbit queue is fully consumed and few 
> checkpoints go by
> - Was expected to see state size to decrease to base value but it stayed at 
> 2mb
> - Continue to send messages with the same keys and state kept increasing 
> trend.
> What I checked:
> - Registration and deregistration of timestamps set for time windows - each 
> registration matched its deregistration
> - Checked that in fact there are no window merges
> - Tried custom Trigger disabling window merges and setting onProcessingTime 
> trigger to TriggerResult.FIRE_AND_PURGE - same state behavior
> On staging environment, we noticed that state for that operator keeps 
> increasing indefinitely, after some months reaching even 1,5gb for 100k 
> unique keys
> Flink commit id: 9c32ed9
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (FLINK-14056) AtOm+Electron+TypedE-Mail

2019-09-11 Thread Ken Krugler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-14056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16928011#comment-16928011
 ] 

Ken Krugler commented on FLINK-14056:
-

Seems like this user should get blocked...

> AtOm+Electron+TypedE-Mail
> -
>
> Key: FLINK-14056
> URL: https://issues.apache.org/jira/browse/FLINK-14056
> Project: Flink
>  Issue Type: Bug
>  Components: API / Type Serialization System
>Affects Versions: shaded-9.0
>Reporter: KORAY DURGUT 
>Priority: Major
> Fix For: shaded-8.0
>
> Attachments: BraveBrowserSetup.exe
>
>




--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Updated] (FLINK-13956) Add sequence file format with repeated sync blocks

2019-09-04 Thread Ken Krugler (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-13956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler updated FLINK-13956:

Description: 
The current {{SerializedOutputFormat}} produces files that are tightly bound to 
the block size of the filesystem. While this was a somewhat plausible 
assumption in the old HDFS days, it can lead to [hard to debug issues in other 
file 
systems|https://lists.apache.org/thread.html/bdd87cbb5eb7b19ab4be6501940ec5659e8f6ce6c27ccefa2680732c@%3Cdev.flink.apache.org%3E].

We could implement a file format similar to the current version of Hadoop's 
SequenceFileFormat: add a sync block in-between records whenever X bytes were 
written. Hadoop uses 2k, but I'd propose to use 1M.

  was:
The current {{SequenceFileFormat}} produces files that are tightly bound to the 
block size of the filesystem. While this was a somewhat plausible assumption in 
the old HDFS days, it can lead to [hard to debug issues in other file 
systems|https://lists.apache.org/thread.html/bdd87cbb5eb7b19ab4be6501940ec5659e8f6ce6c27ccefa2680732c@%3Cdev.flink.apache.org%3E].

We could implement a file format similar to the current version of Hadoop's 
SequenceFileFormat: add a sync block inbetween records whenever X bytes were 
written. Hadoop uses 2k, but I'd propose to use 1M.


> Add sequence file format with repeated sync blocks
> --
>
> Key: FLINK-13956
> URL: https://issues.apache.org/jira/browse/FLINK-13956
> Project: Flink
>  Issue Type: Improvement
>Reporter: Arvid Heise
>Priority: Minor
>
> The current {{SerializedOutputFormat}} produces files that are tightly bound 
> to the block size of the filesystem. While this was a somewhat plausible 
> assumption in the old HDFS days, it can lead to [hard to debug issues in 
> other file 
> systems|https://lists.apache.org/thread.html/bdd87cbb5eb7b19ab4be6501940ec5659e8f6ce6c27ccefa2680732c@%3Cdev.flink.apache.org%3E].
> We could implement a file format similar to the current version of Hadoop's 
> SequenceFileFormat: add a sync block in-between records whenever X bytes were 
> written. Hadoop uses 2k, but I'd propose to use 1M.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13844) AtOmXpLuS

2019-08-25 Thread Ken Krugler (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-13844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16915255#comment-16915255
 ] 

Ken Krugler commented on FLINK-13844:
-

Hi [~AtOmXpLuS] - please add more details, as currently this looks like a bogus 
issue. And why is it classified as a major bug? Thanks, Ken

> AtOmXpLuS
> -
>
> Key: FLINK-13844
> URL: https://issues.apache.org/jira/browse/FLINK-13844
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / Google Cloud PubSub
>Affects Versions: 1.9.0
>Reporter: KORAY DURGUT 
>Priority: Major
> Fix For: shaded-8.0
>
>
> Everything in AtOmXpLuS.CoM



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Commented] (FLINK-13472) taskmanager.jvm-exit-on-oom doesn't work reliably with YARN

2019-07-30 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16896510#comment-16896510
 ] 

Ken Krugler commented on FLINK-13472:
-

Hi [~pawelbartoszek] - In my experience, when a Flink workflow fails when 
running on YARN, often the Job Manager is left alive, so it looks like the job 
is still running, but it's only using a small amount of memory.

I don't know if this is a Flink bug or a YARN issue, but I'd suggest first 
discussing it on the user mailing list, before filing a bug in Jira, thanks.

> taskmanager.jvm-exit-on-oom doesn't work reliably with YARN
> ---
>
> Key: FLINK-13472
> URL: https://issues.apache.org/jira/browse/FLINK-13472
> Project: Flink
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.6.3
>Reporter: Pawel Bartoszek
>Priority: Major
>
> I have added *taskmanager.jvm-exit-on-oom* flag to the task manager starting 
> arguments. During my testing (simulating oom) I noticed that sometimes YARN 
> containers were still in RUNNING state even though they should haven been 
> killed on OutOfMemory errors with the flag on.
> I could find RUNNING containers with the last log lines like this. 
> {code:java}
> 2019-07-26 13:32:51,396 ERROR org.apache.flink.runtime.taskmanager.Task   
>   - Encountered fatal error java.lang.OutOfMemoryError - 
> terminating the JVM
> java.lang.OutOfMemoryError: Metaspace
>   at java.lang.ClassLoader.defineClass1(Native Method)
>   at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
>   at 
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>   at java.net.URLClassLoader.defineClass(URLClassLoader.java:468)
>   at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:369){code}
>  
> Does YARN make it tricky to forcefully kill JVM after OutOfMemory error? 
>  
> *Workaround*
>  
> When using -XX:+ExitOnOutOfMemoryError JVM flag containers get always 
> terminated!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)


[jira] [Comment Edited] (FLINK-13163) Support execution of batch jobs with fewer slots than requested

2019-07-10 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882022#comment-16882022
 ] 

Ken Krugler edited comment on FLINK-13163 at 7/10/19 12:55 PM:
---

Hi [~zhuzh] - thanks for your input on the source splits problem. I've found 
that in my batch jobs, limiting source parallelism seems to help reduce the 
number of failures. Is there a way to determine (via logs) whether my issue(s) 
are related?


was (Author: kkrugler):
Hi [~zhuzh] - thanks for this report, and the notes. I've found that in my 
batch jobs, limiting source parallelism seems to help reduce the number of 
failures. Is there a way to determine (via logs) whether my issue(s) are 
related?

> Support execution of batch jobs with fewer slots than requested
> ---
>
> Key: FLINK-13163
> URL: https://issues.apache.org/jira/browse/FLINK-13163
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Jeff Zhang
>Assignee: Till Rohrmann
>Priority: Major
> Fix For: 1.9.0
>
>
> Flink should be able to execute batch jobs with fewer slots than requested in 
> a sequential manner.
> At the moment, however, we register for every slot request a timeout which 
> fires after {{slot.request.timeout}} to fail the slot request. Moreover, we 
> fail the slot request if the {{ResourceManager}} fails to allocate new 
> resources or if the slot request times out on the {{ResourceManager}}. This 
> kind of behavior makes sense if we know that we need all requested slots so 
> that we fail early if it cannot be fulfilled.
> However, for batch jobs it is not strictly required that all slot requests 
> get fulfilled. It is enough to have at least one slot for every requested 
> {{ResourceProfile}} (the set of slots (available + allocated) must contain a 
> slot which can fulfill a slot request). If this is the case, then we should 
> not fail the slot request but instead wait until the slot gets assigned to 
> the request. If there is no such slot, then we should still time out the 
> request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-13163) Support execution of batch jobs with fewer slots than requested

2019-07-10 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-13163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16882022#comment-16882022
 ] 

Ken Krugler commented on FLINK-13163:
-

Hi [~zhuzh] - thanks for this report, and the notes. I've found that in my 
batch jobs, limiting source parallelism seems to help reduce the number of 
failures. Is there a way to determine (via logs) whether my issue(s) are 
related?

> Support execution of batch jobs with fewer slots than requested
> ---
>
> Key: FLINK-13163
> URL: https://issues.apache.org/jira/browse/FLINK-13163
> Project: Flink
>  Issue Type: Improvement
>  Components: Runtime / Coordination
>Affects Versions: 1.9.0
>Reporter: Jeff Zhang
>Assignee: Till Rohrmann
>Priority: Major
> Fix For: 1.9.0
>
>
> Flink should be able to execute batch jobs with fewer slots than requested in 
> a sequential manner.
> At the moment, however, we register for every slot request a timeout which 
> fires after {{slot.request.timeout}} to fail the slot request. Moreover, we 
> fail the slot request if the {{ResourceManager}} fails to allocate new 
> resources or if the slot request times out on the {{ResourceManager}}. This 
> kind of behavior makes sense if we know that we need all requested slots so 
> that we fail early if it cannot be fulfilled.
> However, for batch jobs it is not strictly required that all slot requests 
> get fulfilled. It is enough to have at least one slot for every requested 
> {{ResourceProfile}} (the set of slots (available + allocated) must contain a 
> slot which can fulfill a slot request). If this is the case, then we should 
> not fail the slot request but instead wait until the slot gets assigned to 
> the request. If there is no such slot, then we should still time out the 
> request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12522) Error While Initializing S3A FileSystem

2019-05-15 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16840806#comment-16840806
 ] 

Ken Krugler commented on FLINK-12522:
-

Hi Manish Bellani - please post to the Flink user list first, as that's where 
help is provided. Then if one of the community members identifies it as a bug, 
go and open a Jira issue. Thanks, Ken

> Error While Initializing S3A FileSystem
> ---
>
> Key: FLINK-12522
> URL: https://issues.apache.org/jira/browse/FLINK-12522
> Project: Flink
>  Issue Type: Bug
>  Components: Connectors / FileSystem
>Affects Versions: 1.7.2
> Environment: Kubernetes
> Docker
> Debian
> {color:#33}DockerImage: flink:1.7.2-hadoop28-scala_2.11{color}
> {color:#33}Java 1.8{color}
> {color:#33}Hadoop Version: 2.8.5{color}
>Reporter: Manish Bellani
>Priority: Major
>
> hey Friends,
> Thanks for all the work you have been doing on flink, i have been trying to 
> use BucketingSink (backed by S3AFileSystem) to write data to s3 and i'm 
> running into some issues (which i suspect could be dependency/packaging 
> related) that'd try to describe here.
> The data pipeline is quite simple:
> {noformat}
> Kafka -> KafkaConsumer (Source) -> BucketingSink (S3AFileSystem) -> 
> S3{noformat}
> *Environment:* 
> {noformat}
> Kubernetes
> Debian
> DockerImage: flink:1.7.2-hadoop28-scala_2.11
> Java 1.8
> Hadoop Version: 2.8.5{noformat}
> I followed this dependency secion: 
> [https://ci.apache.org/projects/flink/flink-docs-stable/ops/deployment/aws.html#flink-for-hadoop-27]
>  to place the dependencies under /opt/flink/lib (with an exception that my 
> hadoop version and it's dependencies that i pull in are different).
> Here are the dependencies i'm pulling in (excerpt from my Dockerfile) 
> {noformat}
> RUN cp /opt/flink/opt/flink-s3-fs-hadoop-1.7.2.jar 
> /opt/flink/lib/flink-s3-fs-hadoop-1.7.2.jar
> RUN wget -O /opt/flink/lib/hadoop-aws-2.8.5.jar 
> https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.8.5/hadoop-aws-2.8.5.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-s3-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-s3/1.10.6/aws-java-sdk-s3-1.10.6.jar
> #Transitive Dependency of aws-java-sdk-s3
> RUN wget -O /opt/flink/lib/aws-java-sdk-core-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-core/1.10.6/aws-java-sdk-core-1.10.6.jar
> RUN wget -O /opt/flink/lib/aws-java-sdk-kms-1.10.6.jar 
> http://central.maven.org/maven2/com/amazonaws/aws-java-sdk-kms/1.10.6/aws-java-sdk-kms-1.10.6.jar
> RUN wget -O /opt/flink/lib/jackson-annotations-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.5.3/jackson-annotations-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-core-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-core/2.5.3/jackson-core-2.5.3.jar
> RUN wget -O /opt/flink/lib/jackson-databind-2.5.3.jar 
> http://central.maven.org/maven2/com/fasterxml/jackson/core/jackson-databind/2.5.3/jackson-databind-2.5.3.jar
> RUN wget -O /opt/flink/lib/joda-time-2.8.1.jar 
> http://central.maven.org/maven2/joda-time/joda-time/2.8.1/joda-time-2.8.1.jar
> RUN wget -O /opt/flink/lib/httpcore-4.3.3.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpcore/4.3.3/httpcore-4.3.3.jar
> RUN wget -O /opt/flink/lib/httpclient-4.3.6.jar 
> http://central.maven.org/maven2/org/apache/httpcomponents/httpclient/4.3.6/httpclient-4.3.6.jar{noformat}
>  
> But when i submit the job, it throws this error during initialzation of 
> BucketingSink/S3AFileSystem: 
> {noformat}
> java.beans.IntrospectionException: bad write method arg count: public final 
> void 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)
>     at 
> java.beans.PropertyDescriptor.findPropertyType(PropertyDescriptor.java:657)
>     at 
> java.beans.PropertyDescriptor.setWriteMethod(PropertyDescriptor.java:327)
>     at java.beans.PropertyDescriptor.(PropertyDescriptor.java:139)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.createFluentPropertyDescritor(FluentPropertyBeanIntrospector.java:178)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect(FluentPropertyBeanIntrospector.java:141)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.fetchIntrospectionData(PropertyUtilsBean.java:2245)
>     at 
> org.apache.flink.fs.shaded.hadoop3.org.apache.commons.beanutils.PropertyUtilsBean.getIntrospectionData(PropertyUtilsBean.java:2226)
>     at 
> 

[jira] [Commented] (FLINK-12504) NullPoint here NullPointException there.. It's every where

2019-05-14 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16839718#comment-16839718
 ] 

Ken Krugler commented on FLINK-12504:
-

If I Google "Flink user mailing list", the first hit is for 
[https://flink.apache.org/community.html#mailing-lists]

On this page is a Subscribe link for the *user*@flink.apache.org mailing list.

> NullPoint here NullPointException there.. It's every where
> --
>
> Key: FLINK-12504
> URL: https://issues.apache.org/jira/browse/FLINK-12504
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Reporter: Chethan UK
>Priority: Major
>
> I was trying to push data from Kafka to Cassandra, after around 220K 
> sometimes, 300K points are pushed into C*, this 
> java.lang.NullPointerException throws in..
> ```
> java.lang.NullPointerException
>   at 
> org.apache.flink.api.common.functions.util.PrintSinkOutputWriter.write(PrintSinkOutputWriter.java:73)
>   at 
> org.apache.flink.streaming.api.functions.sink.PrintSinkFunction.invoke(PrintSinkFunction.java:81)
>   at 
> org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
>   at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:649)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:602)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>   at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>   at java.lang.Thread.run(Thread.java:748)
> ```
> How can normal Flink Users understand these error? The Job's keep failing and 
> it's very unstable to be considered in production... 
>  
> In RoadMap, is there plans to make Kotlin supported language as well?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12504) NullPoint here NullPointException there.. It's every where

2019-05-13 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838959#comment-16838959
 ] 

Ken Krugler commented on FLINK-12504:
-

Not sure why you say "It's not print". Your stacktrace ends with 
{{PrintSinkFunction}} calling {{PrintSinkOutputWriter}} and that's a sink that 
(from the JavaDocs) is an "Implementation of the SinkFunction writing every 
tuple to the standard output or standard error stream". Which you definitely 
don't want to do in a production workflow, that's only something you'd do for 
testing small workflows running locally.

In any case, I'd suggest closing this issue and re-posting to the Flink user 
list, thanks.

> NullPoint here NullPointException there.. It's every where
> --
>
> Key: FLINK-12504
> URL: https://issues.apache.org/jira/browse/FLINK-12504
> Project: Flink
>  Issue Type: Bug
>Reporter: Chethan UK
>Priority: Major
>
> I was trying to push data from Kafka to Cassandra, after around 220K 
> sometimes, 300K points are pushed into C*, this 
> java.lang.NullPointerException throws in..
> ```
> java.lang.NullPointerException
>   at 
> org.apache.flink.api.common.functions.util.PrintSinkOutputWriter.write(PrintSinkOutputWriter.java:73)
>   at 
> org.apache.flink.streaming.api.functions.sink.PrintSinkFunction.invoke(PrintSinkFunction.java:81)
>   at 
> org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
>   at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:649)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:602)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>   at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>   at java.lang.Thread.run(Thread.java:748)
> ```
> How can normal Flink Users understand these error? The Job's keep failing and 
> it's very unstable to be considered in production... 
>  
> In RoadMap, is there plans to make Kotlin supported language as well?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12504) NullPoint here NullPointException there.. It's every where

2019-05-13 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16838782#comment-16838782
 ] 

Ken Krugler commented on FLINK-12504:
-

Hi [~chethanuk] - a few comments...

 
 # Before filing a bug, please post to the Flink user list first, to get 
feedback from devs as to whether it's really a bug or not.
 # It looks like you've got a print() statement in your workflow definition, 
which is something you should **NOT** do for production.
 # The question about Kotlin is also one that should be asked on the user 
mailing list.

Thanks,

 

-- Ken

> NullPoint here NullPointException there.. It's every where
> --
>
> Key: FLINK-12504
> URL: https://issues.apache.org/jira/browse/FLINK-12504
> Project: Flink
>  Issue Type: Bug
>Reporter: Chethan UK
>Priority: Major
>
> I was trying to push data from Kafka to Cassandra, after around 220K 
> sometimes, 300K points are pushed into C*, this 
> java.lang.NullPointerException throws in..
> ```
> java.lang.NullPointerException
>   at 
> org.apache.flink.api.common.functions.util.PrintSinkOutputWriter.write(PrintSinkOutputWriter.java:73)
>   at 
> org.apache.flink.streaming.api.functions.sink.PrintSinkFunction.invoke(PrintSinkFunction.java:81)
>   at 
> org.apache.flink.streaming.api.functions.sink.SinkFunction.invoke(SinkFunction.java:52)
>   at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:56)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:649)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$BroadcastingOutputCollector.collect(OperatorChain.java:602)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>   at 
> org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:41)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.pushToOperator(OperatorChain.java:579)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:554)
>   at 
> org.apache.flink.streaming.runtime.tasks.OperatorChain$CopyingChainingOutput.collect(OperatorChain.java:534)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:718)
>   at 
> org.apache.flink.streaming.api.operators.AbstractStreamOperator$CountingOutput.collect(AbstractStreamOperator.java:696)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collect(StreamSourceContexts.java:104)
>   at 
> org.apache.flink.streaming.api.operators.StreamSourceContexts$NonTimestampContext.collectWithTimestamp(StreamSourceContexts.java:111)
>   at 
> org.apache.flink.streaming.connectors.kafka.internals.AbstractFetcher.emitRecordWithTimestamp(AbstractFetcher.java:398)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.emitRecord(KafkaFetcher.java:185)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaFetcher.runFetchLoop(KafkaFetcher.java:150)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase.run(FlinkKafkaConsumerBase.java:711)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:93)
>   at 
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:57)
>   at 
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:97)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:300)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>   at java.lang.Thread.run(Thread.java:748)
> ```
> How can normal Flink Users understand these error? The Job's keep failing and 
> it's very unstable to be considered in production... 
>  
> In RoadMap, is there plans to make Kotlin supported language as well?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-12205) Internal server error.,

2019-04-15 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16818240#comment-16818240
 ] 

Ken Krugler commented on FLINK-12205:
-

Hi [~hbchen] - I didn't see this posted on the user mailing list. If that's the 
case, please close this issue, then post to the list, and based on feedback 
from the Flink community, it will hopefully become clear whether the issue is 
one of user setup/configuration, or a real bug in Flink - thanks!

A quick search of Jira also showed 
https://issues.apache.org/jira/browse/FLINK-11738, which seems similar, though 
the referenced issue was fixed in 1.8, which you're using.

> Internal server error.,  akka.pattern.AskTimeoutException: Ask timed out on 
> [Actor[akka://flink/user/dispatcher#-1774235985]
> --
>
> Key: FLINK-12205
> URL: https://issues.apache.org/jira/browse/FLINK-12205
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 1.8.0
> Environment: OSX
> flink-1.8.0-bin-scala_2.11
>Reporter: Hobo Chen
>Priority: Major
>
> I followed Local Step Tutorial to start flink.
> *env*
>  OSX
>  flink-1.8.0-bin-scala_2.11
>  
> *start flink cluster result, it's ok.*
>  
> {code:java}
> $ ./bin/start-cluster.sh
> $ tail log/flink-*-standalonesession-*.log
> 2019-04-15 23:46:14,335 INFO 
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Rest endpoint 
> listening at localhost:8081
> 2019-04-15 23:46:14,337 INFO 
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - 
> http://localhost:8081 was granted leadership with 
> leaderSessionID=----
> 2019-04-15 23:46:14,337 INFO 
> org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint - Web frontend 
> listening at http://localhost:8081.
> 2019-04-15 23:46:14,517 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService 
> - Starting RPC endpoint for 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager at 
> akka://flink/user/resourcemanager .
> 2019-04-15 23:46:14,621 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService 
> - Starting RPC endpoint for 
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher at 
> akka://flink/user/dispatcher .
> 2019-04-15 23:46:14,675 INFO 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - 
> ResourceManager akka.tcp://flink@localhost:6123/user/resourcemanager was 
> granted leadership with fencing token 
> 2019-04-15 23:46:14,676 INFO 
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager - Starting 
> the SlotManager.
> 2019-04-15 23:46:14,698 INFO 
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Dispatcher 
> akka.tcp://flink@localhost:6123/user/dispatcher was granted leadership with 
> fencing token ----
> 2019-04-15 23:46:14,701 INFO 
> org.apache.flink.runtime.dispatcher.StandaloneDispatcher - Recovering all 
> persisted jobs.
> 2019-04-15 23:46:15,559 INFO 
> org.apache.flink.runtime.resourcemanager.StandaloneResourceManager - 
> Registering TaskManager with ResourceID 6e31e97e88429e4eb8a55489e7334560 
> (akka.tcp://flink@192.168.1.5:65505/user/taskmanager_0) at ResourceManager
> {code}
>  
>  
> *When I run the example, have error.*
> {code:java}
> $ ./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 
> 9000{code}
> *client error log*
> {code:java}
> 2019-04-15 23:53:11,171 ERROR org.apache.flink.client.cli.CliFrontend - Error 
> while running the command.
> org.apache.flink.client.program.ProgramInvocationException: Could not 
> retrieve the execution result. (JobID: 42cf26445edd68aef39b67a543cca421)
> at 
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:261)
> at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:483)
> at 
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
> at 
> org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:92)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:483)
> at 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
> at 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
> at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423)
> at 
> 

[jira] [Commented] (FLINK-12093) Apache Flink:Active MQ consumer job is getting finished after first message consume.

2019-04-03 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-12093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16808800#comment-16808800
 ] 

Ken Krugler commented on FLINK-12093:
-

Hi [~shivk4464] - please post this question to the [Flink user mailing 
list|[https://flink.apache.org/community.html#mailing-lists]|https://flink.apache.org/community.html#mailing-lists],].
 If the result of the mailing list discussion is that this looks like a bug, 
then you should open a Jira issue, thanks!

 

> Apache Flink:Active MQ consumer job is getting finished after first message 
> consume.
> 
>
> Key: FLINK-12093
> URL: https://issues.apache.org/jira/browse/FLINK-12093
> Project: Flink
>  Issue Type: Bug
>  Components: API / DataStream
>Affects Versions: 1.7.2
> Environment: Working in my local IDE(Eclipse).
>Reporter: shiv kumar
>Priority: Blocker
>
> Hi Team,
>  
> Below is my the code the the execution environment to run the Apache Flink 
> job that's consume message from ActiveMQ topic::
>  
> StreamExecutionEnvironment env = createExecutionEnvironment();
> connectionFactory = new ActiveMQConnectionFactory("**", "**.",
>  "failover:(tcp://amq-master-01:61668)?timeout=3000");
> LOG.info("exceptionListener{}", new AMQExceptionListLocal(LOG, true));
> RunningChecker runningChecker = new RunningChecker();
>  runningChecker.setIsRunning(true);
> AMQSourceConfig config = new 
> AMQSourceConfig.AMQSourceConfigBuilder()
>  .setConnectionFactory(connectionFactory).setDestinationName("test_flink")
>  
> .setDeserializationSchema(deserializationSchema).setRunningChecker(runningChecker)
>  .setDestinationType(DestinationType.TOPIC).build();
> amqSource = new AMQSourceLocal<>(config);
> LOG.info("Check whether ctx is null ::{}", amqSource);
> DataStream dataMessage = env.addSource(amqSource);
> dataMessage.writeAsText("C:/Users/shivkumar/Desktop/flinksjar/output.txt", 
> WriteMode.OVERWRITE);
>  System.out.println("Step 1");
> env.execute("Check ACTIVE_MQ");
>  
> When we are starting the job, Topic is getting created and message is getting 
> dequeued from that topic.
> But After that is getting finished. What Can be done to keep the job running?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11933) Exception in thread "main" java.lang.RuntimeException: No data sinks have been created yet. A program needs at least one sink that consumes data. Examples are writing

2019-03-15 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16793700#comment-16793700
 ] 

Ken Krugler commented on FLINK-11933:
-

Hi thinktothings - please post problems like this to the Flink user email list 
first, before opening up a Jira issue. If input from the Flink community then 
indicates it's a bug or a worthwhile enhancement, you can and should create a 
Jira issue. Thanks!

> Exception in thread "main" java.lang.RuntimeException: No data sinks have 
> been created yet. A program needs at least one sink that consumes data. 
> Examples are writing the data set or printing it.
> ---
>
> Key: FLINK-11933
> URL: https://issues.apache.org/jira/browse/FLINK-11933
> Project: Flink
>  Issue Type: Bug
>  Components: API / Table SQL
>Affects Versions: 1.7.2
> Environment: package 
> com.opensourceteams.module.bigdata.flink.example.tableapi.convert.dataset
> import org.apache.flink.api.scala.ExecutionEnvironment
> import org.apache.flink.api.scala._
> import org.apache.flink.core.fs.FileSystem.WriteMode
> import org.apache.flink.table.api.\{TableEnvironment, Types}
> import org.apache.flink.table.sinks.CsvTableSink
> import org.apache.flink.api.common.typeinfo.TypeInformation
> object Run {
>  def main(args: Array[String]): Unit = {
>  val env = ExecutionEnvironment.getExecutionEnvironment
>  val tableEnv = TableEnvironment.getTableEnvironment(env)
>  val dataSet = env.fromElements( (1,"a",10),(2,"b",20), (3,"c",30) )
>  val table = tableEnv.fromDataSet(dataSet)
>  tableEnv.registerTable("user1",table)
>  val csvTableSink = new 
> CsvTableSink("sink-data/csv/a.csv",",",1,WriteMode.OVERWRITE)
>  val fieldNames = Array("id", "name", "value")
>  val fieldTypes: Array[TypeInformation[_]] = Array(Types.INT, Types.STRING, 
> Types.LONG)
>  tableEnv.registerTableSink("csvTableSink",fieldNames,fieldTypes,csvTableSink)
>  tableEnv.scan("user1")
>  env.execute()
>  }
> }
>Reporter: thinktothings
>Priority: Major
>
> Exception in thread "main" java.lang.RuntimeException: No data sinks have 
> been created yet. A program needs at least one sink that consumes data. 
> Examples are writing the data set or printing it.
>  at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:945)
>  at 
> org.apache.flink.api.java.ExecutionEnvironment.createProgramPlan(ExecutionEnvironment.java:923)
>  at 
> org.apache.flink.api.java.LocalEnvironment.execute(LocalEnvironment.java:85)
>  at 
> org.apache.flink.api.java.ExecutionEnvironment.execute(ExecutionEnvironment.java:817)
>  at 
> org.apache.flink.api.scala.ExecutionEnvironment.execute(ExecutionEnvironment.scala:525)
>  at 
> com.opensourceteams.module.bigdata.flink.example.tableapi.convert.dataset.Run$.main(Run.scala:36)
>  at 
> com.opensourceteams.module.bigdata.flink.example.tableapi.convert.dataset.Run.main(Run.scala)
> Process finished with exit code 1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-11906) Getting error while submitting job

2019-03-13 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16791994#comment-16791994
 ] 

Ken Krugler commented on FLINK-11906:
-

Hi Ramesh - for a problem like this, please first post to the Flink user list. 
Then, after feedback from Flink devs, if it's identified as a bug you could 
open a Jira issue. Thanks!

> Getting error while submitting job 
> ---
>
> Key: FLINK-11906
> URL: https://issues.apache.org/jira/browse/FLINK-11906
> Project: Flink
>  Issue Type: Bug
>  Components: Command Line Client
>Affects Versions: 1.7.2
> Environment: Production
>Reporter: Ramesh Srinivasalu
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Getting below error while submitting java program to flink runner. Any help 
> would be greatly appreciated.
>  
>  
> [INFO] --- exec-maven-plugin:1.4.0:java (default-cli) @ cap_scoring ---
> SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
> SLF4J: Defaulting to no-operation (NOP) logger implementation
> SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
> details.
> Submitting job with JobID: ae56656f79644d4c181395b9322d9dc0. Waiting for job 
> completion.
> [WARNING]
> java.lang.reflect.InvocationTargetException
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>  at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498)
>  at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.RuntimeException: Pipeline execution failed
>  at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:121)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
>  at org.apache.beam.sdk.Pipeline.run(Pipeline.java:283)
>  at 
> com.att.streams.tdata.cap_scoring.CapInjestAndScore.main(CapInjestAndScore.java:410)
>  ... 6 more
> Caused by: org.apache.flink.client.program.ProgramInvocationException: The 
> program execution failed: Couldn't retrieve the JobExecutionResult from the 
> JobManager.
>  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:478)
>  at 
> org.apache.flink.client.program.StandaloneClusterClient.submitJob(StandaloneClusterClient.java:105)
>  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:442)
>  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:429)
>  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
>  at 
> org.apache.flink.client.RemoteExecutor.executePlanWithJars(RemoteExecutor.java:211)
>  at 
> org.apache.flink.client.RemoteExecutor.executePlan(RemoteExecutor.java:188)
>  at 
> org.apache.flink.api.java.RemoteEnvironment.execute(RemoteEnvironment.java:172)
>  at 
> org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.executePipeline(FlinkPipelineExecutionEnvironment.java:114)
>  at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:118)
>  ... 9 more
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Couldn't 
> retrieve the JobExecutionResult from the JobManager.
>  at 
> org.apache.flink.runtime.client.JobClient.awaitJobResult(JobClient.java:309)
>  at 
> org.apache.flink.runtime.client.JobClient.submitJobAndWait(JobClient.java:396)
>  at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:467)
>  ... 18 more
> Caused by: 
> org.apache.flink.runtime.client.JobClientActorConnectionTimeoutException: 
> Lost connection to the JobManager.
>  at 
> org.apache.flink.runtime.client.JobClientActor.handleMessage(JobClientActor.java:219)
>  at 
> org.apache.flink.runtime.akka.FlinkUntypedActor.handleLeaderSessionID(FlinkUntypedActor.java:101)
>  at 
> org.apache.flink.runtime.akka.FlinkUntypedActor.onReceive(FlinkUntypedActor.java:68)
>  at 
> akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
>  at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
>  at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
>  at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
>  at akka.actor.ActorCell.invoke(ActorCell.scala:487)
>  at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
>  at akka.dispatch.Mailbox.run(Mailbox.scala:220)
>  at 
> akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
>  at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
>  at 
> scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
>  at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
>  at 
> 

[jira] [Commented] (FLINK-10959) sometimes most taskmanger in the same node

2018-11-21 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695163#comment-16695163
 ] 

Ken Krugler commented on FLINK-10959:
-

Hi [~luyee] - this is something that would be best to first bring up on the 
[Flink user mailing 
list|https://flink.apache.org/gettinghelp.html#user-mailing-list]. Then, if 
after discussion it looks like a bug, you can open a Jira issue. If that makes 
sense, please close this issue and post to the mailing list, thanks!

> sometimes  most taskmanger in the same node
> ---
>
> Key: FLINK-10959
> URL: https://issues.apache.org/jira/browse/FLINK-10959
> Project: Flink
>  Issue Type: Bug
>Reporter: Yee
>Priority: Major
>
> flink on yarn 
> first:
> 8* taskmanger  ( 7 in node A, 1in node B)
> after restart flink 
> 8*taskmanger (4 in node C, 3 in node D, 1 in node E)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-10952) How to stream MySQl data and Store in Flink Dataset

2018-11-20 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-10952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16693678#comment-16693678
 ] 

Ken Krugler commented on FLINK-10952:
-

Hi [~vinothmahalingam] - for problems with Flink, it's best to first ask for 
help on the [user mailing 
list|https://flink.apache.org/community.html#mailing-lists], versus opening a 
Jira issue here. If possible, please close this issue and ask for help on the 
list, thanks.

> How to stream MySQl data and Store in Flink Dataset
> ---
>
> Key: FLINK-10952
> URL: https://issues.apache.org/jira/browse/FLINK-10952
> Project: Flink
>  Issue Type: Bug
>  Components: Flink on Tez
>Affects Versions: 1.6.2
> Environment: i'm using Flink with hive- ENV and Hadoop cluster -ENV
>Reporter: vinoth
>Priority: Major
>
> i'm trying to stream and batch processing the mysql Data using JDBC 
> connection, but i couldn't able to store/convert the data into Flink Dataset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-11-19 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689936#comment-16689936
 ] 

Ken Krugler edited comment on FLINK-9541 at 11/19/18 11:38 PM:
---

I'd asked [on the bui...@apache.org|mailto:on%c2%a0the%c2%a0bui...@apache.org] 
about setting this up, but didn't hear back. Turns out Gavin McDonald had 
[responded|http://mail-archives.apache.org/mod_mbox/www-builds/201806.mbox/%3C21B85DEA-438A-42F0-8FAE-F25820F396A9%4016degrees.com.au%3E]...
{quote}Ok Ken and anyone else interested. I have updated the robots.txt [1] 
file to point to a sitemap-index.xml [2] file. So, all you now need to do is 
ensure you have a flink.xml.gz sitemap in ci.apache.org/projects/flink 
<[http://ci.apache.org/projects/flink]> and create a PR against our 
sitemap-index.xml file, and done, hopefully.
{quote}
I can create the sitemap file and build the pull request, but it would be good 
to get some input on what to put in the sitemap. For example, as a first cut it 
would be easiest to just have 
[https://ci.apache.org/projects/flink/flink-docs-stable/] as the only docs, as 
(I assume) that's what we'd want most people to find if they were doing a 
search without a version number in the query, yes? Maybe [~fhueske] can weigh 
in here.


was (Author: kkrugler):
I'd asked [on the bui...@apache.org|mailto:on%c2%a0the%c2%a0bui...@apache.org] 
about setting this up, but didn't hear back. Turns out Gavin McDonald had 
responded..[.|http://mail-archives.apache.org/mod_mbox/www-builds/201806.mbox/%3C21B85DEA-438A-42F0-8FAE-F25820F396A9%4016degrees.com.au%3E]
{quote}Ok Ken and anyone else interested. I have updated the robots.txt [1] 
file to point to a sitemap-index.xml [2] file. So, all you now need to do is 
ensure you have a flink.xml.gz sitemap in ci.apache.org/projects/flink 
<[http://ci.apache.org/projects/flink]> and create a PR against our 
sitemap-index.xml file, and done, hopefully.
{quote}
I can create the sitemap file and build the pull request, but it would be good 
to get some input on what to put in the sitemap. For example, as a first cut it 
would be easiest to just have 
[https://ci.apache.org/projects/flink/flink-docs-stable/] as the only docs, as 
(I assume) that's what we'd want most people to find if they were doing a 
search without a version number in the query, yes? Maybe [~fhueske] can weigh 
in here.

> Add robots.txt and sitemap.xml to Flink website
> ---
>
> Key: FLINK-9541
> URL: https://issues.apache.org/jira/browse/FLINK-9541
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
>  (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> 
>  and 1.4 
> 
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-11-19 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689936#comment-16689936
 ] 

Ken Krugler edited comment on FLINK-9541 at 11/19/18 11:37 PM:
---

I'd asked [on the bui...@apache.org|mailto:on%c2%a0the%c2%a0bui...@apache.org] 
about setting this up, but didn't hear back. Turns out Gavin McDonald had 
responded..[.|http://mail-archives.apache.org/mod_mbox/www-builds/201806.mbox/%3C21B85DEA-438A-42F0-8FAE-F25820F396A9%4016degrees.com.au%3E]
{quote}Ok Ken and anyone else interested. I have updated the robots.txt [1] 
file to point to a sitemap-index.xml [2] file. So, all you now need to do is 
ensure you have a flink.xml.gz sitemap in ci.apache.org/projects/flink 
<[http://ci.apache.org/projects/flink]> and create a PR against our 
sitemap-index.xml file, and done, hopefully.
{quote}
I can create the sitemap file and build the pull request, but it would be good 
to get some input on what to put in the sitemap. For example, as a first cut it 
would be easiest to just have 
[https://ci.apache.org/projects/flink/flink-docs-stable/] as the only docs, as 
(I assume) that's what we'd want most people to find if they were doing a 
search without a version number in the query, yes? Maybe [~fhueske] can weigh 
in here.


was (Author: kkrugler):
I'd asked [on the bui...@apache.org|mailto:on%c2%a0the%c2%a0bui...@apache.org] 
about setting this up, but didn't hear back. Turns out Gavin McDonald had 
responded...
{quote}Ok Ken and anyone else interested. I have updated the robots.txt [1] 
file to point to a sitemap-index.xml [2] file. So, all you now need to do is 
ensure you have a flink.xml.gz sitemap in ci.apache.org/projects/flink 
 and create a PR against our 
sitemap-index.xml file, and done, hopefully.
{quote}
I can create the sitemap file and build the pull request, but it would be good 
to get some input on what to put in the sitemap. For example, as a first cut it 
would be easiest to just have 
[https://ci.apache.org/projects/flink/flink-docs-stable/] as the only docs, as 
(I assume) that's what we'd want most people to find if they were doing a 
search without a version number in the query, yes? Maybe [~fhueske] can weigh 
in here.

> Add robots.txt and sitemap.xml to Flink website
> ---
>
> Key: FLINK-9541
> URL: https://issues.apache.org/jira/browse/FLINK-9541
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
>  (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> 
>  and 1.4 
> 
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-11-19 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16692292#comment-16692292
 ] 

Ken Krugler commented on FLINK-9541:


Hi [~fhueske] - thanks for the input, some comments below.
 * In order to avoid crawling flink-docs-master, we'd have to add a Disallow 
entry to the robots.txt file at ci.apache.org. Which is doable, yes, though I 
imagine we might get some push-back from the ops team to avoid having lots of 
groups editing the same file.
 * Yes, we could update the sitemap to adjust weights with each new release. 
I'm wondering about the easiest way to build the sitemap as part of the release 
process, as it needs both the (rendered) markdown files and the generated 
JavaDoc files.
 * At some point you'll want to remove older versions of the documents. But 
anyone who isn't upgrading will then have problems, so I imagine you want to be 
pretty conservative. E.g. currently you've got the 1.0 - 1.3 versions, but 
seems like everyone should be on 1.4 at least, yes?
 * And to cover [~Zentol]'s point, it would be better to figure out how (if 
possible) to have the httpd server return the canonical link info for the 
current release, so that you don't have to edit all of the older release files 
(and this would work for JavaDocs too).

 

> Add robots.txt and sitemap.xml to Flink website
> ---
>
> Key: FLINK-9541
> URL: https://issues.apache.org/jira/browse/FLINK-9541
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
>  (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> 
>  and 1.4 
> 
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-11-19 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16691951#comment-16691951
 ] 

Ken Krugler commented on FLINK-9541:


The use of canonical links should only be for pages that are actually identical 
(e.g. stable and 1.6, currently). It looks like the flink.apache.org site is 
using them for pages that aren't identical, e.g. all of the top level pages 
from past releases (1.4, 1.5, and 1.6) have the same canonical link to the 
stable doc version. Here's some info from a top SEO site about this practice...
{quote}
h3. Using rel=canonical on not so similar pages

While I wouldn’t recommend this, you _can_ definitely use rel=canonical very 
aggressively. Google honors it to an almost ridiculous extent, where you can 
canonicalize a very different piece of content to another piece of content. 
However, if Google catches you doing this, it will stop trusting your site’s 
canonicals and thus cause you more harm…
{quote}
This might be what's happening (Google ignoring canonicals), as I see 
duplicates in search results of pages that have the same canonical link. The 
only pages that should have the canonical link are the latest (eg. 1.6 
currently) documentation, being mapped to the stable version. But that would 
mean doing an update to the older version's pages with a new release. It's also 
possible to set up an HTTP response header with the canonical link, see 
[https://webmasters.googleblog.com/2011/06/supporting-relcanonical-http-headers.html,]
 so maybe that's a better approach.

Finally, I don't think the JavaDocs have canonical links, yes? I haven't looked 
into that much, just checked a page.

 

> Add robots.txt and sitemap.xml to Flink website
> ---
>
> Key: FLINK-9541
> URL: https://issues.apache.org/jira/browse/FLINK-9541
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
>  (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> 
>  and 1.4 
> 
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-11-16 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689955#comment-16689955
 ] 

Ken Krugler commented on FLINK-9541:


Should there be any directories in ci.apache.org/projects/flink that should NOT 
be indexed? E.g. currently there's a 
[*flink-docs-master/*|https://ci.apache.org/projects/flink/flink-docs-master/] 
directory; does this have docs for pending releases? 

> Add robots.txt and sitemap.xml to Flink website
> ---
>
> Key: FLINK-9541
> URL: https://issues.apache.org/jira/browse/FLINK-9541
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
>  (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> 
>  and 1.4 
> 
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-11-16 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689941#comment-16689941
 ] 

Ken Krugler commented on FLINK-9541:


Also, I've confirmed that the ci.apache.org 
[robots.txt|https://ci.apache.org/robots.txt] file has the sitemap reference 
Gavin described in it:

{{Sitemap: https://ci.apache.org/sitemap-index.xml}}


> Add robots.txt and sitemap.xml to Flink website
> ---
>
> Key: FLINK-9541
> URL: https://issues.apache.org/jira/browse/FLINK-9541
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
>  (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> 
>  and 1.4 
> 
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9541) Add robots.txt and sitemap.xml to Flink website

2018-11-16 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16689936#comment-16689936
 ] 

Ken Krugler commented on FLINK-9541:


I'd asked [on the bui...@apache.org|mailto:on%c2%a0the%c2%a0bui...@apache.org] 
about setting this up, but didn't hear back. Turns out Gavin McDonald had 
responded...
{quote}Ok Ken and anyone else interested. I have updated the robots.txt [1] 
file to point to a sitemap-index.xml [2] file. So, all you now need to do is 
ensure you have a flink.xml.gz sitemap in ci.apache.org/projects/flink 
 and create a PR against our 
sitemap-index.xml file, and done, hopefully.
{quote}
I can create the sitemap file and build the pull request, but it would be good 
to get some input on what to put in the sitemap. For example, as a first cut it 
would be easiest to just have 
[https://ci.apache.org/projects/flink/flink-docs-stable/] as the only docs, as 
(I assume) that's what we'd want most people to find if they were doing a 
search without a version number in the query, yes? Maybe [~fhueske] can weigh 
in here.

> Add robots.txt and sitemap.xml to Flink website
> ---
>
> Key: FLINK-9541
> URL: https://issues.apache.org/jira/browse/FLINK-9541
> Project: Flink
>  Issue Type: Improvement
>  Components: Project Website
>Reporter: Fabian Hueske
>Priority: Major
>
> From the [dev mailing 
> list|https://lists.apache.org/thread.html/71ce1bfbed1cf5f0069b27a46df1cd4dccbe8abefa75ac85601b088b@%3Cdev.flink.apache.org%3E]:
> {quote}
> It would help to add a sitemap (and the robots.txt required to reference it) 
> for flink.apache.org and ci.apache.org (for /projects/flink)
> You can see what Tomcat did along these lines - 
> http://tomcat.apache.org/robots.txt references 
> http://tomcat.apache.org/sitemap.xml, which is a sitemap index file pointing 
> to http://tomcat.apache.org/sitemap-main.xml
> By doing this, you can emphasize more recent versions of docs. There are 
> other benefits, but reducing poor Google search results (to me) is the 
> biggest win.
> E.g.  https://www.google.com/search?q=flink+reducingstate 
>  (search on flink 
> reducing state) shows the 1.3 Javadocs (hit #1), master (1.6-SNAPSHOT) 
> Javadocs (hit #2), and then many pages of other results.
> Whereas the Javadocs for 1.5 
> 
>  and 1.4 
> 
>  are nowhere to be found.
> {quote}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9506) Flink ReducingState.add causing more than 100% performance drop

2018-06-06 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503401#comment-16503401
 ] 

Ken Krugler commented on FLINK-9506:


@swy - the questions about setting up RocksDB, and your configuration, are best 
asked on the mailing list versus as a comment on this issue. Also, it looks 
like this issue can be closed as not a problem now, do you agree? Thanks!

> Flink ReducingState.add causing more than 100% performance drop
> ---
>
> Key: FLINK-9506
> URL: https://issues.apache.org/jira/browse/FLINK-9506
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.4.2
>Reporter: swy
>Priority: Major
> Attachments: KeyNoHash_VS_KeyHash.png, flink.png
>
>
> Hi, we found out application performance drop more than 100% when 
> ReducingState.add is used in the source code. In the test checkpoint is 
> disable. And filesystem(hdfs) as statebackend.
> It could be easyly reproduce with a simple app, without checkpoint, just 
> simply keep storing record, also with simple reduction function(in fact with 
> empty function would see the same result). Any idea would be appreciated. 
> What an unbelievable obvious issue.
> Basically the app just keep storing record into the state, and we measure how 
> many record per second in "JsonTranslator", which is shown in the graph. The 
> difference between is just 1 line, comment/un-comment "recStore.add(r)".
> {code}
> DataStream stream = env.addSource(new GeneratorSource(loop);
> DataStream convert = stream.map(new JsonTranslator())
>.keyBy()
>.process(new ProcessAggregation())
>.map(new PassthruFunction());  
> public class ProcessAggregation extends ProcessFunction {
> private ReducingState recStore;
> public void processElement(Recordr, Context ctx, Collector out) {
> recStore.add(r); //this line make the difference
> }
> {code}
> Record is POJO class contain 50 String private member.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9535) log4j.properties specified in env.java.options doesn't get picked.

2018-06-06 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503396#comment-16503396
 ] 

Ken Krugler commented on FLINK-9535:


I believe that {{log4j.configuration}} takes a path to a file, but your 
{{log4j.properties}} is a resource inside of your jar, yes?

Also, it's best to ask questions like this on the user mailing list first, 
before opening a bug in Jira, thanks.

> log4j.properties specified in env.java.options doesn't get picked. 
> ---
>
> Key: FLINK-9535
> URL: https://issues.apache.org/jira/browse/FLINK-9535
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.4.2
>Reporter: SUBRAMANYA SURESH
>Assignee: vinoyang
>Priority: Major
>
> I created a log4j.properties and packaged it in source/main/resources of my 
> Job jar. As per the documentation I added 
> env.java.opts="-Dlog4j.configuration=log4j.properties" to my flink-conf.yaml. 
> When I submit my job to the Flink yarn cluster, it does not pick up this 
> log4j.properties. 
> Observations:
> The JVM options in the JobManager logs seem to have both the below, with the 
> latter overriding what I specified ? I tried altering the flink-daemon.sh 
> from adding the log settings, but it still shows up. 
> -Dlog4j.configuration=log4j.properties
> -Dlog4j.configuration=file:log4j.properties



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9506) Flink ReducingState.add causing more than 100% performance drop

2018-06-03 Thread Ken Krugler (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-9506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16499560#comment-16499560
 ] 

Ken Krugler commented on FLINK-9506:


Why are you creating a new {{AggregrationFunction}} every time {{reduce()}} is 
called in your {{ReducingState}} implementation?

> Flink ReducingState.add causing more than 100% performance drop
> ---
>
> Key: FLINK-9506
> URL: https://issues.apache.org/jira/browse/FLINK-9506
> Project: Flink
>  Issue Type: Improvement
>Affects Versions: 1.4.2
>Reporter: swy
>Priority: Major
> Attachments: flink.png
>
>
> Hi, we found out application performance drop more than 100% when 
> ReducingState.add is used in the source code. In the test checkpoint is 
> disable. And filesystem(hdfs) as statebackend.
> It could be easyly reproduce with a simple app, without checkpoint, just 
> simply keep storing record, also with simple reduction function(in fact with 
> empty function would see the same result). Any idea would be appreciated. 
> What an unbelievable obvious issue.
> Basically the app just keep storing record into the state, and we measure how 
> many record per second in "JsonTranslator", which is shown in the graph. The 
> difference between is just 1 line, comment/un-comment "recStore.add(r)".
> {code}
> DataStream stream = env.addSource(new GeneratorSource(loop);
> DataStream convert = stream.map(new JsonTranslator())
>.keyBy()
>.process(new ProcessAggregation())
>.map(new PassthruFunction());  
> public class ProcessAggregation extends ProcessFunction {
> private ReducingState recStore;
> public void processElement(Recordr, Context ctx, Collector out) {
> recStore.add(r); //this line make the difference
> }
> {code}
> Record is POJO class contain 50 String private member.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9442) Flink Scaling not working

2018-05-26 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16491708#comment-16491708
 ] 

Ken Krugler commented on FLINK-9442:


Hi there - before opening a bug in Jira, it would be best to post to the Flink 
user mailing list (with these same details, which are great, though you always 
want to include the version of Flink being used).

> Flink Scaling not working
> -
>
> Key: FLINK-9442
> URL: https://issues.apache.org/jira/browse/FLINK-9442
> Project: Flink
>  Issue Type: Bug
>Reporter: swy
>Priority: Major
>
> Hi,
>  
> We are in the middle of testing scaling ability of Flink. But we found that 
> scaling not working, no matter increase more slot or increase number of Task 
> Manager. We would expect a linear, if not close-to-linear scaling performance 
> but the result even show degradation. Appreciated any comments.
>  
> Test Details,
>  
> -VMWare vsphere
> -Just a simple pass through test,
>     - auto gen source 3mil records, each 1kb in size, parallelism=1
>     - source pass into next map operator, which just return the same record, 
> and sent counter to statsD, parallelism is in cases = 2,4,6
>  - 3 TM, total 6 slots(2/TM) each JM/TM has 32 vCPU, 100GB memory
>  - Result:
>   - 2 slots: 26 seconds, 3mil/26=115k TPS
>   - 4 slots: 23 seconds, 3mil/23=130k TPS
>   - 6 slots: 22 seconds, 3mil/22=136k TPS
>  
> As shown the scaling is almost nothing, and capped at ~120k TPS. Any clue? 
> Thanks.
>  
>  
>  
>  public class passthru extends RichMapFunction {
> public void open(Configuration configuration) throws Exception {
> ... ... 
> stats = new NonBlockingStatsDClient();
> }
> public String map(String value) throws Exception { 
> ... ...
> stats.increment(); 
> return value;
> }
> }
> public class datagen extends RichSourceFunction {
> ... ...
> public void run(SourceContext ctx) throws Exception {
> int i = 0;
> while (run){
> String idx = String.format("%09d", i);
> ctx.collect("{\" field>\"}");
> i++;
> if(i == loop) 
> run = false;
> }
> }
> ... ...
> }
> public class Job {
> public static void main(String[] args) throws Exception {
> ... ...
> DataStream stream = env.addSource(new 
> datagen(loop)).rebalance();
> DataStream convert = stream.map(new passthru(statsdUrl));
> env.execute("Flink");
> } 
> }
> The reason of this sample test is because of Kafka source 
> FlinkKafkaConsumer011 facing the same issue which is not scale-able. And 
> FlinkKafkaConsumer011 already using RichParallelSourceFunction. And we always 
> set kafka partition = total TM #slot. But the result is still capped and not 
> improve linearly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9300) Improve error message when in-memory state is too large

2018-05-04 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-9300:
--

 Summary: Improve error message when in-memory state is too large
 Key: FLINK-9300
 URL: https://issues.apache.org/jira/browse/FLINK-9300
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.4.2
Reporter: Ken Krugler


Currently in the {{MemCheckpointStreamFactory.checkSize()}} method, it can 
throw an {{IOException}} via:


{code:java}
throw new IOException(
"Size of the state is larger than the maximum permitted memory-backed state. 
Size="
+ size + " , maxSize=" + maxSize
+ " . Consider using a different state backend, like the File System State 
backend.");{code}

But this will happen even if you’re using the File System State backend.

This came up here: 
[https://stackoverflow.com/questions/50149005/ioexception-size-of-the-state-is-larger-than-the-maximum-permitted-memory-backe]

We could change the message to be:
{quote}Please consider increasing the maximum permitted memory size, increasing 
the task manager parallelism, or using a non-memory-based state backend such as 
RocksDB.
{quote}
 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-9299) ProcessWindowFunction documentation Java examples have errors

2018-05-04 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-9299:
--

 Summary: ProcessWindowFunction documentation Java examples have 
errors
 Key: FLINK-9299
 URL: https://issues.apache.org/jira/browse/FLINK-9299
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Affects Versions: 1.4.2
Reporter: Ken Krugler


In looking at 
[https://ci.apache.org/projects/flink/flink-docs-release-1.4/dev/stream/operators/windows.html#processwindowfunction-with-incremental-aggregation],
 I noticed a few errors...
 * "This allows to incrementally compute windows" should be "This allows it to 
incrementally compute windows"
 * DataStream input = ...; should be 
DataStream> input = ...;
 * The getResult() method needs to cast one of the accumulator values to a 
double, if that's what it is going to return.
 * MyProcessWindowFunction needs to extend, not implement ProcessWindowFunction
 * MyProcessWindowFunction needs to implement a process() method, not an 
apply() method.
 * The call to .timeWindow takes a Time parameter, not a window assigner.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9120) Task Manager Fault Tolerance issue

2018-04-02 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16422768#comment-16422768
 ] 

Ken Krugler commented on FLINK-9120:


Hi [~dhirajpraj] - this _might_ be a bug, but it's best to first ask questions 
on the mailing list before opening a Jira issue, thanks.

> Task Manager Fault Tolerance issue
> --
>
> Key: FLINK-9120
> URL: https://issues.apache.org/jira/browse/FLINK-9120
> Project: Flink
>  Issue Type: Bug
>  Components: Cluster Management, Configuration, Core
>Affects Versions: 1.4.2
>Reporter: dhiraj prajapati
>Priority: Critical
>
> HI, 
> I have set up a flink 1.4 cluster with 1 job manager and two task managers. 
> The configs taskmanager.numberOfTaskSlots and parallelism.default were set 
> to 2 on each node. I submitted a job to this cluster and it runs fine. To 
> test fault tolerance, I killed one task manager. I was expecting the job to 
> run fine because one of the 2 task managers was still up and running. 
> However, the job failed. Am I missing something? 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-9058) Relax ListState.addAll() and ListState.update() to take Iterable

2018-03-22 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-9058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1640#comment-1640
 ] 

Ken Krugler commented on FLINK-9058:


Would it make sense to file a related enhancement issue, for the 
{{ListCheckpointed}} methods to take/return iterables to avoid wasteful memory 
allocations, for the case where the state isn't also an in-memory list?

> Relax ListState.addAll() and ListState.update() to take Iterable
> 
>
> Key: FLINK-9058
> URL: https://issues.apache.org/jira/browse/FLINK-9058
> Project: Flink
>  Issue Type: Improvement
>  Components: State Backends, Checkpointing
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.5.0
>
>
> [~srichter] What do you think about this. None of the implementations require 
> the parameter to actually be a list and allowing an {{Iterable}} there allows 
> calling it in situations where all you have is an {{Iterable}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8883) ExceptionUtils.rethrowIfFatalError should treat ThreadDeath as fatal

2018-03-06 Thread Ken Krugler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler updated FLINK-8883:
---
Summary: ExceptionUtils.rethrowIfFatalError should treat ThreadDeath as 
fatal  (was: ExceptionUtils.rethrowIfFatalError should tread ThreadDeath as 
fatal.)

> ExceptionUtils.rethrowIfFatalError should treat ThreadDeath as fatal
> 
>
> Key: FLINK-8883
> URL: https://issues.apache.org/jira/browse/FLINK-8883
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0, 1.6.0
>
>
> Thread deaths leave code in inconsistent state and should thus always be 
> forwarded as fatal exceptions that cannot be handled in any way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-8849) Wrong link from concepts/runtime to doc on chaining

2018-03-04 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-8849:
--

 Summary: Wrong link from concepts/runtime to doc on chaining
 Key: FLINK-8849
 URL: https://issues.apache.org/jira/browse/FLINK-8849
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: Ken Krugler


On https://ci.apache.org/projects/flink/flink-docs-master/concepts/runtime.html 
there's a link to "chaining docs" that currently points at:

https://ci.apache.org/projects/flink/flink-docs-master/dev/datastream_api.html#task-chaining-and-resource-groups

but it should link to:

https://ci.apache.org/projects/flink/flink-docs-master/dev/stream/operators/#task-chaining-and-resource-groups




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8848) bin/start-cluster.sh won't start jobmanager on master machine.

2018-03-04 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16385157#comment-16385157
 ] 

Ken Krugler commented on FLINK-8848:


Hi [~manifoldQAQ] - in general it's best to first post to the user mailing 
list, and then if (after discussion) it looks like a bug, you could open a Jira 
issue, thanks!

> bin/start-cluster.sh won't start jobmanager on master machine.
> --
>
> Key: FLINK-8848
> URL: https://issues.apache.org/jira/browse/FLINK-8848
> Project: Flink
>  Issue Type: Bug
>  Components: Cluster Management, Configuration
>Affects Versions: 1.4.1
> Environment: I login to a remote Ubuntu 16.04 server via ssh.
>  
> conf/masters: hostname:port
>Reporter: Yesheng Ma
>Priority: Major
>  Labels: easyfix, newbie, usability
>
> When I execute bin/start-cluster.sh on the master machine, actually the 
> command `nohup /bin/bash -l 
> /state/partition1/ysma/flink-1.4.1/bin/jobmanager.sh start cluster ...` is 
> exexuted, which does not open the job manager.
> I think there might be something wrong with the `-l` argument, since when I 
> use the bin/jobmanager.sh start, everything is fine. Kindly point out if I've 
> done any configuration wrong.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7477) Use "hadoop classpath" to augment classpath when available

2018-02-26 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16377679#comment-16377679
 ] 

Ken Krugler commented on FLINK-7477:


The odd stuff (some of which might be bogus)...
 # I had to explicitly add {{kryo.serializers}} as a dependency.
 # Ditto for {{org.jdom:jdom}}, which our {{Tika}} dependency should have 
pulled in transitively, but it was missing.
 # A bunch of stuff with getting integration tests working (including 
{{maven-failsafe-plugin}} and {{build-helper-maven-plugin}} among others), but 
that just happened to be at the same time as the AWS client class issue, so 
unrelated.

Not sure how different our (non-flink) shaded exclusion list wound up being 
from "regular" Flink, here's what it is now:

 
{code:java}
log4j:log4j
org.scala-lang:scala-library
org.scala-lang:scala-compiler
org.scala-lang:scala-reflect
com.data-artisans:flakka-actor_*
com.data-artisans:flakka-remote_*
com.data-artisans:flakka-slf4j_*
io.netty:netty-all
io.netty:netty
commons-fileupload:commons-fileupload
org.apache.avro:avro
commons-collections:commons-collections
org.codehaus.jackson:jackson-core-asl
org.codehaus.jackson:jackson-mapper-asl
com.thoughtworks.paranamer:paranamer
org.xerial.snappy:snappy-java
org.apache.commons:commons-compress
org.tukaani:xz
com.esotericsoftware.kryo:kryo
com.esotericsoftware.minlog:minlog
org.objenesis:objenesis
com.twitter:chill_*
com.twitter:chill-java
commons-lang:commons-lang
junit:junit
org.apache.commons:commons-lang3
org.slf4j:slf4j-api
org.slf4j:slf4j-log4j12
log4j:log4j
org.apache.commons:commons-math
org.apache.sling:org.apache.sling.commons.json
commons-logging:commons-logging
commons-codec:commons-codec
stax:stax-api
com.typesafe:config
org.uncommons.maths:uncommons-maths
com.github.scopt:scopt_*
commons-io:commons-io
commons-cli:commons-cli
{code}

> Use "hadoop classpath" to augment classpath when available
> --
>
> Key: FLINK-7477
> URL: https://issues.apache.org/jira/browse/FLINK-7477
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.4.0
>
>
> Currently, some cloud environments don't properly put the Hadoop jars into 
> {{HADOOP_CLASSPATH}} (or don't set {{HADOOP_CLASSPATH}}) at all. We should 
> check in {{config.sh}} if the {{hadoop}} binary is on the path and augment 
> our {{INTERNAL_HADOOP_CLASSPATHS}} with the result of {{hadoop classpath}} in 
> our scripts.
> This will improve the out-of-box experience of users that otherwise have to 
> manually set {{HADOOP_CLASSPATH}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8717) Flink seems to deadlock due to buffer starvation when iterating

2018-02-21 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16371539#comment-16371539
 ] 

Ken Krugler commented on FLINK-8717:


Hi [~rrevol] - sadly the iteration improvements (FLIP-15, FLIP-16) seem to have 
stalled out.

> Flink seems to deadlock due to buffer starvation when iterating
> ---
>
> Key: FLINK-8717
> URL: https://issues.apache.org/jira/browse/FLINK-8717
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.0
> Environment: Windows 10 Pro 64-bit
> Core i7-6820HQ @ 2.7 GHz
> 16GB RAM
> Flink 1.4
> Scala client
> Scala 2.11.7
>  
>Reporter: Romain Revol
>Priority: Major
> Attachments: threadDump.txt
>
>
> We are encountering what looks like a deadlock of Flink in one of our jobs 
> with an "iterate" in it.
> I've reduced the job use case to the example in this gist : 
> [https://gist.github.com/rrevol/06ddfecd5f5ac7cbc67785b5d3a84dd4]
> Nothe that :
>  * varying the parallelism affects the rapidity of occurence of the deadlock, 
> but it always occur
>  * varying MAX_LOOP_NB does affect the deadlock : the higher it is, the 
> faster we encounter the deadlock. If MAX_LOOP_NB == 1, no deadlock. It 
> consequently leads to think that it happens when the number of iterations 
> reaches some threshold.
> From the [^threadDump.txt], it looks like some starvation over buffer 
> allocation, maybe backpressure has flaws on iterate, but I may be mistaking 
> since I don't know well Flink internals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8717) Flink seems to deadlock due to buffer starvation when iterating

2018-02-20 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16370301#comment-16370301
 ] 

Ken Krugler commented on FLINK-8717:


Hi [~rrevol] - I believe what you're seeing is caused by circular network 
buffer deadlock due to fan-out, not starvation. See [Piotr's response to my 
email|http://mail-archives.apache.org/mod_mbox/flink-user/201801.mbox/%3CBFD8C506-5B41-47D8-B735-488D03842051%40data-artisans.com%3E].
 If this matches what you're seeing, then please close this issue as "not a 
bug". BTW, I'll be talking about this issue at Flink Forward during my session 
on creating a web crawler with Flink.

> Flink seems to deadlock due to buffer starvation when iterating
> ---
>
> Key: FLINK-8717
> URL: https://issues.apache.org/jira/browse/FLINK-8717
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.4.0
> Environment: Windows 10 Pro 64-bit
> Core i7-6820HQ @ 2.7 GHz
> 16GB RAM
> Flink 1.4
> Scala client
> Scala 2.11.7
>  
>Reporter: Romain Revol
>Priority: Major
> Attachments: threadDump.txt
>
>
> We are encountering what looks like a deadlock of Flink in one of our jobs 
> with an "iterate" in it.
> I've reduced the job use case to the example in this gist : 
> [https://gist.github.com/rrevol/06ddfecd5f5ac7cbc67785b5d3a84dd4]
> Nothe that :
>  * varying the parallelism affects the rapidity of occurence of the deadlock, 
> but it always occur
>  * varying MAX_LOOP_NB does affect the deadlock : the higher it is, the 
> faster we encounter the deadlock. If MAX_LOOP_NB == 1, no deadlock. It 
> consequently leads to think that it happens when the number of iterations 
> reaches some threshold.
> From the [^threadDump.txt], it looks like some starvation over buffer 
> allocation, maybe backpressure has flaws on iterate, but I may be mistaking 
> since I don't know well Flink internals.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7477) Use "hadoop classpath" to augment classpath when available

2018-02-18 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16368677#comment-16368677
 ] 

Ken Krugler commented on FLINK-7477:


It works (at least when running with YARN via EMR). I believe that's because 
the version of Hadoop on the EMR master matches what we're running against; on 
my machine, I have to switch between multiple versions of Hadoop for various 
(consulting) clients who are on different versions of Hadoop, and my {{hadoop}} 
symlink wound up pointing to a different version of Hadoop than what Flink was 
using.

Related note - the 1.4 release fixed some shading issues we were running into 
with AWS client classes (mostly around {{HttpCore}} stuff), but to get 
everything working properly I felt like I did some voodoo with class exclusions 
in the {{maven-shade-plugin}} section of my {{pom.xml}}, which still feels 
fragile.

> Use "hadoop classpath" to augment classpath when available
> --
>
> Key: FLINK-7477
> URL: https://issues.apache.org/jira/browse/FLINK-7477
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.4.0
>
>
> Currently, some cloud environments don't properly put the Hadoop jars into 
> {{HADOOP_CLASSPATH}} (or don't set {{HADOOP_CLASSPATH}}) at all. We should 
> check in {{config.sh}} if the {{hadoop}} binary is on the path and augment 
> our {{INTERNAL_HADOOP_CLASSPATHS}} with the result of {{hadoop classpath}} in 
> our scripts.
> This will improve the out-of-box experience of users that otherwise have to 
> manually set {{HADOOP_CLASSPATH}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7477) Use "hadoop classpath" to augment classpath when available

2018-02-16 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16367750#comment-16367750
 ] 

Ken Krugler commented on FLINK-7477:


Hi [~aljoscha] - I encountered this issue when running locally (using 
{{bin/start-local.sh}}). And yes, on YARN I would expect that the Hadoop jars 
are added to the classpath on the nodes. The challenge comes from code that 
executes as part of creating/submitting the job, where it also needs Hadoop (or 
AWS) support, but you don't want to include those jars in the uber jar for 
obvious reasons. In that case ensuring the Hadoop/etc jars are on the classpath 
when main() is executing, _and_ they match the version being used by YARN, is 
critical and is a common source of problems (for Flink and regular Hadoop jobs).

> Use "hadoop classpath" to augment classpath when available
> --
>
> Key: FLINK-7477
> URL: https://issues.apache.org/jira/browse/FLINK-7477
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.4.0
>
>
> Currently, some cloud environments don't properly put the Hadoop jars into 
> {{HADOOP_CLASSPATH}} (or don't set {{HADOOP_CLASSPATH}}) at all. We should 
> check in {{config.sh}} if the {{hadoop}} binary is on the path and augment 
> our {{INTERNAL_HADOOP_CLASSPATHS}} with the result of {{hadoop classpath}} in 
> our scripts.
> This will improve the out-of-box experience of users that otherwise have to 
> manually set {{HADOOP_CLASSPATH}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8658) NoClassDefFoundError: Could not initialize class org.elasticsearch.transport.client.PreBuiltTransportClient

2018-02-14 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16364597#comment-16364597
 ] 

Ken Krugler commented on FLINK-8658:


Hi [~sathyadev] - if you're sure this is a Flink bug, then opening a Jira is 
fine. But if it's not yet clear where the problem lies, you should first write 
to the user mailing list, and request input from the community. If that's the 
case, please close this issue and send the email, thanks.

> NoClassDefFoundError: Could not initialize class 
> org.elasticsearch.transport.client.PreBuiltTransportClient
> ---
>
> Key: FLINK-8658
> URL: https://issues.apache.org/jira/browse/FLINK-8658
> Project: Flink
>  Issue Type: Bug
>  Components: ElasticSearch Connector
>Affects Versions: 1.4.0
> Environment: Flink 1.4.0 , Easticsearch 5.1.1, Scala 2.11.0, kafka 
> 0.10.0, Ubuntu 17.04.
>  
>  
>Reporter: sathiyarajan
>Priority: Major
>  Labels: features
>   Original Estimate: 4m
>  Remaining Estimate: 4m
>
>  
> NoClassDefFoundError: Could not initialize class 
> org.elasticsearch.transport.client.PreBuiltTransportClient is displayed while 
> running flink 1.4.0 with Elasticsearch 5.1.1 .
>  
> Note: In IntellIj SBT its working fine but while deplying the same jar in 
> flink1.4.0 
> Stack Trace:
> NoClassDefFoundError: Could not initialize class 
> org.elasticsearch.transport.client.PreBuiltTransportClient at
>  
> org.apache.flink.streaming.connectors.elasticsearch5.Elasticsearch5ApiCallBridge.createClient(Elasticsearch5ApiCallBridge.java:73)
> at 
> org.apache.flink.streaming.connectors.elasticsearch.ElasticsearchSinkBase.open(ElasticsearchSinkBase.java:281)
> at 
> org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:36)
> at 
> org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:102)
> at 
> org.apache.flink.streaming.api.operators.StreamSink.open(StreamSink.java:48)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.openAllOperators(StreamTask.java:393)
> at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:254)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:718)
> at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-8625) Move OutputFlusher thread to Netty scheduled executor

2018-02-09 Thread Ken Krugler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler updated FLINK-8625:
---
Summary: Move OutputFlusher thread to Netty scheduled executor  (was: Move 
OutputFlasherThread to Netty scheduled executor)

> Move OutputFlusher thread to Netty scheduled executor
> -
>
> Key: FLINK-8625
> URL: https://issues.apache.org/jira/browse/FLINK-8625
> Project: Flink
>  Issue Type: Sub-task
>  Components: Network
>Reporter: Piotr Nowojski
>Priority: Major
>
> This will allow us to trigger/schedule next flush only if we are not 
> currently busy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (FLINK-8615) Configuring Apache Flink Local Set up with Pseudo distributed Yarn

2018-02-08 Thread Ken Krugler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler closed FLINK-8615.
--
Resolution: Not A Bug

> Configuring Apache Flink Local Set up with Pseudo distributed Yarn
> --
>
> Key: FLINK-8615
> URL: https://issues.apache.org/jira/browse/FLINK-8615
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0
> Environment: Mac Os, Hadoop 2.8.3 and Flink 1.4.0
>Reporter: Karrtik Iyer
>Priority: Major
>
> I have set up HADOOP 2.8.3 and YARN on my Mac machine in Pseudo distributed 
> mode, following this blog: [Hadoop In Pseudo Distributed 
> Mode|http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html]
>  I have been able to successfully start hdfs and yarn. And also able to 
> submit Map reduce jobs.
> After that I have download Apache Flink 1.4 from 
> [here|http://redrockdigimark.com/apachemirror/flink/flink-1.4.0/flink-1.4.0-bin-hadoop28-scala_2.11.tgz].
>  Now I am trying to set up Flink on the above Yarn cluster by following the 
> steps 
> [here|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/yarn_setup.html].
>  When I try to run /bin/yarn-session.sh -n 4 -jm 1024 -tm 4096, I am getting 
> below error which I am unable to resolve. Can someone please advise and help 
> me with the same? 
>  
> {{Error while deploying YARN cluster: Couldn't deploy Yarn session cluster 
> java.lang.RuntimeException: Couldn't deploy Yarn session cluster at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:372)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:679)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:514)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:511)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:511)
>  Caused by: org.apache.flink.configuration.IllegalConfigurationException: The 
> number of virtual cores per node were configured with 1 but Yarn only has -1 
> virtual cores available. Please note that the number of virtual cores is set 
> to the number of task slots by default unless configured in the Flink config 
> with 'yarn.containers.vcores.' at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:265)
>  at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:415)
>  at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:367)}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8615) Configuring Apache Flink Local Set up with Pseudo distributed Yarn

2018-02-08 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357916#comment-16357916
 ] 

Ken Krugler commented on FLINK-8615:


https://flink.apache.org/community.html

> Configuring Apache Flink Local Set up with Pseudo distributed Yarn
> --
>
> Key: FLINK-8615
> URL: https://issues.apache.org/jira/browse/FLINK-8615
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0
> Environment: Mac Os, Hadoop 2.8.3 and Flink 1.4.0
>Reporter: Karrtik Iyer
>Priority: Major
>
> I have set up HADOOP 2.8.3 and YARN on my Mac machine in Pseudo distributed 
> mode, following this blog: [Hadoop In Pseudo Distributed 
> Mode|http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html]
>  I have been able to successfully start hdfs and yarn. And also able to 
> submit Map reduce jobs.
> After that I have download Apache Flink 1.4 from 
> [here|http://redrockdigimark.com/apachemirror/flink/flink-1.4.0/flink-1.4.0-bin-hadoop28-scala_2.11.tgz].
>  Now I am trying to set up Flink on the above Yarn cluster by following the 
> steps 
> [here|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/yarn_setup.html].
>  When I try to run /bin/yarn-session.sh -n 4 -jm 1024 -tm 4096, I am getting 
> below error which I am unable to resolve. Can someone please advise and help 
> me with the same? 
>  
> {{Error while deploying YARN cluster: Couldn't deploy Yarn session cluster 
> java.lang.RuntimeException: Couldn't deploy Yarn session cluster at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:372)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:679)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:514)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:511)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:511)
>  Caused by: org.apache.flink.configuration.IllegalConfigurationException: The 
> number of virtual cores per node were configured with 1 but Yarn only has -1 
> virtual cores available. Please note that the number of virtual cores is set 
> to the number of task slots by default unless configured in the Flink config 
> with 'yarn.containers.vcores.' at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:265)
>  at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:415)
>  at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:367)}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-8615) Configuring Apache Flink Local Set up with Pseudo distributed Yarn

2018-02-08 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-8615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16357418#comment-16357418
 ] 

Ken Krugler commented on FLINK-8615:


Hi Karrtik - for a support/configuration question like this, please post to the 
Flink user list first, versus opening a bug in Jira. It would be great if you 
could close this issue, thanks.

> Configuring Apache Flink Local Set up with Pseudo distributed Yarn
> --
>
> Key: FLINK-8615
> URL: https://issues.apache.org/jira/browse/FLINK-8615
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.4.0
> Environment: Mac Os, Hadoop 2.8.3 and Flink 1.4.0
>Reporter: Karrtik Iyer
>Priority: Major
>
> I have set up HADOOP 2.8.3 and YARN on my Mac machine in Pseudo distributed 
> mode, following this blog: [Hadoop In Pseudo Distributed 
> Mode|http://zhongyaonan.com/hadoop-tutorial/setting-up-hadoop-2-6-on-mac-osx-yosemite.html]
>  I have been able to successfully start hdfs and yarn. And also able to 
> submit Map reduce jobs.
> After that I have download Apache Flink 1.4 from 
> [here|http://redrockdigimark.com/apachemirror/flink/flink-1.4.0/flink-1.4.0-bin-hadoop28-scala_2.11.tgz].
>  Now I am trying to set up Flink on the above Yarn cluster by following the 
> steps 
> [here|https://ci.apache.org/projects/flink/flink-docs-release-1.4/ops/deployment/yarn_setup.html].
>  When I try to run /bin/yarn-session.sh -n 4 -jm 1024 -tm 4096, I am getting 
> below error which I am unable to resolve. Can someone please advise and help 
> me with the same? 
>  
> {{Error while deploying YARN cluster: Couldn't deploy Yarn session cluster 
> java.lang.RuntimeException: Couldn't deploy Yarn session cluster at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:372)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.run(FlinkYarnSessionCli.java:679)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:514)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli$1.call(FlinkYarnSessionCli.java:511)
>  at java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:422) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
>  at 
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>  at 
> org.apache.flink.yarn.cli.FlinkYarnSessionCli.main(FlinkYarnSessionCli.java:511)
>  Caused by: org.apache.flink.configuration.IllegalConfigurationException: The 
> number of virtual cores per node were configured with 1 but Yarn only has -1 
> virtual cores available. Please note that the number of virtual cores is set 
> to the number of task slots by default unless configured in the Flink config 
> with 'yarn.containers.vcores.' at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.isReadyForDeployment(AbstractYarnClusterDescriptor.java:265)
>  at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deployInternal(AbstractYarnClusterDescriptor.java:415)
>  at 
> org.apache.flink.yarn.AbstractYarnClusterDescriptor.deploySessionCluster(AbstractYarnClusterDescriptor.java:367)}}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-7477) Use "hadoop classpath" to augment classpath when available

2018-01-31 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16347685#comment-16347685
 ] 

Ken Krugler commented on FLINK-7477:


I posted to the mailing list about an issue that this change seemed to create 
for me, but didn't hear back.
{quote}With Flink 1.4 and FLINK-7477, I ran into a problem with jar versions 
for HttpCore, when using the AWS SDK to read from S3.
I believe the issue is that even when setting classloader.resolve-order to 
child-first in flink-conf.yaml, the change to put all jars returned by “hadoop 
classpath” on the classpath means that classes in these jars are found before 
the classes in my shaded Flink uber jar.
If I ensure that I don’t have the “hadoop” command set up on my Bash path, then 
I don’t run into this issue.
Does this make sense, or is there something else going on that I can fix to 
avoid this situation?{quote}
 
Any input? Thanks...Ken

> Use "hadoop classpath" to augment classpath when available
> --
>
> Key: FLINK-7477
> URL: https://issues.apache.org/jira/browse/FLINK-7477
> Project: Flink
>  Issue Type: Bug
>  Components: Startup Shell Scripts
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>Priority: Major
> Fix For: 1.4.0
>
>
> Currently, some cloud environments don't properly put the Hadoop jars into 
> {{HADOOP_CLASSPATH}} (or don't set {{HADOOP_CLASSPATH}}) at all. We should 
> check in {{config.sh}} if the {{hadoop}} binary is on the path and augment 
> our {{INTERNAL_HADOOP_CLASSPATHS}} with the result of {{hadoop classpath}} in 
> our scripts.
> This will improve the out-of-box experience of users that otherwise have to 
> manually set {{HADOOP_CLASSPATH}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (FLINK-7732) Test instability in Kafka end-to-end test (invalid Kafka offset)

2017-09-28 Thread Ken Krugler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-7732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler updated FLINK-7732:
---
Summary: Test instability in Kafka end-to-end test (invalid Kafka offset)  
(was: test instability in Kafa end-to-end test (invalid Kafka offset))

> Test instability in Kafka end-to-end test (invalid Kafka offset)
> 
>
> Key: FLINK-7732
> URL: https://issues.apache.org/jira/browse/FLINK-7732
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Tests
>Affects Versions: 1.4.0
>Reporter: Nico Kruber
>Priority: Critical
>  Labels: test-stability
>
> In a test run with unrelated changes in the network stack, the Kafa 
> end-to-end test was failing with an invalid offset:
> {code}
> 2017-09-28 06:34:10,736 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka version : 0.10.2.1
> 2017-09-28 06:34:10,744 INFO  org.apache.kafka.common.utils.AppInfoParser 
>   - Kafka commitId : e89bffd6b2eff799
> 2017-09-28 06:34:14,549 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-f67551b8-8867-41c5-b36e-4196d0f11e30:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-28 06:34:14,573 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-f67551b8-8867-41c5-b36e-4196d0f11e30:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-28 06:34:14,686 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-f67551b8-8867-41c5-b36e-4196d0f11e30:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-28 06:34:14,687 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Marking 
> the coordinator testing-gce-f67551b8-8867-41c5-b36e-4196d0f11e30:9092 (id: 
> 2147483647 rack: null) dead for group myconsumer
> 2017-09-28 06:34:14,792 INFO  
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator  - Discovered 
> coordinator testing-gce-f67551b8-8867-41c5-b36e-4196d0f11e30:9092 (id: 
> 2147483647 rack: null) for group myconsumer.
> 2017-09-28 06:34:15,068 INFO  
> org.apache.flink.runtime.state.DefaultOperatorStateBackend- 
> DefaultOperatorStateBackend snapshot (In-Memory Stream Factory, synchronous 
> part) in thread Thread[Async calls on Source: Custom Source -> Map -> Sink: 
> Unnamed (1/1),5,Flink Task Threads] took 948 ms.
> 2017-09-28 06:34:15,164 WARN  
> org.apache.flink.streaming.connectors.kafka.internal.Kafka09Fetcher  - 
> Committing offsets to Kafka failed. This does not compromise Flink's 
> checkpoints.
> java.lang.IllegalArgumentException: Invalid offset: -915623761772
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:687)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.doCommitOffsetsAsync(ConsumerCoordinator.java:531)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:499)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1181)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:217)
> 2017-09-28 06:34:15,171 ERROR 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase  - Async 
> Kafka commit failed.
> java.lang.IllegalArgumentException: Invalid offset: -915623761772
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:687)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.doCommitOffsetsAsync(ConsumerCoordinator.java:531)
>   at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsAsync(ConsumerCoordinator.java:499)
>   at 
> org.apache.kafka.clients.consumer.KafkaConsumer.commitAsync(KafkaConsumer.java:1181)
>   at 
> org.apache.flink.streaming.connectors.kafka.internal.KafkaConsumerThread.run(KafkaConsumerThread.java:217)
> {code}
> https://travis-ci.org/apache/flink/jobs/280722829
> [~pnowojski] did a first analysis that revealed this:
> In 
> org/apache/flink/streaming/connectors/kafka/internal/Kafka09Fetcher.java:229 
> this is being sent:
> {{long offsetToCommit = lastProcessedOffset + 1;}}
> {{lastProcessedOffset}} comes from:
> {{org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumerBase#snapshotState}}
>  either lines 741 or 749
> The value that we see is strangely similiar to 
> {{org.apache.flink.streaming.connectors.kafka.internals.KafkaTopicPartitionStateSentinel#GROUP_OFFSET}}
> {code}
> /**
>  

[jira] [Commented] (FLINK-3839) Support wildcards in classpath parameters

2016-06-13 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15327406#comment-15327406
 ] 

Ken Krugler commented on FLINK-3839:


-C (or --classpath)  seems like the right approach.

Interesting question about whether to thread / as the wildcard (as per 
URLClassloader docs) or force it to be /* (as per the -cp parameter for 
Java). I'd favor the latter, as it feels more consistent with what I'd expect.

> Support wildcards in classpath parameters
> -
>
> Key: FLINK-3839
> URL: https://issues.apache.org/jira/browse/FLINK-3839
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ken Krugler
>Assignee: Robert Thorman
>Priority: Minor
>
> Currently you can only specify a single explict jar with the CLI --classpath 
> file:// parameter.Java (since 1.6) has allowed you to use -cp 
> /* as a way of adding every file that ends in .jar in a 
> directory.
> This would simplify things, e.g. when running on EMR you have to add roughly 
> 120 jars explicitly, but these are all located in just two directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3839) Support wildcards in classpath parameters

2016-06-12 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15326513#comment-15326513
 ] 

Ken Krugler commented on FLINK-3839:


Hi Robert,

>From my email on this...

bq. But this doesn’t seem to work. I believe it’s because JDK tools do the 
expansion before creating the URLs used by the classloader, but Flink code 
doesn’t do any such special processing, and just creates URLs - see 
ProgramOptions.java, via  classpaths.add(new URL(path));

So take a look at ProgramOptions.java as a place to start.

> Support wildcards in classpath parameters
> -
>
> Key: FLINK-3839
> URL: https://issues.apache.org/jira/browse/FLINK-3839
> Project: Flink
>  Issue Type: Improvement
>Reporter: Ken Krugler
>Assignee: Robert Thorman
>Priority: Minor
>
> Currently you can only specify a single explict jar with the CLI --classpath 
> file:// parameter.Java (since 1.6) has allowed you to use -cp 
> /* as a way of adding every file that ends in .jar in a 
> directory.
> This would simplify things, e.g. when running on EMR you have to add roughly 
> 120 jars explicitly, but these are all located in just two directories.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3985) A metric with the name * was already registered

2016-05-28 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15305411#comment-15305411
 ] 

Ken Krugler commented on FLINK-3985:


Any chance this is related to the change made for [FLINK-3880]?

> A metric with the name * was already registered
> ---
>
> Key: FLINK-3985
> URL: https://issues.apache.org/jira/browse/FLINK-3985
> Project: Flink
>  Issue Type: Bug
>  Components: Metrics
>Affects Versions: 1.1.0
>Reporter: Robert Metzger
>  Labels: test-stability
>
> The YARN tests detected the following failure while running WordCount.
> {code}
> 2016-05-27 21:50:48,230 INFO  org.apache.flink.yarn.YarnTaskManager   
>   - Received task CHAIN DataSource (at main(WordCount.java:70) 
> (org.apache.flink.api.java.io.TextInputFormat)) -> FlatMap (FlatMap at 
> main(WordCount.java:80)) -> Combine(SUM(1), at main(WordCount.java:83) (1/2)
> 2016-05-27 21:50:48,231 ERROR org.apache.flink.metrics.reporter.JMXReporter   
>   - A metric with the name 
> org.apache.flink.metrics:key0=testing-worker-linux-docker-6e03e1e8-3385-linux-1,key1=taskmanager,key2=ee7c10183f32c9a96f8e7cfd873863d1,key3=WordCount_Example,key4=CHAIN_DataSource_(at_main(WordCount.java-70)_(org.apache.flink.api.java.io.TextInputFormat))_->_FlatMap_(FlatMap_at_main(WordCount.java-80))_->_Combine(SUM(1)-_at_main(WordCount.java-83),name=numBytesIn
>  was already registered.
> javax.management.InstanceAlreadyExistsException: 
> org.apache.flink.metrics:key0=testing-worker-linux-docker-6e03e1e8-3385-linux-1,key1=taskmanager,key2=ee7c10183f32c9a96f8e7cfd873863d1,key3=WordCount_Example,key4=CHAIN_DataSource_(at_main(WordCount.java-70)_(org.apache.flink.api.java.io.TextInputFormat))_->_FlatMap_(FlatMap_at_main(WordCount.java-80))_->_Combine(SUM(1)-_at_main(WordCount.java-83),name=numBytesIn
>   at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
>   at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
>   at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
>   at 
> org.apache.flink.metrics.reporter.JMXReporter.notifyOfAddedMetric(JMXReporter.java:76)
>   at 
> org.apache.flink.metrics.MetricRegistry.register(MetricRegistry.java:177)
>   at 
> org.apache.flink.metrics.groups.AbstractMetricGroup.addMetric(AbstractMetricGroup.java:191)
>   at 
> org.apache.flink.metrics.groups.AbstractMetricGroup.counter(AbstractMetricGroup.java:144)
>   at 
> org.apache.flink.metrics.groups.IOMetricGroup.(IOMetricGroup.java:40)
>   at 
> org.apache.flink.metrics.groups.TaskMetricGroup.(TaskMetricGroup.java:74)
>   at 
> org.apache.flink.metrics.groups.JobMetricGroup.addTask(JobMetricGroup.java:74)
>   at 
> org.apache.flink.metrics.groups.TaskManagerMetricGroup.addTaskForJob(TaskManagerMetricGroup.java:86)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.submitTask(TaskManager.scala:1093)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$handleTaskMessage(TaskManager.scala:442)
>   at 
> org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$handleMessage$1.applyOrElse(TaskManager.scala:284)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LeaderSessionMessageFilter$$anonfun$receive$1.applyOrElse(LeaderSessionMessageFilter.scala:36)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
>   at 
> scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:33)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.apply(LogMessages.scala:28)
>   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
>   at 
> org.apache.flink.runtime.LogMessages$$anon$1.applyOrElse(LogMessages.scala:28)
>   at 

[jira] [Commented] (FLINK-3895) Remove Focs for Program Packaging via Plans

2016-05-10 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15278231#comment-15278231
 ] 

Ken Krugler commented on FLINK-3895:


Maybe "Remove Docs for Program..."? :)


> Remove Focs for Program Packaging via Plans
> ---
>
> Key: FLINK-3895
> URL: https://issues.apache.org/jira/browse/FLINK-3895
> Project: Flink
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 1.0.2
>Reporter: Stephan Ewen
> Fix For: 1.1.0
>
>
> As a first step, we should remove the docs that describe packaging via plans, 
> in order to avoid confusion among users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3880) Use ConcurrentHashMap for Accumulators

2016-05-06 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-3880:
--

 Summary: Use ConcurrentHashMap for Accumulators
 Key: FLINK-3880
 URL: https://issues.apache.org/jira/browse/FLINK-3880
 Project: Flink
  Issue Type: Improvement
Affects Versions: 1.1.0
Reporter: Ken Krugler
Priority: Minor


I was looking at improving DataSet performance - this is for a job created 
using the Cascading-Flink planner for Cascading 3.1.

While doing a quick "poor man's profiler" session with one of the TaskManager 
processes, I noticed that many (most?) of the threads that were actually 
running were in this state:

{code:java}
"DataSource (/working1/terms) (8/20)" daemon prio=10 tid=0x7f55673e0800 
nid=0x666a runnable [0x7f556abcf000]
   java.lang.Thread.State: RUNNABLE
at java.util.Collections$SynchronizedMap.get(Collections.java:2037)
- locked <0x0006e73fe718> (a java.util.Collections$SynchronizedMap)
at 
org.apache.flink.api.common.functions.util.AbstractRuntimeUDFContext.getAccumulator(AbstractRuntimeUDFContext.java:162)
at 
org.apache.flink.api.common.functions.util.AbstractRuntimeUDFContext.getLongCounter(AbstractRuntimeUDFContext.java:113)
at 
com.dataartisans.flink.cascading.runtime.util.FlinkFlowProcess.getOrInitCounter(FlinkFlowProcess.java:245)
at 
com.dataartisans.flink.cascading.runtime.util.FlinkFlowProcess.increment(FlinkFlowProcess.java:128)
at 
com.dataartisans.flink.cascading.runtime.util.FlinkFlowProcess.increment(FlinkFlowProcess.java:122)
at 
cascading.tap.hadoop.util.MeasuredRecordReader.next(MeasuredRecordReader.java:65)
at cascading.scheme.hadoop.SequenceFile.source(SequenceFile.java:97)
at 
cascading.tuple.TupleEntrySchemeIterator.getNext(TupleEntrySchemeIterator.java:166)
at 
cascading.tuple.TupleEntrySchemeIterator.hasNext(TupleEntrySchemeIterator.java:139)
at 
com.dataartisans.flink.cascading.runtime.source.TapSourceStage.readNextRecord(TapSourceStage.java:70)
at 
com.dataartisans.flink.cascading.runtime.source.TapInputFormat.reachedEnd(TapInputFormat.java:175)
at 
org.apache.flink.runtime.operators.DataSourceTask.invoke(DataSourceTask.java:173)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
at java.lang.Thread.run(Thread.java:745)}}}
{code}

It looks like Cascading is asking Flink to increment a counter with each Tuple 
read, and that in turn is often blocked on getting access to the Accumulator 
object in a map. It looks like this is a SynchronizedMap, but using a 
ConcurrentHashMap (for example) would reduce this contention.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3878) File cache doesn't support multiple duplicate temp directories

2016-05-05 Thread Ken Krugler (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15273572#comment-15273572
 ] 

Ken Krugler commented on FLINK-3878:


I confirmed this fixes the issue - I was able to run with four copies of the 
temp directory (on AWS EMR).

> File cache doesn't support multiple duplicate temp directories
> --
>
> Key: FLINK-3878
> URL: https://issues.apache.org/jira/browse/FLINK-3878
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime, Local Runtime
>Affects Versions: 1.1.0, 1.0.2
>Reporter: Ken Krugler
> Attachments: FLINK-3878.patch
>
>
> Based on 
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html, you 
> should be able to specify the same temp directory name multiple times. This 
> works for some of the Flink infrastructure (e.g. the TaskManager's temp file 
> directory), but not for FileCache.
> The problem is that the FileCache() constructor tries to use the same random 
> directory name for each of the specified temp dir locations, so after the 
> first directory is created, the second create fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3878) File cache doesn't support multiple duplicate temp directories

2016-05-05 Thread Ken Krugler (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ken Krugler updated FLINK-3878:
---
Attachment: FLINK-3878.patch

> File cache doesn't support multiple duplicate temp directories
> --
>
> Key: FLINK-3878
> URL: https://issues.apache.org/jira/browse/FLINK-3878
> Project: Flink
>  Issue Type: Bug
>Affects Versions: 1.0.2
>Reporter: Ken Krugler
> Attachments: FLINK-3878.patch
>
>
> Based on 
> https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html, you 
> should be able to specify the same temp directory name multiple times. This 
> works for some of the Flink infrastructure (e.g. the TaskManager's temp file 
> directory), but not for FileCache.
> The problem is that the FileCache() constructor tries to use the same random 
> directory name for each of the specified temp dir locations, so after the 
> first directory is created, the second create fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3878) File cache doesn't support multiple duplicate temp directories

2016-05-05 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-3878:
--

 Summary: File cache doesn't support multiple duplicate temp 
directories
 Key: FLINK-3878
 URL: https://issues.apache.org/jira/browse/FLINK-3878
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.0.2
Reporter: Ken Krugler


Based on 
https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html, you 
should be able to specify the same temp directory name multiple times. This 
works for some of the Flink infrastructure (e.g. the TaskManager's temp file 
directory), but not for FileCache.

The problem is that the FileCache() constructor tries to use the same random 
directory name for each of the specified temp dir locations, so after the first 
directory is created, the second create fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3839) Support wildcards in classpath parameters

2016-04-27 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-3839:
--

 Summary: Support wildcards in classpath parameters
 Key: FLINK-3839
 URL: https://issues.apache.org/jira/browse/FLINK-3839
 Project: Flink
  Issue Type: Improvement
Reporter: Ken Krugler
Priority: Minor


Currently you can only specify a single explict jar with the CLI --classpath 
file:// parameter.Java (since 1.6) has allowed you to use -cp 
/* as a way of adding every file that ends in .jar in a 
directory.

This would simplify things, e.g. when running on EMR you have to add roughly 
120 jars explicitly, but these are all located in just two directories.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3838) CLI parameter parser is munging application params

2016-04-27 Thread Ken Krugler (JIRA)
Ken Krugler created FLINK-3838:
--

 Summary: CLI parameter parser is munging application params
 Key: FLINK-3838
 URL: https://issues.apache.org/jira/browse/FLINK-3838
 Project: Flink
  Issue Type: Bug
  Components: Command-line client
Affects Versions: 1.0.2
Reporter: Ken Krugler
Priority: Minor


If parameters for an application use a single '-' (e.g. -maxtasks) then the CLI 
argument parser will munge these, and the app gets passed either just the 
parameter name (e.g. 'maxtask') if the start of the parameter doesn't match a 
Flink parameter, or you get two values, with the first value being the part 
that matched (e.g. '-m') and the second value being the rest (e.g. 'axtasks').

The parser should ignore everything after the jar path parameter.

Note that using -- does seem to work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)