[ANNOUNCE] Apache Flink Kubernetes Operator 1.8.0 released

2024-03-25 Thread Maximilian Michels
The Apache Flink community is very happy to announce the release of the Apache Flink Kubernetes Operator version 1.8.0. The Flink Kubernetes Operator allows users to manage their Apache Flink applications on Kubernetes through all aspects of their lifecycle. Release highlights: - Flink

[ANNOUNCE] Apache Flink Kubernetes Operator 1.8.0 released

2024-03-25 Thread Maximilian Michels
The Apache Flink community is very happy to announce the release of the Apache Flink Kubernetes Operator version 1.8.0. The Flink Kubernetes Operator allows users to manage their Apache Flink applications on Kubernetes through all aspects of their lifecycle. Release highlights: - Flink

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-12 Thread Maximilian Michels
king forward to your feedback, thanks~ > > [1] https://github.com/apache/flink/pull/23247#discussion_r1422626734 > [2] > https://github.com/apache/flink/assets/38427477/642c57e0-b415-4326-af05-8b506c5fbb3a > [3] https://issues.apache.org/jira/browse/FLINK-33736 > > Best, > Ru

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-12 Thread Maximilian Michels
king forward to your feedback, thanks~ > > [1] https://github.com/apache/flink/pull/23247#discussion_r1422626734 > [2] > https://github.com/apache/flink/assets/38427477/642c57e0-b415-4326-af05-8b506c5fbb3a > [3] https://issues.apache.org/jira/browse/FLINK-33736 > > Best, > Ru

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-07 Thread Maximilian Michels
Hey Rui, +1 for changing the default restart strategy to exponential-delay. This is something all users eventually run into. They end up changing the restart strategy to exponential-delay. I think the current defaults are quite balanced. Restarts happen quickly enough unless there are consecutive

Re: [DISCUSS] Change the default restart-strategy to exponential-delay

2023-12-07 Thread Maximilian Michels
Hey Rui, +1 for changing the default restart strategy to exponential-delay. This is something all users eventually run into. They end up changing the restart strategy to exponential-delay. I think the current defaults are quite balanced. Restarts happen quickly enough unless there are consecutive

Re: Flink operator autoscaler scaling down

2023-11-11 Thread Maximilian Michels
Hi Yang, We're always open to changes / additions to the autoscaler logic and metric collection. Preferably, we change these directly in the autoscaler implementation, without adding additional processes or controllers. Let us know how your experiments go! If you want to contribute, a JIRA with a

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.5.0 released

2023-05-23 Thread Maximilian Michels
Niceee. Thanks for managing the release, Gyula! -Max On Wed, May 17, 2023 at 8:25 PM Márton Balassi wrote: > > Thanks, awesome! :-) > > On Wed, May 17, 2023 at 2:24 PM Gyula Fóra wrote: >> >> The Apache Flink community is very happy to announce the release of Apache >> Flink Kubernetes

Re: [ANNOUNCE] Apache Flink Kubernetes Operator 1.5.0 released

2023-05-23 Thread Maximilian Michels
Niceee. Thanks for managing the release, Gyula! -Max On Wed, May 17, 2023 at 8:25 PM Márton Balassi wrote: > > Thanks, awesome! :-) > > On Wed, May 17, 2023 at 2:24 PM Gyula Fóra wrote: >> >> The Apache Flink community is very happy to announce the release of Apache >> Flink Kubernetes

Re: support escaping `#` in flink job spec in Flink-operator

2022-11-08 Thread Maximilian Michels
parser. So it needs to be something else. In the meantime, we could at least add support for escapes in the configuration parser. CC dev mailing list -Max On Tue, Nov 8, 2022 at 2:26 PM Maximilian Michels wrote: > The job fails when starting because its arguments are passed through the >

Re: support escaping `#` in flink job spec in Flink-operator

2022-11-08 Thread Maximilian Michels
The job fails when starting because its arguments are passed through the Flink configuration in application deployment mode. >This is a known limit of the current Flink options parser. Refer to FLINK-15358[1] for more information. Exactly. The issue stems from the

Re: Flink SQL and checkpoints and savepoints

2021-01-30 Thread Maximilian Michels
It is true that there are no strict upgrade guarantees. However, looking at the code, it appears RowSerializer supports adding new fields to Row - as long as no fields are modified or deleted. Haven't tried this out but it looks like the code would only restore existing fields and incorporate

Re: Webinar: Unlocking the Power of Apache Beam with Apache Flink

2020-05-28 Thread Maximilian Michels
Thanks to everyone who joined and asked questions. Really enjoyed this new format! -Max On 28.05.20 08:09, Marta Paes Moreira wrote: > Thanks for sharing, Aizhamal - it was a great webinar! > > Marta > > On Wed, 27 May 2020 at 23:17, Aizhamal Nurmamat kyzy > mailto:aizha...@apache.org>> wrote:

Re: Flink logging issue with logback

2020-01-09 Thread Maximilian Michels
FYI, there is also a PR: https://github.com/apache/flink/pull/10811 On 09.01.20 01:53, Bajaj, Abhinav wrote: Thanks Dawid, Max and Yang for confirming the issue and providing potential workaround. On 1/8/20, 3:24 AM, "Maximilian Michels" wrote: Interesting that we c

Re: Flink logging issue with logback

2020-01-08 Thread Maximilian Michels
Interesting that we came across this problem at the same time. We have observed this with Lyft's K8s operator which uses the Rest API for job submission, much like the Flink dashboard. Note that you can restore the original stdout/stderr in your program: private static void

Re: Default restart behavior with checkpointing

2016-12-06 Thread Maximilian Michels
Very good question! As the documentation mentions, the old way was to use `setNumberOfExecutionRerties` but it has been replaced by `setRestartStrategy`. If you don't configure anything, then your job will _not_ be restarted. However, if you have enabled checkpointing, then your application will

Re: How to let 1.1.3 not drop late events as verion 1.0.3 does

2016-11-29 Thread Maximilian Michels
Setting allowedLateness to Long.MAX_VALUE and returning TriggerResult.FIRE_AND_PURGE in the custom trigger should do the trick. -Max On Mon, Nov 28, 2016 at 2:57 PM, vinay patil wrote: > Hi Sendoh, > > I have used the Custom Trigger which is same as 1.0.3

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-11-22 Thread Maximilian Michels
r to add it to the ship files in > the > cluster descriptor" Can you please tell me how to do this ? > > Regards, > Vinay Patil > > On Fri, Nov 18, 2016 at 4:12 AM, Maximilian Michels [via Apache Flink User > Mailing List archive.] <[hidden email]> wrote: >> >> Th

Re: FLINK-2821 - Flink on Kubernetes

2016-11-22 Thread Maximilian Michels
Hi Aparup, Could you go into a bit more detail on what you're trying to do and what kind of errors you're facing? Thanks, Max -Max On Fri, Nov 18, 2016 at 1:29 AM, Aparup Banerjee (apbanerj) wrote: > Hi Max, > > > > I am running into an issue on running flink on

Re: Can not stop cluster gracefully

2016-11-22 Thread Maximilian Michels
The stop script relies on a file in the /tmp directory (location can be changed by setting env.pid.dir in the Flink config). If that file somehow gets cleanup up occasionally, the stop script can't find the process identifiers inside that file to kill the processes. Another explanation could be

Re: Flink application and curator integration issues

2016-11-22 Thread Maximilian Michels
As far as I know we're shading Curator so you shouldn't run into class conflicts. Have you checked that Curator is included in your jar? -Max On Tue, Nov 22, 2016 at 9:30 AM, Liu Tongwei wrote: > Hi all, > > I'm using flink 1.1.3. I need to use the curator inside the

Re: PathIsNotEmptyDirectoryException in Namenode HDFS log when using Jobmanager HA in YARN

2016-11-22 Thread Maximilian Michels
This could be related to https://issues.apache.org/jira/browse/FLINK-5063 where some issues related to the cleanup of checkpointing files were fixed. -Max On Mon, Nov 21, 2016 at 10:05 PM, static-max wrote: > Update: I deleted the /flink/recovery folder on HDFS and

Re: ContinuousEventTimeTrigger breaks coGrouped windowed streams?

2016-11-22 Thread Maximilian Michels
Hi William, I've reproduced your example locally for some toy data and everything was working as expected (with the early triggering). So I'm assuming either there is something wrong with your input data or the behavior doesn't always manifest. Here's the example I run in case you want to try:

Re: flink-dist shading

2016-11-22 Thread Maximilian Michels
Hi Craig, I've left a comment on the original Maven JIRA issue to revive the discussion. For BigTop, you can handle this in the build script by building flink-dist again after a successful build. That will always work independently of the Maven 3.x version. -Max On Mon, Nov 21, 2016 at 6:27

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-11-17 Thread Maximilian Michels
The JVM only accepts Jar files in the classpath. You will have to load your custom files from the working directory of the node where the lib directory is shipped. By the way, the /lib directory is meant for Jar files. If you want to ship a custom file, it's better to add it to the ship files in

Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs

2016-11-04 Thread Maximilian Michels
Hi Anchit, The documentation mentions that you need Zookeeper in addition to setting the application attempts. Zookeeper is needed to retrieve the current leader for the client and to filter out old leaders in case multiple exist (old processes could even stay alive in Yarn). Moreover, it is

Re: Freeing resources in SourceFunction

2016-11-03 Thread Maximilian Michels
For your use case you should use the close() method which is always called upon shutdown of your source. The cancel() is only called when you explicitly cancel your job. -Max On Thu, Nov 3, 2016 at 2:45 PM, Yury Ruchin wrote: > Hello, > > I'm writing a custom source

Re: Testing DataStreams

2016-11-03 Thread Maximilian Michels
Hi Juan, StreamingMultipleProgramsTestBase is in the testing scope. Thus, is it not bundled in the normal jars. You would have to add the flink-test-utils_2.10 module. It is true that there is no guide. There is https://github.com/ottogroup/flink-spector for testing streaming pipelines. For

Re: BoundedOutOfOrdernessTimestampExtractor and timestamps in the future

2016-11-03 Thread Maximilian Michels
The BoundedOutOfOrdernessTimestampExtractor is not really useful if you have outliers because you always set the Watermark to the element with the largest timestamp minus the out-of-orderness. If your data is of such nature, you will have to implement a custom Watermark extractor to deal with

Re: native snappy library not available

2016-10-19 Thread Maximilian Michels
The Hadoop config of your Hadoop installation which is loaded in SequenceFileWriter.open() needs to be configured to have "io.compression.codecs" set to include "SnappyCodec". This is probably described in the Hadoop documentation. -Max On Wed, Oct 19, 2016 at 6:09 PM,

Re: Read Apache Kylin from Apache Flink

2016-10-19 Thread Maximilian Michels
Thanks for the guide, Alberto! -Max On Tue, Oct 18, 2016 at 10:20 PM, Till Rohrmann wrote: > Great to see Alberto. Thanks for sharing it with the community :-) > > Cheers, > Till > > On Tue, Oct 18, 2016 at 7:40 PM, Alberto Ramón > wrote: >> >>

Re: First Program with WordCount - Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/flink/api/common/functions/FlatMapFunction

2016-10-19 Thread Maximilian Michels
This usually happens when you enable the 'build-jar' profile from within IntelliJ. This profile assumes you have a Flink installation in the class path which is only true if you submit the job to an existing Flink cluster. -Max On Mon, Oct 17, 2016 at 10:50 AM, Stefan Richter

Re: Executing a flink program

2016-10-10 Thread Maximilian Michels
Whoops, I meant Flink and not Beam (I had just visited the Beam mailing list). -Max On Mon, Oct 10, 2016 at 12:08 PM, Maximilian Michels <m...@apache.org> wrote: > Normally, you should be able to directly execute your Beam program > from within your IDE. It automatically starts a l

Re: Executing a flink program

2016-10-10 Thread Maximilian Michels
Normally, you should be able to directly execute your Beam program from within your IDE. It automatically starts a local cluster with the resources needed for the job. Which Beam version are you using? Could you post some of the code your executing? -Max On Sat, Oct 8, 2016 at 7:51 PM, Dayong

Re: Best way to trigger dataset sampling

2016-09-28 Thread Maximilian Michels
actly what I was looking for. What do you mean for 'the best thing > is if you keep a local copy of your sampling jars and work directly with > them'? > > Best, > Flavio > > On Tue, Sep 27, 2016 at 2:35 PM, Maximilian Michels <m...@apache.org> wrote: >> >> Hi Flav

Re: Best way to trigger dataset sampling

2016-09-27 Thread Maximilian Michels
eed to tell the cluster the main class and the parameters to run > the job (and where the jar file is on HDFS). > > Best, > Flavio > > On Tue, Sep 27, 2016 at 12:06 PM, Maximilian Michels <m...@apache.org> > wrote: > >> Hi Flavio, >> >> Do you want to

Re: Best way to trigger dataset sampling

2016-09-27 Thread Maximilian Michels
Hi Flavio, Do you want to sample from a running batch job? That would be like Queryable State in streaming jobs but it is not supported in batch mode. Cheers, Max On Mon, Sep 26, 2016 at 6:13 PM, Flavio Pompermaier wrote: > Hi to all, > > I have a use case where I need

Re: flink run throws NPE, JobSubmissionResult is null when interactive and not isDetached()

2016-09-26 Thread Maximilian Michels
ent-fabric.com> wrote: > On Tue, Sep 20, 2016 at 12:49 PM, Maximilian Michels <m...@apache.org> wrote: >> >> Hi Luis, >> >> That looks like a bug but looking at the code I don't yet see how it may >> occur. We definitely need more information to reproduce it. Do you

Re: RawSchema as deserialization schema

2016-09-05 Thread Maximilian Michels
Just implement DeserializationSchema and return the byte array from Kafka. Byte array serialization poses no problem to the Flink serialization. On Mon, Sep 5, 2016 at 3:50 PM, Swapnil Chougule wrote: > I am using Kafka consumer in flink 1.1.1 with Kafka 0.8.2. I want to

Re: Kafka SimpleStringConsumer NPE

2016-09-05 Thread Maximilian Michels
Your Kafka topic seems to contain null values. By default, Flink will just forward null values to the DeserializationSchema which has to take care of null values. The SimpleStringSchema doesn't do that and fails with a NullPointerException. Thus, you need an additional check in your

Re: fromParallelCollection

2016-09-05 Thread Maximilian Michels
Please give us a bit more insight on what you're trying to do. On Sat, Sep 3, 2016 at 5:01 AM, wrote: > Hi, > val env = StreamExecutionEnvironment.getExecutionEnvironment > val tr = env.fromParallelCollection(data) > > the data i do not know initialize,some one can tell

Re: Flink Kafka more consumers than partitions

2016-09-05 Thread Maximilian Michels
Thanks for letting us know! On Sat, Sep 3, 2016 at 12:42 PM, neo21 zerro wrote: > Hi all, > > It turns out that there were other factors influencing my performance tests. > (actually hbase) > Hence, more consumers than partitions in Flink was not the problem. > Thanks for

Re: [SUGGESTION] Stack Overflow Documentation for Apache Flink

2016-09-05 Thread Maximilian Michels
Hi! This looks neat. Let's try it out. I just voted. Cheers, Max On Sun, Sep 4, 2016 at 8:11 PM, Vishnu Viswanath wrote: > Hi All, > > Why don't we make use of Stackoverflow's new documentation feature to do > some documentation of Apache Flink. > > To start, at

[DISCUSS] Storm 1.x.x support in the compatibility layer

2016-09-02 Thread Maximilian Michels
This should be of concern mostly to the users of the Storm compatibility layer: We just received a pull request [1] for updating the Storm compatibility layer to support Storm versions >= 1.0.0. This is a major change because all Storm imports have changed their namespace due to package renaming.

Re: Delaying starting the jobmanager in yarn?

2016-08-26 Thread Maximilian Michels
You too! On Fri, Aug 26, 2016 at 4:15 PM, Niels Basjes <ni...@basjes.nl> wrote: > Thanks! > I'm going to work with this next week. > > Have a nice weekend. > > Niels > > On Fri, Aug 26, 2016 at 2:49 PM, Maximilian Michels <m...@apache.org> wrote: >> >

Re: Running multiple jobs on yarn (without yarn-session)

2016-08-25 Thread Maximilian Michels
t if you see a way to fix this. > I consider it fine if this requires an extra call to the system indicating > that this is a 'mulitple job' situation. > > I created https://issues.apache.org/jira/browse/FLINK-4495 for you > > Niels Basjes > > On Thu, Aug 25, 2016 at 3:34 PM,

Re: Running multiple jobs on yarn (without yarn-session)

2016-08-25 Thread Maximilian Michels
Hi Niels, This is with 1.1.1? We could fix this in the upcoming 1.1.2 release by only using automatic shut down for detached jobs. In all other cases we should be able to shutdown from the client side after running all jobs. The only downside I see is that Flink clusters may actually never be

Re: Delaying starting the jobmanager in yarn?

2016-08-25 Thread Maximilian Michels
Hi Niels, If you're using 1.1.1, then you can instantiate the YarnClusterDescriptor and supply it with the Flink jar and configuration and subsequently call `deploy()` on it to receive a ClusterClient for Yarn which you can submit programs using the `run(PackagedProgram program, String args)`

Re: Dealing with Multiple sinks in Flink

2016-08-25 Thread Maximilian Michels
I'm assuming there is something wrong with your Watermark/Timestamp assigner. Could you share some of the code? On Wed, Aug 24, 2016 at 9:54 PM, vinay patil wrote: > Hi, > > Just an update, the window is not getting triggered when I change the > parallelism to more than

Re: Regarding Global Configuration in Flink

2016-08-25 Thread Maximilian Michels
Hi! Are you referring to the GlobalConfiguration class? That used to be a singleton class in Flink version < 1.2.x which would load the configuration only once per VM, if it found a config file. It allowed operations that could change that config after it had been loaded. It has since then been

Re: How to set a custom JAVA_HOME when run flink on YARN?

2016-08-25 Thread Maximilian Michels
Preferably, you set that directly in the config using env.java.home: /path/to/java/home If unset, Flink will use the $JAVA_HOME environment variable. Cheers, Max On Thu, Aug 25, 2016 at 10:39 AM, Renkai wrote: > I think I solved myself,just add -yD

Re: How to get latency info from benchmark

2016-08-24 Thread Maximilian Michels
I believe the AnaylzeTool is for processing logs of a different benchmark. CC Jamie and Robert who worked on the benchmark. On Wed, Aug 24, 2016 at 3:25 AM, Eric Fukuda wrote: > Hi, > > I'm trying to benchmark Flink without Kafka as mentioned in this post >

Re: Setting up zeppelin with flink

2016-08-24 Thread Maximilian Michels
Hi! There are some people familiar with the Zeppelin integration. CCing Till and Trevor. Otherwise, you could also send this to the Zeppelin community. Cheers, Max On Wed, Aug 24, 2016 at 12:58 PM, Frank Dekervel wrote: > Hello, > > for reference: > > i already found out that

Re: How to share text file across tasks at run time in flink.

2016-08-24 Thread Maximilian Michels
Hi! 1. The community is working on adding side inputs to the DataStream API. That will allow you to easily distribute data to all of your workers. 2. In the meantime, you could use `.broadcast()` on a DataSet to broadcast data to all workers. You still have to join that data with another stream

Re: Dealing with Multiple sinks in Flink

2016-08-24 Thread Maximilian Michels
Hi Vinay, Does this only happen with the S3 file system or also with your local file system? Could you share some example code or log output of your running job? Best, Max On Wed, Aug 24, 2016 at 4:20 AM, Vinay Patil wrote: > Hi, > > In our flink pipeline we are

Re: FLINK-4329 fix version

2016-08-24 Thread Maximilian Michels
Added a fix version 1.1.2 and 1.2.0 because a pull request is under way. On Tue, Aug 23, 2016 at 1:17 PM, Ufuk Celebi wrote: > On Tue, Aug 23, 2016 at 12:28 PM, Yassine Marzougui > wrote: >> The fix version of FLINK-4329 in JIRA is set to 1.1.1, but 1.1.1

Re: Checking for existance of output directory/files before running a batch job

2016-08-24 Thread Maximilian Michels
Forgot to mention, this is on the master. For Flink < 1.2.x, you will have to use GlobalConfiguration.get(); On Wed, Aug 24, 2016 at 12:23 PM, Maximilian Michels <m...@apache.org> wrote: > Hi Niels, > > The problem is that such method only works reliably if the cluster > conf

Re: Checking for existance of output directory/files before running a batch job

2016-08-24 Thread Maximilian Michels
cal, Yarn, > Mesos, etc.) without any problems. > > What do you guys think? > Is this desirable? Possible? > > Niels. > > > > On Fri, Aug 19, 2016 at 3:22 PM, Robert Metzger <rmetz...@apache.org> wrote: >> >> Ooops. Looks like Google Mail / Apache / the in

Re: "Failed to retrieve JobManager address" in Flink 1.1.1 with JM HA

2016-08-24 Thread Maximilian Michels
ager.rpc.port. When I tried setting localhost and 6123 > respectively, it worked. > > Regards, > Hironori > > 2016-08-24 0:54 GMT+09:00 Maximilian Michels <m...@apache.org>: >> Created an issue and fix should be there soon: >> https://issues.apache.org/jira/browse/

Re: "Failed to retrieve JobManager address" in Flink 1.1.1 with JM HA

2016-08-23 Thread Maximilian Michels
Created an issue and fix should be there soon: https://issues.apache.org/jira/browse/FLINK-4454 Thanks, Max On Tue, Aug 23, 2016 at 4:38 PM, Maximilian Michels <m...@apache.org> wrote: > Hi! > > Yes, this is a bug. However, there seems to be something wrong with > the config

Re: "Failed to retrieve JobManager address" in Flink 1.1.1 with JM HA

2016-08-23 Thread Maximilian Michels
Hi! Yes, this is a bug. However, there seems to be something wrong with the config directory because Flink fails to load the default value ("localhost") from the config. If you had a default value for the job manager in flink-conf.yaml, it wouldn't fail but only display a wrong job manager url.

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-23 Thread Maximilian Michels
tion). Check the hostnames; in > configuration, there are aliases used and the difference from fqdn may > be the cause, judging by the log (exception at line 87)... > > http://pastebin.com/iimPVbXB > > Thanks, > Mira > > > > Maximilian Michels píše v Pá 19. 0

Re: Checking for existance of output directory/files before running a batch job

2016-08-19 Thread Maximilian Michels
HI Niels, Have you tried specifying the fully-qualified path? The default is the local file system. For example, hdfs:///path/to/foo If that doesn't work, do you have the same Hadoop configuration on the machine where you test? Cheers, Max On Thu, Aug 18, 2016 at 2:02 PM, Niels Basjes

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-19 Thread Maximilian Michels
for debian. I can try >> to >> use the binary release for hadoop 2.6.0. >> >> Regarding zookeeper, we do not share instances between dev and >> production. >> >> Thanks, >> Miroslav >> >> Maximilian Michels píše v

Re: off heap memory deallocation

2016-08-18 Thread Maximilian Michels
Hi, Off-heap memory currently only gets deallocated once MaxDirectMemory has been reached. We can't manually clear the memory because some of the code assumes that it can still access old memory after it has been released. In case of offheap memory, that would give us a segmentation fault. We

Re: flink on yarn - Fatal error in AM: The ContainerLaunchContext was not set

2016-08-18 Thread Maximilian Michels
Hi Miroslav, >From the logs it looks like you're using Flink version 1.0.x. The ContainerLaunchContext is always set by Flink. I'm wondering why this error can still occur. Are you using the default Hadoop version that comes with Flink (2.3.0)? You could try the Hadoop 2.6.0 build of Flink. Does

Re: Programmatically Creating a Flink Cluster On YARN

2016-08-17 Thread Maximilian Michels
Hi Benjamin, Please apologize the late reply. In the latest code base and also Flink 1.1.1, the Flink configuration doesn't have to be loaded via a file location read from an environment variable and it doesn't throw an exception if it can't find the config upfront (phew). Instead, you can also

Re: Using CustomPartitionerWrapper with KeyedStream

2016-08-12 Thread Maximilian Michels
Hi Philippe, There is no particular reason other than hash partitioning is a sensible default for most users. It seems like this is rarely an issue. When the number of keys is close to the parallelism, having idle partitions is usually not a problem due to low data volume. I see that it could be

Re: ValueState is missing

2016-08-12 Thread Maximilian Michels
You're clearing the "handState" on "GameEndHistory". I'm assuming this event comes in before "CommCardHistory" where you check the state. On Fri, Aug 12, 2016 at 6:59 AM, Dong-iL, Kim wrote: > in my code, is the config of ExecutionEnv alright? > > >> On Aug 11, 2016, at 8:47

Re: Unit tests failing, losing stream contents

2016-08-12 Thread Maximilian Michels
Hi David, You're starting two executions at the same time (in different threads). Here's why: Execution No 1 DataStreamUtils.collect(..) starts a Thread which executes your job and collects stream elements. It runs asynchronously. The collect(..) method returns after starting the thread.

Re: Multiple Partitions (Source Functions) -> Event Time -> Watermarks -> Trigger

2016-08-12 Thread Maximilian Michels
Hi Sameer, If you use Event Time you should make sure to assign Watermarks and Timestamps at the source. As you already observed, Flink may get stuck otherwise because it waits for Watermarks to progress in time. There is no timeout for windows. However, you can implement that logic in your

Re: Within interval for CEP - Wall Clock based or Event Timestamp based?

2016-08-12 Thread Maximilian Michels
Hi Sameer, That depends on the time characteristic you have chosen. If you have set it to event time [1] then it will use event time, otherwise the default is to use processing time. When using event time, the element's timestamp is used to assign it to the specified time windows in the

Re: Container running beyond physical memory limits when processing DataStream

2016-07-29 Thread Maximilian Michels
Hi Jack, Considering the type of job you're running, you shouldn't run out of memory. Could it be that the events are quite large strings? It could be that the TextOutputFormat doesn't write to disk fast enough and accumulates memory. Actually, it doesn't perform regular flushing which could be

Re: how does flink assign windows to task

2016-07-29 Thread Maximilian Michels
Hi Vishnu Viswanath, The keyed elements are spread across the 50 task slots (assuming you have a parallelism of 50) using hash partitioning on the keys. Each task slot runs one or multiple operators (depending on the slot sharing options). One of them is a WindowOperator which will decide when to

Re: hash preservation

2016-07-29 Thread Maximilian Michels
Hi Robert, >Unfortunately, during the Flink map and reduce phases the objects change >their hash codes and become inconsistent with the keys of the original hashmap If objects change their hash code values, then this means they are not equal anymore. If this is not desired then the

Re: Killing Yarn Session Leaves Lingering Flink Jobs

2016-07-28 Thread Maximilian Michels
Hi Konstantin, If you come from traditional on-premise installations it may seem counter-intuitive to start a Flink cluster for each job. However, in today's cluster world it is not a problem to request containers on demand and spawn a new Flink cluster for each job. Per job clusters are

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-07-25 Thread Maximilian Michels
Hi! In the latest master and in the upcoming 1.1, all files in the lib folder will be shipped to the Yarn cluster and added to the class path. In Flink version <= 1.0.x no files will be added to the ship files by default (only the flink-dist*.jar will be shipped). Regardless of the version, if

Re: Flink Dashboard stopped showing list of uploaded jars

2016-07-20 Thread Maximilian Michels
:19:00,552 INFO > org.apache.flink.runtime.webmonitor.WebRuntimeMonitor - Using directory > /some/dir for web frontend JAR file uploads > > if you manually delete the offending jar file from that directory it could > solve your problem. > > Cheers, > Aljoscha > > On Wed, 20 Jul 2016 at 15:37 Maximilian M

Re: Elasticsearch connector and number of shards

2016-07-20 Thread Maximilian Michels
The connector doesn't cover this use case. Through the API you need to use the IndicesAdminClient: https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-admin-indices.html Otherwise Elasticsearch creates an index with shards automatically. We could add support for configuring

Re: DataStreamUtils not working properly

2016-07-20 Thread Maximilian Michels
opology. Cannot execute.* > * at * > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getStreamGraph(StreamExecutionEnvironment.java:1195) > at > org.apache.flink.streaming.api.environment.LocalStreamEnvironment.execute(LocalStreamEnvironment.java:86) > at > org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(S

Re: Class loading and job versioning

2016-07-20 Thread Maximilian Michels
the jobs libs should have precedence and no versioning problem should have > happened. > > Regards, > > Michal > > > > On 20.07.2016 14:00, Maximilian Michels wrote: >> >> Hi Michal, >> >> I couldn't find Joda in flink-dist. Possibly there is

Re: DataStreamUtils not working properly

2016-07-20 Thread Maximilian Michels
Open(); > columns[4] = (double) value.getVolume(); > return (new Tuple2<String, Double[]>(value.getId(), columns)); > } > }); > > Regards, > Subash Basnet > > On Wed, Jul 20, 2016 at 12:20 PM, Maximilian Michels <m...@apache.org> wrote: >> >> This

Re: Class loading and job versioning

2016-07-20 Thread Maximilian Michels
Hi Michal, I couldn't find Joda in flink-dist. Possibly there is some other clash? There are two potential issues here: 1) Flink shades some libraries (Guava) but not all. If you use a version of a library in your Flink job which doesn't match the one in flink-dist, you're bound for trouble.

Re: Data point goes missing within iteration

2016-07-20 Thread Maximilian Michels
s cause this... > > Paris > > PS: on my yet incomplete PR (I know I know) I basically disabled queue > polling timeouts since the checkpoint overhead on the StreamIterationHead > almost always led to record loss. > https://github.com/apache/flink/pull/1668 > > > On 20 J

Re: Understanding iteration error message

2016-07-20 Thread Maximilian Michels
Hi, It's stating that you can't use a DataStream which was not part of the iteration. It works with `newCentroids` because it is part of the loop. The only way to get the centroids DataStream in, is to union/join it with the `newCentroids` stream. Cheers, Max On Wed, Jul 20, 2016 at 11:33 AM,

Re: Logical plan optimization with Calcite

2016-07-20 Thread Maximilian Michels
Hi Gallenvara, As far as I know, the Table API is now translated into a Calcite plan which is then optimized according to Calcite's optimization rules. Cheers, Max On Wed, Jul 20, 2016 at 7:24 AM, gallenvara wrote: > > Hello, everyone. I'm new to Calcite and have some

Re: DataStreamUtils not working properly

2016-07-20 Thread Maximilian Michels
Just tried the following and it worked: public static void main(String[] args) throws IOException { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); final DataStreamSource source = env.fromElements(1, 2, 3, 4); source.print(); final Iterator

Re: Issue with running Flink Python jobs on cluster

2016-07-19 Thread Maximilian Michels
Hi! HDFS is mentioned in the docs but not explicitly listed as a requirement: https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/python.html#project-setup I suppose the Python API could also distribute its libraries through Flink's BlobServer. Cheers, Max On Tue, Jul 19, 2016 at

Re: Flink java.io.FileNotFoundException Exception with Peel Framework

2016-06-30 Thread Maximilian Michels
ved the first exception. Anyway on >> the 80GiB dataset I struggle with the second exception. >> >> Regards, >> Andrea >> >> 2016-06-28 12:08 GMT+02:00 Maximilian Michels <m...@apache.org>: >>> >>> Hi Andrea, >>> >>> The number of

Re: How to avoid breaking states when upgrading Flink job?

2016-06-30 Thread Maximilian Michels
Hi Josh, You have to assign UIDs to all operators to change the topology. Plus, you have to add dummy operators for all UIDs which you removed; this is a limitation currently because Flink will attempt to find all UIDs of the old job. Cheers, Max On Wed, Jun 29, 2016 at 9:00 PM, Josh

Re: Adding 3rd party moving average and other 'indicators'

2016-06-28 Thread Maximilian Michels
Hi Anton, I would suggest you simply put your moving average code in a MapFunction where you can keep track of the current average using a class field. Cheers, Max On Fri, Jun 24, 2016 at 10:05 PM, Anton wrote: > Hello > > I'm currently trying to learn Flink. And so far am

Re: Flink java.io.FileNotFoundException Exception with Peel Framework

2016-06-28 Thread Maximilian Michels
Hi Andrea, The number of network buffers should be sufficient. Actually, assuming you have 16 task slots on each of the 25 nodes, it should be enough to have 16^2 * 25 * 4 = 14400 network buffers. See https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#background So we

Re: Question regarding logging capabilities in flink

2016-06-24 Thread Maximilian Michels
Hi, Flink prints the Yarn application id during deployment of the cluster. You can then query the logs from Yarn using the `yarn logs -applicationId ` command. Please have a look at https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs Cheers, Max On Thu,

Re: State key serializer has not been configured in the config.

2016-06-24 Thread Maximilian Michels
+1 for a more helpful error message. @Jacob Would you mind opening a JIRA issue at https://issues.apache.org/jira/browse/FLINK? On Thu, Jun 23, 2016 at 11:31 AM, Chesnay Schepler wrote: > We should adjust the error message to contain the keyed stream thingy. > > > On

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-15 Thread Maximilian Michels
n) cluster cluster to the to-be-created cluster. Will be fixed in 1.1 and probably backported to 1.0.4. On Wed, Jun 15, 2016 at 6:05 PM, Maximilian Michels <m...@apache.org> wrote: > Hi Arnaud, > > One issue per thread please. That makes things a lot easier for us :) > &g

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-15 Thread Maximilian Michels
Hi Arnaud, One issue per thread please. That makes things a lot easier for us :) Something positive first: We are reworking the resuming of existing Flink Yarn applications. It'll be much easier to resume a cluster using simply the Yarn ID or re-discoering the Yarn session using the properties

Re: Application log on Yarn FlinkCluster

2016-06-15 Thread Maximilian Michels
n running a Spark job for example on the same setup, the yarn aggregated > log contains all the information printed out by the application. > > Cheers, > Theofilos > > > On 6/15/2016 10:14 AM, Maximilian Michels wrote: > > Please use the `yarn logs -applicationId ` to retrieve

Re: Application log on Yarn FlinkCluster

2016-06-15 Thread Maximilian Michels
1),3) > 2> ((25,11),4) > 2> ((46,44),2 > .." > > However, the yarn aggregated log contains only the jobmanager output. Is > this expected or could it indicate a problem with my hadoop logging > configuration not picking up taskmanager logs? > > Cheers, > T

Re: Accessing StateBackend snapshots outside of Flink

2016-06-13 Thread Maximilian Michels
. > > If you are interested in this we can work together on adding proper support > for TTL (time-to-live) to the Flink state abstraction. > > Cheers, > Aljoscha > > On Mon, 13 Jun 2016 at 12:21 Maximilian Michels <m...@apache.org> wrote: >> >> Hi Josh, >&g

Re: HBase reads and back pressure

2016-06-13 Thread Maximilian Michels
Thanks! On Mon, Jun 13, 2016 at 12:34 PM, Christophe Salperwyck wrote: > Hi, > I vote on this issue and I agree this would be nice to have. > > Thx! > Christophe > > 2016-06-13 12:26 GMT+02:00 Aljoscha Krettek : >> >> Hi, >> I'm afraid this

  1   2   3   >