Re: Features with major priority/future release/s

2015-12-09 Thread Maximilian Michels
Hi Ovidiu, This is still subject to community discussion. We created a wiki page to keep track of the essential features to be included in 1.0: https://cwiki.apache.org/confluence/display/FLINK/1.0+Release Otherwise, everything which is marked "Fix version 1.0.0" in JIRA is a candidate to be

Re: Read Kafka topic from the beginning

2015-12-03 Thread Maximilian Michels
gt; On Dec 3, 2015, at 6:11 PM, Vladimir Stoyak <vsto...@yahoo.com> wrote: >> >> As far as I know "auto.offset.reset" what to do if offset it not available >> or out of bound? >> >> Vladimir >> >> >> On Thursday, December 3, 2015 5:58

Re: Flink job on secure Yarn fails after many hours

2015-12-02 Thread Maximilian Michels
Hi Niels, Sorry for hear you experienced this exception. From a first glance, it looks like a bug in Hadoop to me. > "Not retrying because the invoked method is not idempotent, and unable to > determine whether it was invoked" That is nothing to worry about. This is Hadoop's internal retry

Re: Using Flink with Scala 2.11 and Java 8

2015-12-07 Thread Maximilian Michels
Hi Cory, Thanks for reporting the issue. Scala should run independently of the Java version. We are already using ASM version 5.0.4. However, some code uses the ASM4 op codes which don't seem to be work with Java 8. This needs to be fixed. I'm filing a JIRA. Cheers, Max On Mon, Dec 7, 2015 at

Re: Using Flink with Scala 2.11 and Java 8

2015-12-07 Thread Maximilian Michels
For completeness, could you provide a stack trace of the error message? On Mon, Dec 7, 2015 at 6:56 PM, Maximilian Michels <m...@apache.org> wrote: > Hi Cory, > > Thanks for reporting the issue. Scala should run independently of the > Java version. We are already using ASM vers

Re: Flink Storm

2015-12-03 Thread Maximilian Michels
Hi Naveen, I think you're not using the latest 1.0-SNAPSHOT. Did you build from source? If so, you need to build again because the snapshot API has been updated recently. Best regards, Max On Thu, Dec 3, 2015 at 6:40 PM, Madhire, Naveen wrote: > Hi, > > I am

Re: Iterative queries on Flink

2015-12-02 Thread Maximilian Michels
Hi Flavio, I was working on this some time ago but it didn't make it in yet and priorities shifted a bit. The pull request is here: https://github.com/apache/flink/pull/640 The basic idea is to remove Flink's ResultPartition buffers in memory lazily, i.e. keep them as long as enough memory is

Re: Using Flink with Scala 2.11 and Java 8

2015-12-10 Thread Maximilian Michels
Hi Cory, The issue has been fixed in the master and the latest Maven snapshot. https://issues.apache.org/jira/browse/FLINK-3143 Cheers, Max On Tue, Dec 8, 2015 at 12:35 PM, Maximilian Michels <m...@apache.org> wrote: > Thanks for the stack trace, Cory. Looks like you were on the rig

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Maximilian Michels
I know this has been fixed already but, out of curiosity, could you point me to the Kafka JIRA issue for this bug? From the Flink issue it looks like this is a Zookeeper version mismatch. On Tue, Dec 1, 2015 at 5:16 PM, Robert Metzger wrote: > Hi Gyula, > > no, I didn't ;)

Re: NPE with Flink Streaming from Kafka

2015-12-01 Thread Maximilian Michels
Thanks! I've linked the issue in JIRA. On Tue, Dec 1, 2015 at 5:39 PM, Robert Metzger <rmetz...@apache.org> wrote: > I think its this one https://issues.apache.org/jira/browse/KAFKA-824 > > On Tue, Dec 1, 2015 at 5:37 PM, Maximilian Michels <m...@apache.org> wrote: >&

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Maximilian Michels
Hi Madhukar, Thanks for your question. When you instantiate the FlinkKafkaConsumer, you supply a DeserializationSchema in the constructor. You simply create a class which implements DeserializationSchema and contains the KafkaAvroDecoder with the schema registry. Like so: public class

Re: YARN High Availability

2015-11-19 Thread Maximilian Michels
The docs have been updated. On Thu, Nov 19, 2015 at 12:36 PM, Ufuk Celebi wrote: > I’ve added a note about this to the docs and asked Max to trigger a new build > of them. > > Regarding Aljoscha’s idea: I like it. It is essentially a shortcut for > configuring the root path. >

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Maximilian Michels
32) >> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) >> at >> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) >> at >> java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) >> at >>

Re: How to pass hdp.version to flink on yarn

2015-11-24 Thread Maximilian Michels
Hi Jagat, I think your issue here are not the JVM options. You are missing shell environment variables during the container launch. Adding those to the user's .bashrc or .profile should fix the problem. Best regards, Max On Mon, Nov 23, 2015 at 10:14 PM, Jagat Singh

Re: Running Flink in Cloudfoundry Environment

2015-11-24 Thread Maximilian Michels
Hi Madhukar, I'm not too familiar with Cloudfoundry but seems like you would have to write a service integration. Ideally, a Hadoop YARN service is already available in your Cloudfoundry environment. You could then create an application which deploys a Flink job on YARN. Best regards, Max On

Re: Kafka + confluent schema registry Avro parsing

2015-11-19 Thread Maximilian Michels
run(SourceStreamTask.java:55) > at > org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:218) > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:584) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NullPointerException > at test.flink.MyAvro

Re: HBase reads and back pressure

2016-06-13 Thread Maximilian Michels
Hi Christophe, A fold function has two inputs: The state and a record to update the state with. So you can update the SummaryStatistics (state) with each Put (input). Cheers, Max On Mon, Jun 13, 2016 at 11:04 AM, Christophe Salperwyck wrote: > Thanks for the

Re: NotSerializableException

2016-06-13 Thread Maximilian Michels
Is there an issue or a fix for proper use of the ClojureCleaner in CoGroup.where()? On Fri, Jun 10, 2016 at 8:07 AM, Aljoscha Krettek wrote: > Hi, > yes, I was talking about a Flink bug. I forgot to mention the work-around > that Stephan mentioned. > > On Thu, 9 Jun 2016 at

Re: Application log on Yarn FlinkCluster

2016-06-13 Thread Maximilian Michels
Hi Theofilos, Flink doesn't send the local client output to the Yarn cluster. I think this will only change once we move the entire execution of the Job to the cluster framework. All output of the actual Flink job should be within the JobManager or TaskManager logs. There is something wrong with

Re: How to maintain the state of a variable in a map transformation.

2016-06-13 Thread Maximilian Michels
Hi Ravikumar, In short: No, you can't use closures to maintain a global state. If you want to keep an always global state, you'll have to use parallelism 1 or an external data store to keep that global state. Is it possible to break up your global state into a set of local states which can be

Re: Accessing StateBackend snapshots outside of Flink

2016-06-13 Thread Maximilian Michels
Hi Josh, I'm not a RocksDB expert but the workaround you described should work. Just bear in mind that accessing RocksDB concurrently with a Flink job can result in an inconsistent state. Make sure to perform atomic updates and clear the RocksDB cache for the item. Cheers, Max On Mon, Jun 13,

Re: Accessing StateBackend snapshots outside of Flink

2016-06-13 Thread Maximilian Michels
. > > If you are interested in this we can work together on adding proper support > for TTL (time-to-live) to the Flink state abstraction. > > Cheers, > Aljoscha > > On Mon, 13 Jun 2016 at 12:21 Maximilian Michels <m...@apache.org> wrote: >> >> Hi Josh, >&g

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-15 Thread Maximilian Michels
Hi Arnaud, One issue per thread please. That makes things a lot easier for us :) Something positive first: We are reworking the resuming of existing Flink Yarn applications. It'll be much easier to resume a cluster using simply the Yarn ID or re-discoering the Yarn session using the properties

Re: Application log on Yarn FlinkCluster

2016-06-15 Thread Maximilian Michels
1),3) > 2> ((25,11),4) > 2> ((46,44),2 > .." > > However, the yarn aggregated log contains only the jobmanager output. Is > this expected or could it indicate a problem with my hadoop logging > configuration not picking up taskmanager logs? > > Cheers, > T

Re: Application log on Yarn FlinkCluster

2016-06-15 Thread Maximilian Michels
n running a Spark job for example on the same setup, the yarn aggregated > log contains all the information printed out by the application. > > Cheers, > Theofilos > > > On 6/15/2016 10:14 AM, Maximilian Michels wrote: > > Please use the `yarn logs -applicationId ` to retrieve

Re: Yarn batch not working with standalone yarn job manager once a persistent, HA job manager is launched ?

2016-06-15 Thread Maximilian Michels
n) cluster cluster to the to-be-created cluster. Will be fixed in 1.1 and probably backported to 1.0.4. On Wed, Jun 15, 2016 at 6:05 PM, Maximilian Michels <m...@apache.org> wrote: > Hi Arnaud, > > One issue per thread please. That makes things a lot easier for us :) > &g

Re: HBase reads and back pressure

2016-06-13 Thread Maximilian Michels
Thanks! On Mon, Jun 13, 2016 at 12:34 PM, Christophe Salperwyck wrote: > Hi, > I vote on this issue and I agree this would be nice to have. > > Thx! > Christophe > > 2016-06-13 12:26 GMT+02:00 Aljoscha Krettek : >> >> Hi, >> I'm afraid this

Re: whats is the purpose or impact of -yst( --yarnstreaming ) argument

2016-05-26 Thread Maximilian Michels
The "-yst" or "-yarnstreaming" parameter doesn't have an effect anymore because the streaming mode has been removed. I filed an issue some weeks ago: https://issues.apache.org/jira/browse/FLINK-3890 On Wed, May 25, 2016 at 10:27 PM, Aljoscha Krettek wrote: > Hi Prateek, >

Re: Apache Beam and Flink

2016-05-26 Thread Maximilian Michels
Small addition: The Flink Runner translates into the DataSet or DataStream API depending on the "streaming" flag of the PipelineOptions. The default mode is batch. Ultimately, this flag we be removed and replaced with an automated decision depending on the sources used. On Thu, May 26, 2016 at

Re: Type of TypeVariable 'OT' in 'class org.apache.flink.api.common.io.RichInputFormat' could not be determined.

2016-06-01 Thread Maximilian Michels
Hi Robertson, You need to supply a TypeInformation for the data read from the InputFormat. val dataset = env.createInput(input, new TupleTypeInfo(Tuple1.class, BasicTypeInfo.STRING_TYPE_INFO)) should do the trick. Cheers, Max On Tue, May 31, 2016 at 1:13 PM, Robertson Williams

Re: Context-specific step function in Iteration

2016-06-01 Thread Maximilian Michels
Hi Martin, No worries. Thanks for letting us know! Cheers, Max On Mon, May 30, 2016 at 9:17 AM, Martin Junghanns wrote: > Hi again, > > I had a bug in my logic. It works as expected (which is perfect). > > So maybe for others: > > Problem: > - execute

Re: No key found restore States

2016-06-01 Thread Maximilian Michels
> I'm using a keyby but would like to store the state. > > Thus what's the way to go? > > How do I have to handle the state in option 2). > > Could you give an example? > > Thanks > --Simon > >> On 01 Jun 2016, at 15:55, Maximilian Michels <m...@apache.org>

Re: State key serializer has not been configured in the config.

2016-06-24 Thread Maximilian Michels
+1 for a more helpful error message. @Jacob Would you mind opening a JIRA issue at https://issues.apache.org/jira/browse/FLINK? On Thu, Jun 23, 2016 at 11:31 AM, Chesnay Schepler wrote: > We should adjust the error message to contain the keyed stream thingy. > > > On

Re: Question regarding logging capabilities in flink

2016-06-24 Thread Maximilian Michels
Hi, Flink prints the Yarn application id during deployment of the cluster. You can then query the logs from Yarn using the `yarn logs -applicationId ` command. Please have a look at https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#logs Cheers, Max On Thu,

Re: Flink plus Elastic Search plus Kibana

2016-01-18 Thread Maximilian Michels
Hi Sendoh, At the time the article was created, Elasticsearch 2.0 was only in the making and by the time of publishing it had just been released. That's why we used version 1.7.3. There is currently no 2.X version of the Flink adapter but that will change very soon. There is an issue and a

Re: Redeployements and state

2016-01-18 Thread Maximilian Michels
The documentation layout changed in the master. Then new URL: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html On Thu, Jan 14, 2016 at 2:21 PM, Niels Basjes wrote: > Yes, that is exactly the type of solution I was looking for. > > I'll dive

Re: Security in Flink

2016-01-18 Thread Maximilian Michels
Hi Welly, There is no fixed timeline yet but we plan to make progress in terms of authentication and encryption after the 1.0.0 release. Cheers, Max On Wed, Jan 13, 2016 at 8:34 AM, Welly Tambunan wrote: > Hi Stephan, > > Thanks a lot for the explanation. > > Is there any

Re: Flink java.io.FileNotFoundException Exception with Peel Framework

2016-06-28 Thread Maximilian Michels
Hi Andrea, The number of network buffers should be sufficient. Actually, assuming you have 16 task slots on each of the 25 nodes, it should be enough to have 16^2 * 25 * 4 = 14400 network buffers. See https://ci.apache.org/projects/flink/flink-docs-master/setup/config.html#background So we

Re: Adding 3rd party moving average and other 'indicators'

2016-06-28 Thread Maximilian Michels
Hi Anton, I would suggest you simply put your moving average code in a MapFunction where you can keep track of the current average using a class field. Cheers, Max On Fri, Jun 24, 2016 at 10:05 PM, Anton wrote: > Hello > > I'm currently trying to learn Flink. And so far am

Re: Error while reading binary file

2016-02-08 Thread Maximilian Michels
Hi Saliya, Thanks for your question. Flink's type analyzer couldn't extract the type information. You may implement the ResultTypeQueryable interface in your custom source. That way you can manually specify the correct type. If that doesn't help you, could you please share more of the stack

Re: OutputFormat vs SinkFunction

2016-02-08 Thread Maximilian Michels
Hi Nick, SinkFunction just implements user-defined functions on incoming elements. OutputFormat offers more lifecycle methods. Thus it is a more powerful interface. The OutputFormat originally comes from the batch API, whereas the SinkFunction originates from streaming. Those were more separate

Re: taskmanager.memory.off-heap bc requirement not documented

2016-02-05 Thread Maximilian Michels
Hi Flavio, You're right, it is not documented. However, there is a message that explains that bc is required when you start the cluster. On many clusters it is preinstalled. Please create a JIRA if you like. Cheers, Max On Fri, Feb 5, 2016 at 5:28 PM, Flavio Pompermaier

Re: Flink on YARN: Stuck on "Trying to register at JobManager"

2016-02-06 Thread Maximilian Michels
Hi Pieter, Which version of Flink are you using? It appears you've created a Flink YARN cluster but you can't reach the JobManager afterwards. Cheers, Max On Sat, Feb 6, 2016 at 1:42 PM, Pieter Hameete wrote: > Hi Robert, > > unfortunately there are no signs of what is

Re: Failed to submit 0.10.1

2016-02-08 Thread Maximilian Michels
2.11(//apache.mirrors.spacedump.net/flink/flink-0.10.1/flink-0.10.1-bin-hadoop27-scala_2.11.tgz), > Is there a different version of client lib for scala 2.11? > > > Best, > > Andrew > > On 08 Feb 2016, at 11:30, Maximilian Michels <m...@apache.org> wrote: > > Hi Andr

Re: OutputFormat vs SinkFunction

2016-02-08 Thread Maximilian Michels
ormat. Having > two separate class hierarchies is not helpful, hence the adapter. Much of > this code already exists in the implementation of FileSinkFunction, so the > project already supports it in a limited way. > > On Mon, Feb 8, 2016 at 4:16 AM, Maximilian Michels <m...@apache.org>

Re: Flink 1.0-SNAPSHOT scala 2.11 in S3 has scala 2.10

2016-02-10 Thread Maximilian Michels
Hi David, Just had a check as well. Can't find a 2.10 Jar in the lib folder. Cheers, Max On Wed, Feb 10, 2016 at 6:17 AM, Chiwan Park wrote: > Hi David, > > I just downloaded the "flink-1.0-SNAPSHOT-bin-hadoop2_2.11.tgz” but there is > no jar compiled with Scala 2.10.

Re: Flink cluster and Java 8

2016-02-04 Thread Maximilian Michels
Hi Flavio, To address your points: 1) It runs. That's fine. 2) It doesn't work to run a Java 8 compiled Flink job with Java 7 Flink cluster if you use Java 8 non-backwards-compatible features in your job. 3) I compile Flink daily with Java 8. Also, we have Travis CI tests which uses OpenJDK and

Re: Flink cluster and Java 8

2016-02-04 Thread Maximilian Michels
rget in the maven java compiler. > If you change them to 1.8 flink-core doesn't compile anymore. > > On Thu, Feb 4, 2016 at 4:23 PM, Maximilian Michels <m...@apache.org> wrote: >> >> Hi Flavio, >> >> To address your points: >> >> 1) It runs. That's f

Re: flink-storm FlinkLocalCluster issue

2016-02-29 Thread Maximilian Michels
Hi Zhang, Please have a look here for the 1.0.0-rc2: Binaries: http://people.apache.org/~rmetzger/flink-1.0.0-rc2/ Maven repository: https://repository.apache.org/content/repositories/orgapacheflink-1064 Cheers, Max On Sat, Feb 27, 2016 at 4:00 AM, #ZHANG SHUHAO#

Re: Dataset filter improvement

2016-02-23 Thread Maximilian Michels
: > Hi Max, > why do I need to register them? My job runs without problem also without > that. > The only problem with my POJOs was that I had to implement equals and hash > correctly, Flink didn't enforce me to do it but then results were wrong :( > > > > On Wed, Feb 17, 20

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
Hi Stefano, The Job should stop temporarily but then be resumed by the new JobManager. Have you increased the number of execution retries? AFAIK, it is set to 0 by default. This will not re-run the job, even in HA mode. You can enable it on the StreamExecutionEnvironment. Otherwise, you have

Re: Issues testing Flink HA w/ ZooKeeper

2016-02-15 Thread Maximilian Michels
tests (I'm not expert enough to not read the docs ;)) but it doesn't > mention some specific requirement regarding the execution retries, I'll > check it out, thank! > > On Mon, Feb 15, 2016 at 12:51 PM, Maximilian Michels <m...@apache.org> wrote: >> >> Hi Stefano,

Re: Dataset filter improvement

2016-02-17 Thread Maximilian Michels
Hi Flavio, Stephan was referring to env.registerType(ExtendedClass1.class); env.registerType(ExtendedClass2.class); Cheers, Max On Wed, Feb 10, 2016 at 12:48 PM, Flavio Pompermaier wrote: > What do you mean exactly..? Probably I'm missing something here..remember > that

Re: Regarding Concurrent Modification Exception

2016-02-17 Thread Maximilian Michels
HI Biplob, Could you please supply some sample code? Otherwise it is tough to debug this problem. Cheers, Max On Tue, Feb 16, 2016 at 2:46 PM, Biplob Biswas wrote: > Hi, > > No, we don't start a flink job inside another job, although the job creation > was done in a

Re: Flink Streaming - WriteAsText

2016-03-01 Thread Maximilian Michels
Hey Ankur, If the output after cancelling the job is correct, I assume the changes haven't been flushed to disk before. For further investigation, could you share some code with us? Cheers, Max On Wed, Feb 24, 2016 at 8:54 PM, Ankur Sharma wrote: > Hey, > > I am

Re: Flink job on secure Yarn fails after many hours

2016-03-19 Thread Maximilian Michels
0 AM, Thomas Lamirault > <thomas.lamira...@ericsson.com> wrote: >> >> Hi Max, >> >> I will try these workaround. >> Thanks >> >> Thomas >> >> >> De : Maximilian Michels [m...@apache.org] >

Re: 404 error for Flink examples

2016-03-11 Thread Maximilian Michels
Thanks for noticing, Janardhan. Fixed for the next release. On Fri, Mar 11, 2016 at 6:38 AM, janardhan shetty wrote: > Thanks Balaji. > > This needs to be updated in the Job.java file of quickstart application. > I am using 1.0 version > > On Thu, Mar 10, 2016 at 9:23 PM,

Re: Flink job on secure Yarn fails after many hours

2016-03-15 Thread Maximilian Michels
Maximilian, > > I just downloaded the version from your google drive and used that to run my > test topology that accesses HBase. > I deliberately started it twice to double the chance to run into this > situation. > > I'll keep you posted. > > Niels > > > On

Re: JobManager Dashboard and YARN execution

2016-03-09 Thread Maximilian Michels
Hi Andrea, The dashboard is available in both cases. It only shows the job manager logs. For the task manager, you will have to use the Yarn commands. Cheers, Max On Wed, Mar 9, 2016 at 12:47 PM, Andrea Sella wrote: > Hi, > I am experimenting the integration between

Re: [ANNOUNCE] Flink 1.0.0 has been released

2016-03-09 Thread Maximilian Michels
Great to have this out! @Radu: You may use this LinkedIn post: https://www.linkedin.com/groups/7414853/7414853-6113008761373290497 @Igor: Having a one month window should work fine. The CEP library only keeps track of the current state of the events which enables you to process large amounts of

Re: Attempting to refactor Kafka Example using Flink 0.10.2 with Scala 2.11

2016-03-09 Thread Maximilian Michels
Hi Prez, It appears Spring's Classloader is not set up correctly. Unfortunately, I'm not familiar with the way Springboot works. You added flink-connector-kafka-0.9_2.10 but also mentioned you're using Scala 2.11. That is bound to cause troubles :) Cheers, Max On Thu, Mar 3, 2016 at 8:02 PM,

Re: Error when accessing secure HDFS with standalone Flink

2016-03-19 Thread Maximilian Michels
ry deployed job run as > the user that ran the start-cluster.sh script (same behavior as running a > YARN session)? Or users can kinit on each node and then submit jobs that > will be individually run with their credentials? > > Thanks again. > > On Wed, Mar 16, 2016 at 10:30 A

Re: Error when accessing secure HDFS with standalone Flink

2016-03-16 Thread Maximilian Michels
Hi Stefano, You have probably seen https://ci.apache.org/projects/flink/flink-docs-release-1.0/setup/config.html#kerberos ? Currently, all nodes need to be authenticated with the Kerberos before Flink is started (not just the JobManager). Could it be that the start-cluster.sh script actually is

Re: Read a given list of HDFS folder

2016-03-29 Thread Maximilian Michels
Hi Gwenhael, That is not possible right now. As a workaround, you could have three DataSets that are constructed by reading recursively from each directory and unify these later. Alternatively, moving/linking the directories in a different location would also work. I agree that it would be nice

Re: Output from Beam (on Flink) to Kafka

2016-03-21 Thread Maximilian Michels
Sorry. Wrong mailing list... On Mon, Mar 21, 2016 at 11:47 AM, Maximilian Michels <m...@apache.org> wrote: > FYI: The Runner registration has been fixed. The Flink runner > explicitly registers as of [1]. Also, the SDK tries to look up the > PipelineRunner class in case it has not

Re: Output from Beam (on Flink) to Kafka

2016-03-21 Thread Maximilian Michels
On Sat, Mar 19, 2016 at 6:43 PM, Maximilian Michels <m...@apache.org> wrote: > Great to see such a lively discussion here. > > I think we'll support sinks through the Write interface (like in > batched execution) and also have a dedicated wrapper for the Flink > sinks. Th

Re: Kerberos on YARN: delegation or proxying?

2016-03-06 Thread Maximilian Michels
Hi Stefano, That is currently a limitation of the Kerberos implementation. The Kerberos authentication is performed only once the Flink cluster is brought up. The Yarn session is then tight to a particular user's ticket. Note, that you need at least Hadoop version 2.6.1 or higher to run

Re: Flink on YARN: long-running session vs. one-off jobs

2016-03-07 Thread Maximilian Michels
n! >> >> On Mon, Mar 7, 2016 at 2:38 PM, Maximilian Michels <m...@apache.org> wrote: >>> >>> Hi Stefano, >>> >>> Essentially the Yarn Session is not much different from a per-job Yarn >>> cluster. In either case, a Flink cluster

Re: readTextFile is not working for StreamExecutionEnvironment

2016-03-02 Thread Maximilian Michels
Hi Balaji, You forgot to execute your Flink program using env.execute(); Cheers, Max On Wed, Mar 2, 2016 at 1:36 PM, Balaji Rajagopalan wrote: > def main(args: Array[String]): Unit = { > > > > val env: StreamExecutionEnvironment = >

Re: FYI: Updated Slides Section

2016-04-04 Thread Maximilian Michels
Hi Ufuk, Thanks for updating the page. The "latest documentation" points to the page itself and not the documentation. I've fixed that and added the slides from Big Data Warsaw. Cheers, Max On Mon, Apr 4, 2016 at 12:09 PM, Ufuk Celebi wrote: > @Paris: Just added it. Thanks for

Re: CEP blog post

2016-04-04 Thread Maximilian Michels
Made a few suggestions. Reads well, Till! On Mon, Apr 4, 2016 at 10:10 AM, Ufuk Celebi wrote: > Same here. > > +1 to publish > > On Mon, Apr 4, 2016 at 10:04 AM, Aljoscha Krettek wrote: >> Hi, >> I like it. Very dense and very focused on the example but I

Re: Failed to stream on Yarn cluster

2016-04-28 Thread Maximilian Michels
Hi Patcharee, What do you mean by "nothing happened"? There is no output? Did you check the logs? Cheers, Max On Thu, Apr 28, 2016 at 12:10 PM, patcharee wrote: > Hi, > > I am testing the streaming wiki example - >

Re: Gracefully stop long running streaming job

2016-04-26 Thread Maximilian Michels
I have to warn you that the Storm SpoutWrapper and the TwitterSource are currently the only stoppable sources. However, we could make more stoppable, e.g. the KafkaConsumer. On Tue, Apr 19, 2016 at 12:38 AM, Robert Schmidtke wrote: > I'm on 0.10.2 which seems to be still

Re: YARN terminating TaskNode

2016-04-26 Thread Maximilian Michels
ped on me is the allocation of memory for jni libs. I > do use a native library in my application, which is likely the culprit. I > need to account for its memory footprint when doing my memory calculations. > > Thanks, > Timur > > > On Mon, Apr 25, 2016 at 10:28 AM, Maximil

Re: Need a working example to read/write avro data using FlinkKafkaProducer / Consumer

2016-04-27 Thread Maximilian Michels
Hi Kaniska, I've replied to your mail on the Beam user mailing list. Cheers, Max On Wed, Apr 27, 2016 at 4:56 PM, kaniska Mandal wrote: > I am facing some issues while reading / writing Avro data. > > Attached here the corresponding files and avro-generated pojo. > >

Re: YARN terminating TaskNode

2016-04-25 Thread Maximilian Michels
Hi Timur, Which version of Flink are you using? Could you share the entire logs? Thanks, Max On Mon, Apr 25, 2016 at 12:05 PM, Robert Metzger wrote: > Hi Timur, > > The reason why we only allocate 570mb for the heap is because you are > allocating most of the memory as off

Re: Flink on Yarn - ApplicationMaster command

2016-04-25 Thread Maximilian Michels
ID = jobRes.getJobID(); > } catch (ProgramInvocationException ex) { > Logger.getLogger(YarnRunner.class.getName()).log(Level.SEVERE, null, > } > > > Thanks, > Theofilos > > > On 2016-04-22 16:05, Maximilian Michels wrote: > > Hi Theofilos, > > Assuming

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread Maximilian Michels
Hi Simon, You'll have to write the property file to disk first to load it using the ParameterTool.fromPropertiesFile method. For example: // copy config from Java resource to a file File configOnDisk = new File("/path/to/config.properties");

Re: Import Configuration File in Flink Cluster

2016-05-23 Thread Maximilian Michels
ers > Simon > > > On 23 May 2016, at 16:30, Stefano Baghino <stefano.bagh...@radicalbit.io> > wrote: > > Are you using Maven to package your project? I believe the resources > plugin[1] can suit your needs. > > [1]: > http://maven.apache.org/plugins/maven-resource

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Maximilian Michels
What error do you get when you don't register the Kryo serializer? On Mon, May 23, 2016 at 11:57 AM, Flavio Pompermaier wrote: > With this last settings I was able to terminate the job the second time I > retried to run it, without restarting the cluster.. > If I don't

Re: Combining streams with static data and using REST API as a sink

2016-05-23 Thread Maximilian Michels
Hi Josh, 1) Use a RichFunction which has an `open()` method to load data (e.g. from a database) at runtime before the processing starts. 2) No that's fine. If you want your Rest API Sink to interplay with checkpointing (for fault-tolerance), this is a bit tricky though depending on the

Re: [RichFlattMapfunction] Configuration File

2016-05-23 Thread Maximilian Michels
void open(Configuration parameters) throws Exception { > parameters.getInteger("myInt", -1); > // .. do > > > > Cheers > Simon > > On 23 May 2016, at 14:01, Maximilian Michels <m...@apache.org> wrote: > > Hi Simon, > > A

Re: Logging Exceptions

2016-05-23 Thread Maximilian Michels
Hi David, I'm afraid Flink logs all exceptions. You'll find the exceptions in the /log directory. Cheers, Max On Mon, May 23, 2016 at 6:18 PM, David Kim wrote: > Hello! > > Just wanted to check up on this. :) > > I grepped around for `log.error` and it *seems*

Re: 1.1-snapshot issues

2016-05-19 Thread Maximilian Michels
This should be resolved according to Apache Infra. On Tue, May 17, 2016 at 11:28 PM, Henry Saputra wrote: > Looks like it has been resolved, Could you try it again? > > On Tue, May 17, 2016 at 7:02 AM, Stefano Baghino > wrote: >> >> I

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-23 Thread Maximilian Michels
Hi Flavio, These error messages are quite odd. Looks like an off by one error in the serializer/deserializer. Must be somehow related to the Kryo serialization stack because it doesn't seem to occur with Flink's serialization system. Does the job run fine if you don't register the custom Kryo

Re: [RichFlattMapfunction] Configuration File

2016-05-23 Thread Maximilian Michels
Hi Simon, As Aljoscha said, the best way is to supply the configuration as class fields. Alternatively, if you overload the open(..) method, it should also show up in the Properties/Configuration tab on the Web interface. Cheers, Max On Mon, May 23, 2016 at 11:43 AM, simon peyer

Re: Combining streams with static data and using REST API as a sink

2016-05-24 Thread Maximilian Michels
gt; job to be running constantly. > > Josh > > On Mon, May 23, 2016 at 5:56 PM, Maximilian Michels <m...@apache.org> wrote: >> >> Hi Josh, >> >> 1) Use a RichFunction which has an `open()` method to load data (e.g. from a >> database) at runtime before t

Re: Checking actual config values used by TaskManager

2016-05-02 Thread Maximilian Michels
Hi Ken, When you're running Yarn, the Flink configuration is created once and shared among all nodes (JobManager and TaskManagers). Please have a look at the JobManager tab on the web interface. It shows you the configuration. Cheers, Max On Fri, Apr 29, 2016 at 3:18 PM, Ken Krugler

Re: Flink on Yarn - ApplicationMaster command

2016-04-19 Thread Maximilian Michels
Hi Theofilos, I'm not sure whether I understand correctly what you are trying to do. I'm assuming you don't want to use the command-line client. You can setup the Yarn cluster in your code manually using the FlinkYarnClient class. The deploy() method will give you a FlinkYarnCluster which you

Re: Master (1.1-SNAPSHOT) Can't run on YARN

2016-04-21 Thread Maximilian Michels
Hi Stefano, Thanks for reporting. I wasn't able to reproduce the problem. I ran ./bin/yarn-session.sh -n 1 -s 2 -jm 2048 -tm 2048 on a Yarn cluster and it created a Flink cluster with a JobManager and a TaskManager with two task slots. By the way, if you omit the "-s 2" flag, then the default is

Re: add FLINK_LIB_DIR to classpath on yarn -> add properties file to class path on yarn

2016-07-25 Thread Maximilian Michels
Hi! In the latest master and in the upcoming 1.1, all files in the lib folder will be shipped to the Yarn cluster and added to the class path. In Flink version <= 1.0.x no files will be added to the ship files by default (only the flink-dist*.jar will be shipped). Regardless of the version, if

Re: Killing Yarn Session Leaves Lingering Flink Jobs

2016-07-28 Thread Maximilian Michels
Hi Konstantin, If you come from traditional on-premise installations it may seem counter-intuitive to start a Flink cluster for each job. However, in today's cluster world it is not a problem to request containers on demand and spawn a new Flink cluster for each job. Per job clusters are

Re: hash preservation

2016-07-29 Thread Maximilian Michels
Hi Robert, >Unfortunately, during the Flink map and reduce phases the objects change >their hash codes and become inconsistent with the keys of the original hashmap If objects change their hash code values, then this means they are not equal anymore. If this is not desired then the

Re: how does flink assign windows to task

2016-07-29 Thread Maximilian Michels
Hi Vishnu Viswanath, The keyed elements are spread across the 50 task slots (assuming you have a parallelism of 50) using hash partitioning on the keys. Each task slot runs one or multiple operators (depending on the slot sharing options). One of them is a WindowOperator which will decide when to

Re: Container running beyond physical memory limits when processing DataStream

2016-07-29 Thread Maximilian Michels
Hi Jack, Considering the type of job you're running, you shouldn't run out of memory. Could it be that the events are quite large strings? It could be that the TextOutputFormat doesn't write to disk fast enough and accumulates memory. Actually, it doesn't perform regular flushing which could be

Re: Within interval for CEP - Wall Clock based or Event Timestamp based?

2016-08-12 Thread Maximilian Michels
Hi Sameer, That depends on the time characteristic you have chosen. If you have set it to event time [1] then it will use event time, otherwise the default is to use processing time. When using event time, the element's timestamp is used to assign it to the specified time windows in the

Re: Multiple Partitions (Source Functions) -> Event Time -> Watermarks -> Trigger

2016-08-12 Thread Maximilian Michels
Hi Sameer, If you use Event Time you should make sure to assign Watermarks and Timestamps at the source. As you already observed, Flink may get stuck otherwise because it waits for Watermarks to progress in time. There is no timeout for windows. However, you can implement that logic in your

Re: Unit tests failing, losing stream contents

2016-08-12 Thread Maximilian Michels
Hi David, You're starting two executions at the same time (in different threads). Here's why: Execution No 1 DataStreamUtils.collect(..) starts a Thread which executes your job and collects stream elements. It runs asynchronously. The collect(..) method returns after starting the thread.

Re: ValueState is missing

2016-08-12 Thread Maximilian Michels
You're clearing the "handState" on "GameEndHistory". I'm assuming this event comes in before "CommCardHistory" where you check the state. On Fri, Aug 12, 2016 at 6:59 AM, Dong-iL, Kim wrote: > in my code, is the config of ExecutionEnv alright? > > >> On Aug 11, 2016, at 8:47

Re: Using CustomPartitionerWrapper with KeyedStream

2016-08-12 Thread Maximilian Michels
Hi Philippe, There is no particular reason other than hash partitioning is a sensible default for most users. It seems like this is rarely an issue. When the number of keys is close to the parallelism, having idle partitions is usually not a problem due to low data volume. I see that it could be

<    1   2   3   >