Re: Batch Processing Fault Tolerance (DataSet API)

2016-02-22 Thread Till Rohrmann
ted > to [2] ? > > [2] https://issues.apache.org/jira/browse/FLINK-3047 > > Best, > Ovidiu > > On 22 Feb 2016, at 18:13, Till Rohrmann wrote: > > Hi Ovidiu, > > at the moment Flink's batch fault tolerance restarts the whole job in case > of a failure. However, par

Re: Flink packaging makes life hard for SBT fat jar's

2016-02-22 Thread Till Rohrmann
Hi Shikhar, you're right that including a connector dependency would have let us spot the problem earlier. In fact, any project building a fat jar with SBT would have failed without setting the flink dependencies to provided. The problem is that the template is a general purpose template. Thus, i

Re: Best way to process data in many files? (FLINK-BATCH)

2016-02-23 Thread Till Rohrmann
Hi Tim, depending on how you create the DataSource fileList, Flink will schedule the downstream operators differently. If you used the ExecutionEnvironment.fromCollection method, then it will create a DataSource with a CollectionInputFormat. This kind of DataSource will only be executed with a deg

Re: Dataset filter improvement

2016-02-23 Thread Till Rohrmann
Registering a data type is only relevant for the Kryo serializer or if you want to serialize a subclass of a POJO. Registering has the advantage that you assign an id to the class which is written instead of the full class name. The latter is usually much longer than the id. Cheers, Till On Tue,

Re: How to use all available task managers

2016-02-24 Thread Till Rohrmann
Hi Saiph, I think the configuration value should be parallelism.default: 6. That will execute jobs which have not parallelism defined with a DOP of 6. Cheers, Till ​ On Wed, Feb 24, 2016 at 1:43 AM, Saiph Kappa wrote: > Hi, > > I am running a flink stream application on a cluster with 6 slave

Re: Dataset filter improvement

2016-02-24 Thread Till Rohrmann
an Tuple serialization but much >faster than Kryo) >- What if I call env.registerTypeWithKryoSerializer()? Why should I >specify a serializer for Kryo? > > Best, > Flavio > > > On Tue, Feb 23, 2016 at 4:08 PM, Till Rohrmann > wrote: > >> Registering a d

Re: Error when executing job

2016-02-24 Thread Till Rohrmann
I assume that you included the flink-connector-twitter dependency in your job jar, right? Alternatively, you might also put the jar in the lib folder on each of your machines. Cheers, Till ​ On Wed, Feb 24, 2016 at 10:38 AM, ram kumar wrote: > Hi, > > > getting below error when executing twitte

Re: downloading dependency in apache flink

2016-02-24 Thread Till Rohrmann
Hi Pankaj, are you creating a fat jar when you create your use code jar? This can be done using maven's shade plugin or the assembly plugin. We provide a maven archetype to set up a pom file which will make sure that a fat jar is built [1]. [1] https://ci.apache.org/projects/flink/flink-docs-mast

Re: Underlying TaskManager's Actor System

2016-02-24 Thread Till Rohrmann
Hi Andrea, no there isn’t. But you can always start your own ActorSystem in a stateful operator. Cheers, Till ​ On Wed, Feb 24, 2016 at 11:57 AM, Andrea Sella wrote: > Hi, > There is a way to access to the underlying TaskManager's Actor System? > > Thank you in advance, > Andrea >

Re: Best way to process data in many files? (FLINK-BATCH)

2016-02-24 Thread Till Rohrmann
CollectionInputFormat, IteratorInputFormat and the JDBCInputFormat. I hope this helps. Cheers, Till ​ On Tue, Feb 23, 2016 at 3:44 PM, Tim Conrad wrote: > Hi Till (and others). > > Thank you very much for your helpful answer. > > On 23.02.2016 14:20, Till Rohrmann wrote: > > [...] In c

Re: downloading dependency in apache flink

2016-02-24 Thread Till Rohrmann
What is the error message you receive? On Wed, Feb 24, 2016 at 1:49 PM, Pankaj Kumar wrote: > Hi Till , > > I was able to make fat jar, but i am not able to execute this jar through > flink command line. > > On Wed, Feb 24, 2016 at 4:31 PM, Till Rohrmann > wrote: > >

Re: Best way to process data in many files? (FLINK-BATCH)

2016-02-24 Thread Till Rohrmann
If I’m not mistaken, then this shouldn’t solve the scheduling peculiarity of Flink. Flink will still deploy the tasks of the flat map operation to the machine where the source task is running. Only after this machine has no more slots left, other machines will be used as well. I think that you don

Re: Compile issues with Flink 1.0-SNAPSHOT and Scala 2.11

2016-02-24 Thread Till Rohrmann
What is currently the error you observe? It might help to clear org.apache.flink in the ivy cache once in a while. Cheers, Till On Wed, Feb 24, 2016 at 6:09 PM, Cory Monty wrote: > We're still seeing this issue in the latest SNAPSHOT version. Do you have > any suggestions to resolve the error?

Re: Compile issues with Flink 1.0-SNAPSHOT and Scala 2.11

2016-02-24 Thread Till Rohrmann
ffect in > the past. > > On Wed, Feb 24, 2016 at 11:34 AM, Till Rohrmann > wrote: > >> What is currently the error you observe? It might help to clear >> org.apache.flink in the ivy cache once in a while. >> >> Cheers, >> Till >> >> On Wed, F

Re: loss of TaskManager

2016-02-25 Thread Till Rohrmann
Hi Christoph, have you tried setting the blocks parameter of the SVM algorithm? That basically decides how many features are grouped together in one block. The lower the value is the more feature vectors are grouped together and, thus, the size of the block is increased. Increasing this value migh

Re: Counting tuples within a window in Flink Stream

2016-02-26 Thread Till Rohrmann
Hi Saiph, you can do it the following way: input.keyBy(0).timeWindow(Time.seconds(10)).fold(0, new FoldFunction, Integer>() { @Override public Integer fold(Integer integer, Tuple2 o) throws Exception { return integer + 1; } }); Cheers, Till ​ On Thu, Feb 25, 2016 at 7:58 PM,

Re: Kafka issue

2016-02-26 Thread Till Rohrmann
Hi Gyula, could it be that you compiled against a different Scala version than the one you're using for running the job? This usually happens when you compile against 2.10 and let it run with version 2.11. Cheers, Till On Fri, Feb 26, 2016 at 10:09 AM, Gyula Fóra wrote: > Hey, > > For one of o

Re: flink-storm FlinkLocalCluster issue

2016-02-26 Thread Till Rohrmann
Hi Shuhao, the configuration you’re providing is only used for the storm compatibility layer and not Flink itself. When you run your job locally, the LocalFlinkMiniCluster should be started with as many slots as your maximum degree of parallelism is in your topology. You can check this in FlinkLoc

Re: Flink packaging makes life hard for SBT fat jar's

2016-03-02 Thread Till Rohrmann
Hi Shikhar, that is a problem we just found out today. The problem is that the scala.binary.version was not properly replaced in the parent pom so that it resolves to 2.10 [1]. Max already opened a PR to fix this problem. With the next release candidate, this should be fixed. [1] https://issues.a

Re: Kafka issue

2016-03-03 Thread Till Rohrmann
PSHOT yourself or are you relying on the snapshot repository? >>>>> >>>>> We had issues in the past that jars in the snapshot repo were incorrect >>>>> >>>>> On Fri, Feb 26, 2016 at 10:45 AM, Gyula Fóra wrote: >>>>>>

Re: YARN JobManager HA using wrong network interface

2016-03-03 Thread Till Rohrmann
Hi Max, the problem is that before starting the TM, we have to find the network interface which is reachable by the other machines. So what we do is to connect to the current JobManager. If it should happen, as in your case, that the JobManager just died and the new JM address has not been written

Re: Flink CEP Pattern Matching

2016-03-03 Thread Till Rohrmann
Hi Jerry, at the moment it is not yet possible to access previous elements in the filter function of an individual element. Therefore, you have to check for the condition “B is 5 days after A” in the final select statement. Giving this context to the where clause would be indeed a nice addition to

Re: YARN JobManager HA using wrong network interface

2016-03-03 Thread Till Rohrmann
ult back to > the hostname/interface that is configured on the machine. > > > On Thu, Mar 3, 2016 at 10:43 AM, Till Rohrmann > wrote: > >> Hi Max, >> >> the problem is that before starting the TM, we have to find the network >> interface which is

Re: YARN JobManager HA using wrong network interface

2016-03-03 Thread Till Rohrmann
behavior, introduced in HA. > Originally, if the connection attempts failed, it always returned the > InetAddress.getLocalHost() > interface. > I think we should change it back to that, because that interface is by far > the best possible heuristic. > > On Thu, Mar 3, 2016 at 11:

Re: YARN JobManager HA using wrong network interface

2016-03-03 Thread Till Rohrmann
> TNG Technology Consulting GmbH, Betastr. 13a, 85774 Unterföhring > Geschäftsführer: Henrik Klagges, Christoph Stock, Dr. Robert Dahlke > Sitz: Unterföhring * Amtsgericht München * HRB 135082 > > Am 03.03.2016 um 12:29 schrieb Till Rohrmann : > > No I don't think this behaviour has

Re: Flink CEP Pattern Matching

2016-03-03 Thread Till Rohrmann
o get, for example, the last > event of a window, lets say a 5 second window? > > Rgds, > > Vitor Vieira > @notvitor > > 2016-03-03 7:29 GMT-03:00 Till Rohrmann : > >> Hi Jerry, >> >> at the moment it is not yet possible to access previous elements in the &

Re: Flink packaging makes life hard for SBT fat jar's

2016-03-12 Thread Till Rohrmann
Great to hear Shikhar :-) Cheers, Till On Mar 4, 2016 3:51 AM, "shikhar" wrote: > Thanks Till. I can confirm that things are looking good with RC5. > sbt-assembly works well with the flink-kafka connector dependency not > marked > as "provided". > > > > -- > View this message in context: > http:

Re: TimeWindow not getting last elements any longer with flink 1.0 vs 0.10.1

2016-03-14 Thread Till Rohrmann
Hi Arnaud, with version 1.0 the behaviour for window triggering in case of a finite stream was slightly changed. If you use event time, then all unfinished windows are triggered in case that your stream ends. This can be motivated by the fact that the end of a stream is equivalent to no elements w

Re: Integration Alluxio and Flink

2016-03-14 Thread Till Rohrmann
Yes it seems as if you have a netty version conflict. Maybe the alluxio-core-client.jar pulls in an incompatible netty version. Could you check whether this is the case? But maybe you also have another dependencies which pulls in a wrong netty version, since the Alluxio documentation indicates that

Re: Integration Alluxio and Flink

2016-03-14 Thread Till Rohrmann
ied to downgrade the Alluxio's netty version from 4.0.28.Final to > 4.0.27.Final to align Flink and Alluxio dependencies. First of all, Flink > 1.0.0 uses 4.0.27.Final, is it correct? Btw it doesn't work, same error as > above. > > BR, > Andrea > > 2016-03-14 15:30 GMT

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Till Rohrmann
Hi Ravinder, could you tell us what's written in the taskmanager log of the failing taskmanager? There should be some kind of failure why the taskmanager stopped working. Moreover, given that you have 64 GB of main memory, you could easily give 50GB as heap memory to each taskmanager. Cheers, Ti

Re: XGBoost4J: Portable Distributed XGboost in Flink

2016-03-15 Thread Till Rohrmann
Great to hear Tianqi :-) I will try it out. Cheers, Till On Tue, Mar 15, 2016 at 12:41 AM, Tianqi Chen wrote: > Hi Flink Community: > I am sending this email to let you know we just release XGBoost4J > which also runs on Flink. In short, XGBoost is a machine learning package > that is used

Re: MatrixMultiplication

2016-03-15 Thread Till Rohrmann
Hi Lydia, the implementation looks correct. What you could do to speed up the computation is to exploit existing partitionings in order to avoid unnecessary network shuffles. Moreover, you could block your matrices to increase the data granularity at the cost of parallelism. Cheers, Till On Mon,

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Till Rohrmann
rdCount program for streaming and batch >> respectively? >> >> – Ufuk >> >> On Tue, Mar 15, 2016 at 10:22 AM, Till Rohrmann >> wrote: >> > Hi Ravinder, >> > >> > could you tell us what's written in the taskmanager log of the failing >>

Re: OutofMemoryError: Java heap space & Loss of Taskmanager

2016-03-15 Thread Till Rohrmann
askManager >- Determined BLOB server address to be /10.155.208.156:59504. > Starting BLOB cache. > 09:55:38,536 INFO org.apache.flink.runtime.blob.BlobCache > - Created BLOB cache storage directory > /tmp/blobStore-8e88302d-3303-4c80-8613-f0be13911fb2 > 09:56:48,371

Re: Integration Alluxio and Flink

2016-03-15 Thread Till Rohrmann
nt. Do I need to specify the hadoop > configuration via code or core-site.xml is enough? > > Thank you again, > Andrea > > 2016-03-14 17:28 GMT+01:00 Till Rohrmann : > >> Hi Andrea, >> >> the problem won’t be netty-all but netty, I suspect. Flink is using >> ve

Re: JobManager Dashboard and YARN execution

2016-03-15 Thread Till Rohrmann
Hi Andrea, there is also a PR [1] which will allow you to access the TaskManager logs via the UI. [1] https://github.com/apache/flink/pull/1790 Cheers, Till On Wed, Mar 9, 2016 at 1:58 PM, Stephan Ewen wrote: > Hi! > > Yes, the dashboard is available in both cases. It is proxied through the >

Re: realtion between operator and task

2016-03-16 Thread Till Rohrmann
Hi Radu, the mapping which StreamOperator is executed by which StreamTask happens first in the StreamGraph.addOperator method. However, there is a second step in the StreamingJobGraphGenerator.createChain where chainable operators are chained and then executed by a single StreamTask. The construct

Re: operators

2016-03-18 Thread Till Rohrmann
Hi Radu, the API call slotSharingGroup was introduced with version 1.0. In the version 0.10 there was something similar called startNewResourceGroup, but it was somewhat broken. Therefore, I would recommend you upgrading to version 1.0. You can find the description of the new method here [1]. The

Re: How to start with the first Kafka Message

2016-03-19 Thread Till Rohrmann
Hi Dominique, have you tried setting the Kafka property props.put("auto.offset.reset", "smallest");? Cheers, Till ​ On Thu, Mar 17, 2016 at 1:39 PM, Dominique Rondé < dominique.ro...@codecentric.de> wrote: > Hi folks, > > i have a kafka topic with messages from the last 7 days. Now i have a new

Re: degree of Parallelism

2016-03-19 Thread Till Rohrmann
Hi Ahmed, if you don't set the parallelism in your program then depending on how you execute your program different parallelisms will be used. If you execute it in your IDE, then the number of cores will be used as parallelism. If you submit it to a cluster without specifying the parallelism via t

Re: Flink CEP Pattern Matching

2016-03-19 Thread Till Rohrmann
cribed in the standard. >> >> Having this pattern matching CEP functionality in Flink is a killing >> feature IMHO. >> >> Best Regards, >> >> Jerry >> >> >> On Thu, Mar 3, 2016 at 8:47 AM, Till Rohrmann >> wrote: >> >

Re: define no. of nodes via source code

2016-03-21 Thread Till Rohrmann
Hi Subash, you can use ExecutionEnvironment env = ...; env.setParallelism(dop) for that. Cheers, Till ​ On Mon, Mar 21, 2016 at 3:42 PM, subash basnet wrote: > Hello all, > > Using the flink-webclient we have the options to define no. of parallelism > and the same no. i.e. taskmanager.numberO

Re: Scala syntax AllWindowFunction ?

2016-03-22 Thread Till Rohrmann
Hi Bart, there are multiple ways how to specify a window function using the Scala API. The most scalaesque way would probably be to use an anonymous function: val env = StreamExecutionEnvironment.getExecutionEnvironment val input = env.fromElements(1,2,3,4,5,7) val pair = input.map(x => (x, x))

Re: Connecting to a remote jobmanager - problem with Akka remote

2016-03-22 Thread Till Rohrmann
Hi Simone, can your problem be related to this mail thread [1]? [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Flink-1-0-0-JobManager-is-not-running-in-Docker-Container-on-AWS-td10711.html Cheers, Till On Tue, Mar 22, 2016 at 1:22 PM, Simone Robutti < simone.robu...@radicalbi

Re: normalizing DataSet with cross()

2016-03-22 Thread Till Rohrmann
Hi Lydia, I tried to reproduce your problem but I couldn't. Can it be that you have somewhere a non deterministic operation in your program or do you read the data from a source with varying data? Maybe you could send us a compilable and complete program which reproduces your problem. Cheers, Til

Re: normalizing DataSet with cross()

2016-03-22 Thread Till Rohrmann
csvReader.fieldDelimiter(","); > csvReader.includeFields("ttt"); > return csvReader.types(Integer.class, Integer.class, Double.class); > } > > > Am 22.03.2016 um 14:47 schrieb Till Rohrmann : > > Hi Lydia, > > I tried to reproduce your prob

Re: normalizing DataSet with cross()

2016-03-22 Thread Till Rohrmann
Tue, Mar 22, 2016 at 3:31 PM, Lydia Ickler wrote: > Sorry I was not clear: > I meant the initial DataSet is changing. Not the ds. :) > > > > Am 22.03.2016 um 15:28 schrieb Till Rohrmann : > > From the code extract I cannot tell what could be wrong because the code > lo

Re: TopologyBuilder throws java.lang.ExceptionInInitializerError

2016-03-23 Thread Till Rohrmann
Hi, have you tried clearing your m2 repository? It would also be helpful to see your dependencies (pom.xml). Cheers, Till On Tue, Mar 22, 2016 at 10:41 PM, Sharma, Samiksha wrote: > Hi, > > I am converting a storm topology to Flink-storm topology using the > flink-storm dependency. When I run

Re: does reduce function has a bug

2016-03-24 Thread Till Rohrmann
Hi Balaji, the output you see is the correct output since you're computing a continuous reduce of the incoming data. Since you haven't defined a time frame for your reduce computation you either would have to wait for all eternity to output the final result or you output every time you've generate

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-29 Thread Till Rohrmann
Hi Gna, there are no utilities yet to do that but you can do it manually. In the end, a model is simply a Flink DataSet which you can serialize to some file. Upon reading this DataSet you simply have to give it to your algorithm to be used as the model. The following code snippet illustrates this

Re: DataSet.randomSplit()

2016-03-29 Thread Till Rohrmann
Hi, I think Ufuk is completely right. As far as I know, we don't support this function and nobody's currently working on it. If you like, then you could take the lead there. Cheers, Till On Mon, Mar 28, 2016 at 10:50 PM, Ufuk Celebi wrote: > Hey Gna! I think that it's not on the road map at th

Re: threads, parallelism and task managers

2016-03-29 Thread Till Rohrmann
Hi, for what do you use the ExecutionContext? That should actually be something which you shouldn’t be concerned with since it is only used internally by the runtime. Cheers, Till ​ On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli wrote: > Well, in theory yes. Each task has a thread, but only

Re: threads, parallelism and task managers

2016-03-29 Thread Till Rohrmann
In fact, I don't use it. I just had to crawl back the runtime > implementation to get to the point where parallelism was switching from 32 > to 8. > > saluti, > Stefano > > 2016-03-29 12:24 GMT+02:00 Till Rohrmann : > >> Hi, >> >> for what do you

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-03-30 Thread Till Rohrmann
gt; >> >> *case* *class* WeightVector(weights: Vector, intercept: Double) *extends* >> Serializable {} >> >> >> However, I will use the approach to write out the weights as text. >> >> >> On Tue, Mar 29, 2016 at 5:01 AM, Till Rohrmann >> wrote: >> &g

Re: withBroadcastSet for a DataStream missing?

2016-03-31 Thread Till Rohrmann
Hi Stavros, you might be able to solve your problem using a CoFlatMap operation with iterations. You would use one of the inputs for the iteration on which you broadcast the model updates to every operator. On the other input you would receive the data points which you want to cluster. As output y

Re: wait until BulkIteration finishes

2016-03-31 Thread Till Rohrmann
Hi Lydia, all downstream operators which depend on the bulk iteration will wait implicitly until data from the iteration operator is available. Cheers, Till On Thu, Mar 31, 2016 at 9:39 AM, Lydia Ickler wrote: > Hi all, > > is there a way to tell the program that it should wait until the > Bul

Re: Why Scala Option is not a valid key?

2016-03-31 Thread Till Rohrmann
Actually I think that it’s not correct that the OptionType cannot be used as a key type. In fact it is similar to a composite type and should be usable as a key iff it’s element can be used as a key. Then we only have to provide an OptionTypeComparator which will compare the elements if they are se

Re: DataSetUtils zipWithIndex question

2016-03-31 Thread Till Rohrmann
Hi Tarandeep, the number of elements in each partition should stay constant. In fact the elements in each partition should not change. Cheers, Till On Wed, Mar 30, 2016 at 8:14 AM, Tarandeep Singh wrote: > Hi, > > I am looking at implementation of zipWithIndex in DataSetUtils- > > https://gith

Re: DataSetUtils zipWithIndex question

2016-03-31 Thread Till Rohrmann
nager..right? > Then, each partition is divided again at the task manager to maximize the > slot usage..is it correct? > In every case, there will be a case where at least one partition is > smaller than the others...am I wrong? Am I confusing some term..? > > Best, > Flavio &g

Re: wait until BulkIteration finishes

2016-03-31 Thread Till Rohrmann
one sends already by default? > > Best regards, > Lydia > > > Am 31.03.2016 um 12:01 schrieb Till Rohrmann : > > Hi Lydia, > > all downstream operators which depend on the bulk iteration will wait > implicitly until data from the iteration operator is available. > >

CEP blog post

2016-04-01 Thread Till Rohrmann
Hi Flink community, I've written a short blog [1] post about Flink's new CEP library which basically showcases its functionality using a monitoring example. I would like to publish the post on the flink.apache.org blog next week, if nobody objects. Feedback is highly appreciated :-) [1] https://d

Re: CEP blog post

2016-04-04 Thread Till Rohrmann
e put it on Flink Blog > > Cheers > Gen > > > On Fri, Apr 1, 2016 at 9:56 PM, Till Rohrmann > wrote: > >> Hi Flink community, >> >> I've written a short blog [1] post about Flink's new CEP library which >> basically showcases its functionali

Re: CEP API: Question on FollowedBy

2016-04-05 Thread Till Rohrmann
Hi Anwar, yes, once we have published the introductory blog post about the CEP library, we will also publish a more in-depth description of the approach we have implemented. To spoil it a little bit: We have mainly followed the paper “Efficient Pattern Matching over Event Streams” for the implemen

Re: CEP API: Question on FollowedBy

2016-04-05 Thread Till Rohrmann
Yes exactly. This is a feature which we still have to add. On Tue, Apr 5, 2016 at 1:07 PM, Anwar Rizal wrote: > Thanks Till. > > The only way I can change the behavior would be to post filter the result > then. > > Anwar. > > On Tue, Apr 5, 2016 at 11:41 AM, Till Ro

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
Hi Balaji, from the stack trace it looks as if you cannot open a connection redis. Have you checked that you can access redis from all your TaskManager nodes? Cheers, Till On Wed, Apr 6, 2016 at 7:46 AM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > I am trying to use AWS EMR ya

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-06 Thread Till Rohrmann
Hi Norman, which version of Flink are you using? We recently fixed some issues with the CEP library which looked similar to your error message. The problem occurred when using the CEP library with processing time. Switching to event or ingestion time, solve the problem. The fixes to make it also

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
> redisClient.set(k,v,exTime) > } > > > def get(k: String): Option[String] = { > import scala.concurrent.duration._ > val f = redisClient.get[String](k) > Await.result(f, 1.seconds) //FIXME - really bad need to return future > here. > } > > } > > > O

Re: CEP blog post

2016-04-06 Thread Till Rohrmann
only work > with 1.0.1. > > On Mon, Apr 4, 2016 at 3:35 PM, Till Rohrmann > wrote: > > Thanks a lot to all for the valuable feedback. I've incorporated your > > suggestions and will publish the article, once Flink 1.0.1 has been > released > > (we need 1

Re: RemoteTransportException when trying to redis in flink code

2016-04-06 Thread Till Rohrmann
client machine > GlobalConfiguration params will passed on to the task manager nodes, as > well, it was not and values from default was getting pickup, which was > localhost 6379 and there was no redis running in localhost of task manager. > > balaji > > On Wed, Apr 6, 201

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-07 Thread Till Rohrmann
Hi Norman, this error is exactly what I thought I had fixed. I guess there is still another case where a premature pruning can happen in the SharedBuffer. Could you maybe send me the example code with which you could produce the error. The input data would also be very helpful. Then I can debug it

Re: Using native libraries in Flink EMR jobs

2016-04-07 Thread Till Rohrmann
Hi Timur, what you can try doing is to pass the JVM parameter -Djava.library.path= via the env.java.opts to the system. You simply have to add env.java.opts: "-Djava.library.path=" in the flink-conf.yaml or via -Denv.java.opts="-Djava.library.path=", if I’m not mistaken. Cheers Till ​ On Thu, Ap

Re: Using native libraries in Flink EMR jobs

2016-04-07 Thread Till Rohrmann
For passing the dynamic property directly when running things on YARN, you have to use -yDenv.java.opts="..." ​ On Thu, Apr 7, 2016 at 11:42 AM, Till Rohrmann wrote: > Hi Timur, > > what you can try doing is to pass the JVM parameter > -Djava.library.path= via the env.j

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-07 Thread Till Rohrmann
Hi Norman, could you provide me an example input data set which produces the error? E.g. the list of strings you inserted into Kafka/read from Kafka? Cheers, Till On Thu, Apr 7, 2016 at 11:05 AM, norman sp wrote: > Hi Till, > thank you. here's the code: > > public class CepStorzSimulator { > >

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-12 Thread Till Rohrmann
gt; > behrouz.derakhshan@ > > >> wrote: > > > >> Is there a reasons the Predictor or Estimator class don't have read and > >> write methods for saving and retrieving the model? I couldn't find Jira > >> issues for it. Does it make sense to

Re: Flink ML 1.0.0 - Saving and Loading Models to Score a Single Feature Vector

2016-04-12 Thread Till Rohrmann
Sorry, I had a mistake in my example code. I thought the model would be stored as a (Option[DataSet[Factors]], Option[DataSet[Factors]]) but instead it’s stored as Option[(DataSet[Factors], DataSet[Factors])]. So the code should be val als = ALS() als.fit(input) val alsModelOpt = als.factorsOpt

Re: Task Slots and Heterogeneous Tasks

2016-04-15 Thread Till Rohrmann
Hi Maxim, concerning your second part of the question: The managed memory of a TaskManager is first split among the available slots. Each slot portion of the managed memory is again split among all operators which require managed memory when a pipeline is executed. In contrast to that, the heap me

Re: Configuring task slots and parallelism for single node Maven executed

2016-04-18 Thread Till Rohrmann
Hi Prez, 1. the configuration setting taskmanager.numberOfTaskSlots says with how many task slots a TaskManager will be started. As a rough rule of thumb, set this value to the number of cores of the machine the TM is running on. This this link [1] for further information. The conf

Re: withBroadcastSet for a DataStream missing?

2016-04-18 Thread Till Rohrmann
ach operator before the the next window > processing begins? > > Thnx! > > > On Fri, Apr 1, 2016 at 10:51 PM, Stavros Kontopoulos < > st.kontopou...@gmail.com> wrote: > >> Ok thnx Till i will give it a shot! >> >> On Thu, Mar 31, 2016 at 11:25 AM, Till Roh

Re: fan out parallel-able operator sub-task beyond total slots number

2016-04-18 Thread Till Rohrmann
Hi Chen, two subtasks of the same operator can never be executed within the same slot/pipeline. The `slotSharingGroup` allows you to only control which subtasks of different operators can be executed along side in the same slot. It basically allows you to break pipelines into smaller ones. Therefo

Re: providing java system arguments(-D) to specific job

2016-04-18 Thread Till Rohrmann
That is correct. You can provide it also as a property to the CLI: -Denv.java.opts="-Dmy-prop=bla -Dmyprop2=bla2" Cheers, Till On Sun, Apr 17, 2016 at 3:56 PM, Igor Berman wrote: > for the sake of history(at task manager level): > in conf/flink-conf.yaml > env.java.opts: -Dmy-prop=bla -Dmy-prop

Re: Task Slots and Heterogeneous Tasks

2016-04-18 Thread Till Rohrmann
tency external service is going to be put in a separate slot. Is it > possible to indicate to the Flink that subtasks of a particular operation > can be collocated in a slot, as such subtasks are IO bound and require no > shared memory? > > On Fri, Apr 15, 2016 at 5:31 AM, Till Roh

Re: Flink + S3

2016-04-19 Thread Till Rohrmann
Hi Michael-Keith, you can use S3 as the checkpoint directory for the filesystem state backend. This means that whenever a checkpoint is performed the state data will be written to this directory. The same holds true for the zookeeper recovery storage directory. This directory will contain the sub

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
Hi Yifei, if you don't wanna implement your own join operator, then you could also chain two join operations. I created a small example to demonstrate that: https://gist.github.com/tillrohrmann/c074b4eedb9deaf9c8ca2a5e124800f3. However, bare in mind that for this approach you will construct two wi

Re: Turn off logging in Flink

2016-04-19 Thread Till Rohrmann
Hi Sendoh, you have to edit your log4j.properties file to set log4j.rootLogger=OFF in order to turn off the logger. Depending on how you run Flink and where you wanna turn off the logging, you either have to edit the log4j.properties file in the FLINK_HOME/conf directory or the in your project whi

Re: class java.util.UUID is not a valid POJO type

2016-04-19 Thread Till Rohrmann
Hi Leonard, the UUID class cannot be treated as a POJO by Flink, because it is lacking the public getters and setters for mostSigBits and leastSigBits. However, it should be possible to treat it as a generic type. I think the difference is that you cannot use key expressions and key indices to def

Re: adding source not serializable exception in streaming implementation

2016-04-19 Thread Till Rohrmann
I assume that the provided FetchStock code is not complete. As the exception indicates, you somehow store a LocalStreamEnvironment in you source function. The StreamExecutionEnvironments are not serializable and cannot be part of the source function’s closure. Cheers, Till ​ On Tue, Apr 19, 2016

Re: logback.xml and logback-yarn.xml rollingpolicy configuration

2016-04-19 Thread Till Rohrmann
Have you made sure that Flink is using logback [1]? [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/best_practices.html#using-logback-instead-of-log4j Cheers, Till On Tue, Apr 19, 2016 at 2:01 PM, Balaji Rajagopalan < balaji.rajagopa...@olacabs.com> wrote: > The are two files in

Re: Flink CEP AbstractCEPPatternOperator fail after event detection

2016-04-19 Thread Till Rohrmann
Hi Norman, sorry for the late reply. I finally found time and could, thanks to you, reproduce the problem. The problem was that the window borders were treated differently in two parts of the code. Now the left border of a window is inclusive and the right border (late elements) is exclusive. I've

Re: YARN session application attempts

2016-04-19 Thread Till Rohrmann
Hi Stefano, Hadoop supports this feature since version 2.6.0. You can define a time interval for the maximum number of applications attempt. This means that you have to observe this number of application failures in a time interval before failing the application ultimately. Flink will activate thi

Re: Does Flink support joining multiple streams based on event time window now?

2016-04-19 Thread Till Rohrmann
ry them out. I have > another question. > > Since S2 my be days delayed, so there are may be lots of windows and large > amount of data stored in memory waiting for computation. How does Flink > deal with that? > > Thanks, > > Yifei > > On Tue, Apr 19, 2016 at

Re: Trying to detecting changes

2016-04-20 Thread Till Rohrmann
You could use CEP for that. First you would create a pattern of two states which matches everything. In the select function you could then check whether both elements are different. However, this would be a little bit of an overkill for this simple use case. You could for example simply use a flat

Re: Access Flink UI for jobs submitted using Eclipse

2016-04-20 Thread Till Rohrmann
Have you created a RemoteExecutionEnvironment to submit your job from within the IDE to the running cluster? See here [1] for more information. [1] https://ci.apache.org/projects/flink/flink-docs-master/apis/cluster_execution.html Cheers, Till ​ On Wed, Apr 20, 2016 at 3:41 PM, Ritesh Kumar Sing

Re: How to perform this join operation?

2016-04-20 Thread Till Rohrmann
Hi Elias, sorry for the late reply. You're right that with the windowed join you would have to deal with pairs where the timestamp of (x,y) is not necessarily earlier than the timestamp of z. Moreover, by using sliding windows you would receive duplicates as you've described. Using tumbling window

Re: Custom state values in CEP

2016-04-22 Thread Till Rohrmann
Hi Sowmya, I'm afraid at the moment it is not possible to store custom state in the filter or select function. If you need access to the whole sequence of matching events then you have to put this code in the select clause of your pattern stream. Cheers, Till On Fri, Apr 22, 2016 at 7:55 AM, Sow

Re: Flink YARN job manager web port

2016-04-22 Thread Till Rohrmann
Hi Shannon, if you need this feature (assigning range of web server ports) for your use case, then we would have to add it. If you want to do it, then it would help us a lot. I think the documentation is a bit outdated here. The port is either chosen from the range of ports or a ephemeral port is

Re: Programatic way to get version

2016-04-22 Thread Till Rohrmann
But be aware that this method only returns a non null string if the binaries have been built with Maven. Otherwise it will return null. Cheers, Till On Fri, Apr 22, 2016 at 12:12 AM, Trevor Grant wrote: > dug through the codebase, in case any others want to know: > > import org.apache.flink.run

Re: How to fetch kafka Message have [KEY,VALUE] pair

2016-04-22 Thread Till Rohrmann
Depending on how the key value pair is encoded, you could use the TypeInformationKeyValueSerializationSchema where you provide the BasicTypeInfo.STRING_TYPE_INFO and PrimitiveArrayTypeInfo.BYTE_PRIMITIVE_ARRAY_TYPE_INFO as the key and value type information. But this only works if your data was ser

Re: Job hangs

2016-04-26 Thread Till Rohrmann
Could you share the logs with us, Timur? That would be very helpful. Cheers, Till On Apr 26, 2016 3:24 AM, "Timur Fayruzov" wrote: > Hello, > > Now I'm at the stage where my job seem to completely hang. Source code is > attached (it won't compile but I think gives a very good idea of what > happ

Re: Control Trigger behavior based on external datasource

2016-04-26 Thread Till Rohrmann
Hi Hironori, I would go with the second approach, because it is not guaranteed that all events of a given key have been received by the window operator if the data source says that all events for this key have been read. The events might still be in flight. Furthermore, it integrates more nicely w

<    1   2   3   4   5   6   7   8   9   10   >