Re: Flink Yarn Cluster specifying Job Manager IP

2016-05-30 Thread arpit srivastava
Will try and come back with feedback if its possible. Thanks, Arpit On Tue, May 31, 2016 at 12:00 PM, Alexis Gendronneau < a.gendronn...@gmail.com> wrote: > Hi Arpit, > > I'm not sure of what you tries to do, but if you want yarn to execute a > type of job on particular nodes, you may find a way

Re: Flink Yarn Cluster specifying Job Manager IP

2016-05-30 Thread Alexis Gendronneau
Hi Arpit, I'm not sure of what you tries to do, but if you want yarn to execute a type of job on particular nodes, you may find a way using node labels : https://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/NodeLabel.html I'm not sure you'll be able to specify that the jobmanager ha

Alter Flink's execution graph at run-time

2016-05-30 Thread Xtra Coder
Hello, I'm curious about ability to alter processing of streams in Flink at run-time. Potential use-case may look like following: 1. I have a stream already running (i.e. data processing is already started) in the Flink's cluster 2. At some point of time I decide that I need to add some more st

Re: Why Scala Option is not a valid key?

2016-05-30 Thread Chiwan Park
I’ve merged a patch [1] for this issue. Now we can use Option as a key. [1]: https://git-wip-us.apache.org/repos/asf?p=flink.git;a=commit;h=c60326f85faaa38bcc359d555cd2d2818ef2e4e7 Regards, Chiwan Park > On Apr 5, 2016, at 2:08 PM, Chiwan Park wrote: > > I just found that Timur created a JIRA

Flink Yarn Cluster specifying Job Manager IP

2016-05-30 Thread arpit srivastava
Hi, I have yarn cluster with 7 nodes. Five nodes -16gb ram One node - 8gb ram One yarn resourcemanager - 8gb I want to specifically use 8gb machine (not resourcemanager) to act as job manager and other five nodes as task managers. Is there a way to do that ?

Re: Executing detached data stream programs

2016-05-30 Thread Robert Metzger
Hi Jordan, the ./bin/flink client's run command has a -d / --detached flag for detached execution. However, this doesn't allow you to programatically control the running job. What you probably have to do is using the RemoteEnvironment submitting the job in a blocking way using a separate thread. T

Re: Unsatisfied Link Error

2016-05-30 Thread arpit srivastava
I am not sure but the reason could be that you are not submitting "fat-jar" containing all the libs when submitting it to job manager. When you run it through intellij it takes care of including the libraries. To create fat jar use maven-shade-plugin. Let me know if it works. Thanks, Arpit On

Re: Different log4j.properies

2016-05-30 Thread Stephan Ewen
I think "log4j.properties" is also used for YARN (it is included in the shipped bundle, together with jars). Otherwise it is correct. Would be good to write that down somewhere... On Mon, May 30, 2016 at 6:30 PM, Konstantin Knauf < konstantin.kn...@tngtech.com> wrote: > Hi everyone, > > basic q

Re: Different log4j.properies

2016-05-30 Thread Chesnay Schepler
i believe the log4j.properties are always used for JM/TM logs, and log4j-yarn-session.properties strictly for pure YARN stuff. On 30.05.2016 18:30, Konstantin Knauf wrote: Hi everyone, basic question, but I just want to check if my current understanding of the log4j property files inside flink'

Different log4j.properies

2016-05-30 Thread Konstantin Knauf
Hi everyone, basic question, but I just want to check if my current understanding of the log4j property files inside flink's conf directory is correct. log4j-cli.properties: Used by the client "flink run/list" for code not executed in the cluster log4j-yarn-session.properties Used by the client w

Re: dataset flatmap with multiple output types

2016-05-30 Thread Robert Metzger
Hi, Alexis is right. The original data set is only read once and the two flatMaps run in parallel on multiple machines in the cluster. Regards, Robert On Fri, May 27, 2016 at 11:10 PM, Alexis Gendronneau < a.gendronn...@gmail.com> wrote: > Hi Jon, > > I'm pretty sure your input will be processed

Re: re: Elegantly sharing state in a streaming environment

2016-05-30 Thread leon_mclare
Dear Philippe, that is exactly what i need. Thank you for the concise explanation. This approach is excellent, as it also permits the values to be easily updated externally. Kind regards Leon 30. May 2016 14:31 by philippe.capar...@orange.fr: > > > Just transform the list in a DataStream. A

Executing detached data stream programs

2016-05-30 Thread Jordan Ganoff
Hi, I plan to leveraging Flink data stream programs within a larger application. I’d like to be able to execute a data stream program in detached mode directly from the StreamExecutionEnvironment similar to how I can execute a program in blocking mode. I was expecting to find StreamExecutionEn

Re: Parallel read text

2016-05-30 Thread Robert Metzger
Hi David, I guess you can verify it by adding custom log statements into the Flink code (therefore, you need to recompile Flink). Maybe a debugger is also sufficient (if you are running Flink locally). We are currently reworking the reading of static files for the streaming environment. Maybe its

Re: Visualize result of Flink job

2016-05-30 Thread Robert Metzger
Hi, sounds doable. I think it should be easy to set up a first proof of concept for this. Let us know if you need any further assistance. Regards, Robert On Mon, May 30, 2016 at 2:29 PM, Palle wrote: > Hi Robert. > > Thank you for the answer. > > I am looking at a rate of max 10.000 elements /

re: Elegantly sharing state in a streaming environment

2016-05-30 Thread Philippe CAPARROY
Just transform the list in a DataStream. A datastream can be finite. One solution, in the context of a Streaming environment is to use Kafka, or any other distributed broker, although Flink ships with a KafkaSource.   1)Create a Kafka Topic dedicated to your list of key/values. Inject your va

Re: Visualize result of Flink job

2016-05-30 Thread Palle
Hi Robert. Thank you for the answer. I am looking at a rate of max 10.000 elements / 10 seconds, so Elastic/Kibana is probably the way to go. I'll find a way to model it. Thanks. /Palle - Original meddelelse - > Fra: Robert Metzger > Til: user@flink.apache.org > Dato: Man, 30. maj 2

Re: Flink Kafka from Time Offset

2016-05-30 Thread Robert Metzger
Hi Simon, Timestamp-awareness has been added to Kafka 0.10 only [1]. I'm not sure if the 0.9 connector code of Flink will support Kafka 0.10 immediately. Another way of handling the issue would be - Implementing a custom offset-metadata system (storing the timestamp for some of the offsets (say ev

Re: withBroadcastSet for a DataStream missing?

2016-05-30 Thread souibil
Hi Stavros, I have same problem as you and i try to solve it , did you find some solution meanwhile? thankyou -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/withBroadcastSet-for-a-DataStream-missing-tp5793p7262.html Sent from the Apache Fl

Flink Kafka from Time Offset

2016-05-30 Thread simon peyer
Hi together I'm using flink 1.0.1 and a FlinkKafkaConsumer09. I'm very interested in getting data from a specific Time offset in Kafka. Is there a property which can do this? Or is there another way of handling such issues? I'm also using checkpointing. If I deploy a new pipeline with the same i

Elegantly sharing state in a streaming environment

2016-05-30 Thread leon_mclare
Hello Flink team, How can i partition and share static state among instances of a streaming operator? I have a huge list of keys and values, which are used to filter tuples in a stream. The list does not change. Currently i am sharing the list with each operator instance via the constructor, a

Re: Write matrix/vector

2016-05-30 Thread Chiwan Park
Hi Lydia, `FlinkMLTools.persist` method is used to save ML models and can be used to save Matrix and Vector object. Note that the method uses TypeSerializerOutputFormat which is binary output format. Regards, Chiwan Park > On May 30, 2016, at 11:31 AM, Lydia Ickler wrote: > > Hi, > > I woul

Re: Visualize result of Flink job

2016-05-30 Thread Robert Metzger
Hi Palle, I think there is currently no way of sending the data from a streaming Flink job into Zeppelin. What rate / amount of data do you expect to send every 10 seconds to the visualization tool? People have used Flink -> ES -> Kibana for this purpose in the past [1], but I think you can not se

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-30 Thread Robert Metzger
Hi Flavio, can you privately share the source code of your Flink job with me? I'm wondering whether the issue might be caused by a version mixup between different versions on the cluster (different JVM versions? or different files in the lib/ folder?), How are you deploying the Flink job? Regard

Re: Flink Upgrades and Job Upgrades

2016-05-30 Thread Ufuk Celebi
2) What about the following: You trigger a savepoint and deploy the 2nd version from the savepoint. You let both job versions run at the same time and switch the consumers of the job to the new version (e.g. Kafka output topic v2). On the Flink side, this should be possible, but moves the problem t

Re: Weird Kryo exception (Unable to find class: java.ttil.HashSet)

2016-05-30 Thread Flavio Pompermaier
I tried to reproduce the error on a subset of the data and actually reducing the available memory and increasing a lot the gc (creating a lot of useless objects in one of the first UDFs) caused this error: Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exc

Re: [DISCUSS] Allowed Lateness in Flink

2016-05-30 Thread Aljoscha Krettek
Thanks for the feedback! :-) I already read the comments on the file. On Mon, 30 May 2016 at 11:10 Gyula Fóra wrote: > Thanks Aljoscha :) I added some comments that might seem relevant from the > users point of view. > > Gyula > > Aljoscha Krettek ezt írta (időpont: 2016. máj. 30., > H, 10:33):

Re: Incremental updates

2016-05-30 Thread Aljoscha Krettek
Hi, the state will be kept indefinitely but we are planning to introduce a setting that would allow setting a time-to-live on state. I think this is exactly what you would need. As an alternative, maybe you could implement your program using windows? In this way you would also bound how long state

Re: Flink Upgrades and Job Upgrades

2016-05-30 Thread Aljoscha Krettek
Hi, hot update of a running cluster is not possible right now. And there is also no one working on this for the near future. We are aware that this would be nice to have, though. For 2), this is possible, but not without stopping the job. Savepoints is the feature that was introduced for that: htt

Unsatisfied Link Error

2016-05-30 Thread Debaditya Roy
I was trying to run a basic program in java by submitting to the job manager in Flink. I have a native library from open CV. When I try to submit the job I get "java.lang.UnsatisfiedLinkError: no opencv_java310 in java.library.path", however when I run it on Intellij by setting up the flink executi

Re: [DISCUSS] Allowed Lateness in Flink

2016-05-30 Thread Gyula Fóra
Thanks Aljoscha :) I added some comments that might seem relevant from the users point of view. Gyula Aljoscha Krettek ezt írta (időpont: 2016. máj. 30., H, 10:33): > Hi, > I created a new doc specifically about the interplay of lateness and > window state garbage collection: > https://docs.goo

Re: Extracting Timestamp in MapFunction

2016-05-30 Thread Aljoscha Krettek
Hi, right now the only way of getting at the timestamps is writing a custom operator and using that with DataStream.transform(). Take a look at StreamMap, which is the operator implementation that executes MapFunctions. I think the links in the doc should point to https://ci.apache.org/projects/fl

Re: [DISCUSS] Allowed Lateness in Flink

2016-05-30 Thread Aljoscha Krettek
Hi, I created a new doc specifically about the interplay of lateness and window state garbage collection: https://docs.google.com/document/d/1vgukdDiUco0KX4f7tlDJgHWaRVIU-KorItWgnBapq_8/edit?usp=sharing There is still some stuff that needs to be figured out, both in the new doc and the existing do

Re: Writing test for Flink streaming jobs

2016-05-30 Thread lofifnc
Hi, Flinkspector is indeed a good choice to circumvent this problem as it specifically has several mechanisms to deal with these synchronization problems. Unfortunately, I'm still looking for a reasonable solution to support checking of scala types. Maybe I will provide a version in which you can

Re: Visualize result of Flink job

2016-05-30 Thread Palle
OK, I found product that seems to be what I am looking for: Apache Zeppelin. I will have a look into that one. If anyone can point me to an example (Git) outputting data from Flink to the Zeppelin Notebook I would be happy. - Original meddelelse - > Fra: Palle > Til: user@flink.apache.

Re: sparse matrix

2016-05-30 Thread Simone Robutti
Hello, right now Flink's local matrices are rather raw and for this kind of usage, you should rely on Breeze. If you need to perform operations, slicing in this case, they are a better option if you don't want to reimplement everything. In case you already developed against Flink's matrices, ther

Re: Context-specific step function in Iteration

2016-05-30 Thread Martin Junghanns
Hi again, I had a bug in my logic. It works as expected (which is perfect). So maybe for others: Problem: - execute superstep-dependent UDFs on datasets which do not have access to the iteration context Solution: - add dummy element to the working set (W) at the beginning of the step functi