verbose console

2015-09-02 Thread Michele Bertoni
Hi everybody, I just found that in version 0.9.1 it is possibile to disable that verbose console, can you please explain how to do it both in IDE and local environment? Especially in IDE I am able to set property of log4j for my logger, but everything I try for flink internal one does not work

Travis updates on Github

2015-09-02 Thread Sachin Goel
Hi all Is there some issue with travis integration? The last three pull requests do not have their build status on Github page. The builds are getting triggered though. Regards Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685

Re: Travis updates on Github

2015-09-02 Thread Robert Metzger
Hi Sachin, I also noticed that the GitHub integration is not working properly. I'll ask the Apache Infra team. On Wed, Sep 2, 2015 at 10:20 AM, Sachin Goel wrote: > Hi all > Is there some issue with travis integration? The last three pull requests > do not have their

Bug broadcasting objects (serialization issue)

2015-09-02 Thread Andres R. Masegosa
Hi, I get a bug when trying to broadcast a list of integers created with the primitive "Arrays.asList(...)". For example, if you try to run this "wordcount" example, you can reproduce the bug. public class WordCountExample { public static void main(String[] args) throws Exception {

Re: question on flink-storm-examples

2015-09-02 Thread Matthias J. Sax
Hi, StormFileSpout uses a simple FileReader internally an cannot deal with HDFS. It would be a nice extension to have. I just opened a JIRA for it: https://issues.apache.org/jira/browse/FLINK-2606 Jerry, feel to work in this feature and contribute code to Flink ;) -Matthias On 09/02/2015 07:52

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel
If I'm right, all Tests use either the MultipleProgramTestBase or JavaProgramTestBase​. Those shut down the cluster explicitly anyway. I will make sure if this is the case. Regards Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Sep 2, 2015 at 9:40 PM, Till Rohrmann

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Till Rohrmann
Oh sorry, then I got the wrong context. I somehow thought it was about test cases because I read `MultipleProgramTestBase` etc. Sorry my bad. On Wed, Sep 2, 2015 at 6:00 PM, Sachin Goel wrote: > I was under the impression that the @AfterClass annotation can only be >

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Till Rohrmann
Why is it not possible to shut down the local cluster? Can’t you shut it down in the @AfterClass method? ​ On Wed, Sep 2, 2015 at 4:56 PM, Sachin Goel wrote: > Yes. That will work too. However, then it isn't possible to shut down the > local cluster. [Is it necessary

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel
I was under the impression that the @AfterClass annotation can only be used in test classes. Even so, the idea is that a user program running in the IDE should not be starting up the cluster several times [my primary concern is the addition of the persist operator], and we certainly cannot ask the

NPE thrown when using Storm Kafka Spout with Flink

2015-09-02 Thread Jerry Peng
Hello, When I try to run a storm topology with a Kafka Spout on top of Flink, I get an NPE at: 15:00:32,853 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask - Error closing stream operators after an exception. java.lang.NullPointerException at

Re: How to force the parallelism on small streams?

2015-09-02 Thread Matthias J. Sax
Hi, If I understand you correctly, you want to have 100 mappers. Thus you need to apply the .setParallelism() after .map() > addSource(myFileSource).rebalance().map(myFileMapper).setParallelism(100) The order of commands you used, set the dop for the source to 100 (which might be ignored, if

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Stephan Ewen
You can always shut down a cluster manually (via shutdown()) and if the JVM simply exists, all is well as well. Crucial cleanup is in shutdown hooks. On Wed, Sep 2, 2015 at 6:22 PM, Till Rohrmann wrote: > If I'm not mistaken, then the cluster should be properly

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Till Rohrmann
Maybe we can create a single PlanExecutor for the LocalEnvironment which is used when calling execute. This of course entails that we don’t call stop on the LocalCluster. For cases where the program exits after calling execute, this should be fine because all resources will then be released

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel
Okay. No problem. Any suggestions for the correct context though? :') I don't think something like a {{FlinkProgram}} class is a good idea [User would need to override a {{program}} method and we will make sure the cluster is setup only once and torn down properly only after the user code

RE: How to force the parallelism on small streams?

2015-09-02 Thread LINZ, Arnaud
Hi, You are right, but in fact it does not solve my problem, since I have 100 parallelism everywhere. Each of my 100 sources gives only a few lines (say 14 max), and only the first 14 next nodes will receive data. Same problem by replacing rebalance() with shuffle(). But I found a workaround:

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel
I'm not sure what you mean by "Crucial cleanup is in shutdown hooks". Could you elaborate? -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Sep 2, 2015 at 10:25 PM, Stephan Ewen wrote: > You can always shut down a cluster manually (via shutdown()) and if

Re: nosuchmethoderror

2015-09-02 Thread Ferenc Turi
Ok. As I see only the method name was changed. It was an unnecessary modification which caused the incompatibility. F. On Wed, Sep 2, 2015 at 8:53 PM, Márton Balassi wrote: > Dear Ferenc, > > The Kafka consumer implementations was modified from 0.9.0 to 0.9.1, >

max-fan

2015-09-02 Thread Greg Hogan
When workers spill more than 128 files, I have seen these fully merged into one or more much larger files. Does the following parameter allow more files to be stored without requiring the intermediate merge-sort? I have changed it to 1024 without effect. Also, it appears that the entire set of

Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Welly Tambunan
Hi All, I would like to filter some item from the event stream. I think there are two ways doing this. Using the regular pipeline filter(...).map(...). We can also use flatMap for doing both in the same operator. Any performance improvement if we are using flatMap ? As that will be done in one

Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Gyula Fóra
Hey Welly, If you call filter and map one after the other like you mentioned, these operators will be chained and executed as if they were running in the same operator. The only small performance overhead comes from the fact that the output of the filter will be copied before passing it as input

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
Hi Robert and Jay, Thanks for your answers. The petstore jobs could indeed be used as a roseta code for Flink and Spark. Regarding the memory requirements, those are very good news to me, just 2GB of RAM is certainly a modest amount of memory, you can use even some Single Board Computers for

nosuchmethoderror

2015-09-02 Thread Ferenc Turi
Hi, I tried to use the latest 0.9.1 release but I got: java.lang.NoSuchMethodError: org.apache.flink.util.NetUtils.ensureCorrectHostnamePort(Ljava/lang/String;)V at com.nventdata.kafkaflink.sink.FlinkKafkaTopicWriterSink.(FlinkKafkaTopicWriterSink.java:69) at

Re: nosuchmethoderror

2015-09-02 Thread Márton Balassi
Dear Ferenc, The Kafka consumer implementations was modified from 0.9.0 to 0.9.1, please use the new code. [1] I suspect that your com.nventdata.kafkaflink.sink.FlinkKafkaTopicWriterSink depends on the way the Flink code used to look in 0.9.0, if you take a closer look Robert changed the

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá
Answering to myself, I have found some nice training material at http://dataartisans.github.io/flink-training. There are even videos at youtube for some of the slides - http://dataartisans.github.io/flink-training/overview/intro.html https://www.youtube.com/watch?v=XgC6c4Wiqvs -

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Maximilian Michels
Hi Andreas, Thank you for reporting the problem and including the code to reproduce the problem. I think there is a problem with the class serialization or deserialization. Arrays.asList uses a private ArrayList class (java.util.Arrays$ArrayList) which is not the one you would normally use

Re: Bigpetstore - Flink integration

2015-09-02 Thread Robert Metzger
Okay, I see. As I said before, I was not able to reproduce the serialization issue you've reported. Can you maybe post the exception you are seeing? On Wed, Sep 2, 2015 at 3:32 PM, jay vyas wrote: > Hey, thanks! > > Those are just seeds, the files aren't large. > >

Re: Bigpetstore - Flink integration

2015-09-02 Thread jay vyas
Hey, thanks! Those are just seeds, the files aren't large. The scale out data is the transactions. The seed data needs to be the same, shipped to ALL nodes, and then the nodes generate transactions. On Wed, Sep 2, 2015 at 9:21 AM, Robert Metzger wrote: > I'm starting a

Re: Bigpetstore - Flink integration

2015-09-02 Thread Stephan Ewen
If a lot of the data is generated locally, this may face the same issue as Greg did with oversized payloads (dropped by Akka). On Wed, Sep 2, 2015 at 3:21 PM, Robert Metzger wrote: > I'm starting a new discussion thread for the bigpetstore-flink integration > ... > > > I

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Stephan Ewen
Yes, even serialize in the constructor. Then the failure (if serialization does not work) comes immediately. On Wed, Sep 2, 2015 at 4:02 PM, Maximilian Michels wrote: > Nice suggestion. So you want to serialize and deserialize the InputFormats > on the Client to check whether

Re: Bigpetstore - Flink integration

2015-09-02 Thread jay vyas
hmmm interesting looks to be working magically now... :) I must have wrote some code late at night that magically fixed it and forgot. The original errors I was getting were kayo related. The objects aren't being serialized on write to anything useful, but thats I'm sure an easy fix. Onward

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Maximilian Michels
Nice suggestion. So you want to serialize and deserialize the InputFormats on the Client to check whether they can be transferred correctly? Merely serializing is not enough because the above Exception occurs during deserialization. On Wed, Sep 2, 2015 at 2:29 PM, Stephan Ewen

Re: verbose console

2015-09-02 Thread Maximilian Michels
Hi Michele, Please supply a log4j.properties file path as a Java VM property like so: -Dlog4j.configuration=/path/to/log4j.properties Your IDE should have an option to adjust VM arguments. Cheers, Max On Wed, Sep 2, 2015 at 9:10 AM, Michele Bertoni wrote: > Hi

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Maximilian Michels
Here's the JIRA issue: https://issues.apache.org/jira/browse/FLINK-2608 On Wed, Sep 2, 2015 at 12:49 PM, Maximilian Michels wrote: > Hi Andreas, > > Thank you for reporting the problem and including the code to reproduce > the problem. I think there is a problem with the class

Re: Hardware requirements and learning resources

2015-09-02 Thread Robert Metzger
Hi Juan, I think the recommendations in the Spark guide are quite good, and are similar to what I would recommend for Flink as well. Depending on the workloads you are interested to run, you can certainly use Flink with less than 8 GB per machine. I think you can start Flink TaskManagers with 500

Re: Hardware requirements and learning resources

2015-09-02 Thread Jay Vyas
Just running the main class is sufficient > On Sep 2, 2015, at 8:59 AM, Robert Metzger wrote: > > Hey jay, > > How can I reproduce the error? > >> On Wed, Sep 2, 2015 at 2:56 PM, jay vyas wrote: >> We're also working on a bigpetstore

Re: Hardware requirements and learning resources

2015-09-02 Thread Robert Metzger
Hey jay, How can I reproduce the error? On Wed, Sep 2, 2015 at 2:56 PM, jay vyas wrote: > We're also working on a bigpetstore implementation of flink which will > help onboard spark/mapreduce folks. > > I have prototypical code here that runs a simple job in

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Stephan Ewen
We should try to improve the exception here. More people will run into this issue and the exception should help them understand it well. How about we do eager serialization into a set of byte arrays? Then the serializability issue comes immediately when the program is constructed, rather than

Re: Hardware requirements and learning resources

2015-09-02 Thread jay vyas
We're also working on a bigpetstore implementation of flink which will help onboard spark/mapreduce folks. I have prototypical code here that runs a simple job in memory, contributions welcome, right now there is a serialization error https://github.com/bigpetstore/bigpetstore-flink . On Wed,

Re: Hardware requirements and learning resources

2015-09-02 Thread Robert Metzger
@Jay: I've looked into your code, but I was not able to reproduce the issue. I'll start a new discussion thread on the user@flink list for the Flink-BigPetStore discussion. I don't want to take over Juan's hardware-requirements discussion ;) On Wed, Sep 2, 2015 at 3:01 PM, Jay Vyas

Re: verbose console

2015-09-02 Thread Michele Bertoni
Totally, it is the first time i am using them and I thought they were the same I will try it asap thanks Il giorno 02/set/2015, alle ore 16:41, Till Rohrmann > ha scritto: Can it be that you confuse the logback configuration file with the

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel
Yes. That will work too. However, then it isn't possible to shut down the local cluster. [Is it necessary to do so or does it shut down automatically when the program exists? I'm not entirely sure.] -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Sep 2, 2015 at 7:59 PM,

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Stephan Ewen
I see. Manual serialization implies also manual deserialization (on the workers only), which would give a better exception. BTW: There is an opportunity to fix two problems with one patch: The framesize overflow for the input format, and the serialization. On Wed, Sep 2, 2015 at 4:16 PM,

Re: verbose console

2015-09-02 Thread Michele Bertoni
thanks for answering, yes I did it, in fact it is working for my logger, but I don’t know how to limit flink logger using the log4j property if i use this line as suggested by doc, i get a compilation error of the log4j property document: level should not be in that position instead doing

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Stephan Ewen
Have a look at some other tests, like the checkpointing tests. They start one cluster manually and keep it running. They connect against it using the remote environment ("localhost", miniCluster.getJobManagerRpcPort()). That works nicely... On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel

Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel
Hi all While using LocalEnvironment, in case the program triggers execution several times, the {{LocalFlinkMiniCluster}} is started as many times. This can consume a lot of time in setting up and tearing down the cluster. Further, this hinders with a new functionality I'm working on based on

Re: Bigpetstore - Flink integration

2015-09-02 Thread jay vyas
hmmm interesting looks to be working magically now... :) I must have wrote some code late at night that magically fixed it and forgot. The original errors I was getting were kryo related. The objects aren't being serialized on write to anything useful, but thats I'm sure an easy fix. Onward