date:20150902

Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Welly Tambunan

Hi Gyula, Thanks for your response. Seems i will use filter and map for now as that one is really make the intention clear, and not a big performance hit. Thanks again. Cheers On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra wrote: > Hey Welly, > > If you call filter and map one after the other li

Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Gyula Fóra

Hey Welly, If you call filter and map one after the other like you mentioned, these operators will be chained and executed as if they were running in the same operator. The only small performance overhead comes from the fact that the output of the filter will be copied before passing it as input t

Efficiency for Filter then Transform ( filter().map() vs flatMap() )

2015-09-02 Thread Welly Tambunan

Hi All, I would like to filter some item from the event stream. I think there are two ways doing this. Using the regular pipeline filter(...).map(...). We can also use flatMap for doing both in the same operator. Any performance improvement if we are using flatMap ? As that will be done in one o

max-fan

2015-09-02 Thread Greg Hogan

When workers spill more than 128 files, I have seen these fully merged into one or more much larger files. Does the following parameter allow more files to be stored without requiring the intermediate merge-sort? I have changed it to 1024 without effect. Also, it appears that the entire set of smal

Re: nosuchmethoderror

2015-09-02 Thread Ferenc Turi

Ok. As I see only the method name was changed. It was an unnecessary modification which caused the incompatibility. F. On Wed, Sep 2, 2015 at 8:53 PM, Márton Balassi wrote: > Dear Ferenc, > > The Kafka consumer implementations was modified from 0.9.0 to 0.9.1, > please use the new code. [1] > >

Re: nosuchmethoderror

2015-09-02 Thread Márton Balassi

Dear Ferenc, The Kafka consumer implementations was modified from 0.9.0 to 0.9.1, please use the new code. [1] I suspect that your com.nventdata.kafkaflink.sink.FlinkKafkaTopicWriterSink depends on the way the Flink code used to look in 0.9.0, if you take a closer look Robert changed the function

nosuchmethoderror

2015-09-02 Thread Ferenc Turi

Hi, I tried to use the latest 0.9.1 release but I got: java.lang.NoSuchMethodError: org.apache.flink.util.NetUtils.ensureCorrectHostnamePort(Ljava/lang/String;)V at com.nventdata.kafkaflink.sink.FlinkKafkaTopicWriterSink.(FlinkKafkaTopicWriterSink.java:69) at com.nventdata.kafkaflink.sink.FlinkKa

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá

Hi Robert and Jay, Thanks for your answers. The petstore jobs could indeed be used as a roseta code for Flink and Spark. Regarding the memory requirements, those are very good news to me, just 2GB of RAM is certainly a modest amount of memory, you can use even some Single Board Computers for that

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá

Hi Kostas, Thanks a lot for your answer. It's nice to know there are more training videos on their way, they will be on my watch list. I guess you'll be using the data Artisans channel for the new videos too. Greetings, Juan 2015-09-02 14:30 GMT+02:00 Kostas Tzoumas : > Hi Juan, > > Flink is

Re: How to force the parallelism on small streams?

2015-09-02 Thread Matthias J. Sax

Thanks for clarifying. shuffle() is similar to rebalance() -- however, it redistributes randomly and not in round robin fashion. However, the problem you describe sounds like a bug to me. I included dev list. Maybe anyone else can step in so we can identify it there is a bug or not. -Matthias O

NPE thrown when using Storm Kafka Spout with Flink

2015-09-02 Thread Jerry Peng

Hello, When I try to run a storm topology with a Kafka Spout on top of Flink, I get an NPE at: 15:00:32,853 ERROR org.apache.flink.streaming.runtime.tasks.StreamTask - Error closing stream operators after an exception. java.lang.NullPointerException at storm.kafka.KafkaSpout.close(KafkaSp

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

I'm not sure what you mean by "Crucial cleanup is in shutdown hooks". Could you elaborate? -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Sep 2, 2015 at 10:25 PM, Stephan Ewen wrote: > You can always shut down a cluster manually (via shutdown()) and if the > JVM simply exi

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Stephan Ewen

You can always shut down a cluster manually (via shutdown()) and if the JVM simply exists, all is well as well. Crucial cleanup is in shutdown hooks. On Wed, Sep 2, 2015 at 6:22 PM, Till Rohrmann wrote: > If I'm not mistaken, then the cluster should be properly terminated when > it gets garbage

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Till Rohrmann

If I'm not mistaken, then the cluster should be properly terminated when it gets garbage collected. Thus, also when the main method exits. On Wed, Sep 2, 2015 at 6:14 PM, Sachin Goel wrote: > If I'm right, all Tests use either the MultipleProgramTestBase or > JavaProgramTestBase. Those shut dow

RE: How to force the parallelism on small streams?

2015-09-02 Thread LINZ, Arnaud

Hi, You are right, but in fact it does not solve my problem, since I have 100 parallelism everywhere. Each of my 100 sources gives only a few lines (say 14 max), and only the first 14 next nodes will receive data. Same problem by replacing rebalance() with shuffle(). But I found a workaround: s

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

If I'm right, all Tests use either the MultipleProgramTestBase or JavaProgramTestBase. Those shut down the cluster explicitly anyway. I will make sure if this is the case. Regards Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Sep 2, 2015 at 9:40 PM, Till Rohrmann

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

Okay. No problem. Any suggestions for the correct context though? :') I don't think something like a {{FlinkProgram}} class is a good idea [User would need to override a {{program}} method and we will make sure the cluster is setup only once and torn down properly only after the user code finishes

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Till Rohrmann

Maybe we can create a single PlanExecutor for the LocalEnvironment which is used when calling execute. This of course entails that we don’t call stop on the LocalCluster. For cases where the program exits after calling execute, this should be fine because all resources will then be released anyway.

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Till Rohrmann

Oh sorry, then I got the wrong context. I somehow thought it was about test cases because I read `MultipleProgramTestBase` etc. Sorry my bad. On Wed, Sep 2, 2015 at 6:00 PM, Sachin Goel wrote: > I was under the impression that the @AfterClass annotation can only be > used in test classes. > Even

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

I was under the impression that the @AfterClass annotation can only be used in test classes. Even so, the idea is that a user program running in the IDE should not be starting up the cluster several times [my primary concern is the addition of the persist operator], and we certainly cannot ask the

Re: How to force the parallelism on small streams?

2015-09-02 Thread Matthias J. Sax

Hi, If I understand you correctly, you want to have 100 mappers. Thus you need to apply the .setParallelism() after .map() > addSource(myFileSource).rebalance().map(myFileMapper).setParallelism(100) The order of commands you used, set the dop for the source to 100 (which might be ignored, if the

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Till Rohrmann

Why is it not possible to shut down the local cluster? Can’t you shut it down in the @AfterClass method? On Wed, Sep 2, 2015 at 4:56 PM, Sachin Goel wrote: > Yes. That will work too. However, then it isn't possible to shut down the > local cluster. [Is it necessary to do so or does it shut dow

How to force the parallelism on small streams?

2015-09-02 Thread LINZ, Arnaud

Hi, I have a source that provides few items since it gives file names to the mappers. The mapper opens the file and process records. As the files are huge, one input line (a filename) gives a consequent work to the next stage. My topology looks like : addSource(myFileSource).rebalance().setParal

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

Yes. That will work too. However, then it isn't possible to shut down the local cluster. [Is it necessary to do so or does it shut down automatically when the program exists? I'm not entirely sure.] -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685 On Wed, Sep 2, 2015 at 7:59 PM, Steph

Re: verbose console

2015-09-02 Thread Michele Bertoni

Totally, it is the first time i am using them and I thought they were the same I will try it asap thanks Il giorno 02/set/2015, alle ore 16:41, Till Rohrmann mailto:trohrm...@apache.org>> ha scritto: Can it be that you confuse the logback configuration file with the log4j configuration file

Re: verbose console

2015-09-02 Thread Till Rohrmann

Can it be that you confuse the logback configuration file with the log4j configuration file? The log4j.properties file should look like log4j.rootLogger=INFO, console # Log all infos in the given file log4j.appender.console=org.apache.log4j.ConsoleAppender log4j.appender.console.layout=org.apache

Re: Multiple restarts of Local Cluster

2015-09-02 Thread Stephan Ewen

Have a look at some other tests, like the checkpointing tests. They start one cluster manually and keep it running. They connect against it using the remote environment ("localhost", miniCluster.getJobManagerRpcPort()). That works nicely... On Wed, Sep 2, 2015 at 4:23 PM, Sachin Goel wrote: > H

Re: verbose console

2015-09-02 Thread Michele Bertoni

thanks for answering, yes I did it, in fact it is working for my logger, but I don’t know how to limit flink logger using the log4j property if i use this line as suggested by doc, i get a compilation error of the log4j property document: level should not be in that position instead doing som

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Stephan Ewen

I see. Manual serialization implies also manual deserialization (on the workers only), which would give a better exception. BTW: There is an opportunity to fix two problems with one patch: The framesize overflow for the input format, and the serialization. On Wed, Sep 2, 2015 at 4:16 PM, Maximil

Re: Bigpetstore - Flink integration

2015-09-02 Thread jay vyas

hmmm interesting looks to be working magically now... :) I must have wrote some code late at night that magically fixed it and forgot. The original errors I was getting were kryo related. The objects aren't being serialized on write to anything useful, but thats I'm sure an easy fix. Onward and

Multiple restarts of Local Cluster

2015-09-02 Thread Sachin Goel

Hi all While using LocalEnvironment, in case the program triggers execution several times, the {{LocalFlinkMiniCluster}} is started as many times. This can consume a lot of time in setting up and tearing down the cluster. Further, this hinders with a new functionality I'm working on based on persis

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Maximilian Michels

Ok but that would not prevent the above error, right? Serializing is not the issue here. Nevertheless, it would catch all errors during initial serialization. Deserializing has its own hazards due to possible Classloader issues. On Wed, Sep 2, 2015 at 4:05 PM, Stephan Ewen wrote: > Yes, even ser

Re: verbose console

2015-09-02 Thread Maximilian Michels

Hi Michele, Please supply a log4j.properties file path as a Java VM property like so: -Dlog4j.configuration=/path/to/log4j.properties Your IDE should have an option to adjust VM arguments. Cheers, Max On Wed, Sep 2, 2015 at 9:10 AM, Michele Bertoni wrote: > Hi everybody, I just found that in v

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Stephan Ewen

Yes, even serialize in the constructor. Then the failure (if serialization does not work) comes immediately. On Wed, Sep 2, 2015 at 4:02 PM, Maximilian Michels wrote: > Nice suggestion. So you want to serialize and deserialize the InputFormats > on the Client to check whether they can be transfe

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Maximilian Michels

Nice suggestion. So you want to serialize and deserialize the InputFormats on the Client to check whether they can be transferred correctly? Merely serializing is not enough because the above Exception occurs during deserialization. On Wed, Sep 2, 2015 at 2:29 PM, Stephan Ewen wrote: > We should

Re: Bigpetstore - Flink integration

2015-09-02 Thread Stephan Ewen

If a lot of the data is generated locally, this may face the same issue as Greg did with oversized payloads (dropped by Akka). On Wed, Sep 2, 2015 at 3:21 PM, Robert Metzger wrote: > I'm starting a new discussion thread for the bigpetstore-flink integration > ... > > > I took a closer look into

Re: Bigpetstore - Flink integration

2015-09-02 Thread jay vyas

hmmm interesting looks to be working magically now... :) I must have wrote some code late at night that magically fixed it and forgot. The original errors I was getting were kayo related. The objects aren't being serialized on write to anything useful, but thats I'm sure an easy fix. Onward and

Re: Bigpetstore - Flink integration

2015-09-02 Thread jay vyas

Hey, thanks! Those are just seeds, the files aren't large. The scale out data is the transactions. The seed data needs to be the same, shipped to ALL nodes, and then the nodes generate transactions. On Wed, Sep 2, 2015 at 9:21 AM, Robert Metzger wrote: > I'm starting a new discussion thread

Re: Bigpetstore - Flink integration

2015-09-02 Thread Robert Metzger

Okay, I see. As I said before, I was not able to reproduce the serialization issue you've reported. Can you maybe post the exception you are seeing? On Wed, Sep 2, 2015 at 3:32 PM, jay vyas wrote: > Hey, thanks! > > Those are just seeds, the files aren't large. > > The scale out data is the tra

Bigpetstore - Flink integration

2015-09-02 Thread Robert Metzger

I'm starting a new discussion thread for the bigpetstore-flink integration ... I took a closer look into the code you've posted. It seems to me that you are generating a lot of data locally on the client, before you actually submit a job to Flink. (Both "customers" and "stores" are generated loca

Re: Hardware requirements and learning resources

2015-09-02 Thread Robert Metzger

@Jay: I've looked into your code, but I was not able to reproduce the issue. I'll start a new discussion thread on the user@flink list for the Flink-BigPetStore discussion. I don't want to take over Juan's hardware-requirements discussion ;) On Wed, Sep 2, 2015 at 3:01 PM, Jay Vyas wrote: > Jus

Re: Hardware requirements and learning resources

2015-09-02 Thread Jay Vyas

Just running the main class is sufficient > On Sep 2, 2015, at 8:59 AM, Robert Metzger wrote: > > Hey jay, > > How can I reproduce the error? > >> On Wed, Sep 2, 2015 at 2:56 PM, jay vyas wrote: >> We're also working on a bigpetstore implementation of flink which will help >> onboard spark/m

Re: Hardware requirements and learning resources

2015-09-02 Thread Robert Metzger

Hey jay, How can I reproduce the error? On Wed, Sep 2, 2015 at 2:56 PM, jay vyas wrote: > We're also working on a bigpetstore implementation of flink which will > help onboard spark/mapreduce folks. > > I have prototypical code here that runs a simple job in memory, > contributions welcome, > >

Re: Hardware requirements and learning resources

2015-09-02 Thread jay vyas

We're also working on a bigpetstore implementation of flink which will help onboard spark/mapreduce folks. I have prototypical code here that runs a simple job in memory, contributions welcome, right now there is a serialization error https://github.com/bigpetstore/bigpetstore-flink . On Wed, Se

Re: Hardware requirements and learning resources

2015-09-02 Thread Stephan Ewen

I have actually run Flink nodes with 50 MB of memory and processed multiple gigabytes, but that is truely a toy setup for experimentation. As Robert said, a Mini-Cluster with two local workers (each around 300-400 MB memory) plus a master node (200-300 MB) gives you 1 GB of total needed memory and

Re: Hardware requirements and learning resources

2015-09-02 Thread Robert Metzger

Hi Juan, I think the recommendations in the Spark guide are quite good, and are similar to what I would recommend for Flink as well. Depending on the workloads you are interested to run, you can certainly use Flink with less than 8 GB per machine. I think you can start Flink TaskManagers with 500

Re: Hardware requirements and learning resources

2015-09-02 Thread Kostas Tzoumas

Hi Juan, Flink is quite nimble with hardware requirements; people have run it in old-ish laptops and also the largest instances available in cloud providers. I will let others chime in with more details. I am not aware of something along the lines of a cheatsheet that you mention. If you actually

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Stephan Ewen

We should try to improve the exception here. More people will run into this issue and the exception should help them understand it well. How about we do eager serialization into a set of byte arrays? Then the serializability issue comes immediately when the program is constructed, rather than late

Re: Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá

Answering to myself, I have found some nice training material at http://dataartisans.github.io/flink-training. There are even videos at youtube for some of the slides - http://dataartisans.github.io/flink-training/overview/intro.html https://www.youtube.com/watch?v=XgC6c4Wiqvs - http://da

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Maximilian Michels

Here's the JIRA issue: https://issues.apache.org/jira/browse/FLINK-2608 On Wed, Sep 2, 2015 at 12:49 PM, Maximilian Michels wrote: > Hi Andreas, > > Thank you for reporting the problem and including the code to reproduce > the problem. I think there is a problem with the class serialization or >

Hardware requirements and learning resources

2015-09-02 Thread Juan Rodríguez Hortalá

Hi list, I'm new to Flink, and I find this project very interesting. I have experience with Apache Spark, and for I've seen so far I find that Flink provides an API at a similar abstraction level but based on single record processing instead of batch processing. I've read in Quora that Flink exten

Re: Bug broadcasting objects (serialization issue)

2015-09-02 Thread Maximilian Michels

Hi Andreas, Thank you for reporting the problem and including the code to reproduce the problem. I think there is a problem with the class serialization or deserialization. Arrays.asList uses a private ArrayList class (java.util.Arrays$ArrayList) which is not the one you would normally use (java.u

Re: Travis updates on Github

2015-09-02 Thread Robert Metzger

Hi Sachin, I also noticed that the GitHub integration is not working properly. I'll ask the Apache Infra team. On Wed, Sep 2, 2015 at 10:20 AM, Sachin Goel wrote: > Hi all > Is there some issue with travis integration? The last three pull requests > do not have their build status on Github page

Bug broadcasting objects (serialization issue)

2015-09-02 Thread Andres R. Masegosa

Hi, I get a bug when trying to broadcast a list of integers created with the primitive "Arrays.asList(...)". For example, if you try to run this "wordcount" example, you can reproduce the bug. public class WordCountExample { public static void main(String[] args) throws Exception {

Re: question on flink-storm-examples

2015-09-02 Thread Matthias J. Sax

Hi, StormFileSpout uses a simple FileReader internally an cannot deal with HDFS. It would be a nice extension to have. I just opened a JIRA for it: https://issues.apache.org/jira/browse/FLINK-2606 Jerry, feel to work in this feature and contribute code to Flink ;) -Matthias On 09/02/2015 07:52 A

Travis updates on Github

2015-09-02 Thread Sachin Goel

Hi all Is there some issue with travis integration? The last three pull requests do not have their build status on Github page. The builds are getting triggered though. Regards Sachin -- Sachin Goel Computer Science, IIT Delhi m. +91-9871457685

verbose console

2015-09-02 Thread Michele Bertoni

Hi everybody, I just found that in version 0.9.1 it is possibile to disable that verbose console, can you please explain how to do it both in IDE and local environment? Especially in IDE I am able to set property of log4j for my logger, but everything I try for flink internal one does not work

57 matches

Mail list logo