Re: flink-dist shading

2016-11-18 Thread Tzu-Li (Gordon) Tai
Hi Craig, I think the email wasn't sent to the ‘dev’ list, somehow. Have you tried this: mvn clean install -DskipTests # In Maven 3.3 the shading of flink-dist doesn't work properly in one run, so we need to run mvn for flink-dist again. cd flink-dist mvn clean install -DskipTests I agree that

Error while running Yahoo Streaming Benchmarks on a single machine

2016-11-18 Thread Muhammad Haseeb Javed
I am trying to run the Yahoo Streaming Benchmarks on a single machine right now. When I run them for Flink I am getting the following error: org.apache.flink.client.program.ProgramInvocationException: The program execution failed: Job was cancelled. at

Re: flink-dist shading

2016-11-18 Thread Foster, Craig
I’m not even sure this was delivered to the ‘dev’ list but I’ll go ahead and forward the same email to the ‘user’ list since I haven’t seen a response. --- I’m following up on the issue in FLINK-5013 about flink-dist specifically requiring Maven 3.0.5 through to <3.3. This affects people

Flink streaming with 1+ TB of managed state

2016-11-18 Thread Steven Ruppert
Hi, Is anybody currently running flink streaming with north of a terabyte (TB) of managed state? If you are, can you share your experiences wrt hardware, tuning, recovery situations, etc? I'm evaluating flink for a use case I estimate will take around 5TB of state in total, but looking at the

Re: Flink survey by data Artisans

2016-11-18 Thread Shannon Carey
There's a newline that disrupts the URL. http://www.surveygizmo.com/s3/3166399/181bdb611f22 Not: http://www.surveygizmo.com/s3/ 3166399/181bdb611f22

Re: Flink survey by data Artisans

2016-11-18 Thread Vishnu Viswanath
Works for me also. On Fri, Nov 18, 2016 at 12:35 PM, Stephan Ewen wrote: > Just checked it, the link works for me. > > > On Fri, Nov 18, 2016 at 7:20 PM, amir bahmanyari > wrote: > >> [image: Inline image] >> >> >> -- >>

Re: Flink survey by data Artisans

2016-11-18 Thread Stephan Ewen
Just checked it, the link works for me. On Fri, Nov 18, 2016 at 7:20 PM, amir bahmanyari wrote: > [image: Inline image] > > > -- > *From:* Kostas Tzoumas > *To:* "d...@flink.apache.org" ;

Re: Flink survey by data Artisans

2016-11-18 Thread amir bahmanyari
From: Kostas Tzoumas To: "d...@flink.apache.org" ; user@flink.apache.org Sent: Friday, November 18, 2016 7:28 AM Subject: Flink survey by data Artisans Hi everyone! The Apache Flink community has evolved quickly over the past 2+ years,

Re: CodeAnalysisMode in Flink

2016-11-18 Thread Fabian Hueske
Hi Vinay, not sure why it's not working, but maybe TImo (in CC) can help. Best, Fabian 2016-11-18 17:41 GMT+01:00 Vinay Patil : > Hi, > > According to JavaDoc if I use the below method > *env.getConfig().setCodeAnalysisMode(CodeAnalysisMode.HINT);* > > ,it will print

Re: Any way to increase sort buffer size?

2016-11-18 Thread Gábor Hermann
Hi Fabian, Thanks for your answer! I see that it's not a lightweight change. I guess it's easier if I find a workaround for using smaller objects. Cheers, Gabor On 2016-11-18 11:02, Fabian Hueske wrote: Hi Gabor, I don't think there is a way to tune the memory settings for specific

Re: Subtask keeps on discovering new Kinesis shard when using Kinesalite

2016-11-18 Thread Philipp Bussche
Hello Gordon thank you for the patch. I can confirm that discovery looks good now and it does not re discover shards every few seconds. I will do more testing with this now but it looks very good already ! Thanks, Philipp -- View this message in context:

CodeAnalysisMode in Flink

2016-11-18 Thread Vinay Patil
Hi, According to JavaDoc if I use the below method *env.getConfig().setCodeAnalysisMode(CodeAnalysisMode.HINT);* ,it will print the program improvements to log, however these details are not getting printed to log, I have kept the log4j in resources folder and kept the log level to ALL. I want

Re: Regarding time window based on the values received in the stream

2016-11-18 Thread Abdul Salam Shaikh
Hello Mr Hueske, Thank you for reaching out to my query. The example stated in the documentation is the same use case for me where I am trying to build a prototype regarding a traffic metric in Germany as a part of my thesis. The data is received from multiple detectors and there is a field

Re: spark vs flink batch performance

2016-11-18 Thread Gábor Gévay
> "For csv reading, i deliberately did not use csv reader since i want to run > same code across spark and flink." > > If your objective deviates from writing and running the fastest Spark and > fastest Flink programs, then your comparison is worthless. Well, I don't really agree with this. I

Re: spark vs flink batch performance

2016-11-18 Thread Greg Hogan
"For csv reading, i deliberately did not use csv reader since i want to run same code across spark and flink." If your objective deviates from writing and running the fastest Spark and fastest Flink programs, then your comparison is worthless. On Fri, Nov 18, 2016 at 5:37 AM, CPC

Re: Type of TypeVariable 'K' in 'class <> could not be determined

2016-11-18 Thread Vasiliki Kalavri
Thanks Timo for the explanation and Wouter for reporting this. I've located two more instances of this in the Graph class and created FLINK5097. I'll ping you after I open the PR if that's OK. -Vasia. On 18 November 2016 at 12:22, Timo Walther wrote: > Yes. I don't know if

Re: Type of TypeVariable 'K' in 'class <> could not be determined

2016-11-18 Thread Timo Walther
Yes. I don't know if it solve this problem but in general if the input type is known it should be passed for input type inference. Am 18/11/16 um 11:28 schrieb Vasiliki Kalavri: Hi Timo, thanks for looking into this! Are you referring to the 4th argument in [1]? Thanks, -Vasia. [1]:

Re: spark vs flink batch performance

2016-11-18 Thread CPC
Thank you Flavio. I will generate flamegraph for flink and compare them. On 18 November 2016 at 13:43, Flavio Pompermaier wrote: > I think this could be very helpful for your study: > > http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance- >

Re: spark vs flink batch performance

2016-11-18 Thread Flavio Pompermaier
I think this could be very helpful for your study: http://db-blog.web.cern.ch/blog/luca-canali/2016-09-spark-20-performance-improvements-investigated-flame-graphs Best, Flavio On Fri, Nov 18, 2016 at 11:37 AM, CPC wrote: > Hi Gabor, > > Thank you for your kind response. I

Re: spark vs flink batch performance

2016-11-18 Thread CPC
Hi Gabor, Thank you for your kind response. I forget to mention that i have actually three workers. This is why i set default paralelism to 6. For csv reading, i deliberately did not use csv reader since i want to run same code across spark and flink. Collect is returning 40k records which is

Re: Type of TypeVariable 'K' in 'class <> could not be determined

2016-11-18 Thread Vasiliki Kalavri
Hi Timo, thanks for looking into this! Are you referring to the 4th argument in [1]? Thanks, -Vasia. [1]: https://github.com/apache/flink/blob/master/ flink-libraries/flink-gelly/src/main/java/org/apache/ flink/graph/Graph.java#L506 On 18 November 2016 at 10:25, Timo Walther

Re: Any way to increase sort buffer size?

2016-11-18 Thread Fabian Hueske
Hi Gabor, I don't think there is a way to tune the memory settings for specific operators. For that you would need to change the memory allocation in the optimizers, which is possible but not a lightweight change either. If you want to get something working, you could add a method to the API to

Re: spark vs flink batch performance

2016-11-18 Thread Gábor Gévay
Hello, Your program looks mostly fine, but there are a few minor things that might help a bit: Parallelism: In your attached flink-conf.yaml, you have 2 task slots per task manager, and if you have 1 task manager, then your total number of task slots is also 2. However, your default parallelism

Re: Type of TypeVariable 'K' in 'class <> could not be determined

2016-11-18 Thread Timo Walther
I think I identified the problem. Input type inference can not be used because of missing information. @Vasia: Why is the TypeExtractor in Graph. mapVertices(mapper) called without information about the input type? Isn't the input of the MapFunction known at this point? ( vertices.getType())

Re: Why use Kafka after all?

2016-11-18 Thread Fabian Hueske
A common choice is Apache Avro. You can to define a schema for you Pojos and generate serializers and deserializers. 2016-11-18 5:11 GMT+01:00 Matt : > Just to be clear, what I'm looking for is a way to serialize a POJO class > for Kafka but also for Flink, I'm not sure

Re: Regarding time window based on the values received in the stream

2016-11-18 Thread Fabian Hueske
Hi, that does not sound like a time window problem because there is not time-related condition to split the windows. I think you can implement that with a GlobalWindow and a custom trigger. The documentation about global windows, triggers, and evictors [1] and this blogpost [2] might be helpful

Re: Cross product of datastream and dataset

2016-11-18 Thread Fabian Hueske
Hi, it is not possible to mix the DataSet and DataStream APIs at the moment. If the DataSet is constant and not too big (which I assume, since otherwise crossing would be extremely expensive), you can load the data into a stateful MapFunction. For that you can implement a RichFlatMapFunction and

Re: fileInpuFormat of type json

2016-11-18 Thread Fabian Hueske
Hi, there are no built-in methods to read JSON. How this can be done depends on the formatting of the JSON. If you have a collection of JSON objects which are separated by newline (and there are no other newlines) you can read the file with env.readTextFile(). This will give you one String per