Flink packaging makes life hard for SBT fat jar's

2016-02-12 Thread shikhar
Repro at https://github.com/shikhar/flink-sbt-fatjar-troubles, run `sbt assembly` A fat jar seems like the best way to provide jobs for Flink to execute. I am declaring deps like: {noformat} "org.apache.flink" %% "flink-clients" % "1.0-SNAPSHOT" % "provided" "org.apache.flink" %% "flink-streaming

Re: streaming using DeserializationSchema

2016-02-12 Thread Nick Dimiduk
My input file contains newline-delimited JSON records, one per text line. The records on the Kafka topic are JSON blobs encoded to UTF8 and written as bytes. On Fri, Feb 12, 2016 at 1:41 PM, Martin Neumann wrote: > I'm trying the same thing now. > > I guess you need to read the file as byte arra

Re: streaming using DeserializationSchema

2016-02-12 Thread Martin Neumann
I'm trying the same thing now. I guess you need to read the file as byte arrays somehow to make it work. What read function did you use? The mapper is not hard to write but the byte array stuff gives me a headache. cheers Martin On Fri, Feb 12, 2016 at 9:12 PM, Nick Dimiduk wrote: > Hi Mart

Re: streaming using DeserializationSchema

2016-02-12 Thread Nick Dimiduk
Hi Martin, I have the same usecase. I wanted to be able to load from dumps of data in the same format as is on the kafak queue. I created a new application main, call it the "job" instead of the "flow". I refactored my code a bit for building the flow so all that can be reused via factory method.

writeAsCSV with partitionBy

2016-02-12 Thread Srikanth
Hello, Is there a Hive(or Spark dataframe) partitionBy equivalent in Flink? I'm looking to save output as CSV files partitioned by two columns(date and hour). The partitionBy dataset API is more to partition the data based on a column for further processing. I'm thinking there is no direct

Re: Master Thesis [Apache-flink paper references]

2016-02-12 Thread subash basnet
Hello Matthias, Thank you very much :) Best Regards, Subash Basnet On Fri, Feb 12, 2016 at 8:22 PM, Matthias J. Sax wrote: > You might want to check out the Stratosphere project web site: > http://stratosphere.eu/project/publications/ > > -Matthias > > On 02/12/2016 05:52 PM, subash basnet wro

Re: Master Thesis [Apache-flink paper references]

2016-02-12 Thread Matthias J. Sax
You might want to check out the Stratosphere project web site: http://stratosphere.eu/project/publications/ -Matthias On 02/12/2016 05:52 PM, subash basnet wrote: > Hello all, > > I am currently doing master's thesis on Apache-flink. It would be really > helpful to know about the reference paper

Master Thesis [Apache-flink paper references]

2016-02-12 Thread subash basnet
Hello all, I am currently doing master's thesis on Apache-flink. It would be really helpful to know about the reference papers followed for the development/background of flink. It would help me build a solid background knowledge to analyze flink. Currently I am reading all the related materials f

Re: How to convert List to flink DataSet

2016-02-12 Thread subash basnet
Hello Fabian, Thank you for the response, but I have been stuck on how to iterate over the DataSet, perform operations and return a new modified DataSet similar to that of list operation as shown below. Eg: Currently I am doing the following: for (Centroid centroid : centroids.collect()) { for

Re: Merge or minus Dataset API missing

2016-02-12 Thread Flavio Pompermaier
Ah ok, I didn't know about it! Thanks Till and Fabian! On Fri, Feb 12, 2016 at 5:11 PM, Fabian Hueske wrote: > Hi Flavio, > > If I got it right, you can use a FullOuterJoin. > It will give you both elements on a match and otherwise a left or a right > element and null. > > Best, Fabian > > 2016-

Re: Merge or minus Dataset API missing

2016-02-12 Thread Fabian Hueske
Hi Flavio, If I got it right, you can use a FullOuterJoin. It will give you both elements on a match and otherwise a left or a right element and null. Best, Fabian 2016-02-12 16:48 GMT+01:00 Flavio Pompermaier : > Hi to all, > > I have a use case where I have to merge 2 datasets but I can't fin

Re: Merge or minus Dataset API missing

2016-02-12 Thread Till Rohrmann
Why don’t you simply use a fullOuterJoin to do that? Cheers, Till ​ On Fri, Feb 12, 2016 at 4:48 PM, Flavio Pompermaier wrote: > Hi to all, > > I have a use case where I have to merge 2 datasets but I can't find a > direct dataset API to do that. > I want to execute some function when there's a

Merge or minus Dataset API missing

2016-02-12 Thread Flavio Pompermaier
Hi to all, I have a use case where I have to merge 2 datasets but I can't find a direct dataset API to do that. I want to execute some function when there's a match, otherwise move on the not-null element. At the moment I can do this in a fairly complicated way (I want to avoid broadcasting becaus

Re: Compile issues with Flink 1.0-SNAPSHOT and Scala 2.11

2016-02-12 Thread Cory Monty
Thanks, Stephan. Everything is back to normal for us. Cheers, Cory On Fri, Feb 12, 2016 at 6:54 AM, Stephan Ewen wrote: > Hi Cory! > > We found the problem. There is a development fork of Flink for Stream SQL, > whose CI infrastructure accidentally also deployed snapshots and overwrote > some

Re: Compile issues with Flink 1.0-SNAPSHOT and Scala 2.11

2016-02-12 Thread Stephan Ewen
Hi Cory! We found the problem. There is a development fork of Flink for Stream SQL, whose CI infrastructure accidentally also deployed snapshots and overwrote some of the proper master branch snapshots. That's why the snapshots got inconsistent. We fixed that, and newer snapshots should be online

Re: consume kafka stream with flink

2016-02-12 Thread Robert Metzger
Hi Tanguy, I would recommend to refer to the documentation of the specific Flink version you are using. This is the documentation for 1.0-SNAPSHOT: https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/connectors/kafka.html and this is the doc for 0.10.x: https://ci.apache.org/pro

consume kafka stream with flink

2016-02-12 Thread Tanguy Racinet
Hello, I am currently trying to develop am algorithm mining frequent item sets over a data stream. I am using kafka to generate the stream, however I cannot manage to link Flink to Kafka. The code presented here is working but only using Flink version 0.9.1 https://github.com/dataArtisans/kafka

Re: streaming using DeserializationSchema

2016-02-12 Thread Martin Neumann
Its not only about testing, I will also need to run things against different datasets. I want to reuse as much of the code as possible to load the same data from a file instead of kafka. Is there a simple way of loading the data from a File using the same conversion classes that I would use to tra