Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Maximilian Michels
Hi Mihail, Thanks for the code. I'm trying to reproduce the problem now. On Wed, Jul 1, 2015 at 8:30 PM, Mihail Vieru vi...@informatik.hu-berlin.de wrote: Hi Max, thank you for your reply. I wanted to revise and dismiss all other factors before writing back. I've attached you my code and

RE: How to cancel a Flink DataSource from the driver code?

2015-07-02 Thread LINZ, Arnaud
Hi Stephan, I think that clean shutdown is a major feature to build a complex persistent service that use Flink Streaming for a data-quality critical task, and I’ll mark my code with a // FIXME comment waiting for this feature to be available ! Greetings, Arnaud De : ewenstep...@gmail.com

Re: Data Source from Cassandra

2015-07-02 Thread Stephan Ewen
Hi! If there is a CassandraSource for Hadoop, you can also use that with the HadoopInputFormatWrapper. If you want to implement a Flink-specific source, extending InputFormat is the right thing to do. A user has started to implement a cassandra sink in this fork (may be able to reuse some code

Re: Data Source from Cassandra

2015-07-02 Thread Welly Tambunan
Hi Stephan, Thanks a lot ! I will give it a look. Cheers On Thu, Jul 2, 2015 at 6:05 PM, Stephan Ewen se...@apache.org wrote: Hi! If there is a CassandraSource for Hadoop, you can also use that with the HadoopInputFormatWrapper. If you want to implement a Flink-specific source,

Re: Flink 0.9 built with Scala 2.11

2015-07-02 Thread Chiwan Park
@Alexander I’m happy to hear that you want to help me. If you help me, I really appreciate. :) Regards, Chiwan Park On Jul 2, 2015, at 2:57 PM, Alexander Alexandrov alexander.s.alexand...@gmail.com wrote: @Chiwan: let me know if you need hands-on support. I'll be more then happy to

Re: writeAsCsv not writing anything on HDFS when WriteMode set to OVERWRITE

2015-07-02 Thread Maximilian Michels
The problem is that your input and output path are the same. Because Flink executes in a pipelined fashion, all the operators will come up at once. When you set WriteMode.OVERWRITE for the sink, it will delete the path before writing anything. That means that when your DataSource reads the input,

Re: Batch Processing as Streaming

2015-07-02 Thread Welly Tambunan
Thanks Stephan, That's clear ! Cheers On Thu, Jul 2, 2015 at 6:13 PM, Stephan Ewen se...@apache.org wrote: Hi! I am actually working to get some more docs out there, there is a lack right now, agreed. Concerning your questions: (1) Batch programs basically recover from the data sources

POJO coCroup on null value

2015-07-02 Thread Flavio Pompermaier
Hi to all, I'd like to join 2 datasets of POJO, let's say for example: Person: - name - birthPlaceId Place: - id - name I'd like to do people.coCoGroup(places).where(birthPlaceId).equalTo(id).with(...) However, not all people have a birthPlaceId value in my use case..so I get a

Re: POJO coCroup on null value

2015-07-02 Thread Stephan Ewen
Hi Flavio! Keys cannot be null in Flink, that is a contract deep in the system. Filter out the null valued elements, or, if you want them in the result, I would try to use a special value for null. That should do it. BTW: In SQL, joining on null usually filters out elements, as key operations

Re: POJO coCroup on null value

2015-07-02 Thread Flavio Pompermaier
ok, thanks for the help Stephan! On 2 Jul 2015 20:05, Stephan Ewen se...@apache.org wrote: Hi Flavio! Keys cannot be null in Flink, that is a contract deep in the system. Filter out the null valued elements, or, if you want them in the result, I would try to use a special value for null.

TeraSort on Flink and Spark

2015-07-02 Thread Dongwon Kim
Hello, I'd like to share my code for TeraSort on Flink and Spark which uses the same range partitioner as Hadoop TeraSort: https://github.com/eastcirclek/terasort I also write a short report on it: http://eastcirclek.blogspot.kr/2015/06/terasort-for-spark-and-flink-with-range.html In the blog