Re: reading csv file from null value

2015-10-26 Thread Maximilian Michels
As far as I know the null support was removed from the Table API because its support was consistently supported with all operations. See https://issues.apache.org/jira/browse/FLINK-2236 On Fri, Oct 23, 2015 at 7:18 PM, Shiti Saxena wrote: > For a similar problem where we

Re: Does 'DataStream.writeAsCsv' suppose to work like this?

2015-10-26 Thread Márton Balassi
Hey Max, The solution I am proposing is not flushing on every record, but it makes sure to forward the flushing from the sinkfunction to the outputformat whenever it is triggered. Practically this means that the buffering is done (almost) solely in the sink and not in the outputformat any more.

Re: Specially introduced Flink to chinese users in CNCC(China National Computer Congress)

2015-10-26 Thread Maximilian Michels
Hi Liang, We greatly appreciate you introduced Flink to the Chinese users at CNCC! We would love to hear how people like Flink. Please keep us up to date and point the users to the mailing list or Stackoverflow if they have any difficulties. Best regards, Max On Sat, Oct 24, 2015 at 5:48 PM,

Re: reading csv file from null value

2015-10-26 Thread Philip Lee
Thanks for your reply. What if I do not use Table API? The error happens when using just env.readFromCsvFile(). I heard that using RowSerializer would handle this null value, but its error of TypeInformation happens when it is converted On Mon, Oct 26, 2015 at 10:26 AM, Maximilian Michels

Re: Reading null value from datasets

2015-10-26 Thread Maximilian Michels
As far as I know the null support was removed from the Table API because its support was consistently supported with all operations. See https://issues.apache.org/jira/browse/FLINK-2236 On Fri, Oct 23, 2015 at 7:20 PM, Shiti Saxena wrote: > For a similar problem where we

Re: Does 'DataStream.writeAsCsv' suppose to work like this?

2015-10-26 Thread Márton Balassi
Hey Rex, Writing half-baked records is definitely unwanted, thanks for spotting this. Most likely it can be solved by adding a flush at the end of every invoke call, let me check. Best, Marton On Mon, Oct 26, 2015 at 7:56 AM, Rex Ge wrote: > Hi, flinkers! > > I'm new to

Does 'DataStream.writeAsCsv' suppose to work like this?

2015-10-26 Thread Rex Ge
Hi, flinkers! I'm new to this whole thing, and it seems to me that ' org.apache.flink.streaming.api.datastream.DataStream.writeAsCsv(String, WriteMode, long)' does not work properly. To be specific, data were not flushed by update frequency when write to HDFS. what make it more disturbing is

Re: Does 'DataStream.writeAsCsv' suppose to work like this?

2015-10-26 Thread Márton Balassi
The problem persists in the current master, simply a format.flush() is needed here [1]. I'll do a quick hotfix, thanks for the report again! [1] https://github.com/apache/flink/blob/master/flink-streaming-java/src/main/java/org/apache/flink/streaming/api/functions/sink/FileSinkFunction.java#L99

Wrong owner of HDFS output folder

2015-10-26 Thread Flavio Pompermaier
Hi to all, when I run my job within my hadoop cluster (both from command line and from webapp) the output of my job (HDFS) works fine until I set the write parallelism to 1 (the output file is created with the user running the job). If I leave the default parallelism (>1) the job fails because it

Re: Wrong owner of HDFS output folder

2015-10-26 Thread Maximilian Michels
The problem is that non-root processes may not be able to read root-owned files/folders. Therefore, we cannot really check as a non-root users whether root-owned clusters have been started. It's better not to run Flink with root permissions. You're welcome. Cheers, Max On Mon, Oct 26, 2015 at

Re: Wrong owner of HDFS output folder

2015-10-26 Thread Maximilian Michels
Hi Flavio, Are you runing your Flink cluster with root permissions? The directory to hold the output splits are created by the JobManager. So if you run then JobManager with root permissions, it will create a folder owned by root. If the task managers are not run with root permissions, this could

FastR-Flink: a new open source Truffle project

2015-10-26 Thread Juan Fumero
Hi everyone, we have just published a new open source Truffle project, FastR-Flink. It is available in https://bitbucket.org/allr/fastr-flink FastR is an implementation of the R language on top of Truffle and Graal [3] developed by Purdue University, Johannes Kepler University and Oracle Labs

Re: Error running an hadoop job from web interface

2015-10-26 Thread Flavio Pompermaier
Now that I've recompiled flink and restarted the web-client everything works fine. However, when I flag the job I want to run I see parallelism 1 in the right panel, but when I click on "Run Job" button + show optimizer plan flagged I see parallelism 36. Is that a bug of the first preview? On

Re: Error running an hadoop job from web interface

2015-10-26 Thread Flavio Pompermaier
No, I just use the default parallelism On Mon, Oct 26, 2015 at 3:05 PM, Maximilian Michels wrote: > Did you set the default parallelism of the cluster to 36? This is because > the plan gets optimized against the cluster configuration when you try to > run the uploaded program.

Re: Error running an hadoop job from web interface

2015-10-26 Thread Maximilian Michels
That's odd. Does it also execute with parallelism 36 then? On Mon, Oct 26, 2015 at 3:06 PM, Flavio Pompermaier wrote: > No, I just use the default parallelism > > On Mon, Oct 26, 2015 at 3:05 PM, Maximilian Michels > wrote: > >> Did you set the default