Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Maximilian Michels
Hi Philip, Thank you for your questions. I think you have mapped the HIVE functions to the Flink ones correctly. Just a remark on the ORDER BY. You wrote that it produces a total order of all the records. In this case, you'd have do a SortPartition operation with parallelism set to 1. This is

Re: Powered by Flink

2015-10-19 Thread Fabian Hueske
Thanks for starting this Kostas. I think the list is quite hidden in the wiki. Should we link from flink.apache.org to that page? Cheers, Fabian 2015-10-19 14:50 GMT+02:00 Kostas Tzoumas : > Hi everyone, > > I started a "Powered by Flink" wiki page, listing some of the >

Re: Powered by Flink

2015-10-19 Thread Márton Balassi
Thanks for starting and big +1 for making it more prominent. On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote: > Thanks for starting this Kostas. > > I think the list is quite hidden in the wiki. Should we link from > flink.apache.org to that page? > > Cheers, Fabian > >

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Maximilian Michels
Hi Philip, You're welcome. Just a small correction: Hive's SORT BY should be DataSet.groupBy(key).sortGroup(key) in Flink. This ensures sorted grouped records within the reducer that follows. No need to set the parallelism to 1. Best, Max On Mon, Oct 19, 2015 at 1:28 PM, Philip Lee

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Philip Lee
Thanks, Fabian. I just want to check one thing again. As you said, [Distribute By] is partitionByHash(). and [Sort By] should be sortGroup on Flink. However, [Cluster By] is consist of partitionByHash(). *sortPartition()*. As far as I know, [Cluster By] is same as the combination with

Powered by Flink

2015-10-19 Thread Kostas Tzoumas
Hi everyone, I started a "Powered by Flink" wiki page, listing some of the organizations that are using Flink: https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink If you would like to be added to the list, just send me a short email with your organization's name and a description

Re: Powered by Flink

2015-10-19 Thread Fabian Hueske
Sounds good +1 2015-10-19 14:57 GMT+02:00 Márton Balassi : > Thanks for starting and big +1 for making it more prominent. > > On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote: > >> Thanks for starting this Kostas. >> >> I think the list is quite

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Fabian Hueske
Hi Philip, here a few additions to what Max said: - ORDER BY: As Max said, Flink's sortPartition() does only sort with a partition and does not produce a total order. You can either set the parallelism to 1 as Max suggested or use a custom partitioner to range partition the data. - SORT BY: From

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Fabian Hueske
The difference between a partition and a group is the following: - A partition refers to all records that are processed by a task instance or sub task. If you use hash partitioning, all elements that share the same key will be in one partition, but usually there will be more than one key in a

Re: Powered by Flink

2015-10-19 Thread Kostas Tzoumas
yes, definitely. How about a link under the Community drop-down that points to the wiki page? On Mon, Oct 19, 2015 at 2:53 PM, Fabian Hueske wrote: > Thanks for starting this Kostas. > > I think the list is quite hidden in the wiki. Should we link from > flink.apache.org to

Re: Powered by Flink

2015-10-19 Thread Suneel Marthi
+1 to this. On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske wrote: > Sounds good +1 > > 2015-10-19 14:57 GMT+02:00 Márton Balassi : > > > Thanks for starting and big +1 for making it more prominent. > > > > On Mon, Oct 19, 2015 at 2:53 PM, Fabian

[no subject]

2015-10-19 Thread Jakob Ericsson
Hello, We are running into a strange problem with Direct Memory buffers. From what I know, we are not using any direct memory buffers inside our code. This is pretty trivial streaming application just doing some dedupliction and union some kafka streams. /Jakob 2015-10-19 13:27:59,064 INFO

Re: Powered by Flink

2015-10-19 Thread Timo Walther
+1 for adding it to the website instead of wiki. "Who is using Flink?" is always a question difficult to answer to interested users. On 19.10.2015 15:08, Suneel Marthi wrote: +1 to this. On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske > wrote:

Re: Powered by Flink

2015-10-19 Thread Anwar Rizal
Nice indeed :-) On Mon, Oct 19, 2015 at 3:08 PM, Suneel Marthi wrote: > +1 to this. > > On Mon, Oct 19, 2015 at 3:00 PM, Fabian Hueske wrote: > >> Sounds good +1 >> >> 2015-10-19 14:57 GMT+02:00 Márton Balassi : >> >> >

Re:

2015-10-19 Thread Maximilian Michels
Hi Jakob, Thank you for reporting the bug. Could you please post your configuration here? In particular, could you please tell us the value of the following configuration variables: taskmanager.heap.mb taskmanager.network.numberOfBuffers taskmanager.memory.off-heap Are you running the Flink

Re: Powered by Flink

2015-10-19 Thread Fabian Hueske
@Timo: The proposal was to keep the list in the wiki (can be easily extended) but link from the main website to the wiki page. 2015-10-19 15:16 GMT+02:00 Timo Walther : > +1 for adding it to the website instead of wiki. > "Who is using Flink?" is always a question difficult

Re: Powered by Flink

2015-10-19 Thread Maximilian Michels
+1 Let's collect in the Wiki for now. At some point in time, we might want to have a dedicated page on the Flink homepage. On Mon, Oct 19, 2015 at 3:31 PM, Timo Walther wrote: > Ah ok, sorry. I think linking to the wiki is also ok. > > > On 19.10.2015 15:18, Fabian Hueske

Re: ExecutionEnvironment setConfiguration API

2015-10-19 Thread Fabian Hueske
I think it's not a nice solution to check for the type of the returned execution environment to determine whether it is a local or a remote execution environment. Wouldn't it be better to add a method isLocal() to ExecutionEnvironment? Cheers, Fabian 2015-10-14 19:14 GMT+02:00 Flavio

Re:

2015-10-19 Thread Jakob Ericsson
Hi, See answers below. /Jakob On Mon, Oct 19, 2015 at 4:03 PM, Maximilian Michels wrote: > Hi Jakob, > > Thank you for reporting the bug. Could you please post your > configuration here? In particular, could you please tell us the value > of the following configuration

Re:

2015-10-19 Thread Maximilian Michels
I forgot to ask you: Which version of Flink are you using? 0.9.1 or 0.10-SNAPSHOT? On Mon, Oct 19, 2015 at 5:05 PM, Maximilian Michels wrote: > Hi Jakob, > > Thanks. Flink allocates its network memory as direct memory outside > the normal Java heap. By default, that is 64MB but

Re:

2015-10-19 Thread Maximilian Michels
Hi Jakob, Thanks. Flink allocates its network memory as direct memory outside the normal Java heap. By default, that is 64MB but can grow up to 128MB on heavy network transfer. How much memory does your machine have? Could it be that your upper memory bound is lower than 2048 + 128 MB? Best, Max

Re:

2015-10-19 Thread Gyula Fóra
It's 0.10-SNAPSHOT Gyula Maximilian Michels ezt írta (időpont: 2015. okt. 19., H, 17:13): > I forgot to ask you: Which version of Flink are you using? 0.9.1 or > 0.10-SNAPSHOT? > > On Mon, Oct 19, 2015 at 5:05 PM, Maximilian Michels > wrote: > > Hi Jakob, > >

Flink+avro integration

2015-10-19 Thread Andrew Whitaker
I'm doing some research on Flink + Avro integration, and I've come across "org.apache.flink.api.java.io.AvroInputFormat" as a way to create a stream of Avro objects from a file. I had the following questions: 1. Is this the extent of Flink's integration with Avro? If I wanted to read

Re: Flink Data Stream Union

2015-10-19 Thread Anwar Rizal
Do you really need to iterate ? On Mon, Oct 19, 2015 at 5:42 PM, flinkuser wrote: > > Here is my code snippet but I find the union operator not workable. > > DataStream msgDataStream1 = env.addSource((new > SocketSource(hostName1,port,'\n',-1))).filter(new >

Re:

2015-10-19 Thread Maximilian Michels
When was the last time you updated your 0.10-SNAPSHOT Flink cluster? If it has been more than a couple of weeks, then I'd advise you to update to the latest snapshot version. There has been an issue with the calculation of the off-heap memory limit in the past. Thanks, Max On Mon, Oct 19, 2015

Flink Data Stream Union

2015-10-19 Thread flinkuser
Here is my code snippet but I find the union operator not workable. DataStream msgDataStream1 = env.addSource((new SocketSource(hostName1,port,'\n',-1))).filter(new MessageFilter()).setParallelism(1); DataStream msgDataStream2 = env.addSource((new

Re:

2015-10-19 Thread Jakob Ericsson
The revision is "Starting JobManager (Version: 0.10-SNAPSHOT, Rev:c82ebbf, Date:15.10.2015 @ 11:34:01 CEST)" We have a lot of memory left on the machine. I have increased it quite a lot. What is your thought on memory configuration? If I understand Flink correctly, you should only have one

Re: Distribute DataSet to subset of nodes

2015-10-19 Thread Stefan Bunk
Hi Fabian, I implemented your approach from above. However, the runtime decides to run two subtasks on the same node, resulting in an out of memory error. I thought partitioning really does partition the data to nodes, but now it seems like its partitioning to tasks, and tasks can be one the same

Re: Flink+avro integration

2015-10-19 Thread Márton Balassi
Hi Andrew, 1a, In general Flink can read and write Avro data through the AvroInputFormat and AvroOutputtFormat in both the batch and the streaming API. In general you can write the following: StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); DataStream

Re:

2015-10-19 Thread Maximilian Michels
You can see the revision number and the build date in the JobManager log file, e.g. "Starting JobManager (Version: 0.10-SNAPSHOT, Rev:1b79bc1, Date:18.10.2015 @ 20:15:08 CEST)" On Mon, Oct 19, 2015 at 5:53 PM, Maximilian Michels wrote: > When was the last time you updated your