Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Maximilian Michels
Hi Philip, Thank you for your questions. I think you have mapped the HIVE functions to the Flink ones correctly. Just a remark on the ORDER BY. You wrote that it produces a total order of all the records. In this case, you'd have do a SortPartition operation with parallelism set to 1. This is

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Maximilian Michels
Hi Philip, You're welcome. Just a small correction: Hive's SORT BY should be DataSet.groupBy(key).sortGroup(key) in Flink. This ensures sorted grouped records within the reducer that follows. No need to set the parallelism to 1. Best, Max On Mon, Oct 19, 2015 at 1:28 PM, Philip Lee

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Philip Lee
Thanks, Fabian. I just want to check one thing again. As you said, [Distribute By] is partitionByHash(). and [Sort By] should be sortGroup on Flink. However, [Cluster By] is consist of partitionByHash(). *sortPartition()*. As far as I know, [Cluster By] is same as the combination with

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Fabian Hueske
Hi Philip, here a few additions to what Max said: - ORDER BY: As Max said, Flink's sortPartition() does only sort with a partition and does not produce a total order. You can either set the parallelism to 1 as Max suggested or use a custom partitioner to range partition the data. - SORT BY: From

Re: Hi, Flink people, a question about translation from HIVE Query to Flink fucntioin by using Table API

2015-10-19 Thread Fabian Hueske
The difference between a partition and a group is the following: - A partition refers to all records that are processed by a task instance or sub task. If you use hash partitioning, all elements that share the same key will be in one partition, but usually there will be more than one key in a