Re: sorting in hive -- general
Thank you... On Mon, Mar 9, 2015 at 2:23 AM, r7raul1...@163.com r7raul1...@163.com wrote: read this article http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/ then read https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy -- r7raul1...@163.com *From:* max scalf oracle.bl...@gmail.com *Date:* 2015-03-08 07:02 *To:* HDP mailing list u...@hadoop.apache.org; Hive Mailing List user@hive.apache.org *Subject:* sorting in hive -- general Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng *Sorting and Aggregating* *Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY performs a parallel total sort of the input (like that described in “Total Sort” on page 261). When a globally sorted result is not required—and in many cases it isn’t—you can use Hive’s nonstandard extension, SORT BY, instead. SORT BY produces a sorted file per reducer.* My Questions is, what exactly does he mean by globally sorted result?, if the sort by operation produces a sorted file per reducer does that mean at the end of the sort all the reducer are put back together to give the correct results ?
Re: sorting in hive -- general
read this article http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/ then read https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy r7raul1...@163.com From: max scalf Date: 2015-03-08 07:02 To: HDP mailing list; Hive Mailing List Subject: sorting in hive -- general Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng Sorting and Aggregating Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY performs a parallel total sort of the input (like that described in “Total Sort” on page 261). When a globally sorted result is not required—and in many cases it isn’t—you can use Hive’s nonstandard extension, SORT BY, instead. SORT BY produces a sorted file per reducer. My Questions is, what exactly does he mean by globally sorted result?, if the sort by operation produces a sorted file per reducer does that mean at the end of the sort all the reducer are put back together to give the correct results ?
sorting in hive -- general
Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng *Sorting and Aggregating* *Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY performs a parallel total sort of the input (like that described in “Total Sort” on page 261). When a globally sorted result is not required—and in many cases it isn’t—you can use Hive’s nonstandard extension, SORT BY, instead. SORT BY produces a sorted file per reducer.* My Questions is, what exactly does he mean by globally sorted result?, if the sort by operation produces a sorted file per reducer does that mean at the end of the sort all the reducer are put back together to give the correct results ?
Re: sorting in hive -- general
sort by query produces multiple independent files. order by - just one file usually sort by is used with distributed by. In older hive versions (0.7) they might be used to implement local sort within partition similar to RANK() OVER (PARTITION BY A ORDER BY B) On Sat, Mar 7, 2015 at 3:02 PM, max scalf oracle.bl...@gmail.com wrote: Hello all, I am a new to hadoop and hive in general and i am reading hadoop the definitive guide by Tom White and on page 504 for the hive chapter, Tom says below with regards to soritng *Sorting and Aggregating* *Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER BY performs a parallel total sort of the input (like that described in “Total Sort” on page 261). When a globally sorted result is not required—and in many cases it isn’t—you can use Hive’s nonstandard extension, SORT BY, instead. SORT BY produces a sorted file per reducer.* My Questions is, what exactly does he mean by globally sorted result?, if the sort by operation produces a sorted file per reducer does that mean at the end of the sort all the reducer are put back together to give the correct results ?