Re: sorting in hive -- general

2015-03-09 Thread max scalf
Thank you...

On Mon, Mar 9, 2015 at 2:23 AM, r7raul1...@163.com r7raul1...@163.com
wrote:

 read this article
 http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/


 then read
 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy

 --
 r7raul1...@163.com


 *From:* max scalf oracle.bl...@gmail.com
 *Date:* 2015-03-08 07:02
 *To:* HDP mailing list u...@hadoop.apache.org; Hive Mailing List
 user@hive.apache.org
 *Subject:* sorting in hive -- general
 Hello all,

 I am a new to hadoop and hive in general and i am reading hadoop the
 definitive guide by Tom White and on page 504 for the hive chapter, Tom
 says below with regards to soritng

 *Sorting and Aggregating*
 *Sorting data in Hive can be achieved by using a standard ORDER BY clause.
 ORDER BY performs a parallel total sort of the input (like that described
 in “Total Sort” on page 261). When a globally sorted result is not
 required—and in many cases it isn’t—you can use Hive’s nonstandard
 extension, SORT BY, instead. SORT BY produces a sorted file per reducer.*


 My Questions is, what exactly does he mean by globally sorted result?,
 if the sort by operation produces a sorted file per reducer does that mean
 at the end of the sort all the reducer are put back together to give the
 correct results ?






Re: sorting in hive -- general

2015-03-09 Thread r7raul1...@163.com
read this article 
http://www.philippeadjiman.com/blog/2009/12/20/hadoop-tutorial-series-issue-2-getting-started-with-customized-partitioning/
 

then read   
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy



r7raul1...@163.com
 
From: max scalf
Date: 2015-03-08 07:02
To: HDP mailing list; Hive Mailing List
Subject: sorting in hive -- general
Hello all,

I am a new to hadoop and hive in general and i am reading hadoop the 
definitive guide by Tom White and on page 504 for the hive chapter, Tom says 
below with regards to soritng

Sorting and Aggregating
Sorting data in Hive can be achieved by using a standard ORDER BY clause. ORDER 
BY performs a parallel total sort of the input (like that described in “Total 
Sort” on page 261). When a globally sorted result is not required—and in many 
cases it isn’t—you can use Hive’s nonstandard extension, SORT BY, instead. SORT 
BY produces a sorted file per reducer.

 
My Questions is, what exactly does he mean by globally sorted result?, if the 
sort by operation produces a sorted file per reducer does that mean at the end 
of the sort all the reducer are put back together to give the correct results ?





sorting in hive -- general

2015-03-07 Thread max scalf
Hello all,

I am a new to hadoop and hive in general and i am reading hadoop the
definitive guide by Tom White and on page 504 for the hive chapter, Tom
says below with regards to soritng

*Sorting and Aggregating*
*Sorting data in Hive can be achieved by using a standard ORDER BY clause.
ORDER BY performs a parallel total sort of the input (like that described
in “Total Sort” on page 261). When a globally sorted result is not
required—and in many cases it isn’t—you can use Hive’s nonstandard
extension, SORT BY, instead. SORT BY produces a sorted file per reducer.*


My Questions is, what exactly does he mean by globally sorted result?, if
the sort by operation produces a sorted file per reducer does that mean at
the end of the sort all the reducer are put back together to give the
correct results ?


Re: sorting in hive -- general

2015-03-07 Thread Alexander Pivovarov
sort by query produces multiple independent files.

order by - just one file

usually sort by is used with distributed by.
In older hive versions (0.7) they might be used to implement local sort
within partition
similar to RANK() OVER (PARTITION BY A ORDER BY B)


On Sat, Mar 7, 2015 at 3:02 PM, max scalf oracle.bl...@gmail.com wrote:

 Hello all,

 I am a new to hadoop and hive in general and i am reading hadoop the
 definitive guide by Tom White and on page 504 for the hive chapter, Tom
 says below with regards to soritng

 *Sorting and Aggregating*
 *Sorting data in Hive can be achieved by using a standard ORDER BY clause.
 ORDER BY performs a parallel total sort of the input (like that described
 in “Total Sort” on page 261). When a globally sorted result is not
 required—and in many cases it isn’t—you can use Hive’s nonstandard
 extension, SORT BY, instead. SORT BY produces a sorted file per reducer.*


 My Questions is, what exactly does he mean by globally sorted result?,
 if the sort by operation produces a sorted file per reducer does that mean
 at the end of the sort all the reducer are put back together to give the
 correct results ?