---------- Forwarded message ---------- From: Lefty Leverenz <leftylever...@gmail.com> Date: Tue, Mar 31, 2015 at 1:47 AM Subject: Re: ORDER BY clause in Hive To: Mich Talebzadeh <m...@peridale.co.uk>
Hive as I see it does not support ORDER BY *Column position*. It only > supports ORDER BY *Column name*. > That's just in Hive release 0.10.0 and earlier. In release 0.11.0+ you can set the configuration parameter <https://cwiki.apache.org/confluence/display/Hive/AdminManual+Configuration#AdminManualConfiguration-ConfiguringHive> *hive.groupby.orderby.position.**alias* to true, and then you can use column positions in ORDER BY. Here's the new documentation: - hive.groupby.orderby.position.alias <https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-hive.groupby.orderby.position.alias> - Order By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy> - Group By <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy#LanguageManualGroupBy-GroupBySyntax> -- Lefty On Mon, Mar 30, 2015 at 4:46 AM, Mich Talebzadeh <m...@peridale.co.uk> wrote: > Gents, > > > > Hive as I see it does not support ORDER BY *Column position*. It only > supports ORDER BY *Column name*. > > > > Thanks > > > > > > Mich Talebzadeh > > > > http://talebzadehmich.wordpress.com > > > > *Publications due shortly:* > > *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache* > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Ltd, its subsidiaries nor their employees accept > any responsibility. > > > > *From:* Gopal Vijayaraghavan [mailto:go...@hortonworks.com] *On Behalf Of > *Gopal Vijayaraghavan > *Sent:* 30 March 2015 05:26 > *To:* user@hive.apache.org > *Cc:* Lefty Leverenz > > *Subject:* Re: ORDER BY clause in Hive > > > > Hi Lefty, > > > > Couldn’t find the documentation for what > hive.groupby.orderby.position.alias=true does. > > > > I suspect that might be what Mich was looking for (though I tend to write > the column names explicitly). > > > > Cheers, > > Gopal > > > > *From: *Lefty Leverenz <leftylever...@gmail.com> > *Reply-To: *"user@hive.apache.org" <user@hive.apache.org> > *Date: *Sunday, March 29, 2015 at 8:32 PM > *To: *"user@hive.apache.org" <user@hive.apache.org> > *Subject: *Re: ORDER BY clause in Hive > > > > I added information about this in the Order By > <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#LanguageManualSortBy-SyntaxofOrderBy> > section of the wiki. Thanks, Mich and Gopal! > > > -- Lefty > > > > On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <m...@peridale.co.uk> > wrote: > > Hi Lefty, Gopal, > > > > It appears that ORDER BY 3 is not interpreted as ORDERR BY “the result set > column three” which standard SQL as evident from Oracle and Sybase does. > > > > So I made it an ORDER BY from the result set EXPLICITELY as shown below > and it worked OK > > > > SELECT > > rs.Customer_ID > > , rs.Number_of_orders > > , rs.Total_customer_amount > > , rs.Average_order > > , rs.Standard_deviation > > FROM > > ( > > SELECT cust_id AS Customer_ID, > > COUNT(amount_sold) AS Number_of_orders, > > SUM(amount_sold) AS Total_customer_amount, > > AVG(amount_sold) AS Average_order, > > STDDEV(amount_sold) AS Standard_deviation > > FROM sales > > GROUP BY cust_id > > HAVING SUM(amount_sold) > 94000 > > AND AVG(amount_sold) < STDDEV(amount_sold) > > ) rs > > ORDER BY > > *rs.Total_customer_amount* > > ; > > > > > > > +-----------------+----------------------+---------------------------+---------------------+------------------------+--+ > > | rs.customer_id | rs.number_of_orders | rs.total_customer_amount | > rs.average_order | rs.standard_deviation | > > > +-----------------+----------------------+---------------------------+---------------------+------------------------+--+ > > | 1743.0 | 238 | 94786.12999999993 | > 398.2610504201678 | 581.0439095219863 | > > | 429.0 | 231 | 94819.41000000006 | > 410.4736363636366 | 613.7057080691426 | > > | 2994.0 | 227 | 94862.61000000006 | > 417.89696035242315 | 623.1607772763742 | > > | 6395.0 | 268 | 97010.47999999998 | > 361.97940298507456 | 576.9120977984521 | > > | 12783.0 | 240 | 97573.54999999996 | > 406.5564583333332 | 590.4445500393804 | > > | 4974.0 | 235 | 98006.16000000002 | > 417.0474893617022 | 624.337482834059 | > > | 42167.0 | 266 | 98585.96000000002 | > 370.6239097744362 | 590.965120684093 | > > | 10747.0 | 256 | 99578.08999999997 | > 388.9769140624999 | 600.7615005975689 | > > | 11407.0 | 248 | 103412.65999999995 | > 416.9865322580643 | 622.221465710723 | > > > +-----------------+----------------------+---------------------------+---------------------+------------------------+--+ > > 9 rows selected (209.699 seconds) > > > > > > Regards, > > > > Mich Talebzadeh > > > > http://talebzadehmich.wordpress.com > > > > *Publications due shortly:* > > *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache* > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Ltd, its subsidiaries nor their employees accept > any responsibility. > > > > *From:* Mich Talebzadeh [mailto:m...@peridale.co.uk] > *Sent:* 29 March 2015 00:11 > *To:* user@hive.apache.org > *Subject:* ORDER BY clause in Hive > > > > Hi, > > > > Can someone point me to doc or otherwise to see if ORDER BY clause in Hive > is working OK > > > > I have a simple aggregate query as follows: > > > > SELECT cust_id AS Customer_ID, > > COUNT(amount_sold) AS Number_of_orders, > > SUM(amount_sold) AS Total_customer_amount, > > AVG(amount_sold) AS Average_order, > > STDDEV(amount_sold) AS Standard_deviation > > FROM sales > > GROUP BY cust_id > > HAVING SUM(amount_sold) > 94000 > > AND AVG(amount_sold) < STDDEV(amount_sold) > > ORDER BY 3 ; > > > > The original table and data are from Oracle sh.sales table > > > > Oracle comes back for this query with > > > > CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER > STANDARD_DEVIATION > > ----------- ---------------- --------------------- ------------- > ------------------ > > 1743 238 94786.13 398.26105 > 582.26845 > > 429 231 94819.41 410.473636 > 615.038404 > > 2994 227 94862.61 417.89696 > 624.53793 > > 6395 268 97010.48 361.979403 > 577.991448 > > 12783 240 97573.55 406.556458 > 591.6785 > > 4974 235 98006.16 417.047489 > 625.670115 > > 42167 266 98585.96 370.62391 > 592.079099 > > 10747 256 99578.09 388.976914 > 601.938312 > > 11407 248 103412.66 416.986532 > 623.479751 > > > > 9 rows selected. > > > > Ordered by TOTAL_CUSTOMER_AMOUNT > > > > And hive returns for the same query > > > > > +--------------+-------------------+------------------------+---------------------+---------------------+--+ > > | customer_id | number_of_orders | total_customer_amount | > average_order | standard_deviation | > > > +--------------+-------------------+------------------------+---------------------+---------------------+--+ > > | 42167.0 | 266 | 98585.96000000002 | > 370.6239097744362 | 590.965120684093 | > > | 12783.0 | 240 | 97573.54999999996 | > 406.5564583333332 | 590.4445500393804 | > > | 11407.0 | 248 | 103412.65999999995 | > 416.9865322580643 | 622.221465710723 | > > | 10747.0 | 256 | 99578.08999999997 | > 388.9769140624999 | 600.7615005975689 | > > | 6395.0 | 268 | 97010.47999999998 | > 361.97940298507456 | 576.9120977984521 | > > | 4974.0 | 235 | 98006.16000000002 | > 417.0474893617022 | 624.337482834059 | > > | 2994.0 | 227 | 94862.61000000006 | > 417.89696035242315 | 623.1607772763742 | > > | 1743.0 | 238 | 94786.12999999993 | > 398.2610504201678 | 581.0439095219863 | > > | 429.0 | 231 | 94819.41000000006 | > 410.4736363636366 | 613.7057080691426 | > > > +--------------+-------------------+------------------------+---------------------+---------------------+--+ > > 9 rows selected (215.774 seconds) > > > > But ordering in Hive does not seem to be correct! Please note ordering is > on column three, *total_customer_amount* > > > > I also tried this in Sybase and got the same as Oracle. > > > > Adaptive Server cpu time: 100 ms. > > Customer_ID Number_of_orders > Total_customer_amount > Average_order Standard_deviation > > -------------------- ---------------- > ----------------------------------------- > ---------------------------------------------------- > --------------------------- > > 1743 238 > 94786.13 > 398.2610504201680 582.268450 > > 429 231 > 94819.41 > 410.4736363636363 615.038404 > > 2994 227 > 94862.61 > 417.8969603524229 624.537930 > > 6395 268 > 97010.48 > 361.9794029850746 577.991448 > > 12783 240 > 97573.55 > 406.5564583333333 591.678500 > > 4974 235 > 98006.16 > 417.0474893617021 625.670115 > > 42167 266 > 98585.96 > 370.6239097744360 592.079099 > > 10747 256 > 99578.09 388.9769140625000 > 601.938312 > > 11407 248 > 103412.66 > 416.9865322580645 623.479751 > > > > I tried Google search and seems to be different suggestions. May be I have > to rewrite the code? > > > > Thanks > > > > Mich Talebzadeh > > > > http://talebzadehmich.wordpress.com > > > > *Publications due shortly:* > > *Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and > Coherence Cache* > > > > NOTE: The information in this email is proprietary and confidential. This > message is for the designated recipient only, if you are not the intended > recipient, you should destroy it immediately. Any information in this > message shall not be understood as given or endorsed by Peridale Ltd, its > subsidiaries or their employees, unless expressly so stated. It is the > responsibility of the recipient to ensure that this email is virus free, > therefore neither Peridale Ltd, its subsidiaries nor their employees accept > any responsibility. > > > > >