Hi Lefty,

Couldn¹t find the documentation for what
hive.groupby.orderby.position.alias=true does.

I suspect that might be what Mich was looking for (though I tend to write
the column names explicitly).

Cheers,
Gopal

From:  Lefty Leverenz <leftylever...@gmail.com>
Reply-To:  "user@hive.apache.org" <user@hive.apache.org>
Date:  Sunday, March 29, 2015 at 8:32 PM
To:  "user@hive.apache.org" <user@hive.apache.org>
Subject:  Re: ORDER BY clause in Hive

I added information about this in the Order By
<https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SortBy#Lang
uageManualSortBy-SyntaxofOrderBy>  section of the wiki.  Thanks, Mich and
Gopal!

-- Lefty

On Sun, Mar 29, 2015 at 5:24 AM, Mich Talebzadeh <m...@peridale.co.uk>
wrote:
> Hi Lefty, Gopal,
>  
> It appears that ORDER BY 3 is not interpreted as ORDERR BY ³the result set
> column three² which standard SQL as evident from Oracle and Sybase does.
>  
> So I made it an ORDER BY from the result set EXPLICITELY as shown below and it
> worked OK
>  
> SELECT
>           rs.Customer_ID
>         , rs.Number_of_orders
>         , rs.Total_customer_amount
>         , rs.Average_order
>         , rs.Standard_deviation
> FROM
> (
>         SELECT cust_id AS Customer_ID,
>         COUNT(amount_sold) AS Number_of_orders,
>         SUM(amount_sold) AS Total_customer_amount,
>         AVG(amount_sold) AS Average_order,
>         STDDEV(amount_sold) AS Standard_deviation
>         FROM sales
>         GROUP BY cust_id
>         HAVING SUM(amount_sold) > 94000
>         AND AVG(amount_sold) < STDDEV(amount_sold)
> ) rs
> ORDER BY
>          rs.Total_customer_amount
> ;
>  
>  
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> | rs.customer_id  | rs.number_of_orders  | rs.total_customer_amount  |
> rs.average_order   | rs.standard_deviation  |
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> | 1743.0          | 238                  | 94786.12999999993         |
> 398.2610504201678   | 581.0439095219863      |
> | 429.0           | 231                  | 94819.41000000006         |
> 410.4736363636366   | 613.7057080691426      |
> | 2994.0          | 227                  | 94862.61000000006         |
> 417.89696035242315  | 623.1607772763742      |
> | 6395.0          | 268                  | 97010.47999999998         |
> 361.97940298507456  | 576.9120977984521      |
> | 12783.0         | 240                  | 97573.54999999996         |
> 406.5564583333332   | 590.4445500393804      |
> | 4974.0          | 235                  | 98006.16000000002         |
> 417.0474893617022   | 624.337482834059       |
> | 42167.0         | 266                  | 98585.96000000002         |
> 370.6239097744362   | 590.965120684093       |
> | 10747.0         | 256                  | 99578.08999999997         |
> 388.9769140624999   | 600.7615005975689      |
> | 11407.0         | 248                  | 103412.65999999995        |
> 416.9865322580643   | 622.221465710723       |
> +-----------------+----------------------+---------------------------+--------
> -------------+------------------------+--+
> 9 rows selected (209.699 seconds)
>  
>  
> Regards,
>  
> 
> Mich Talebzadeh
>  
> http://talebzadehmich.wordpress.com
>  
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache
>  
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries
> or their employees, unless expressly so stated. It is the responsibility of
> the recipient to ensure that this email is virus free, therefore neither
> Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>  
> 
> From: Mich Talebzadeh [mailto:m...@peridale.co.uk]
> Sent: 29 March 2015 00:11
> To: user@hive.apache.org
> Subject: ORDER BY clause in Hive
> 
>  
> Hi,
>  
> Can someone point me to doc or otherwise to see if ORDER BY clause in Hive is
> working OK
>  
> I have a simple aggregate query as follows:
>  
> SELECT cust_id AS Customer_ID,
> COUNT(amount_sold) AS Number_of_orders,
> SUM(amount_sold) AS Total_customer_amount,
> AVG(amount_sold) AS Average_order,
> STDDEV(amount_sold) AS Standard_deviation
> FROM sales
> GROUP BY cust_id
> HAVING SUM(amount_sold) > 94000
> AND AVG(amount_sold) < STDDEV(amount_sold)
> ORDER BY 3 ;
>  
> The original table and data are from Oracle sh.sales table
>  
> Oracle comes back for this query with
>  
> CUSTOMER_ID NUMBER_OF_ORDERS TOTAL_CUSTOMER_AMOUNT AVERAGE_ORDER
> STANDARD_DEVIATION
> ----------- ---------------- --------------------- -------------
> ------------------
>        1743              238              94786.13     398.26105
> 582.26845
>         429              231              94819.41    410.473636
> 615.038404
>        2994              227              94862.61     417.89696
> 624.53793
>        6395              268              97010.48    361.979403
> 577.991448
>       12783              240              97573.55    406.556458
> 591.6785
>        4974              235              98006.16    417.047489
> 625.670115
>       42167              266              98585.96     370.62391
> 592.079099
>       10747              256              99578.09    388.976914
> 601.938312
>       11407              248             103412.66    416.986532
> 623.479751
>  
> 9 rows selected.
>  
> Ordered by TOTAL_CUSTOMER_AMOUNT
>  
> And hive returns for the same query
>  
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> | customer_id  | number_of_orders  | total_customer_amount  |    average_order
> | standard_deviation  |
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> | 42167.0      | 266               | 98585.96000000002      |
> 370.6239097744362   | 590.965120684093    |
> | 12783.0      | 240               | 97573.54999999996      |
> 406.5564583333332   | 590.4445500393804   |
> | 11407.0      | 248               | 103412.65999999995     |
> 416.9865322580643   | 622.221465710723    |
> | 10747.0      | 256               | 99578.08999999997      |
> 388.9769140624999   | 600.7615005975689   |
> | 6395.0       | 268               | 97010.47999999998      |
> 361.97940298507456  | 576.9120977984521   |
> | 4974.0       | 235               | 98006.16000000002      |
> 417.0474893617022   | 624.337482834059    |
> | 2994.0       | 227               | 94862.61000000006      |
> 417.89696035242315  | 623.1607772763742   |
> | 1743.0       | 238               | 94786.12999999993      |
> 398.2610504201678   | 581.0439095219863   |
> | 429.0        | 231               | 94819.41000000006      |
> 410.4736363636366   | 613.7057080691426   |
> +--------------+-------------------+------------------------+-----------------
> ----+---------------------+--+
> 9 rows selected (215.774 seconds)
>  
> But ordering in Hive does not seem to be correct! Please note ordering is on
> column three, total_customer_amount
>  
> I also tried this in Sybase and got the same as Oracle.
>  
> Adaptive Server cpu time: 100 ms.
> Customer_ID          Number_of_orders Total_customer_amount
> Average_order                                        Standard_deviation
> -------------------- ----------------
> -----------------------------------------
> ----------------------------------------------------
> ---------------------------
>                 1743              238
> 94786.13                                    398.2610504201680
> 582.268450
>                   429              231
> 94819.41                                    410.4736363636363
> 615.038404
>                  2994              227
> 94862.61                                    417.8969603524229
> 624.537930
>                  6395              268
> 97010.48                                    361.9794029850746
> 577.991448
>                 12783              240
> 97573.55                                    406.5564583333333
> 591.678500
>                  4974              235
> 98006.16                                    417.0474893617021
> 625.670115
>                 42167              266
> 98585.96                                    370.6239097744360
> 592.079099
>                 10747              256
> 99578.09                                    388.9769140625000
> 601.938312
>                 11407              248
> 103412.66                                    416.9865322580645
> 623.479751
>  
> I tried Google search and seems to be different suggestions. May be I have to
> rewrite the code?
>  
> Thanks
>  
> Mich Talebzadeh
>  
> http://talebzadehmich.wordpress.com
>  
> Publications due shortly:
> Creating in-memory Data Grid for Trading Systems with Oracle TimesTen and
> Coherence Cache
>  
> NOTE: The information in this email is proprietary and confidential. This
> message is for the designated recipient only, if you are not the intended
> recipient, you should destroy it immediately. Any information in this message
> shall not be understood as given or endorsed by Peridale Ltd, its subsidiaries
> or their employees, unless expressly so stated. It is the responsibility of
> the recipient to ensure that this email is virus free, therefore neither
> Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>  



Reply via email to