Re: GROUP BY Issue

Gourav Sengupta Mon, 10 Jun 2013 12:23:50 -0700

Hi Shahab,

It will be great if someone can delete this email from PIG group. I am
aware of this mistake and had posted this issue to HIVE group almost
immediately.


Regards,
Gourav


On Mon, Jun 10, 2013 at 5:28 PM, Shahab Yunus <[email protected]>wrote:

> Gourav, this is not a HIVE mailing list. It is PIG's.
>
> Regards,
> Shahab
>
>
> On Mon, Jun 10, 2013 at 10:39 AM, Gourav Sengupta
> <[email protected]>wrote:
>
> > Hi,
> >
> > On running the following query I am getting multiple records with same
> > value of F1
> >
> > SELECT F1, COUNT(*)
> > FROM
> > (
> > SELECT F1, F2, COUNT(*)
> > FROM TABLE1
> > GROUP BY F1, F2
> > ) a
> > GROUP BY F1;
> >
> > As per what I understand there are multiple number of records based on
> > number of reducers.
> >
> > Replicating the test scenario:
> > STEP1: get the dataset as available in
> > http://snap.stanford.edu/data/amazon0302.html
> >
> > STEP2: Open the file and delete the heading
> >
> > STEP3: hadoop fs -mkdir /test
> >
> > STEP4: hadoop fs -put amazon0302.txt /test
> >
> > STEP5: create external table test (f1 int, f2 int) row format delimited
> > fields terminated by '\t' lines terminated by '\n' stored as textfile
> > location '/test';
> >
> > STEP6: create table test1 location '/test1' as select left_table.* from
> > (select * from test where f1<10000) left_table join (select * from test
> > where f1 < 10000) right_table;
> >
> > STEP7: hadoop fs -mkdir /test2
> >
> > STEP8: create table test2 location '/test2' as select f1, count(*) from
> > (select f1, f2, count(*) from test1 group by f1, f2) a group by f1;
> >
> > STEP9: select * from test2 where f1 = 9887;
> >
> > ENVIRONMENT:
> > HADOOP 2.0.4
> > HIVE 0.11
> >
> > Please do let me know whether I am doing anything wrong.
> >
> >
> > Thanks and Regards,
> > Gourav Sengupta
> >
>

Re: GROUP BY Issue

Reply via email to