[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-18 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3699:
-

   Resolution: Fixed
Fix Version/s: 0.11.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed. Thanks Navis

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Fix For: 0.11.0
>
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-14 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3699:
--

Attachment: HIVE-3699.D7743.3.patch

navis updated the revision "HIVE-3699 [jira] Multiple insert overwrite into 
multiple tables query stores same results in all tables".
Reviewers: JIRA, njain

  Updated result of multi_insert.q,multi_insert_move_tasks_share_dependencies.q


REVISION DETAIL
  https://reviews.facebook.net/D7743

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/test/queries/clientpositive/multi_insert_gby.q
  ql/src/test/results/clientpositive/groupby_multi_single_reducer2.q.out
  ql/src/test/results/clientpositive/multi_insert.q.out
  ql/src/test/results/clientpositive/multi_insert_gby.q.out
  
ql/src/test/results/clientpositive/multi_insert_move_tasks_share_dependencies.q.out

To: JIRA, njain, navis
Cc: njain


> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-14 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3699:


Affects Version/s: (was: 0.10.0)
   Status: Patch Available  (was: Open)

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699.D7743.3.patch, HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-07 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3699:
-

Status: Open  (was: Patch Available)

A lot of tests are failing - can you debug ?

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-05 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3699:


Attachment: HIVE-3699_hive-0.9.1.patch.txt

for 0.9 branch

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch, 
> HIVE-3699_hive-0.9.1.patch.txt
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-03 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3699:
--

Attachment: HIVE-3699.D7743.2.patch

navis updated the revision "HIVE-3699 [jira] Multiple insert overwrite into 
multiple tables query stores same results in all tables".
Reviewers: JIRA

  Addressed comment


REVISION DETAIL
  https://reviews.facebook.net/D7743

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/test/queries/clientpositive/multi_insert_gby.q
  ql/src/test/results/clientpositive/multi_insert_gby.q.out

To: JIRA, navis
Cc: njain


> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-03 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3699:


Status: Patch Available  (was: Open)

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch, HIVE-3699.D7743.2.patch
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-03 Thread Namit Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Namit Jain updated HIVE-3699:
-

Status: Open  (was: Patch Available)

comments on phabricator

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-03 Thread Navis (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Navis updated HIVE-3699:


 Assignee: Navis
Affects Version/s: (was: 0.9.0)
   0.10.0
   Status: Patch Available  (was: Open)

small bug in PPD

for workaround, set hive.optimize.ppd=false;

> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.10.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
>Assignee: Navis
> Attachments: HIVE-3699.D7743.1.patch
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.229 seconds
> hive> SELECT * from e2;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 0.11 seconds
> Here is are the result to the indiviual queries (only the second query 
> returns a result set):
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM 
> nikon
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
> (...)
> OK
>   <- There are no results, this is normal
> Time taken: 41.471 seconds
> hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
> (...)
> OK
> NULL  2
> 1627575 25
> 1627576 70
> 1690950 22
> 1690952 42
> 1696705 199
> 1696706 66
> 1696730 229
> 1696759 85
> 1696893 218
> Time taken: 39.607 seconds
> 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2013-01-03 Thread Phabricator (JIRA)

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HIVE-3699:
--

Attachment: HIVE-3699.D7743.1.patch

navis requested code review of "HIVE-3699 [jira] Multiple insert overwrite into 
multiple tables query stores same results in all tables".
Reviewers: JIRA

  DPAL-1952 Multiple insert overwrite into multiple tables query stores same 
results in all tables

  (Note: This might be related to HIVE-2750)

  I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
to scan the dataset only 1 time, and i end up having all these tables with the 
same content ! It seems the GROUP BY query that returns results is overwriting 
all the temp tables.

  Weird enough, if i had further GROUP BY queries into additional temp tables, 
grouped by a different field, then all temp tables, even the ones that would 
have been wrong content are all correctly populated.

  This is the misbehaving query:

  FROM nikon
  INSERT OVERWRITE TABLE e1
  SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions
  WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
  INSERT OVERWRITE TABLE e2
  SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues
  WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
  ;

  It launches only one MR job and here are the results. Why does table 'e1' 
contains results from table 'e2' ?! Table 'e1' should have been empty (see 
individual SELECTs further below)

  hive> SELECT * from e1;
  OK
  NULL2
  1627575 25
  1627576 70
  1690950 22
  1690952 42
  1696705 199
  1696706 66
  1696730 229
  1696759 85
  1696893 218
  Time taken: 0.229 seconds

  hive> SELECT * from e2;
  OK
  NULL2
  1627575 25
  1627576 70
  1690950 22
  1690952 42
  1696705 199
  1696706 66
  1696730 229
  1696759 85
  1696893 218
  Time taken: 0.11 seconds

  Here is are the result to the indiviual queries (only the second query 
returns a result set):

  hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Impressions FROM nikon
  WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
  (...)
  OK
<- There are no results, this is normal
  Time taken: 41.471 seconds

  hive> SELECT qs_cs_s_aid AS Emplacements, COUNT AS Vues FROM nikon
  WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
  (...)
  OK
  NULL  2
  1627575 25
  1627576 70
  1690950 22
  1690952 42
  1696705 199
  1696706 66
  1696730 229
  1696759 85
  1696893 218
  Time taken: 39.607 seconds

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D7743

AFFECTED FILES
  ql/src/java/org/apache/hadoop/hive/ql/ppd/OpProcFactory.java
  ql/src/test/queries/clientpositive/multi_insert_gby.q
  ql/src/test/results/clientpositive/multi_insert_gby.q.out

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/18627/

To: JIRA, navis


> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: Alexandre Fouché
> Attachments: HIVE-3699.D7743.1.patch
>
>
> (Note: This might be related to HIVE-2750)
> I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
> to scan the dataset only 1 time, and i end up having all these tables with 
> the same content ! It seems the GROUP BY query that returns results is 
> overwriting all the temp tables.
> Weird enough, if i had further GROUP BY queries into additional temp tables, 
> grouped by a different field, then all temp tables, even the ones that would 
> have been wrong content are all correctly populated.
> This is the misbehaving query:
> FROM nikon
> INSERT OVERWRITE TABLE e1
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
> WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
> INSERT OVERWRITE TABLE e2
> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
> WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
> ;
> It launches only one MR job and here are the results. Why does table 'e1' 
> contains results from table 'e2' ?! Table 'e1' should have been empty (see 
> individual SELECTs further below)
> hive> SELECT * from e1;
> OK
> NULL2
> 1627575 25
> 1627576 70
> 1690950 22
>   

[jira] [Updated] (HIVE-3699) Multiple insert overwrite into multiple tables query stores same results in all tables

2012-11-10 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/HIVE-3699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Fouché updated HIVE-3699:
---

Description: 
(Note: This might be related to HIVE-2750)

I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
to scan the dataset only 1 time, and i end up having all these tables with the 
same content ! It seems the GROUP BY query that returns results is overwriting 
all the temp tables.

Weird enough, if i had further GROUP BY queries into additional temp tables, 
grouped by a different field, then all temp tables, even the ones that would 
have been wrong content are all correctly populated.

This is the misbehaving query:

FROM nikon
INSERT OVERWRITE TABLE e1
SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
INSERT OVERWRITE TABLE e2
SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
;

It launches only one MR job and here are the results. Why does table 'e1' 
contains results from table 'e2' ?! Table 'e1' should have been empty (see 
individual SELECTs further below)

hive> SELECT * from e1;
OK
NULL2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.229 seconds

hive> SELECT * from e2;
OK
NULL2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.11 seconds


Here is are the result to the indiviual queries (only the second query returns 
a result set):

hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM nikon
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
(...)
OK
  <- There are no results, this is normal
Time taken: 41.471 seconds

hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
(...)
OK
NULL  2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 39.607 seconds


  was:
I am doing a query with multiple INSERT OVERWRITE to multiple tables in order 
to scan the dataset only 1 time, and i end up having all these tables with the 
same content ! It seems the GROUP BY query that returns results is overwriting 
all the temp tables.

Weird enough, if i had further GROUP BY queries into additional temp tables, 
grouped by a different field, then all temp tables, even the ones that would 
have been wrong content are all correctly populated.

This is the misbehaving query:

FROM nikon
INSERT OVERWRITE TABLE e1
SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid
INSERT OVERWRITE TABLE e2
SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid
;

It launches only one MR job and here are the results. Why does table 'e1' 
contains results from table 'e2' ?! Table 'e1' should have been empty (see 
individual SELECTs further below)

hive> SELECT * from e1;
OK
NULL2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.229 seconds

hive> SELECT * from e2;
OK
NULL2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 0.11 seconds


Here is are the result to the indiviual queries (only the second query returns 
a result set):

hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Impressions FROM nikon
WHERE qs_cs_s_cat='PRINT' GROUP BY qs_cs_s_aid;
(...)
OK
  <- There are no results, this is normal
Time taken: 41.471 seconds

hive> SELECT qs_cs_s_aid AS Emplacements, COUNT(*) AS Vues FROM nikon
WHERE qs_cs_s_cat='VIEW' GROUP BY qs_cs_s_aid;
(...)
OK
NULL  2
1627575 25
1627576 70
1690950 22
1690952 42
1696705 199
1696706 66
1696730 229
1696759 85
1696893 218
Time taken: 39.607 seconds



> Multiple insert overwrite into multiple tables query stores same results in 
> all tables
> --
>
> Key: HIVE-3699
> URL: https://issues.apache.org/jira/browse/HIVE-3699
> Project: Hive
>  Issue Type: Bug
>  Components: Query Processor
>Affects Versions: 0.9.0
> Environment: Cloudera 4.1 on Amazon Linux (rebranded Centos 6): 
> hive-0.9.0+150-1.cdh4.1.1.p0.4.el6.noarch
>Reporter: A