For problems with INSERT INTO, there are HIVE-3465 and HIVE-3676.
2013/1/27 John Omernik <j...@omernik.com>: > I am not a code expert, this looks very much like the bug I posted, but my > bug is not using INSERT OVERWRITE (just INSERT INTO) and I am not doing any > group by (probably not an issue) > > Just to be clear, this is probably the same issue as mine, but if someone > with more knowledge of the underlying structures were to see the OVERWRITE > vs INTO they may see something different. > > > On Sat, Jan 26, 2013 at 9:20 AM, Philip Tromans <philip.j.trom...@gmail.com> > wrote: >> >> This is a known (recently fixed) bug: >> >> https://issues.apache.org/jira/browse/HIVE-3699 >> >> Phil. >> >> >> On 26 January 2013 15:17, John Omernik <j...@omernik.com> wrote: >>> >>> I ran into an interesting bug. Basically, if your FROM() source is a >>> partitioned table and you use a where clause that prunes, all of the INSERT >>> HERE SELECT * WHERE x=y ignores each specified where clause. This does not >>> occur if the source partition is not specified, but if the source as where >>> partition = 'x' then the where on each individual insert is ignored... >>> >>> I've included some files here >>> >>> testdata.tsv - Tab delimited data to prove the issue >>> create_tables.hive - Creates a database and tables as well as loads the >>> data from the TSV >>> >>> Test Cases: >>> I created these test case files in a way that there are three types of >>> insert in each case: 1. Load all data from initial statement, 2. Load >>> partial data (use a limiting clause such as where day >= '2013-01-05', and 3 >>> Load NO data from the initial statement (where 1 = 0) >>> >>> These tests are all run on hive 0.9 >>> >>> multi-flat-flat.hive - The source table and the dest tables are not >>> partitioned, the where clauses work as expected: >>> >>> 19 Rows loaded to multi_bug_flat >>> 0 Rows loaded to multi_bug_flat3 >>> 15 Rows loaded to multi_bug_flat2 >>> >>> multi-part-part.hive - The source table and the dest tables are >>> partitioned. The where clauses are not honored. >>> >>> 9 Rows loaded to multi_bug_part3 >>> 9 Rows loaded to multi_bug_part2 >>> 9 Rows loaded to multi_bug_part >>> >>> multi-flat-part.hive - The source table is flat, the dest table is >>> partitioned - The where clauses work as expected: >>> >>> 0 Rows loaded to multi_bug_part3 >>> 15 Rows loaded to multi_bug_part2 >>> 19 Rows loaded to multi_bug_part >>> >>> multi-part-flat.hive - The source table is partitioned, the dest table is >>> flat - The where clauses are not honored: >>> >>> 9 Rows loaded to multi_bug_flat >>> 9 Rows loaded to multi_bug_flat3 >>> 9 Rows loaded to multi_bug_flat2 >>> >>> multi-part-specified.hive - The source and dest are partitioned, but >>> there is no partition pruning statement in the from () this works as >>> expected >>> >>> 0 Rows loaded to multi_bug_part3 >>> 15 Rows loaded to multi_bug_part2 >>> 19 Rows loaded to multi_bug_part >>> >>> >>> Thoughts? >> >> >