https://bugzilla.wikimedia.org/show_bug.cgi?id=65420
--- Comment #31 from Andrew Otto <[email protected]> --- Not sure what you mean by old and new partitions. Do you mean the single table vs the old 4 tables? There is a difference, yes, in that you query much more data by default with the webrequest table. For example, bits is very large. If are pretty sure you don't want bits data, add a "where webrequest_source != 'bits'" into the query. That will cut down the data size down a lot. I'm googling for ways to make these large queries run and am learning things, but am not yet sure. I'm also looking for errors in the logs to find out why they died. See also: http://mail-archives.apache.org/mod_mbox/hive-user/201212.mbox/%[email protected]%3E Also, since we were talking about HADOOP_HEAPSIZE and Hive CLI earlier, this is the documentation on HADOOP_HEAPSIZE for Hive CLI: # Larger heap size may be required when running queries over large number of files or partitions. # By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be # appropriate for hive server (hwi etc). So it seems the Hive CLI itself needs to have larger heapsizes when running over larger datasets, as we were assuming. I'm still not sure why that would be. I suppose it looks at the data before submitting the job to Hadoop? -- You are receiving this mail because: You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
