Can you please check your Jobtracker logs? The is a generic error related to grabbing the Task Attempt Log URL, the real error is in JT logs.
On Wed, Apr 3, 2013 at 7:17 PM, Sadananda Hegde <saduhe...@gmail.com> wrote: > Hi Dean, > > I tried inserting a bucketed hive table from a non-bucketed table using > insert overwrite .... select from clause; but I get the following error. > > ---------------------------------------------------------------------------------- > Exception in thread "Thread-225" java.lang.NullPointerException > at > org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:44) > at > org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186) > at > org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142) > at java.lang.Thread.run(Thread.java:662) > FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask > > -------------------------------------------------------------------------------------------------------------------------- > > Both tables have same structure except that that one has CLUSTERED BY > CLAUSE and other not. > > Some columns are defined as Array of Structs. The Insert statement works > fine if I take out those complex columns. Are there any known issues > loading STRUCT or ARRAY OF STRUCT fields? > > > Thanks for your time and help. > > Sadu > > > > > On Sat, Mar 30, 2013 at 7:00 PM, Dean Wampler < > dean.wamp...@thinkbiganalytics.com> wrote: > >> The table can be external. You should be able to use this data with other >> tools, because all bucketing does is ensure that all occurrences for >> records with a given key are written into the same block. This is why >> clustered/blocked data can be joined on those keys using map-side joins; >> Hive knows it can cache ab individual block in memory and the block will >> hold all records across the table for the keys in that block. >> >> So, Java MR apps and Pig can still read the records, but they won't >> necessarily understand how the data is organized. I.e., it might appear >> unsorted. Perhaps HCatalog will allow other tools to exploit the structure, >> but I'm not sure. >> >> dean >> >> >> On Sat, Mar 30, 2013 at 5:44 PM, Sadananda Hegde <saduhe...@gmail.com>wrote: >> >>> Thanks, Dean. >>> >>> Does that mean, this bucketing is exclusively Hive feature and not >>> available to others like Java, Pig, etc? >>> >>> And also, my final tables have to be managed tables; not external >>> tables, right? >>> . >>> Thank again for your time and help. >>> >>> Sadu >>> >>> >>> >>> On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler < >>> dean.wamp...@thinkbiganalytics.com> wrote: >>> >>>> I don't know of any way to avoid creating new tables and moving the >>>> data. In fact, that's the official way to do it, from a temp table to the >>>> final table, so Hive can ensure the bucketing is done correctly: >>>> >>>> https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html >>>> >>>> In other words, you might have a big move now, but going forward, >>>> you'll want to stage your data in a temp table, use this procedure to put >>>> it in the final location, then delete the temp data. >>>> >>>> dean >>>> >>>> On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde >>>> <saduhe...@gmail.com>wrote: >>>> >>>>> Hello, >>>>> >>>>> We run M/R jobs to parse and process large and highly complex xml >>>>> files into AVRO files. Then we build external Hive tables on top the >>>>> parsed >>>>> Avro files. The hive tables are partitioned by day; but they are still >>>>> huge >>>>> partitions and joins do not perform that well. So I would like to try >>>>> out creating buckets on the join key. How do I create the buckets on the >>>>> existing HDFS files? I would prefer to avoid creating another set of >>>>> tables >>>>> (bucketed) and load data from non-bucketed table to bucketed tables if at >>>>> all possible. Is it possible to do the bucketing in Java as part of the >>>>> M/R >>>>> jobs while creating the Avro files? >>>>> >>>>> Any help / insight would greatly be appreciated. >>>>> >>>>> Thank you very much for your time and help. >>>>> >>>>> Sadu >>>>> >>>> >>>> >>>> >>>> -- >>>> *Dean Wampler, Ph.D.* >>>> thinkbiganalytics.com >>>> +1-312-339-1330 >>>> >>>> >>> >> >> >> -- >> *Dean Wampler, Ph.D.* >> thinkbiganalytics.com >> +1-312-339-1330 >> >> >