Re: Bucketing external tables

Mark Grover Wed, 03 Apr 2013 22:37:00 -0700

Can you please check your Jobtracker logs? The is a generic error related
to grabbing the Task Attempt Log URL, the real error is in JT logs.


On Wed, Apr 3, 2013 at 7:17 PM, Sadananda Hegde <saduhe...@gmail.com> wrote:

> Hi Dean,
>
> I tried inserting a bucketed hive table from a non-bucketed table using
> insert overwrite .... select from clause; but I get the following error.
>
> ----------------------------------------------------------------------------------
> Exception in thread "Thread-225" java.lang.NullPointerException
>         at
> org.apache.hadoop.hive.shims.Hadoop23Shims.getTaskAttemptLogUrl(Hadoop23Shims.java:44)
>         at
> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.getTaskInfos(JobDebugger.java:186)
>         at
> org.apache.hadoop.hive.ql.exec.JobDebugger$TaskInfoGrabber.run(JobDebugger.java:142)
>         at java.lang.Thread.run(Thread.java:662)
> FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask
>
> --------------------------------------------------------------------------------------------------------------------------
>
> Both tables have same structure except that that one has CLUSTERED BY
> CLAUSE and other not.
>
> Some columns are defined as Array of Structs. The Insert statement works
> fine if I take out those complex columns. Are there any known issues
> loading STRUCT or ARRAY OF STRUCT fields?
>
>
> Thanks for your time and help.
>
> Sadu
>
>
>
>
> On Sat, Mar 30, 2013 at 7:00 PM, Dean Wampler <
> dean.wamp...@thinkbiganalytics.com> wrote:
>
>> The table can be external. You should be able to use this data with other
>> tools, because all bucketing does is ensure that all occurrences for
>> records with a given key are written into the same block. This is why
>> clustered/blocked data can be joined on those keys using map-side joins;
>> Hive knows it can cache ab individual block in memory and the block will
>> hold all records across the table for the keys in that block.
>>
>> So, Java MR apps and Pig can still read the records, but they won't
>> necessarily understand how the data is organized. I.e., it might appear
>> unsorted. Perhaps HCatalog will allow other tools to exploit the structure,
>> but I'm not sure.
>>
>> dean
>>
>>
>> On Sat, Mar 30, 2013 at 5:44 PM, Sadananda Hegde <saduhe...@gmail.com>wrote:
>>
>>> Thanks, Dean.
>>>
>>> Does that mean, this bucketing is exclusively Hive feature and not
>>> available to others like Java, Pig, etc?
>>>
>>> And also, my final tables have to be managed tables; not external
>>> tables, right?
>>>  .
>>> Thank again for your time and help.
>>>
>>> Sadu
>>>
>>>
>>>
>>> On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler <
>>> dean.wamp...@thinkbiganalytics.com> wrote:
>>>
>>>> I don't know of any way to avoid creating new tables and moving the
>>>> data. In fact, that's the official way to do it, from a temp table to the
>>>> final table, so Hive can ensure the bucketing is done correctly:
>>>>
>>>>  https://cwiki.apache.org/Hive/languagemanual-ddl-bucketedtables.html
>>>>
>>>> In other words, you might have a big move now, but going forward,
>>>> you'll want to stage your data in a temp table, use this procedure to put
>>>> it in the final location, then delete the temp data.
>>>>
>>>> dean
>>>>
>>>> On Fri, Mar 29, 2013 at 4:58 PM, Sadananda Hegde 
>>>> <saduhe...@gmail.com>wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> We run M/R jobs to parse and process large and highly complex xml
>>>>> files into AVRO files. Then we build external Hive tables on top the 
>>>>> parsed
>>>>> Avro files. The hive tables are partitioned by day; but they are still 
>>>>> huge
>>>>> partitions and joins do not perform that well. So I would like to try
>>>>> out creating buckets on the join key. How do I create the buckets on the
>>>>> existing HDFS files? I would prefer to avoid creating another set of 
>>>>> tables
>>>>> (bucketed) and load data from non-bucketed table to bucketed tables if at
>>>>> all possible. Is it possible to do the bucketing in Java as part of the 
>>>>> M/R
>>>>> jobs while creating the Avro files?
>>>>>
>>>>> Any help / insight would greatly be appreciated.
>>>>>
>>>>> Thank you very much for your time and help.
>>>>>
>>>>> Sadu
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> *Dean Wampler, Ph.D.*
>>>> thinkbiganalytics.com
>>>> +1-312-339-1330
>>>>
>>>>
>>>
>>
>>
>> --
>> *Dean Wampler, Ph.D.*
>> thinkbiganalytics.com
>> +1-312-339-1330
>>
>>
>

Re: Bucketing external tables

Reply via email to