Give one of the columns that is most suitable splitting the data - split by
column need not be a unique column but should be more or less uniformly
distributed in the table data so that each mapper processes roughly equal
amount of data

Venkat


On Thu, Feb 13, 2014 at 10:41 AM, Martin, Nick <[email protected]> wrote:

>  Forgot about the HCat integration, I'll play with that.
>
>
>
> So if my table has a multi-column pk/unique identifier
> (say...col1,col2,col3,col4 = unique identified for a row) what would the
> syntax for that be using --split-by? I can't pass multiple columns in a
> single -split-by option as far as I know...
>
>
>
> *From:* Venkat Ranganathan [mailto:[email protected]]
> *Sent:* Thursday, February 13, 2014 1:20 PM
> *To:* [email protected]
> *Subject:* Re: Multi-Column PK in 1.4.4
>
>
>
> You can use an explicit split by column and use multiple mappers.    You
> also have the option to use hcatalog support to directly move data into
> target hive format (if you are using RCFile or ORCFile or some other format
> for the hive table)
>
>
> Venkat
>
>
>
> On Thu, Feb 13, 2014 at 8:35 AM, Martin, Nick <[email protected]> wrote:
>
> Just wanted to confirm and make sure I'm not missing anything...
>
>
>
> I'm running 1.4.4 and need to import a large-ish table (400m rows) from
> Oracle w/ a multi-column pk into Hive. That's not doable with multiple
> mappers currently, right (I'd have to go -m 1)? My only option would be
>  HBase for a multi-column key?
>
>
>
> Thanks!
> Nick
>
>
>
>
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

Reply via email to