RE: Multi-Column PK in 1.4.4

Martin, Nick Thu, 13 Feb 2014 10:43:00 -0800

Forgot about the HCat integration, I'll play with that.

So if my table has a multi-column pk/unique identifier 
(say...col1,col2,col3,col4 = unique identified for a row) what would the syntax 
for that be using --split-by? I can't pass multiple columns in a single 
-split-by option as far as I know...

From: Venkat Ranganathan [mailto:[email protected]]
Sent: Thursday, February 13, 2014 1:20 PM
To: [email protected]
Subject: Re: Multi-Column PK in 1.4.4

You can use an explicit split by column and use multiple mappers.    You also 
have the option to use hcatalog support to directly move data into target hive 
format (if you are using RCFile or ORCFile or some other format for the hive 
table)

Venkat

On Thu, Feb 13, 2014 at 8:35 AM, Martin, Nick 
<[email protected]<mailto:[email protected]>> wrote:
Just wanted to confirm and make sure I'm not missing anything...

I'm running 1.4.4 and need to import a large-ish table (400m rows) from Oracle 
w/ a multi-column pk into Hive. That's not doable with multiple mappers 
currently, right (I'd have to go -m 1)? My only option would be  HBase for a 
multi-column key?

Thanks!
Nick

CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader of 
this message is not the intended recipient, you are hereby notified that any 
printing, copying, dissemination, distribution, disclosure or forwarding of 
this communication is strictly prohibited. If you have received this 
communication in error, please contact the sender immediately and delete it 
from your system. Thank You.

RE: Multi-Column PK in 1.4.4

Reply via email to