Re: loading datafiles in s3

2011-06-30 Thread Kennon Lee
location of the directory you wish to add, your > data will be added correctly. > > ** ** > > pat > > ** ** > > *From:* Kennon Lee [mailto:ken...@brooklynpacket.com] > *Sent:* Wednesday, June 29, 2011 1:27 PM > > *To:* user@hive.apache.org > *Subject:* Re: loading

RE: loading datafiles in s3

2011-06-30 Thread Christopher, Pat
tion load. tl;dr? If you specifiy the full location of the directory you wish to add, your data will be added correctly. pat From: Kennon Lee [mailto:ken...@brooklynpacket.com] Sent: Wednesday, June 29, 2011 1:27 PM To: user@hive.apache.org Subject: Re: loading datafiles in s3 Thanks for

Re: loading datafiles in s3

2011-06-29 Thread Kennon Lee
Thanks for the responses. Regarding the first question, I wasnt sure what you meant by using ALTER TABLE statements to allow for non-prefixed directory names. Don't you still have to name the directories with the 'blah=' part? For instance, if we do: ALTER TABLE foo ADD PARTITION (dt='2011-06-29')

Re: loading datafiles in s3

2011-06-28 Thread Igor Tatarinov
I think the answer to 1 is No but you can confirm on the AWS EMR forum. The problem I've been having is that if you have x=foo in the prefix of your S3 path, EMR will try to use it as part of your partitioning key even if you don't want it. Say, x=foo/y=bar/data and you want to partition on y only

RE: loading datafiles in s3

2011-06-28 Thread Christopher, Pat
allo, 1 dunno. I generate my EMR scripts in a separate script so generating a stack of 'alter table...' queries is easy for me 2 event_b will have a null value in column 4. 2 b ( you didn't ask) what happens with this row: event_c user_id france 500 afifthcolumn afifthcolumn will be truncate