I think right URI scheme is s3n://abc/def. We use that with EMR version of hive in production.
create table test (schema string) location 's3n://abc/def'; should work. On Tue, May 29, 2012 at 2:35 PM, Balaji Rao <[email protected]> wrote: > To partition on s3, one would create folders like: > s3://mybucket/path/dt=2012-05-20 > dt=2012-05-21 > dt=2012-05-22 > > You can then use: > create external table from_to(from_address string, to_address string) > partitioned by (dt string) row format delimited fields terminated by > '\t' stored as textfile locaton 's3://mybucket/path'; > > Then issue the command: > alter table from_to recover partitions; > > You will be able to then use the partitions: > select from_address, to_address, dt from from_to where dt >='2012-05-21' > > On Tue, May 29, 2012 at 5:19 PM, Russell Jurney > <[email protected]> wrote: > > I get an error when I create an external table. btw - I can partition > on dt > > or from/to address. I'm just not clear on how to partition - my efforts > > fail. > > > > hive> create external table from_to(from_address string, to_address > string, > > dt string) > > > row format delimited fields terminated by '\t' stored as > textfile > > location 's3n://rjurney_public_web/from_to_date'; > > FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid > > hostname in URI s3n://rjurney_public_web/from_to_date > > FAILED: Execution Error, return code 1 from > > org.apache.hadoop.hive.ql.exec.DDLTask > > > > > > However, I just upgraded to HIVE 0.9, and it works :) No reason to use > the > > old stuff when I can scp the new one up. > > > > Thanks! > > > > On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[email protected]> > wrote: > >> > >> If you are using hive on EMR, you can create a table directly from the > >> data on S3: > >> > >> From hive, you can create tables that use S3 data like this: > >> > >> create external table from_to(from_address string, to_address string, > >> dt string) row format delimited fields terminated by '\t' stored as > >> textfile location 's3://rjurney_public_web/from_to_date'; > >> > >> You could then: > >> select <*> from from_to > >> > >> Balaji > >> > >> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney > >> <[email protected]> wrote: > >> > How do I load data from S3 into Hive using Amazon EMR? I've booted a > >> > small > >> > cluster, and I want to load a 3-column TSV file from Pig into a table > >> > like > >> > this: > >> > > >> > create table from_to (from_address string, to_address string, dt > >> > string); > >> > > >> > > >> > When I run something like this: > >> > > >> > load data inpath 's3n://rjurney_public_web/from_to_date' into table > >> > from_to; > >> > > >> > > >> > I get errors: > >> > > >> > FAILED: Error in semantic analysis: Line 1:17 Invalid path > >> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file > >> > systems > >> > accepted. s3n file system is not supported. > >> > > >> > > >> > There is no distcp on the master node of my EMR cluster, so I can't > copy > >> > it > >> > over. I've read the documentation... and so far after a day of > trying, > >> > I > >> > can't load data into HIVE via EMR. > >> > > >> > What am I missing? Thanks! > >> > -- > >> > Russell > >> > Jurney twitter.com/rjurney [email protected] datasyndrome.com > > > > > > > > > > -- > > Russell Jurney twitter.com/rjurney [email protected] > datasyndrome.com > -- "...:::Aniket:::... Quetzalco@tl"
