the location should be 's3://' and not 's3n://'
On Tue, May 29, 2012 at 5:19 PM, Russell Jurney <[email protected]> wrote: > Ok, I spoke too soon. Same error. Crapola. Still working on it. > > > On Tue, May 29, 2012 at 2:19 PM, Russell Jurney <[email protected]> > wrote: >> >> I get an error when I create an external table. btw - I can partition on >> dt or from/to address. I'm just not clear on how to partition - my efforts >> fail. >> >> hive> create external table from_to(from_address string, to_address >> string, dt string) >> > row format delimited fields terminated by '\t' stored as >> textfile location 's3n://rjurney_public_web/from_to_date'; >> FAILED: Error in metadata: java.lang.IllegalArgumentException: Invalid >> hostname in URI s3n://rjurney_public_web/from_to_date >> FAILED: Execution Error, return code 1 from >> org.apache.hadoop.hive.ql.exec.DDLTask >> >> >> However, I just upgraded to HIVE 0.9, and it works :) No reason to use >> the old stuff when I can scp the new one up. >> >> Thanks! >> >> On Tue, May 29, 2012 at 1:34 PM, Balaji Rao <[email protected]> wrote: >>> >>> If you are using hive on EMR, you can create a table directly from the >>> data on S3: >>> >>> From hive, you can create tables that use S3 data like this: >>> >>> create external table from_to(from_address string, to_address string, >>> dt string) row format delimited fields terminated by '\t' stored as >>> textfile location 's3://rjurney_public_web/from_to_date'; >>> >>> You could then: >>> select <*> from from_to >>> >>> Balaji >>> >>> On Tue, May 29, 2012 at 4:20 PM, Russell Jurney >>> <[email protected]> wrote: >>> > How do I load data from S3 into Hive using Amazon EMR? I've booted a >>> > small >>> > cluster, and I want to load a 3-column TSV file from Pig into a table >>> > like >>> > this: >>> > >>> > create table from_to (from_address string, to_address string, dt >>> > string); >>> > >>> > >>> > When I run something like this: >>> > >>> > load data inpath 's3n://rjurney_public_web/from_to_date' into table >>> > from_to; >>> > >>> > >>> > I get errors: >>> > >>> > FAILED: Error in semantic analysis: Line 1:17 Invalid path >>> > 's3n://rjurney_public_web/from_to_date': only "file" or "hdfs" file >>> > systems >>> > accepted. s3n file system is not supported. >>> > >>> > >>> > There is no distcp on the master node of my EMR cluster, so I can't >>> > copy it >>> > over. I've read the documentation... and so far after a day of trying, >>> > I >>> > can't load data into HIVE via EMR. >>> > >>> > What am I missing? Thanks! >>> > -- >>> > Russell >>> > Jurney twitter.com/rjurney [email protected] datasyndrome.com >> >> >> >> >> -- >> Russell >> Jurney twitter.com/rjurney [email protected] datasyndrome.com > > > > > -- > Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com
