HI Yang, We do this today camus to hive (without the Avro) just plain old tab separated log line.
We use the hive -f command to add dynamic partition to hive table: Bash Shell Scripts add time buckets into HIVE table before camus job runs: for partition in "${@//\//,}"; do echo "ALTER TABLE ${env:TABLE_NAME} ADD IF NOT EXISTS PARTITION ($partition);" done | hive -f e.g File produce by the camus job: /user/[hive.user]/output/ *partition_month_utc=2015-03/partition_day_utc=2015-03-11/partition_minute_bucket=2015-03-11-02-09/* Above will add hive dynamic partition before camus job runs. It works, and you can have any schema: CREATE EXTERNAL TABLE IF NOT EXISTS ${env:TABLE_NAME} ( SOME Table FIELDS... ) PARTITIONED BY ( partition_month_utc STRING, partition_day_utc STRING, partition_minute_bucket STRING ) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS SEQUENCEFILE LOCATION '${env:TABLE_LOCATION_CAMUS_OUTPUT}' ; I hope this will help ! You will have to construct hive query according to partition define. Thanks, Bhavesh On Wed, Mar 11, 2015 at 7:24 AM, Andrew Otto <ao...@wikimedia.org> wrote: > > Hive provides the ability to provide custom patterns for partitions. You > > can use this in combination with MSCK REPAIR TABLE to automatically > detect > > and load the partitions into the metastore. > > I tried this yesterday, and as far as I can tell it doesn’t work with a > custom partition layout. At least not with external tables. MSCK REPAIR > TABLE reports that there are directories in the table’s location that are > not partitions of the table, but it wouldn’t actually add the partition > unless the directory layout matched Hive’s default > (key1=value1/key2=value2, etc.) > > > > > On Mar 9, 2015, at 17:16, Pradeep Gollakota <pradeep...@gmail.com> > wrote: > > > > If I understood your question correctly, you want to be able to read the > > output of Camus in Hive and be able to know partition values. If my > > understanding is right, you can do so by using the following. > > > > Hive provides the ability to provide custom patterns for partitions. You > > can use this in combination with MSCK REPAIR TABLE to automatically > detect > > and load the partitions into the metastore. > > > > Take a look at this SO > > > http://stackoverflow.com/questions/24289571/hive-0-13-external-table-dynamic-partitioning-custom-pattern > > > > Does that help? > > > > > > On Mon, Mar 9, 2015 at 1:42 PM, Yang <teddyyyy...@gmail.com> wrote: > > > >> I believe many users like us would export the output from camus as a > hive > >> external table. but the dir structure of camus is like > >> /YYYY/MM/DD/xxxxxx > >> > >> while hive generally expects /year=YYYY/month=MM/day=DD/xxxxxx if you > >> define that table to be > >> partitioned by (year, month, day). otherwise you'd have to add those > >> partitions created by camus through a separate command. but in the > latter > >> case, would a camus job create >1 partitions ? how would we find out the > >> YYYY/MM/DD values from outside ? ---- well you could always do > something by > >> hadoop dfs -ls and then grep the output, but it's kind of not clean.... > >> > >> > >> thanks > >> yang > >> > >