HI Yang,
We do this today camus to hive (without the Avro) just plain old tab
separated log line.
We use the hive -f command to add dynamic partition to hive table:
Bash Shell Scripts add time buckets into HIVE table before camus job runs:
for partition in "${@//\//,}"; do
echo "ALTER TABLE ${env:TABLE_NAME} ADD IF NOT EXISTS PARTITION
($partition);"
done | hive -f
e.g File produce by the camus job: /user/[hive.user]/output/
*partition_month_utc=2015-03/partition_day_utc=2015-03-11/partition_minute_bucket=2015-03-11-02-09/*
Above will add hive dynamic partition before camus job runs. It works, and
you can have any schema:
CREATE EXTERNAL TABLE IF NOT EXISTS ${env:TABLE_NAME} (
SOME Table FIELDS...
)
PARTITIONED BY (
partition_month_utc STRING,
partition_day_utc STRING,
partition_minute_bucket STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS SEQUENCEFILE
LOCATION '${env:TABLE_LOCATION_CAMUS_OUTPUT}'
;
I hope this will help ! You will have to construct hive query according
to partition define.
Thanks,
Bhavesh
On Wed, Mar 11, 2015 at 7:24 AM, Andrew Otto <[email protected]> wrote:
> > Hive provides the ability to provide custom patterns for partitions. You
> > can use this in combination with MSCK REPAIR TABLE to automatically
> detect
> > and load the partitions into the metastore.
>
> I tried this yesterday, and as far as I can tell it doesn’t work with a
> custom partition layout. At least not with external tables. MSCK REPAIR
> TABLE reports that there are directories in the table’s location that are
> not partitions of the table, but it wouldn’t actually add the partition
> unless the directory layout matched Hive’s default
> (key1=value1/key2=value2, etc.)
>
>
>
> > On Mar 9, 2015, at 17:16, Pradeep Gollakota <[email protected]>
> wrote:
> >
> > If I understood your question correctly, you want to be able to read the
> > output of Camus in Hive and be able to know partition values. If my
> > understanding is right, you can do so by using the following.
> >
> > Hive provides the ability to provide custom patterns for partitions. You
> > can use this in combination with MSCK REPAIR TABLE to automatically
> detect
> > and load the partitions into the metastore.
> >
> > Take a look at this SO
> >
> http://stackoverflow.com/questions/24289571/hive-0-13-external-table-dynamic-partitioning-custom-pattern
> >
> > Does that help?
> >
> >
> > On Mon, Mar 9, 2015 at 1:42 PM, Yang <[email protected]> wrote:
> >
> >> I believe many users like us would export the output from camus as a
> hive
> >> external table. but the dir structure of camus is like
> >> /YYYY/MM/DD/xxxxxx
> >>
> >> while hive generally expects /year=YYYY/month=MM/day=DD/xxxxxx if you
> >> define that table to be
> >> partitioned by (year, month, day). otherwise you'd have to add those
> >> partitions created by camus through a separate command. but in the
> latter
> >> case, would a camus job create >1 partitions ? how would we find out the
> >> YYYY/MM/DD values from outside ? ---- well you could always do
> something by
> >> hadoop dfs -ls and then grep the output, but it's kind of not clean....
> >>
> >>
> >> thanks
> >> yang
> >>
>
>