On 16 Jun 2015, at 2:00, Laurent H wrote:
I've got the same issue Jian, it's could be great to have an answer
oozie
experts! ;)
--
Laurent HATIER - Consultant Big Data & Business Intelligence chez
CapGemini
fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
<http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>
2015-06-15 12:51 GMT+02:00 朱健 <[email protected]>:
Hi,
Thanks for read this email.
I have used oozie for about 2 years. Now I have encountered one
problem
about the time zone.
Because we located at GMT+08:00 timezone, our Hadoop system makes the
convention that all the data path on the HDFS is named by the
GMT+08:00
timezone. That means:
At UTC 2015-01-01T00:00Z, the output hourly data located under this
folder: $root/2015010108, not the $root/2015010100
At UTC 2015-01-01T01:00Z, the output hourly data located under this
folder: $root/2015010109, not the $root/2015010101
So if I set the timezone in the coord to UTC, the oozie job will read
the
data of 00 hour, but I want it to read the 08. For me in Beijing,
China, it
is natural for me to understand that the oozie job will read the 08
data at
local 08:00
I also tried to set the timezone to GMT+08:00, it didn’t work.
Seems the
timezone only impact the “Daylight Saving Time”.
Currently I add 8 to my instance number in the coord to fix it
temporarily
: Change From <instance>0</instance> to <instance>8</instance>
This may be acceptable for hourly job. But it is really ugly to
minutes
jobs or dailyl jobs. Almost unreadable for human.
So how can I solve this problem?
Thanks,
Jian
Hi,
the timezone spec in the coordinator node only serves to figure out
wether
there are 23, 24 or 25 hours on a given day (DST switches); the
timezones
calculations and anything related to time offsets is done in the
datasets
sections; try something like:
<coordinator-app xmlns="uri:oozie:coordinator:0.1" timezone="UTC"
name="${appName}"
frequency="${coord:hours(1)}"
start="${startTime}"
end="${endTime}"
>
...
<datasets>
<dataset
name="hourly-partition"
frequency="${coord:hours(1)}"
initial-instance="${startTime}"
timezone="Asia/Shanghai">
<uri-template><!--whatever path
-->/yyyymmddhh=${YEAR}${MONTH}${DAY}${HOUR}</uri-template>
</dataset>
</datasets>
<input-events>
<data-in name="in" dataset="hourly-partition">
<instance>${coord:current(coord:tzOffset()/60)}</instance>
</data-in>
</input-events>
David