On 16 Jun 2015, at 2:00, Laurent H wrote:

I've got the same issue Jian, it's could be great to have an answer oozie
experts! ;)

--
Laurent HATIER - Consultant Big Data & Business Intelligence chez CapGemini
fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
<http://fr.linkedin.com/pub/laurent-h/25/36b/a86/>

2015-06-15 12:51 GMT+02:00 朱健 <[email protected]>:

Hi,

Thanks for read this email.

I have used oozie for about 2 years. Now I have encountered one problem
about the time zone.

Because we located at GMT+08:00 timezone, our Hadoop system makes the
convention that all the data path on the HDFS is named by the GMT+08:00
timezone. That means:
At UTC 2015-01-01T00:00Z, the output hourly data located under this
folder: $root/2015010108, not the $root/2015010100
At UTC 2015-01-01T01:00Z, the output hourly data located under this
folder: $root/2015010109, not the $root/2015010101

So if I set the timezone in the coord to UTC, the oozie job will read the data of 00 hour, but I want it to read the 08. For me in Beijing, China, it is natural for me to understand that the oozie job will read the 08 data at
local 08:00

I also tried to set the timezone to GMT+08:00, it didn’t work. Seems the
timezone only impact the “Daylight Saving Time”.

Currently I add 8 to my instance number in the coord to fix it temporarily
: Change From <instance>0</instance> to <instance>8</instance>
This may be acceptable for hourly job. But it is really ugly to minutes
jobs or dailyl jobs. Almost unreadable for human.

So how can I solve this problem?

Thanks,
Jian


Hi,

the timezone spec in the coordinator node only serves to figure out wether there are 23, 24 or 25 hours on a given day (DST switches); the timezones calculations and anything related to time offsets is done in the datasets
sections; try something like:

<coordinator-app xmlns="uri:oozie:coordinator:0.1" timezone="UTC"
    name="${appName}"
    frequency="${coord:hours(1)}"
    start="${startTime}"
    end="${endTime}"
   >
...
    <datasets>
        <dataset
            name="hourly-partition"
            frequency="${coord:hours(1)}"
            initial-instance="${startTime}"
            timezone="Asia/Shanghai">
<uri-template><!--whatever path -->/yyyymmddhh=${YEAR}${MONTH}${DAY}${HOUR}</uri-template>
        </dataset>
    </datasets>

    <input-events>
        <data-in name="in" dataset="hourly-partition">
            <instance>${coord:current(coord:tzOffset()/60)}</instance>
        </data-in>
    </input-events>

David

Reply via email to