Re: Time for a fun Query Question

Ted Dunning Wed, 05 Dec 2018 15:52:33 -0800

Argh ...

I didn't want to know that. Just when I thought things made a slight bit of
sense. Happily I have never known of any users in Eucla or the Chatham
islands. But Nepal doing that hurts.


I will have to see if I can remove that knowledge from my head.

On Thu, Dec 6, 2018 at 1:01 AM Joel Pfaff <[email protected]> wrote:

> Hello,
>
> Since I had to maintain a Timezone library a while ago, I discovered that
> not all timezones are multiple of 10 minutes away from UTC, a small number
> of them are using 15 mins steps.
> See https://www.timeanddate.com/time/time-zones-interesting.html
>
> And not all timezone are less than 12 hours away from UTC:
> https://www.timeanddate.com/worldclock/kiribati/kiritimati (-14h from
> UTC).
>
> And some places have more than one DST period per year as well.
>
> Maintaining this lib was definitively a funny time.
>
> Regards, Joel
>
> On Tue, Dec 4, 2018 at 10:03 PM Ted Dunning <[email protected]> wrote:
>
> > I would parse the timestamp into seconds since epoch. Then divide by use
> > floor(ts/600) as the key to group on 10 minute boundaries.
> >
> > This works because:
> >
> > - all timezones are multiples of 10 minutes away from UTC
> >
> > - all leap seconds are hidden in the seconds since epoch conversions
> >
> > - the epoch was on an even 10 minute interval
> >
> > These all apply for any time interval that is a multiple of 10 minutes as
> > well (30 minutes, hourly, daily, weekly). More advanced calendar
> > manipulation is required for things like monthly grouping with timezone
> and
> > DST awareness. String hacking might work if you can take on some
> important
> > assumptions. Putting the calendar manipulations into a UDF wouldn't be
> > hard.
> >
> >
> > On Wed, Dec 5, 2018 at 1:40 AM John Omernik <[email protected]> wrote:
> >
> > > So one approach I have is this:
> > >
> > >
> > >
> > > select ds, ts_win, opt_string, opt_id, max(eday) as max_eday from (
> > > select a.ds, a.ts_win, a.opt_string, b.opt_id, b.eday, b.ts_win as
> > > rpt_ts_win, b.ts
> > > FROM
> > > (
> > > select distinct ds, opt_string, opt_id, concat(substr(ts, 1, 4),
> '0:00')
> > as
> > > ts_win
> > > from dfs.prod.view_optdata where ds = '2018-12-02' order by ds, ts_win,
> > > opt_string
> > > ) a JOIN
> > > (
> > > select opt_id, ds, opt_string, concat(substr(ts, 1, 4), '0:00') as
> > ts_win,
> > > ts, eday
> > > from dfs.prod.view_optdata where ds = '2018-12-02'
> > > ) b ON a.ds = b.ds and a.opt_string = b.opt_string
> > > WHERE b.ts <= a.ts_win
> > > ) c
> > > group by ds, ts_win, opt_string, opt_id
> > > order by ds, ts_win ASC, opt_string ASC, opt_id ASC
> > >
> > > This breaks up my day into 10 minute intervals (based on string
> slicing)
> > > and then shows me what's reported before that... I think :)
> > >
> > > This is limited in that I can do it only in time intervals that can be
> > > substringed out of a time field in  string format.  Still open to other
> > > ideas :)
> > >
> > >
> > > On Tue, Dec 4, 2018 at 10:05 AM John Omernik <[email protected]> wrote:
> > >
> > > > Time for a fun question: How to be clever with queries!
> > > >
> > > >
> > > > I have a table that takes readings from an IoT type device
> > > >
> > > >
> > > > opt_id         dt           ts          eday   opt_string
> > > > 2.1.1       2018-12-01   10:43:43       12.5       1
> > > > 2.1.2       2018-12-01   10:32:43       5.5        1
> > > > 2.1.3       2018-12-01   10:55:02       20.5       1
> > > > 2.2.1       2018-12-01   10:43:43       12.2       2
> > > > 2.2.2       2018-12-01   10:12:56       43.7       2
> > > > 2.2.3       2018-12-01   10:50:23       12.2       2
> > > > 2.1.1       2018-12-01   13:43:43       45.5       1
> > > > 2.1.2       2018-12-01   13:32:43       20.5       1
> > > > 2.1.3       2018-12-01   13:55:02       45.5       1
> > > > 2.2.1       2018-12-01   13:43:43       30.2       2
> > > > 2.2.2       2018-12-01   13:12:56       81.7       2
> > > > 2.2.3       2018-12-01   13:50:23       50.2       2
> > > >
> > > >
> > > > Pretty simple. Essentially, each "device" (opt_id) reports throughout
> > the
> > > > day, what it's cumulative value (eday), at that time, for that day
> > > >
> > > > These devices belong to groups which is it's own column (opt_string)
> > > >
> > > >
> > > > So if I run a query like this:
> > > >
> > > > select opt_id, ds, max(opt_string) as opt_string, max(ts) as max_ts,
> > > > max(Eday) as max_Eday
> > > > from `mydata`
> > > > where ds = '2018-12-01'
> > > > group by ds, opt_id, order by opt_id ASC
> > > >
> > > > I get:
> > > >
> > > > opt_id ds opt_string max_ts max_Eday
> > > > 2.1.1 2018-12-02 1 15:56:33 181.000
> > > > 2.1.3 2018-12-02 1 15:47:02 162.500
> > > > 2.1.4 2018-12-02 1 15:54:19 122.250
> > > > 2.1.5 2018-12-02 1 15:40:15 132.000
> > > > 2.1.6 2018-12-02 1 15:45:04 183.250
> > > > 2.1.7 2018-12-02 1 15:41:15 161.000
> > > > 2.1.8 2018-12-02 1 15:42:10 166.250
> > > > 2.1.9 2018-12-02 1 15:39:23 126.250
> > > > 2.2.1 2018-12-02 2 16:02:54 163.250
> > > > 2.2.2 2018-12-02 2 15:57:39 178.750
> > > > 2.2.3 2018-12-02 2 15:46:53 148.250
> > > > 2.2.4 2018-12-02 2 15:40:27 163.500
> > > > 2.2.5 2018-12-02 2 15:50:49 147.500
> > > > 2.2.6 2018-12-02 2 15:40:24 156.000
> > > > 2.2.7 2018-12-02 2 15:43:00 104.750
> > > > 2.2.8 2018-12-02 2 15:41:55 170.750
> > > > 2.2.9 2018-12-02 2 15:41:04 127.250
> > > >
> > > > Which shows me the last time each device reported on a day, and what
> > that
> > > > report time was, Great.
> > > >
> > > >
> > > > So what if I wanted to show the values through the day? Is there a
> way
> > > > from a query perpective, to perhaps use Windowing or something to
> show
> > > > results like the above raw table, but instead of just having the
> random
> > > > times returned, group by opt_id and get the max_ts as I am doing, but
> > do
> > > > that for each say 5 minute, or 30 minute, or 60 minute block in a
> day?
> > > >
> > > > The best example to explain what I am looking for is using 3 groups
> of
> > 1
> > > > hour blocks: I could use a UNION like this:
> > > >
> > > > select * from (
> > > > select opt_id, ds,  max(opt_string) as opt_string, '10:00:00' as
> > max_ts,
> > > > max(Eday) as max_Eday
> > > > from dfs.prod.view_optdata where ds = '2018-12-02' and ts <=
> '10:00:00'
> > > > group by ds, opt_id
> > > > order by opt_id ASC
> > > > )
> > > > UNION
> > > > (select opt_id, ds,  max(opt_string) as opt_string, '11:00:00' as
> > max_ts,
> > > > max(Eday) as max_Eday
> > > > from dfs.prod.view_optdata where ds = '2018-12-02' and ts <=
> '11:00:00'
> > > > group by ds, opt_id
> > > > order by opt_id ASC)
> > > > UNION
> > > > (select opt_id, ds,  max(opt_string) as opt_string, '12:00:00' as
> > max_ts,
> > > > max(Eday) as max_Eday
> > > > from dfs.prod.view_optdata where ds = '2018-12-02' and ts <=
> '12:00:00'
> > > > group by ds, opt_id
> > > > order by opt_id ASC)
> > > >
> > > >
> > > > Where I would get something like:
> > > >
> > > > index opt_id ds opt_string max_ts max_Eday
> > > > 0 2.1.1 12/2/18 1 10:00:00 18
> > > > 1 2.1.3 12/2/18 1 10:00:00 15.75
> > > > 2 2.1.4 12/2/18 1 10:00:00 12.5
> > > > 3 2.1.5 12/2/18 1 10:00:00 12.75
> > > > 4 2.1.6 12/2/18 1 10:00:00 18.75
> > > > 5 2.1.7 12/2/18 1 10:00:00 7.25
> > > > 6 2.1.8 12/2/18 1 10:00:00 16.25
> > > > 7 2.1.9 12/2/18 1 10:00:00 9.75
> > > > 8 2.2.1 12/2/18 2 10:00:00 18
> > > > 9 2.2.2 12/2/18 2 10:00:00 15.25
> > > > 10 2.2.3 12/2/18 2 10:00:00 2.75
> > > > 11 2.2.4 12/2/18 2 10:00:00 6.5
> > > > 12 2.2.5 12/2/18 2 10:00:00 16.25
> > > > 13 2.2.6 12/2/18 2 10:00:00 11.25
> > > > 14 2.2.7 12/2/18 2 10:00:00 13.5
> > > > 15 2.2.8 12/2/18 2 10:00:00 13.75
> > > > 16 2.2.9 12/2/18 2 10:00:00 14.5
> > > > 17 2.1.1 12/2/18 1 11:00:00 41
> > > > 18 2.1.3 12/2/18 1 11:00:00 32
> > > > 19 2.1.4 12/2/18 1 11:00:00 25
> > > > 20 2.1.5 12/2/18 1 11:00:00 26.75
> > > > 21 2.1.6 12/2/18 1 11:00:00 42.25
> > > > 22 2.1.7 12/2/18 1 11:00:00 27
> > > > 23 2.1.8 12/2/18 1 11:00:00 35.75
> > > > 24 2.1.9 12/2/18 1 11:00:00 22
> > > > 25 2.2.1 12/2/18 2 11:00:00 34.5
> > > > 26 2.2.2 12/2/18 2 11:00:00 42.5
> > > > 27 2.2.3 12/2/18 2 11:00:00 19
> > > > 28 2.2.4 12/2/18 2 11:00:00 29.25
> > > > 29 2.2.5 12/2/18 2 11:00:00 32.25
> > > > 30 2.2.6 12/2/18 2 11:00:00 30.25
> > > > 31 2.2.7 12/2/18 2 11:00:00 26.75
> > > > 32 2.2.8 12/2/18 2 11:00:00 33.5
> > > > 33 2.2.9 12/2/18 2 11:00:00 28.5
> > > > 34 2.1.1 12/2/18 1 12:00:00 81.5
> > > > 35 2.1.3 12/2/18 1 12:00:00 62.75
> > > > 36 2.1.4 12/2/18 1 12:00:00 48.5
> > > > 37 2.1.5 12/2/18 1 12:00:00 48.25
> > > > 38 2.1.6 12/2/18 1 12:00:00 80.5
> > > > 39 2.1.7 12/2/18 1 12:00:00 62
> > > > 40 2.1.8 12/2/18 1 12:00:00 66.25
> > > > 41 2.1.9 12/2/18 1 12:00:00 43.75
> > > > 42 2.2.1 12/2/18 2 12:00:00 62.75
> > > > 43 2.2.2 12/2/18 2 12:00:00 83.5
> > > > 44 2.2.3 12/2/18 2 12:00:00 49
> > > > 45 2.2.4 12/2/18 2 12:00:00 66.25
> > > > 46 2.2.5 12/2/18 2 12:00:00 56.5
> > > > 47 2.2.6 12/2/18 2 12:00:00 68
> > > > 48 2.2.7 12/2/18 2 12:00:00 43
> > > > 49 2.2.8 12/2/18 2 12:00:00 72.5
> > > > 50 2.2.9 12/2/18 2 12:00:00 49.25
> > > >
> > > >
> > > > That's great, but if I wanted to say all the hours in a day, that
> Union
> > > > would get ugly, or what about 30 minute blocks? 5 minute blocks?
> > > >
> > > > Is there a way to get these types of results without the Union and
> > being
> > > > explicit about the times?  Any advice, including "Look at X" would be
> > > > helpful. Or if you need a diversion, dig on in!
> > > >
> > > > John
> > > >
> > > >
> > >
> >
>

Re: Time for a fun Query Question

Reply via email to