Re: top-k function for Window

2017-01-04 Thread Andy Dang
ROW_NUMBER() OVER (PARTITION BY time_bucket, >> >>identifier1 >> >> ORDER BY count DESC) as rowNum >> >> FROM tablename) tmp >> >> WHERE rowNum <=4 >

Re: top-k function for Window

2017-01-04 Thread Georg Heiler
> > FROM tablename) tmp > > WHERE rowNum <=4 > > ORDER BY time_bucket, identifier1, rowNum > > > > The count and order by: > > > > > > SELECT time_bucket, > >identifier1, > > identifier2

Re: top-k function for Window

2017-01-04 Thread Koert Kuipers
FROM tablename) tmp >>> >>> WHERE rowNum <=4 >>> >>> ORDER BY time_bucket, identifier1, rowNum >>> >>> >>> >>> The count and order by: >>> >>> >>> >>> >>> &

RE: top-k function for Window

2017-01-03 Thread Mendelson, Assaf
03, 2017 8:03 PM To: Mendelson, Assaf Cc: user Subject: Re: top-k function for Window > Furthermore, in your example you don’t even need a window function, you can > simply use groupby and explode Can you clarify? You need to sort somehow (be it map-side sorting or reduce-side s

Re: top-k function for Window

2017-01-03 Thread Koert Kuipers
as rowNum >> >> FROM tablename) tmp >> >> WHERE rowNum <=4 >> >> ORDER BY time_bucket, identifier1, rowNum >> >> >> >> The count and order by: >> >> >> >> >> >> S

Re: top-k function for Window

2017-01-03 Thread Andy Dang
: > > > > > > SELECT time_bucket, > >identifier1, > >identifier2, > >count(identifier2) as myCount > > FROM table > > GROUP BY time_bucket, > > identifier1, > >identifier2 > > ORDER BY

Re: top-k function for Window

2017-01-03 Thread HENSLEE, AUSTIN L
bucket, identifier1, identifier2 ORDER BY time_bucket, identifier1, count(identifier2) DESC From: Andy Dang Date: Tuesday, January 3, 2017 at 7:06 AM To: user Subject: top-k function for Window Hi all, What's the best way to do top-k with Windowing

Re: top-k function for Window

2017-01-03 Thread Andy Dang
t; > > *From:* Andy Dang [mailto:nam...@gmail.com] > *Sent:* Tuesday, January 03, 2017 3:07 PM > *To:* user > *Subject:* top-k function for Window > > > > Hi all, > > > > What's the best way to do top-k with Windowing in Dataset world? > > > > I h

RE: top-k function for Window

2017-01-03 Thread Mendelson, Assaf
[mailto:nam...@gmail.com] Sent: Tuesday, January 03, 2017 3:07 PM To: user Subject: top-k function for Window Hi all, What's the best way to do top-k with Windowing in Dataset world? I have a snippet of code that filters the data to the top-k, but with skewed keys: val windo

top-k function for Window

2017-01-03 Thread Andy Dang
Hi all, What's the best way to do top-k with Windowing in Dataset world? I have a snippet of code that filters the data to the top-k, but with skewed keys: val windowSpec = Window.parititionBy(skewedKeys).orderBy(dateTime) val rank = row_number().over(windowSpec) input.withColumn("rank", rank).