Hi Hayden,
The most usual form of percentile function (I believe) is where you have record values in a selection set (or "sample" from a
"population" for the astute statisticians), and you wish to know if you sorted them all by value, and then grouped them in clusters
of 1% so that you have a 100 such clusters in a row all numbered from 1 to 100, then in which n-th of those little clusters would
entry x be?
Of course in real maths the "clusters" are really exact values and non-integer and in
a set [ c | 0 < c < 100 ].
One of the short-cut methods of getting a percentile is simple indexing, but it won't give us an accurate x.xxx-th percentile, only
an integer n-th percentile - which might suffice for your application and is a lot quicker to do.
Indexing simply requires ordering the data-values from small to large, then finding "the first value bigger than your value" in the
list, subtract 1, divide by the total number of values in the list ( Count() ) and round it to find value that will be very very
close to the actual percentile (but not exact).
Similarly, finding which value represents the n-th percentile where you specify the percentile and need the value for it (as is your
question's case), you could simply find the n-th percentile index or indices and see which value(s) are in there, the last value of
which (in case of multiple indices) usually represents it the best, especially if you reverse-engineer it back to a percentile.
This shortcut in SQL terms then:
SELECT `latency` FROM `latencytable` WHERE 1 ORDER BY `latency` ASC LIMIT (SELECT (Count() / ( 100 / ?1 ) ) FROM `latencytable`
WHERE 1), 1;
The value ?1 in this case needs to be the required percentile, so if you are
looking for the 25th percentile, then ?1 = 25.
(Note: ?1 = 0 may throw exceptions )
Essentially the query counts the values, finds the x-th percentile index (call it n), then lists the table ordered ascending
starting from offset n and listing only 1 value, et voila.
Now this will be accurate enough on large samples. On really small samples where values are widely differing it will be more
accurate to find a set of every index that falls within a percentile and then averaging it - which can easily be added to the above,
but should not be needed and will take much more processing.
And if you have sets that are very very small (<50 items) then you should not
have a need to know the percentile.
Hope this helps!
Ryan
(PS. in the query, I use "WHERE 1" and lots of quotes and spaces which all help to see the query structure better, but they are very
superfluous in SQLite terms and may well be omitted.)
On 2013/11/30 01:38, Hayden Livingston wrote:
Is there any documentation or use case for percentile queries? I've only
seen the source code checked in by "drh".
Is it like the EXCEL 2013 function?
I have a table {operationId, latency) with a set of rows that have floating
values for latencies of a given operation.
I want to find the 25th %ile latency for this rowset.
(A) How do I create a Percentile Query?
(B) How do I make it work for the SQLite System.Data.Sqlite adapter in .NET?
Many thanks!
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users