[sqlite] MIN/MAX query

Jason H Thu, 18 Feb 2016 18:55:53 +0100

> Um, I understand sets; which is why I knew the naive group by wouldn't work.
> I guess I should have stated my question is HOW do I define the group so that 
> in the order of I,L, clusters of common V values are a "group". I need to 
> return the value of I and V, with the minimum and maximum L in that cluster. 
> 
> I, L and V are data coming from an upstream source. I have no control over 
> it, and would rather not write a mid-stream filter program to add an 
> additional column, but can do that if there is no way to implement the 
> "grouping" I want with the data as is.I is an arbitrary value, and L are 
> sequential numbers for a given I. V is related properties for the given I.
> 
> (Hopefully it doesn't make a difference, but I am simplifying the upstream 
> data. V is actually a half dozen or so columns, but the cluster changes if 
> any of the columns change, so for simplification, I just noted them as one 
> value column.)
> Thanks for your help.
> David
> From: R Smith <rsmith at rsweb.co.za>
>  To: sqlite-users at mailinglists.sqlite.org 
>  Sent: Thursday, February 18, 2016 4:55 AM
>  Subject: Re: [sqlite] MIN/MAX query
>   
> 
>  
> Let me see if I can unpuzzle this question - kindly say if it is not 
> correctly assumed:
> -------------------------------------
> I have a table:
> I? L V
> 1? 1 A
> 1? 2 A
> 1? 3 A
> 1? 4 B
> 1? 5 B
> 1? 6 A
> 2? 7 A
> 2? 1 C
> 2? 2? C
> 
> I want to return the minimal and maximum L for each "group" of V in a 
> given I.
> The result I want:
> I? MinL? ? MaxL V
> 1? ? 1? ? ? 3? ? ? A
> 1? ? 4? ? ? 5? ? ? B
> 1? ? 6? ? ? 7? ? ? A
> 2? ? 1? ? ? 2? ? ? C
> 
> I know I can't just do? ? SELECT I, MIN(L) as MinL, MAX(L) as MaxL, V? 
> FROM T? GROUP BY I, Vsince that would return min/max of 1 and 7 for A, 
> instead of the 2 lines I need.
> So should I word the SELECT statement?
> ---------------------------------------------------------
> 
> First of, your intended results require a fundamentally wrong assumption 
> about Sets. (SQL is essentially operating on SETs and sets have no order).
> You should really have another column, like a primary key ID that notes 
> the position of each line (so that the order of your lines matter).
> Luckily, in SQLite there is a hidden column named row_id that should 
> have that order already pegged - but you should really explicitly add 
> one since your results depend on where in the list the values appear, 
> not just what they are named.
> 
> You should then, try to imply a group identifier, i.e another column 
> that adds a group set or some way to know which A values belong 
> together. This is not obvious in a SET.
> 
> We might be able to find a way to determine grouping after-the-fact, 
> based on data shape perhaps, but need more information then.
> What's the maximum number of lines that will be in a group?
> Does the L counter determine or exclude groups in any way?
> Might I and L when considered together offer some clue as to a next 
> group starting?
> 
> We could use the row-ID distance to narrow it into groups with items 
> that have a maximum distance from each other, but then we need to know 
> what maximum and minimum distances would be, etc.
> 
> Howcome the current data set is ordered like that? Is it a time-based 
> thing? like a logger?
> 
> Some of these answers might help to make a query that can do it. This 
> would be trivial in program code though.




I think you are looking at the data wrong.  Look at it in terms of I V L, ORDER 
BY I,V,L, then you basically want the V's first L and the V's last L. (create a 
view!). However, the lines:
> 1? ? 1? ? ? 3? ? ? A
> 1? ? 6? ? ? 7? ? ? A
Conflict, and you need another column because you suppose they are different 
grouping.
Anyway, you can then use subqueries to construct your rows.
SELECT (a.min, b.max from (select min(L) from ...) as a, (select max(L) .. from 
b); will give you your min/max, you'll just have to qualify it with a join or 
where.
You then only need to join that up to the I values. 
The additional column is left as an exercise to the reader. 

I would avoid a table scan at all costs both at low-level (indexes!) and 
application level. If you do, I'd scan a view, with the data aleady in sorted 
order, so that you read a min; then you repeatedly read a max, you'll know the 
next one is a min, repeat. I've sped things up in applications by orders of 
magnitude properly using a database.  Let the DB so the heavy lifting 
(particularly when your database is not local, not typical for SQLite use 
cases, but still networks are slower than disks, and disks are slower than 
flash.)

[sqlite] MIN/MAX query

Reply via email to