Re: help with query to count rows while excluding certain rows

2016-01-01 Thread Larry Martell
On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawley
 wrote:
> On 12/31/2015 0:51, Larry Martell wrote:
>>
>> I need to count the number of rows in a table that are grouped by a
>> list of columns, but I also need to exclude rows that have more then
>> some count when grouped by a different set of columns. Conceptually,
>> this is not hard, but I am having trouble doing this efficiently.
>>
>> My first counting query would be this:
>>
>> SELECT count(*)
>> FROM cst_rollup
>> GROUP BY target_name_id, ep, roiname, recipe_process,
>> recipe_product, recipe_layer, f_tag_bottom,
>> measname, recipe_id
>>
>> But from this count I need to subtract the count of rows that have
>> more then 50 rows with a different grouping:
>>
>> SELECT count(*)
>> FROM cst_rollup
>> GROUP BY target_name_id, ep, wafer_id
>> HAVING count(*) >= 50
>>
>> As you can see, the second query has wafer_id, but the first query does
>> not.
>>
>> Currently I am doing this in python, and it's slow. In my current
>> implementation I have one query, and it selects the columns (i.e.
>> doesn't just count), and I have added wafer_id:
>>
>> SELECT target_name_id, ep, roiname, recipe_process,
>> recipe_product, recipe_layer, f_tag_bottom,
>> measname, recipe_id, wafer_id
>> FROM cst_rollup
>>
>> Then I go through the result set (which can be over 200k rows) and I
>> count the number of rows with matching (target_name_id, ep, wafer_id).
>> Then I go through the rows again and regroup them without wafer_id,
>> but skipping the rows that have more then 50 rows for that row's
>> (target_name_id, ep, wafer_id).
>>
>> Is this clear to everyone what I am trying to do?
>
>
> If I've understand this correctly, the resultset you wish to aggregate on is
> ...
>
> select target_name_id, ep, wafer_id
> from cst_rollup a
> left join (   -- exclude rows for which wafer_id count >= 50
>   select name_id, ep, wafer, count(*) n
>   from cst_rollup
>   group by target_name_id, ep, wafer_id
>   having n >= 50
> ) b using ( target_name_id, ep, wafer_id )
> where b.target_name is null ;
>
> If that's so, you could assemble that resultset in a temp table then run the
> desired aggregate query on it, or you could aggregate on it directly as a
> subquery.

That query gives:

ERROR 1137 (HY000): Can't reopen table: 'a'

-- 
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql



Re: help with query to count rows while excluding certain rows

2016-01-01 Thread Peter Brawley

On 12/31/2015 0:51, Larry Martell wrote:

I need to count the number of rows in a table that are grouped by a
list of columns, but I also need to exclude rows that have more then
some count when grouped by a different set of columns. Conceptually,
this is not hard, but I am having trouble doing this efficiently.

My first counting query would be this:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id

But from this count I need to subtract the count of rows that have
more then 50 rows with a different grouping:

SELECT count(*)
FROM cst_rollup
GROUP BY target_name_id, ep, wafer_id
HAVING count(*) >= 50

As you can see, the second query has wafer_id, but the first query does not.

Currently I am doing this in python, and it's slow. In my current
implementation I have one query, and it selects the columns (i.e.
doesn't just count), and I have added wafer_id:

SELECT target_name_id, ep, roiname, recipe_process,
recipe_product, recipe_layer, f_tag_bottom,
measname, recipe_id, wafer_id
FROM cst_rollup

Then I go through the result set (which can be over 200k rows) and I
count the number of rows with matching (target_name_id, ep, wafer_id).
Then I go through the rows again and regroup them without wafer_id,
but skipping the rows that have more then 50 rows for that row's
(target_name_id, ep, wafer_id).

Is this clear to everyone what I am trying to do?


If I've understand this correctly, the resultset you wish to aggregate 
on is ...


select target_name_id, ep, wafer_id
from cst_rollup a
left join (   -- exclude rows for which wafer_id count >= 50
  select name_id, ep, wafer, count(*) n
  from cst_rollup
  group by target_name_id, ep, wafer_id
  having n >= 50
) b using ( target_name_id, ep, wafer_id )
where b.target_name is null ;

If that's so, you could assemble that resultset in a temp table then run 
the desired aggregate query on it, or you could aggregate on it directly 
as a subquery.


PB

-


I'd like to do this all in sql with count because then I do not have
to actually return and parse the data in python.

Can anyone think of a way to do this in sql in a way that will be more
efficient then my current implementation?


Thanks!
-Larry




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe:http://lists.mysql.com/mysql