Re: help with query to count rows while excluding certain rows
On Fri, Jan 1, 2016 at 2:12 PM, Peter Brawleywrote: > On 12/31/2015 0:51, Larry Martell wrote: >> >> I need to count the number of rows in a table that are grouped by a >> list of columns, but I also need to exclude rows that have more then >> some count when grouped by a different set of columns. Conceptually, >> this is not hard, but I am having trouble doing this efficiently. >> >> My first counting query would be this: >> >> SELECT count(*) >> FROM cst_rollup >> GROUP BY target_name_id, ep, roiname, recipe_process, >> recipe_product, recipe_layer, f_tag_bottom, >> measname, recipe_id >> >> But from this count I need to subtract the count of rows that have >> more then 50 rows with a different grouping: >> >> SELECT count(*) >> FROM cst_rollup >> GROUP BY target_name_id, ep, wafer_id >> HAVING count(*) >= 50 >> >> As you can see, the second query has wafer_id, but the first query does >> not. >> >> Currently I am doing this in python, and it's slow. In my current >> implementation I have one query, and it selects the columns (i.e. >> doesn't just count), and I have added wafer_id: >> >> SELECT target_name_id, ep, roiname, recipe_process, >> recipe_product, recipe_layer, f_tag_bottom, >> measname, recipe_id, wafer_id >> FROM cst_rollup >> >> Then I go through the result set (which can be over 200k rows) and I >> count the number of rows with matching (target_name_id, ep, wafer_id). >> Then I go through the rows again and regroup them without wafer_id, >> but skipping the rows that have more then 50 rows for that row's >> (target_name_id, ep, wafer_id). >> >> Is this clear to everyone what I am trying to do? > > > If I've understand this correctly, the resultset you wish to aggregate on is > ... > > select target_name_id, ep, wafer_id > from cst_rollup a > left join ( -- exclude rows for which wafer_id count >= 50 > select name_id, ep, wafer, count(*) n > from cst_rollup > group by target_name_id, ep, wafer_id > having n >= 50 > ) b using ( target_name_id, ep, wafer_id ) > where b.target_name is null ; > > If that's so, you could assemble that resultset in a temp table then run the > desired aggregate query on it, or you could aggregate on it directly as a > subquery. That query gives: ERROR 1137 (HY000): Can't reopen table: 'a' -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql
Re: help with query to count rows while excluding certain rows
On 12/31/2015 0:51, Larry Martell wrote: I need to count the number of rows in a table that are grouped by a list of columns, but I also need to exclude rows that have more then some count when grouped by a different set of columns. Conceptually, this is not hard, but I am having trouble doing this efficiently. My first counting query would be this: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id But from this count I need to subtract the count of rows that have more then 50 rows with a different grouping: SELECT count(*) FROM cst_rollup GROUP BY target_name_id, ep, wafer_id HAVING count(*) >= 50 As you can see, the second query has wafer_id, but the first query does not. Currently I am doing this in python, and it's slow. In my current implementation I have one query, and it selects the columns (i.e. doesn't just count), and I have added wafer_id: SELECT target_name_id, ep, roiname, recipe_process, recipe_product, recipe_layer, f_tag_bottom, measname, recipe_id, wafer_id FROM cst_rollup Then I go through the result set (which can be over 200k rows) and I count the number of rows with matching (target_name_id, ep, wafer_id). Then I go through the rows again and regroup them without wafer_id, but skipping the rows that have more then 50 rows for that row's (target_name_id, ep, wafer_id). Is this clear to everyone what I am trying to do? If I've understand this correctly, the resultset you wish to aggregate on is ... select target_name_id, ep, wafer_id from cst_rollup a left join ( -- exclude rows for which wafer_id count >= 50 select name_id, ep, wafer, count(*) n from cst_rollup group by target_name_id, ep, wafer_id having n >= 50 ) b using ( target_name_id, ep, wafer_id ) where b.target_name is null ; If that's so, you could assemble that resultset in a temp table then run the desired aggregate query on it, or you could aggregate on it directly as a subquery. PB - I'd like to do this all in sql with count because then I do not have to actually return and parse the data in python. Can anyone think of a way to do this in sql in a way that will be more efficient then my current implementation? Thanks! -Larry -- MySQL General Mailing List For list archives: http://lists.mysql.com/mysql To unsubscribe:http://lists.mysql.com/mysql