As promised, https://bugs.launchpad.net/evergreen/+bug/1236979 ... note that it's quick-and-dirty, and untested as yet, and I'm about to be unavailable for the remainder of the day.
TIA for any eyes that can be provided. --miker On Tue, Oct 8, 2013 at 12:54 PM, Mike Rylander <[email protected]> wrote: > First, thanks again for this. > > Some background: this function is replacing the query generated by > the code that lives at > Open-ILS/src/perlmods/lib/OpenILS/Application/SuperCat.pm:643 (in > master). The function uses a variadic param to simplify things, but > we actually need to be able to filter on three different types of > values: circ_lib (covered in the patch supplied), copy status and copy > location. > > I'm creating a variant of the function depesz supplied that leaves > circ_lib as the primary filter axis, because it is the most likely to > be supplied. All three filters will be accepted as array arguments, > as only the last input argument is allowed to be a variadic, and using > a special-case test for an empty circ_lib array to open a slightly > different cursor. > > This special case (in practical terms, asking for installation-wide > item-ordered bibs, or "new bibs everywhere") will benefit from an > index specifically on create_date, and the other cases benefit from > the existing index on circ_lib. I'm not sure if we should add a new > index across both or not ... testing will tell if circ_lib+create_date > makes thing better or not. > > When I have a branch, I'll create an LP bug and respond here. > > --miker > > > On Mon, Oct 7, 2013 at 11:52 AM, hubert depesz lubaczewski > <[email protected]> wrote: >> Hi, >> Couple of days ago, I started to work on #2 most time consuming query >> (from pg logs of our client): >> >> SELECT "acn".record AS "record", max ("acp".create_date) AS "create_date" >> FROM asset.call_number AS "acn" >> INNER JOIN asset.COPY AS "acp" ON ("acp".call_number = "acn".id) >> WHERE ("acn".record > 0) >> AND ("acp".deleted = 'f' >> AND "acp".circ_lib IN ('279', >> '280', >> '280', >> '281', >> '281', >> '282', >> '282', >> '283', >> '283', >> '284', >> '284', >> '285', >> '285', >> '286', >> '286', >> '287', >> '287', >> '288', >> '288', >> '289', >> '289')) >> GROUP BY 1 >> ORDER BY max ("acp".create_date) DESC LIMIT 20 >> ; >> >> This query has explain analyze http://explain.depesz.com/s/RAMc >> - runtime of 13.14 seconds. >> >> >> First level of optimization was adding 2 indexes: >> >> create index copy_circ_lib on asset.copy (circ_lib); >> create index unit_circ_lib on serial.unit (circ_lib); >> >> This made the query faster - 8.719s -- http://explain.depesz.com/s/UOy >> >> Afterwards, I added another two indexes, in hopes that it will get used: >> >> create index copy_circ_lib_create_date on asset.copy (circ_lib, create_date); >> create index unit_circ_lib_create_date on serial.unit (circ_lib, >> create_date); >> >> but it wasn't used. >> >> Since I didn't see a way to optimize the query further by changing query >> or adding indexes, decided to write a function instead. >> >> First iteration of the function: >> >> CREATE OR REPLACE FUNCTION asset.some_clever_name( IN p_limit INT4, VARIADIC >> p_circ_libs INT4[] ) >> RETURNS TABLE ( record INT8, create_date timestamptz ) as $$ >> DECLARE >> v_results public.hstore := ''; >> v_record int8; >> v_temprec record; >> v_found INT4 := 0; >> BEGIN >> FOR v_temprec IN >> SELECT >> c.call_number, >> max(c.create_date) as create_date >> FROM >> asset.copy c >> WHERE c.circ_lib = any( p_circ_libs ) >> AND NOT c.deleted >> GROUP BY c.call_number >> ORDER BY max(c.create_date) DESC >> LOOP >> SELECT cn.record INTO v_record FROM asset.call_number cn WHERE cn.id >> = v_temprec.call_number; >> CONTINUE WHEN NOT FOUND; >> CONTINUE WHEN v_record <= 0; >> CONTINUE WHEN v_results ? v_record::TEXT; >> v_found := v_found + 1; >> v_results := v_results || hstore( v_record::TEXT, >> v_temprec.create_date::TEXT ); >> EXIT WHEN v_found = p_limit; >> END LOOP; >> RETURN QUERY SELECT KEY::INT8, value::timestamptz FROM each(v_results) >> ORDER BY value::timestamptz DESC; >> END; >> $$ language plpgsql STRICT; >> >> It can be called like this: >> SELECT * FROM asset.some_clever_name( 20, 279, 280, 281, 282, 283, 284, 285, >> 286, 287, 288, 289 ); >> >> (i'm not good with naming stuff) >> result - 1.389s -- http://explain.depesz.com/s/IIu >> much better, but it's using the less optimal indexes (first set of two >> indexes I created). >> >> So I wrote this - second iteration of functional approach: >> >> CREATE OR REPLACE FUNCTION asset.some_clever_name( IN p_limit INT4, VARIADIC >> p_circ_libs INT4[] ) >> RETURNS TABLE ( record INT8, create_date timestamptz ) as $$ >> DECLARE >> v_results public.hstore := ''; >> v_seen public.hstore := ''; >> v_records public.hstore := ''; >> v_oldest timestamptz := NULL; >> v_c_oldest timestamptz := NULL; >> v_found INT4 := 0; >> v_circ_lib INT4; >> v_record int8; >> v_temprec record; >> v_iter INT4; >> v_cursor REFCURSOR; >> BEGIN >> FOREACH v_circ_lib IN ARRAY p_circ_libs LOOP >> v_iter := 0; >> v_seen := ''; >> v_c_oldest := NULL; >> open v_cursor NO SCROLL FOR >> SELECT c.call_number, c.create_date >> FROM asset.copy c >> WHERE c.circ_lib = v_circ_lib AND NOT c.deleted >> ORDER BY c.create_date DESC; >> >> LOOP >> FETCH v_cursor INTO v_temprec; >> EXIT WHEN NOT FOUND; >> >> v_iter := v_iter + 1; >> >> -- If we already have better data than current row (newer >> records in >> EXIT WHEN v_oldest IS NOT NULL AND v_oldest >= >> v_temprec.create_date; >> >> -- Ignore if we've seen given call number in current query (for >> current circ_lib) >> CONTINUE WHEN v_seen ? v_temprec.call_number::TEXT; >> v_seen := v_seen || public.hstore( v_temprec.call_number::TEXT, >> '1' ); >> >> -- If we don't have yet record for given call_number, we need to >> get it >> IF v_records ? v_temprec.call_number::TEXT THEN >> v_record := v_records -> v_temprec.call_number::TEXT; >> ELSE >> SELECT cn.record INTO v_record FROM asset.call_number cn >> WHERE cn.id = v_temprec.call_number; >> CONTINUE WHEN NOT FOUND; >> CONTINUE WHEN v_record <= 0; >> v_records := v_records || hstore( >> v_temprec.call_number::TEXT, v_record::TEXT ); >> END IF; >> >> -- If results already contain "better" date for given record, >> next row >> IF v_results ? v_record::TEXT THEN >> CONTINUE WHEN ( v_results -> v_record::TEXT )::timestamptz > >> v_temprec.create_date; >> END IF; >> >> v_found := v_found + 1; >> v_results := v_results || hstore( v_record::TEXT, >> v_temprec.create_date::TEXT ); >> >> IF v_c_oldest IS NULL OR v_c_oldest > v_temprec.create_date THEN >> v_c_oldest := v_temprec.create_date; >> END IF; >> >> EXIT WHEN v_found = p_limit; >> END LOOP; >> >> CLOSE v_cursor; >> >> -- Update oldest information based on oldest row added in current >> loop >> IF v_oldest IS NULL OR v_oldest < v_c_oldest THEN >> v_oldest := v_c_oldest; >> END IF; >> >> END LOOP; >> RETURN QUERY SELECT KEY::INT8, value::timestamptz FROM each(v_results) >> ORDER BY value::timestamptz DESC LIMIT p_limit; >> RETURN; >> END; >> $$ language plpgsql STRICT; >> >> It can be used in exactly the same way. And it returns, of course, the >> same results. >> >> But the time - well - it's 2.687 *ms* http://explain.depesz.com/s/NBwP >> >> This means that the 2nd function, for these database, with these >> parameters is almost 5000 times faster. >> >> The drawback is that it's not really nice function, as it uses (for >> performance) certain side-effects of plpgsql constructs (cursors). >> >> Contacted the client, and, as previously, it was authorized to get >> released to you guys, in hopes that you can make a decision whether the >> optimization is worth being included in core. >> >> Best regards, >> >> depesz >> >> -- >> The best thing about modern society is how easy it is to avoid contact with >> it. >> >> http://depesz.com/ > > > > -- > Mike Rylander > | Director of Research and Development > | Equinox Software, Inc. / Your Library's Guide to Open Source > | phone: 1-877-OPEN-ILS (673-6457) > | email: [email protected] > | web: http://www.esilibrary.com -- Mike Rylander | Director of Research and Development | Equinox Software, Inc. / Your Library's Guide to Open Source | phone: 1-877-OPEN-ILS (673-6457) | email: [email protected] | web: http://www.esilibrary.com
