#428: inveniogc -g (deleting user_queries referencing non-existent users) highly
inefficient
--------------------+-------------------------------------------------------
Reporter: fjorba | Type: defect
Status: new | Priority: minor
Milestone: | Component: WebSession
Version: | Keywords:
--------------------+-------------------------------------------------------
I've never been able to complete any inveniogc -g (Clean expired guest
user related information. [default action]) because when it enters the
second step (deleting user_queries referencing non-existent users) it
halts my system, both DDD and Traces.
The actual query is:
{{{
result = run_sql("""SELECT DISTINCT uq.id_user
FROM user_query AS uq LEFT JOIN user AS u
ON uq.id_user = u.id
WHERE u.id IS NULL""")
}}}
About a year ago, when migrating from a previous version, I did it manualy
in awk, creating a couple of files with user_ids from both tables and
joining them using and associative array. What I learned was:
* The whole procedure (2 simple SQL selects and an gawk run) took seconds
to complete, not minutes.
* It only cleared less than a dozen entries from several milions entries
in 'query' table. So probably is not worth to be included in the --all
option of inveniogc.
So probably doing it using Python dicts should be comparable to gawk, and
much faster than letting MySQL do the join.
--
Ticket URL: <http://invenio-software.org/ticket/428>
Invenio <http://invenio-software.org>