#428: inveniogc -g (deleting user_queries referencing non-existent users) highly
inefficient
--------------------+-------------------------------------------------------
 Reporter:  fjorba  |        Type:  defect    
   Status:  new     |    Priority:  minor     
Milestone:          |   Component:  WebSession
  Version:          |    Keywords:            
--------------------+-------------------------------------------------------
 I've never been able to complete any inveniogc -g (Clean expired guest
 user related information. [default action]) because when it enters the
 second step (deleting user_queries referencing non-existent users) it
 halts my system, both DDD and Traces.

 The actual query is:

 {{{
     result = run_sql("""SELECT DISTINCT uq.id_user
         FROM user_query AS uq LEFT JOIN user AS u
         ON uq.id_user = u.id
         WHERE u.id IS NULL""")
 }}}

 About a year ago, when migrating from a previous version, I did it manualy
 in awk, creating a couple of files with user_ids from both tables and
 joining them using and associative array.  What I learned was:

  * The whole procedure (2 simple SQL selects and an gawk run) took seconds
 to complete, not minutes.
  * It only cleared less than a dozen entries from several milions entries
 in 'query' table. So probably is not worth to be included in the --all
 option of inveniogc.

 So probably doing it using Python dicts should be comparable to gawk, and
 much faster than letting MySQL do the join.

-- 
Ticket URL: <http://invenio-software.org/ticket/428>
Invenio <http://invenio-software.org>

Reply via email to