https://bugzilla.wikimedia.org/show_bug.cgi?id=47125

       Web browser: ---
            Bug ID: 47125
           Summary: Improve performance of
                    dispatchChanges::getPendingChanges
           Product: MediaWiki extensions
           Version: unspecified
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: WikidataRepo
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

dispatchChanges has performance issues. One major bottle neck is the
getPendingChanges() function. It works be loading a block of changes, then for
the item of each change loads the sitelinks, then check whether the target wiki
is mentioned in the sitelinks. This means one extra database query for each
change (per default, 1000 per batch). This is far too slow.

One solution would be to join the wb_changes table against the
wb_items_per_site table directly. This however would no longer work when we
have client side usage tracking. Also, wb_changes uses a single field for the
prefixed ID of the entity, while wb_items_per_site uses one field for the
entity type and one for the numeric ID. This makes joining inefficient and
inconvenient.

An alternative solution would be to provide a storage layer service for 
a) checking for a given client wiki which items from a given list are used
there.
b) provides all pages on a given client wiki that use one of a list of items.
Using the first method, we could filter a given block of changes using a single
query.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to