https://bugzilla.wikimedia.org/show_bug.cgi?id=46867

       Web browser: ---
            Bug ID: 46867
           Summary: wb_terms population strangeness
           Product: MediaWiki extensions
           Version: master
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: Unprioritized
         Component: WikidataRepo
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]
    Classification: Unclassified
   Mobile Platform: ---

So, due to many hours of replag, which are only going to get worse for the next
7-8 hours (meaning at least 12 hours of replag), I've cancelled the current run
of rebuildTermsSearchKey.php

Whilst trying to work out where to start again from:

mysql:wikiadmin@db35 [wikidatawiki]> select min(term_row_id) from wb_terms
where term_search_key = '';
+------------------+
| min(term_row_id) |
+------------------+
|           247135 |
+------------------+
1 row in set (1 min 9.97 sec)

mysql:wikiadmin@db35 [wikidatawiki]> select * from wb_terms where term_row_id >
247130 limit 10;
+-------------+----------------+------------------+---------------+-----------+--------------------+--------------------+
| term_row_id | term_entity_id | term_entity_type | term_language | term_type |
term_text          | term_search_key    |
+-------------+----------------+------------------+---------------+-----------+--------------------+--------------------+
|      247131 |          41253 | item             | bn            | alias     |
Movie theaters     | movie theaters     |
|      247132 |          41253 | item             | bn            | alias     |
Movie house        | movie house        |
|      247133 |          41253 | item             | bn            | alias     |
Exhibition         | exhibition         |
|      247134 |          41253 | item             | bn            | alias     |
Film theatre       | film theatre       |
|      247135 |          41253 | item             | bn            | alias     |
�                   |                    |
|      247136 |          41253 | item             | bn            | alias     |
সিনেমা             | সিনেমা             |
|      247137 |          41253 | item             | bn            | alias     |
Film exhibitor     | film exhibitor     |
|      247138 |          41253 | item             | bn            | alias     |
Matinee            | matinee            |
|      247139 |          41253 | item             | bn            | alias     |
Picture house      | picture house      |
|      247140 |          41253 | item             | bn            | alias     |
Moviegoer          | moviegoer          |
+-------------+----------------+------------------+---------------+-----------+--------------------+--------------------+
10 rows in set (0.05 sec)

mysql:wikiadmin@db35 [wikidatawiki]> select min(term_row_id) from wb_terms
where term_row_id > 247140 AND term_search_key = '';
+------------------+
| min(term_row_id) |
+------------------+
|           254476 |
+------------------+
1 row in set (15.35 sec)

mysql:wikiadmin@db35 [wikidatawiki]> select * from wb_terms where term_row_id =
254476;
+-------------+----------------+------------------+---------------+-----------+-----------+-----------------+
| term_row_id | term_entity_id | term_entity_type | term_language | term_type |
term_text | term_search_key |
+-------------+----------------+------------------+---------------+-----------+-----------+-----------------+
|      254476 |          41607 | item             | bn            | alias     |
�          |                 |
+-------------+----------------+------------------+---------------+-----------+-----------+-----------------+
1 row in set (0.00 sec)


These show as a square box on my shell, but are having a resultant
term_search_key that is ''.

This makes manually finding a starting point difficult, as above.
--only-missing would help, but it's still going to go through the process of
finding all these rows that are apparently still '', attempting to repopulate
them, and then find the next one. This might take a while.

So my first point is, why is the term_search_key coming out as ''? Is this
correct? If necessary, we can try and get the results dumped somewhere so we
can work out what said character is.. Or with the IDs above, you might be able
to find out through the end user interface.


I can/will start the script again when the replag is fixed. In the meantime,
finding out if the above is right/wrong/we don't care would be useful

-- 
You are receiving this mail because:
You are watching all bug changes.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to