Hi all,

I have a script running collecting data in multiple wikipedia(s), I started
to notice that revision table in lbwiki_p has some incorrect data.

Here is an example:
mysql> select rev_id, rev_user, rev_page, rev_deleted, rev_len,
rev_timestamp from revision where rev_id = 185751;
+--------+----------+----------+-------------+---------+----------------+
| rev_id | rev_user | rev_page | rev_deleted | rev_len | rev_timestamp  |
+--------+----------+----------+-------------+---------+----------------+
| 185751 |      580 |    83446 |           0 |    NULL | 20061203231418 |
+--------+----------+----------+-------------+---------+----------------+

mysql> select rev_id, rev_page, rev_len from revision where rev_page =
83446 and rev_timestamp < 20061203231418;
+--------+----------+---------+
| rev_id | rev_page | rev_len |
+--------+----------+---------+
| 115478 |    83446 |    NULL |
| 118003 |    83446 |    NULL |
| 118009 |    83446 |    NULL |
| 138010 |    83446 |    NULL |
+--------+----------+---------+

According to my understanding if a record exist rev_len shouldn't be NULL,
if the revision deleted then rev_deleted should get flag but rev_length
should remain as it is.

Hope someone can look into this, because people who are doing analysis
might end up getting wrong results.

Best;
--
Anuradha Uduwage (Anu)
_______________________________________________
Toolserver-l mailing list ([email protected])
https://lists.wikimedia.org/mailman/listinfo/toolserver-l
Posting guidelines for this list: 
https://wiki.toolserver.org/view/Mailing_list_etiquette

Reply via email to