Greetings to everyone,
We have performed a restore operation (using Tibor's script:
invenio-restore-site) over a newly created invenio site using a backup made
in a different server containing 5mill+ records. All this was carried out
to "replicate" the original server. Everything went as expected and we can
see the data and perform many operations, for example we have added new
collections and they are perfectly usable after a proper webcoll execution.
The problem is that after inserting records, when we call to bibindex like
this:
$ ..... /bin/bibindex -u admin -f 100000 -w title,author
we get this error:
....
2014-05-22 08:51:49 --> idxWORD08F adding records #3001-#4000 started
2014-05-22 08:51:52 --> Exception caught: (1062, "Duplicate entry
'3001-FUTURE' for key 'PRIMARY'")
2014-05-22 08:51:55 --> idxWORD08F normal wordtable flush started
....
with this traceback:
2014-05-22 08:53:14 --> Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/invenio/bibtask.py", line
984, in _task_run
    if callable(task_run_fnc) and task_run_fnc():
  File "/usr/local/lib/python2.7/dist-packages/invenio/bibindex_engine.py",
line 1860, in task_run_core
    wordTable.add_recIDs(final_recIDs, task_get_option("flush"))
  File "/usr/local/lib/python2.7/dist-packages/invenio/bibindex_engine.py",
line 846, in add_recIDs
    just_processed = self.add_recID_range(i_low, i_high)
  File "/usr/local/lib/python2.7/dist-packages/invenio/bibindex_engine.py",
line 940, in add_recID_range
    self.index_virtual_indexes_reversed(wlist, recID1, recID2)
  File "/usr/local/lib/python2.7/dist-packages/invenio/bibindex_engine.py",
line 1027, in index_virtual_indexes_reversed
    run_sql("INSERT INTO %s (id_bibrec,termlist,type) VALUES
(%%s,%%s,'FUTURE')" % wash_table_column_name(tab_name), (recID,
serialize_via_marshal(to_serialize))) # kwalitee: disable=sql
  File "/usr/local/lib/python2.7/dist-packages/invenio/dbquery.py", line
216, in run_sql
    rc = cur.execute(sql, param)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py", line 174, in
execute
    self.errorhandler(self, exc, value)
  File "/usr/lib/python2.7/dist-packages/MySQLdb/connections.py", line 36,
in defaulterrorhandler


Trying to check indexes haven't helped, it says all indexes are ok (by
doing $ ..... /bin/bibindex -u admin -k)
Then we've tried to do a manual reindex on tables following instructions
described in:
http://invenio-demo.cern.ch/help/admin/bibindex-admin-guide#4.3
But we get the same errors (preparing tmp tables)
...
2014-05-21 19:48:20 --> tmp_idxPAIR02F adding records #3950001-#3951000
started
2014-05-21 19:48:21 --> Exception caught: (1062, "Duplicate entry
'3950592-FUTURE' for key 'PRIMARY'")
2014-05-21 19:48:23 --> tmp_idxPAIR02F normal wordtable flush started
...
[Same stacktrace]

We've have attempted to truncate one the index tables we were getting
errors from while updating (i.e: title, TRUNCATE idxWORD08F; TRUNCATE
idxWORD08R) trying a wild guess, but we get the same result. Last thing we
did was clearing global indexes idxWORD01F/R tables and started bibindex
again which will certainly take more than a day, but in the meantime we
though that maybe someone might already have a similar problem or any idea
to try at all.

Thanks.

-- 
Mauricio Acebal
Senior Software Engineer

Frontiers <http://www.frontiersin.org/>
Centro de Empresas - UPM
Campus de Montegancedo
28223 Pozuelo de Alarcón
Madrid

Reply via email to