Hello everyone,
I recently got an "AttributeError: 'JsonReader' object has no attribute
'decode'" error from the bibindex task.
What I did before getting the error (with bibsched in 'manual' mode and
no running jobs) was:
a) to to purge all index tables related to index id 20 (authority
author) and
b) remove the 'authority author' set of fields from the authority author
index.
The first part is common practice and I don't think it could be related
to the error.
The second part is the most suspicious, because now the respective index
has NO related fields (this is how I want it to be for the time being
because, until a way to force the tokenizer to deal only with records
from the AUTHORITY collection is found).
Indeed, if I reconnect the "authority author" set of fields to the
"authority author" index, things go back to normal.
Could you replicate it in a stock/demo invenio site?
If this is true, I would happily create a ticket so that similar
situations (indexes with no fields to get data from) are handled
properly in the future.
The excerpt from the bibedit log is the following:
2014-02-07 12:01:35 --> No new authority records added. idxPHRASE18F is up to
date
2014-02-07 12:01:35 --> idxPHRASE18F contains 230 words from 87272 records
2014-02-07 12:01:35 --> idxPHRASE18F is in consistent state
2014-02-07 12:01:35 --> idxWORD20F contains 0 words from 0 records
2014-02-07 12:01:35 --> idxWORD20F is in consistent state
2014-02-07 12:01:35 --> idxWORD20F for 204053-204053 is in consistent state
2014-02-07 12:01:35 --> idxWORD20F adding records #204053-#204053 started
2014-02-07 12:01:35 --> Exception caught: 'JsonReader' object has no attribute
'decode'
2014-02-07 12:01:36 --> idxWORD20F normal wordtable flush started
2014-02-07 12:01:36 --> ...updating 0 words into idxWORD20F started
2014-02-07 12:01:36 --> ...updating 0 words into idxWORD20F ended
2014-02-07 12:01:36 --> ...updating reverse table idxWORD20R started
2014-02-07 12:01:36 --> ...updating reverse table idxWORD20R ended
2014-02-07 12:01:36 --> idxWORD20F normal wordtable flush ended
2014-02-07 12:01:36 --> Traceback (most recent call last):
File "/usr/lib64/python2.6/site-packages/invenio/bibtask.py", line 996, in
_task_run
if callable(task_run_fnc) and task_run_fnc():
File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line
1435, in task_run_core
wordTable.add_recIDs_by_date(task_get_option("modified"),
task_get_option("flush"))
File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line
821, in add_recIDs_by_date
self.add_recIDs(alist, opt_flush)
File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line
757, in add_recIDs
just_processed = self.add_recID_range(i_low, i_high)
File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line
885, in add_recID_range
new_words = tokenizing_function(record)
File
"/opt/invenio/lib/python/invenio/bibindex_tokenizers/BibIndexAuthorTokenizer.py",
line 334, in tokenize_for_words
return self.tokenize_for_words_default(phrase)
File
"/opt/invenio/lib/python/invenio/bibindex_tokenizers/BibIndexAuthorTokenizer.py",
line 299, in tokenize_for_words_default
return super(BibIndexAuthorTokenizer, self).tokenize_for_words(phrase)
File
"/usr/lib64/python2.6/site-packages/invenio/bibindex_tokenizers/BibIndexDefaultTokenizer.py",
line 80, in tokenize_for_words
phrase = wash_for_utf8(phrase)
File "/usr/lib64/python2.6/site-packages/invenio/textutils.py", line 405, in
wash_for_utf8
text.decode("utf-8")
AttributeError: 'JsonReader' object has no attribute 'decode'
2014-02-07 12:01:37 --> Task #255 finished but not resubmitted. [CERROR]
Cheers,
Theodoros Theodoropoulos