Hello everyone,

I recently got an "AttributeError: 'JsonReader' object has no attribute 'decode'" error from the bibindex task.

What I did before getting the error (with bibsched in 'manual' mode and no running jobs) was: a) to to purge all index tables related to index id 20 (authority author) and b) remove the 'authority author' set of fields from the authority author index.

The first part is common practice and I don't think it could be related to the error. The second part is the most suspicious, because now the respective index has NO related fields (this is how I want it to be for the time being because, until a way to force the tokenizer to deal only with records from the AUTHORITY collection is found).

Indeed, if I reconnect the "authority author" set of fields to the "authority author" index, things go back to normal.

Could you replicate it in a stock/demo invenio site?
If this is true, I would happily create a ticket so that similar situations (indexes with no fields to get data from) are handled properly in the future.

The excerpt from the bibedit log is the following:

2014-02-07 12:01:35 --> No new authority records added. idxPHRASE18F is up to 
date
2014-02-07 12:01:35 --> idxPHRASE18F contains 230 words from 87272 records
2014-02-07 12:01:35 --> idxPHRASE18F is in consistent state
2014-02-07 12:01:35 --> idxWORD20F contains 0 words from 0 records
2014-02-07 12:01:35 --> idxWORD20F is in consistent state
2014-02-07 12:01:35 --> idxWORD20F for 204053-204053 is in consistent state
2014-02-07 12:01:35 --> idxWORD20F adding records #204053-#204053 started
2014-02-07 12:01:35 --> Exception caught: 'JsonReader' object has no attribute 
'decode'
2014-02-07 12:01:36 --> idxWORD20F normal wordtable flush started
2014-02-07 12:01:36 --> ...updating 0 words into idxWORD20F started
2014-02-07 12:01:36 --> ...updating 0 words into idxWORD20F ended
2014-02-07 12:01:36 --> ...updating reverse table idxWORD20R started
2014-02-07 12:01:36 --> ...updating reverse table idxWORD20R ended
2014-02-07 12:01:36 --> idxWORD20F normal wordtable flush ended
2014-02-07 12:01:36 --> Traceback (most recent call last):
  File "/usr/lib64/python2.6/site-packages/invenio/bibtask.py", line 996, in 
_task_run
    if callable(task_run_fnc) and task_run_fnc():
  File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line 
1435, in task_run_core
    wordTable.add_recIDs_by_date(task_get_option("modified"), 
task_get_option("flush"))
  File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line 
821, in add_recIDs_by_date
    self.add_recIDs(alist, opt_flush)
  File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line 
757, in add_recIDs
    just_processed = self.add_recID_range(i_low, i_high)
  File "/usr/lib64/python2.6/site-packages/invenio/bibindex_engine.py", line 
885, in add_recID_range
    new_words = tokenizing_function(record)
  File 
"/opt/invenio/lib/python/invenio/bibindex_tokenizers/BibIndexAuthorTokenizer.py",
 line 334, in tokenize_for_words
    return self.tokenize_for_words_default(phrase)
  File 
"/opt/invenio/lib/python/invenio/bibindex_tokenizers/BibIndexAuthorTokenizer.py",
 line 299, in tokenize_for_words_default
    return super(BibIndexAuthorTokenizer, self).tokenize_for_words(phrase)
  File 
"/usr/lib64/python2.6/site-packages/invenio/bibindex_tokenizers/BibIndexDefaultTokenizer.py",
 line 80, in tokenize_for_words
    phrase = wash_for_utf8(phrase)
  File "/usr/lib64/python2.6/site-packages/invenio/textutils.py", line 405, in 
wash_for_utf8
    text.decode("utf-8")
AttributeError: 'JsonReader' object has no attribute 'decode'
2014-02-07 12:01:37 --> Task #255 finished but not resubmitted. [CERROR]


Cheers,
Theodoros Theodoropoulos

Reply via email to