Many thanks, Samuele. Your suggestion and patch worked as promised.
Devin
On Sep 9, 2008, at 1:28 PM, Samuele Kaplun wrote:
Dear Devin,
Il Tuesday 09 September 2008 18:38:39 Devin Bougie ha scritto:
stemming_language=self.stemming_language) # ,self.separators
File "/usr/lib/python2.4/site-packages/invenio/bibindex_engine.py",
line 300, in get_words_from_fulltext
if ext[0] == '.':
IndexError: string index out of range
thanks for reporting this. It's currently a bug triggered when
you're trying
to index some document whose filename has no extension or has an
extension
not recognized (e.g. in your case testword.docx)... In order to fix
it you
can just apply the patch I'm attaching to bibindex_engine.py . The
bugfix
will be available also with the next release. This will solve the
crash.
Moreover in order to have special format recognized you should set the
following white-list variable in invenio(-local.conf) (this is the
default
configuration available for the next release):
CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS =
hpg
,link,lis,llb,mat,mpp,msg,docx,docm,xlsx,xlsm,xlsb,pptx,pptm,ppsx,ppsm
In this way new Microsoft Office 2007 documents will be recognized.
Best regards,
Samuele
--
.O.
..O
OOO
<bibindex_engine.patch>