Many thanks, Samuele.  Your suggestion and patch worked as promised.

Devin

On Sep 9, 2008, at 1:28 PM, Samuele Kaplun wrote:

Dear Devin,

Il Tuesday 09 September 2008 18:38:39 Devin Bougie ha scritto:
stemming_language=self.stemming_language) # ,self.separators
  File "/usr/lib/python2.4/site-packages/invenio/bibindex_engine.py",
line 300, in get_words_from_fulltext
    if ext[0] == '.':
IndexError: string index out of range

thanks for reporting this. It's currently a bug triggered when you're trying to index some document whose filename has no extension or has an extension not recognized (e.g. in your case testword.docx)... In order to fix it you can just apply the patch I'm attaching to bibindex_engine.py . The bugfix will be available also with the next release. This will solve the crash.

Moreover in order to have special format recognized you should set the
following white-list variable in invenio(-local.conf) (this is the default
configuration available for the next release):

CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS =
hpg ,link,lis,llb,mat,mpp,msg,docx,docm,xlsx,xlsm,xlsb,pptx,pptm,ppsx,ppsm

In this way new Microsoft Office 2007 documents will be recognized.

Best regards,
        Samuele
--
.O.
..O
OOO
<bibindex_engine.patch>


Reply via email to