Dear Devin,
Il Tuesday 09 September 2008 18:38:39 Devin Bougie ha scritto:
> stemming_language=self.stemming_language) # ,self.separators
> File "/usr/lib/python2.4/site-packages/invenio/bibindex_engine.py",
> line 300, in get_words_from_fulltext
> if ext[0] == '.':
> IndexError: string index out of range
thanks for reporting this. It's currently a bug triggered when you're trying
to index some document whose filename has no extension or has an extension
not recognized (e.g. in your case testword.docx)... In order to fix it you
can just apply the patch I'm attaching to bibindex_engine.py . The bugfix
will be available also with the next release. This will solve the crash.
Moreover in order to have special format recognized you should set the
following white-list variable in invenio(-local.conf) (this is the default
configuration available for the next release):
CFG_WEBSUBMIT_ADDITIONAL_KNOWN_FILE_EXTENSIONS =
hpg,link,lis,llb,mat,mpp,msg,docx,docm,xlsx,xlsm,xlsb,pptx,pptm,ppsx,ppsm
In this way new Microsoft Office 2007 documents will be recognized.
Best regards,
Samuele
--
.O.
..O
OOO
Index: bibindex_engine.py
===================================================================
RCS file: /log/cvsroot/cds-invenio/modules/bibindex/lib/bibindex_engine.py,v
retrieving revision 1.73
diff -u -r1.73 bibindex_engine.py
--- bibindex_engine.py 12 Aug 2008 10:03:29 -0000 1.73
+++ bibindex_engine.py 9 Sep 2008 17:24:29 -0000
@@ -289,7 +289,7 @@
if bibdocfile_url_p(url_direct_or_indirect):
write_message("... url %s is an internal url" % url_direct_or_indirect, verbose=9)
ext = decompose_bibdocfile_url(url_direct_or_indirect)[2]
- if ext[0] == '.':
+ if ext.startswith('.'):
ext = ext[1:].lower()
fulltext_urls = [(ext, url_direct_or_indirect)]
else: