Text Extractor issue?

hsp_ Wed, 16 Jul 2008 06:14:56 -0700

Hi;

I have added a huge amount of files in the repository, some of them with the
".sxw" extension and recognized (by sun.net.www.MimeTable and, if without
sucess, after by a table of mimetypes in my application) like
"application/vnd.sun.xml.writer" and the jcr:mimetype was with this value.
Nowadays, I tried to search some documents .sxw by content and they not
returned. So, I saw that in the class OpenOfficeTextExtractor that only the
mimetypes :"application/vnd.oasis.opendocument.database",                       
   
"application/vnd.oasis.opendocument.formula",                          
"application/vnd.oasis.opendocument.graphics",                          
"application/vnd.oasis.opendocument.presentation",                          
"application/vnd.oasis.opendocument.spreadsheet",                          
"application/vnd.oasis.opendocument.text"
would be recognized and indexed by the extractor, is it true?
This means that my application must force the mimetype for some in this
list, in the case of extensions that have another mimetype? Is the class
able to index such kind of openoffice format?
What the solution for my case?


(I am thinking about to update the jcr:mimetype to the
"application/vnd.oasis.opendocument.text" value and redo the indexes, this
would resolve the case by the moment?)
-- 
View this message in context: 
http://www.nabble.com/Text-Extractor-issue--tp18487165p18487165.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.

Text Extractor issue?

Reply via email to