It didnt work for me :( Thanks anyway I have tried with all mimeTypes....msword...openoffice...and none of my documents return in the search result! I really dont know what is missing! If someone has any idea....
Thanks On Thu, Feb 28, 2008 at 2:26 PM, Sean Callan <[EMAIL PROTECTED]> wrote: > Hi Katia, > > This was an issue I was wrestling with for some time and I hope that these > emails will lead to changes on the Jackrabbit website. There are > additional > dependencies not listed for the text extractors. > > http://poi.apache.org/ > http://www.pdfbox.org/ > tm-extractors-0.4.jar <http://www.pdfbox.org/tm-extractors-0.4.jar> (lost > the url) > > These are not directly mentioned on the site but mention you should look > in > one of the maven jars. There is absolutely no reason these should not be > listed as dependencies themselves. > > Hope this works for you. > > On Thu, Feb 28, 2008 at 8:05 AM, Katia Santos <[EMAIL PROTECTED]> > wrote: > > > Hello, > > > > Im trying to search binary content wiht the following query : > > //*[jcr:contains(jcr:data,'myword')] but I dont get any results. > > > > I know that my node has to be of type nt:resource, and has to have the > > properties jcr:data, jcr:mimeType and jcr:lastModified. > > Can the ParentNode of this resourceNode be of any type? or it has to > > be of a specific type?? > > > > I´m doing something like this: > > > > Node parentNode = noActual.addNode(IConstantsEcm.MY_PARENT_NODE); > > Node childNode= noDocumento.addNode(IConstantsEcm.MY_CHILD_NODE, > > "nt:resource"); > > noConteudo.setProperty("jcr:data", binaryData); > > noConteudo.setProperty("jcr:mimeType", > "application/pdf"); > > Calendar c = new GregorianCalendar(); > > noConteudo.setProperty("jcr:lastModified", c); > > > > > > When I create the parent node I´m not specifying the type, so it is > > going to be an unstructured node, is it possible to search with full > > text in a resource node that its a child of an unstructured node? > > > > If it is...please can someone tell me whats missing? > > > > > > In my workspace configuration I have: > > > > <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex > "> > > .... > > <param name="analyzer" > > value="org.apache.lucene.analysis.standard.StandardAnalyzer"/>- > > <param name="queryClass" > > value="org.apache.jackrabbit.core.query.QueryImpl"/> > > ... > > <param name="textFilterClasses" > > value="org.apache.jackrabbit.core.query.MsExcelTextExtractor, > > org.apache.jackrabbit.core.query.MsPowerPointTextExtractor, > > org.apache.jackrabbit.core.query.MsWordTextExtractor, > > org.apache.jackrabbit.core.query.PdfTextExtractor, > > org.apache.jackrabbit.core.query.HTMLTextExtractor, > > org.apache.jackrabbit.core.query.XMLTextExtractor, > > org.apache.jackrabbit.core.query.RTFTextExtractor, > > org.apache.jackrabbit.core.query.OpenOfficeTextExtractor"/> > > .... > > </SearchIndex> > > > > maybe i´m missing something!! > > > > Thank´s > > >
