Sergio- The ClassCastException and the NoSuchMethodException you posted on d...@suggest a classpath problem. I would suggest posting the details of your deployment - what JARs you are using, app server details, etc.
Justin On Thu, Dec 16, 2010 at 9:31 AM, Rojas Buitrago, Sergio <[email protected]>wrote: > Hello. > > > > I’m a newbie in Jackrabbit. > > > > I’m trying to index some content of different types of documents (word, > pdf, xml, …). > > > > I’ve configured the searchIndex in my workspace.xml in this way: > > > > <SearchIndex class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> > > <param name="path" value="${wsp.home}/index"/> > > <param name="supportHighlighting" value="true"/> > > <param > name="textFilterClasses" > value="org.apache.jackrabbit.extractor.MsWordTextExtractor, > > > org.apache.jackrabbit.extractor.MsExcelTextExtractor, > > > org.apache.jackrabbit.extractor.MsPowerPointTextExtractor, > > > org.apache.jackrabbit.extractor.PdfTextExtractor, > > > org.apache.jackrabbit.extractor.OpenOfficeTextExtractor, > > > org.apache.jackrabbit.extractor.RTFTextExtractor, > > > org.apache.jackrabbit.extractor.HTMLTextExtractor, > > > org.apache.jackrabbit.extractor.XMLTextExtractor"/> > > </SearchIndex> > > > > > > When I create a document in the repository, I add the content in this way: > > > > contenido = nodo.addNode("jcr:content", "nt:resource"); > > contenido.setProperty("jcr:data", J_OperacionesSesion > > .*getValueFactory*().createBinary(is)); > > > > MimetypesFileTypeMap mimetypes = > *new*MimetypesFileTypeMap(); > > String *mime* = > mimetypes.getContentType(nodo.getName()); > > contenido.setProperty("jcr:mimeType", "application/pdf" > ); > > > > Afer creating the document, this warning is thrown: > > > > 16.12.2010 13:03:32 *WARN * LazyTextExtractorField: Failed to extract text > from a binary property (LazyTextExtractorField.java, line 180) > > *org.apache.tika.exception.TikaException*: Unable to extract PDF content > > at org.apache.tika.parser.pdf.PDF2XHTML.process(*PDF2XHTML.java:61*) > > at org.apache.tika.parser.pdf.PDFParser.parse(*PDFParser.java:69*) > > at org.apache.tika.parser.CompositeParser.parse(* > CompositeParser.java:120*) > > at org.apache.tika.parser.AutoDetectParser.parse(* > AutoDetectParser.java:101*) > > at org.apache.jackrabbit.core.query.lucene.JackrabbitParser.parse(* > JackrabbitParser.java:189*) > > at > org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run( > *LazyTextExtractorField.java:174*) > > at java.util.concurrent.Executors$RunnableAdapter.call(* > Executors.java:417*) > > at java.util.concurrent.FutureTask$Sync.innerRun(* > FutureTask.java:269*) > > at java.util.concurrent.FutureTask.run(*FutureTask.java:123*) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301( > *ScheduledThreadPoolExecutor.java:65*) > > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(* > ScheduledThreadPoolExecutor.java:168*) > > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(* > ThreadPoolExecutor.java:650*) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(* > ThreadPoolExecutor.java:675*) > > at java.lang.Thread.run(*Thread.java:595*) > > Caused by: *org.apache.pdfbox.exceptions.WrappedIOException*: > OperatorProcessor class org.pdfbox.util.operator.ShowTextGlyph could not be > instantiated > > at org.apache.pdfbox.util.PDFStreamEngine.<init>(* > PDFStreamEngine.java:152*) > > at org.apache.pdfbox.util.PDFTextStripper.<init>(* > PDFTextStripper.java:129*) > > at org.apache.tika.parser.pdf.PDF2XHTML.<init>(*PDF2XHTML.java:69*) > > at org.apache.tika.parser.pdf.PDF2XHTML.process(*PDF2XHTML.java:56*) > > ... 13 more > > Caused by: *java.lang.ClassCastException*: > org.pdfbox.util.operator.ShowTextGlyph > > at org.apache.pdfbox.util.PDFStreamEngine.<init>(* > PDFStreamEngine.java:146*) > > ... 16 more > > > > Later, when I search for the document, filtering by content, in this way: > > > > String consulta = "SELECT * FROM [arch:documento] AS documento WHERE > CONTAINS ( documento.*, 'ubicacion')"; (arch:document extends from > nt:file) > > > > No documents were found. > > > > > > Can you help me please??. > > > > > > Thanks and regards. > > > > > > *Sergio Rojas Buitrago* > > Desarrollo Software > Gestión Documental > > Ronda de Toledo s/n > 13003. Ciudad Real > España > > T +34 926 27 08 49 > > Ext: 237849 > > > > [email protected] > www.indra.es > > [image: indra] > > > > ------------------------------ > Este correo electrónico y, en su caso, cualquier fichero anexo al mismo, > contiene información de carácter confidencial exclusivamente dirigida a su > destinatario o destinatarios. Si no es vd. el destinatario indicado, queda > notificado que la lectura, utilización, divulgación y/o copia sin > autorización está prohibida en virtud de la legislación vigente. En el caso > de haber recibido este correo electrónico por error, se ruega notificar > inmediatamente esta circunstancia mediante reenvío a la dirección > electrónica del remitente. > Evite imprimir este mensaje si no es estrictamente necesario. > > This email and any file attached to it (when applicable) contain(s) > confidential information that is exclusively addressed to its recipient(s). > If you are not the indicated recipient, you are informed that reading, > using, disseminating and/or copying it without authorisation is forbidden in > accordance with the legislation in effect. If you have received this email > by mistake, please immediately notify the sender of the situation by > resending it to their email address. > Avoid printing this message if it is not absolutely necessary. >
