This is what I have; I didn't alter it so I believe it's the default: <!-- Solr Cell: http://wiki.apache.org/solr/ExtractingRequestHandler --> <requestHandler name="/update/extract" class="org.apache.solr.handler.extraction.ExtractingRequestHandler" startup="lazy"> <lst name="defaults"> <!-- All the main content goes into "text"... if you need to return the extracted text or do highlighting, use a stored field. --> <str name="fmap.content">text</str> <str name="lowernames">true</str> <str name="uprefix">ignored_</str>
<!-- capture link hrefs but ignore div attributes --> <str name="captureAttr">true</str> <str name="fmap.a">links</str> <str name="fmap.div">ignored_</str> </lst> </requestHandler> -----Original Message----- From: Grant Ingersoll [mailto:gsing...@apache.org] Sent: Monday, January 03, 2011 8:10 PM To: solr-user@lucene.apache.org Subject: Re: Setting up Solr for PDFs on JBoss What's your solrconfig.xml look like for setting up the ExtractingReqHandler? -Grant On Jan 3, 2011, at 4:44 PM, Olson, Ron wrote: > Hi all- > > After testing the PDF import functionality in my local copy of Solr 1.4.1 > with the included Jetty app server, I tried replicating it using my copy of > Solr running in JBoss 5.10 (which uses Tomcat as its servlet container). When > I try to add a PDF, I get an error buried in the stack trace: > > Caused by: org.apache.solr.common.SolrException: Error Instantiating Request > Handler, org.apache.solr.handler.extraction.ExtractingRequestHandler is not a > org.apache.solr.request.SolrRequestHandler > > > I am using multiple cores, but they all use the common "lib" directory, > instead of the "core/lib" directory. This lib directory is what is added to > the classpath when JBoss starts ($JBOSS_HOME/server/solr_test/lib), so all > the jars in this directory should be available to anything in the "deploy" > directory (just mentioning in case people aren't familiar with JBoss). I've > added all the jars from the contrib/extraction/lib directory, as well as the > jars from dist. > > My lib directory is effectively: > > apache-solr-cell-1.4.1.jar easymock.jar > lucene-spellchecker-2.9.3.jar > apache-solr-clustering-1.4.1.jar fontbox-0.1.0.jar > nekohtml-1.9.9.jar > apache-solr-core-1.4.1.jar geronimo-stax-api_1.0_spec-1.0.1.jar > ojdbc14.jar > apache-solr-solrj-1.4.1.jar geronimo-stax-api_1.0_spec-1.0.jar > ooxml-schemas-1.0.jar > asm-3.1.jar icu4j-3.8.jar > pdfbox-0.7.3.jar > bcmail-jdk14-136.jar jcl-over-slf4j-1.5.5.jar > poi-3.5-beta6.jar > bcprov-jdk14-136.jar jempbox-0.2.0.jar > poi-ooxml-3.5-beta6.jar > commons-codec-1.3.jar junit-4.3.jar > poi-scratchpad-3.5-beta6.jar > commons-compress-1.0.jar log4j-1.2.14.jar > slf4j-api-1.5.5.jar > commons-csv-1.0-SNAPSHOT-r609327.jar lucene-analyzers-2.9.3.jar > slf4j-jdk14-1.5.5.jar > commons-fileupload-1.2.1.jar lucene-core-2.9.3.jar > tika-core-0.4.jar > commons-httpclient-3.1.jar lucene-highlighter-2.9.3.jar > tika-parsers-0.4.jar > commons-io-1.4.jar lucene-memory-2.9.3.jar > wstx-asl-3.2.7.jar > commons-lang-2.1.jar lucene-misc-2.9.3.jar > xercesImpl-2.8.1.jar > commons-logging-1.1.1.jar lucene-queries-2.9.3.jar > xml-apis-1.0.b2.jar > dom4j-1.6.1.jar lucene-snowball-2.9.3.jar > xmlbeans-2.3.0.jar > > I know several of these jars are already essentially present in JBoss (log4j, > for example), but I'm at a loss as to what to remove/add to get it to work. > Anyone have any ideas of configuring it under JBoss? The other cores are > database-based (thus the use of ojdbc14.jar), and they work fine. > > Thanks for any help, > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. -------------------------- Grant Ingersoll http://www.lucidimagination.com DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.