Hi Marcel, You've answered all my questions in one morning, what a great thing! I did realize my issue with the dependencies for text extraction the other day by stumbling upon the pom.xml. In addition I found there are some other dependencies necessary for PDF extraction listed on the PDFBox website.
Thanks for much for all your help, I owe you a beer, let me know when you're in Washington, DC! Thanks, Sean On Fri, Feb 29, 2008 at 5:04 AM, Marcel Reutegger <[EMAIL PROTECTED]> wrote: > Hi Sean, > > did you include all required jar files into your classpath? e.g. text > extraction > from MS office documents requires apache poi. see dependencies here: > > http://svn.apache.org/repos/asf/jackrabbit/tags/1.4/jackrabbit-text-extractors/pom.xml > > regards > marcel > > Sean Callan wrote: > > Hi guys, > > > > Would anyone be so kind as to send me a functional repository > configuration > > that indexes a variety of nt:files types? I'm using the follow > > configuration and I am unable to search for a term within any of my > binary > > content (nt:files > jcr:content). > > > > At this point I'm out of ideas, the correct jars are in place, searching > > works on all my plain text nodes, I can even see that the index is > updated > > when I add in new nt:files nodes. But a search returns nothing. At > this > > point search is the only thing holding back my development and client's > > acceptance of JackRabbit as our repository. > > > > Any help would be greatly appreciated! > > > > <?xml version="1.0" encoding="ISO-8859-1"?> > > <Repository> > > <FileSystem > > class="org.apache.jackrabbit.core.fs.local.LocalFileSystem > "> > > <param name="path" value="${rep.home}/repository"/> > > </FileSystem> > > <Security appName="Jackrabbit"> > > <AccessManager > > class=" > > org.apache.jackrabbit.core.security.SimpleAccessManager"/> > > <LoginModule class=" > > org.apache.jackrabbit.core.security.SimpleLoginModule"> > > <param name="userid" value="" /> > > </LoginModule> > > </Security> > > <Workspaces > > rootPath="${rep.home}/workspaces" > > defaultWorkspace="default" /> > > <Workspace name="${wsp.name}"> > > <FileSystem > > class=" > org.apache.jackrabbit.core.fs.local.LocalFileSystem > > "> > > <param name="path" value="${wsp.home}"/> > > </FileSystem> > > <PersistenceManager > > class=" > > org.apache.jackrabbit.core.state.xml.XMLPersistenceManager" /> > > <SearchIndex > > class="org.apache.jackrabbit.core.query.lucene.SearchIndex"> > > > > <param name="path" value="${wsp.home}/index" /> > > <param name="textFilterClasses" value=" > > org.apache.jackrabbit.extractor.PlainTextExtractor, > > org.apache.jackrabbit.extractor.MsWordTextExtractor, > > org.apache.jackrabbit.extractor.MsExcelTextExtractor, > > > org.apache.jackrabbit.extractor.MsPowerPointTextExtractor, > > org.apache.jackrabbit.extractor.PdfTextExtractor, > > org.apache.jackrabbit.extractor.OpenOfficeTextExtractor, > > org.apache.jackrabbit.extractor.RTFTextExtractor, > > org.apache.jackrabbit.extractor.HTMLTextExtractor, > > org.apache.jackrabbit.extractor.XMLTextExtractor"/> > > </SearchIndex> > > </Workspace> > > <Versioning rootPath="${rep.home}/versions"> > > <FileSystem > > class=" > org.apache.jackrabbit.core.fs.local.LocalFileSystem > > "> > > <param name="path" value="${rep.home}/versions"/> > > </FileSystem> > > <PersistenceManager > > class=" > > org.apache.jackrabbit.core.state.xml.XMLPersistenceManager" /> > > </Versioning> > > </Repository> > > > > Thanks, > > Sean > > > >
