You could also use the analysis handler to see if your field definition strips numeric input.
Michael Della Bitta ------------------------------------------------ Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Tue, Mar 12, 2013 at 4:14 PM, Jack Krupansky <j...@basetechnology.com> wrote: > Use the "extract only" option for Solr Cell to get the text stream that was > extracted by Solr Cell/Tika/PDFBox, then manually search through the > response for some text that is near the "1386", and see what text is output > in the vicinity of the "1386". > > See: > http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only > > Three possibilities: 1) the"text" is actually a graphic image (e.g., screen > capture), 2) the "1386" has an embedded space and is split into two or more > terms, or 3) the "1386" is getting concatenated with an adjacent term. > > -- Jack Krupansky > > -----Original Message----- From: JDJ > Sent: Tuesday, March 12, 2013 3:21 PM > To: solr-user@lucene.apache.org > Subject: PDF keyword searches not accurate > > > Hello, everyone. > > I'm working (basically for the first time) on a project that requires PDFs > to be indexed and searched via Solr under ColdFusion Server 9. > > I've completed the project, but the client is asking a question that I don't > have the answer for. > > Basically, there is one PDF that has "1386" in it (part of a form > description) that is not appearing when searching for 1386. The 1386 is in > a relatively small PDF (2.7 Mb). Is there a way to troubleshoot this issue? > I'm a Solr n00b. > > Thank you, > > > > > ----- > JDJ > "There are two kinds of people in the world; > those who understand binary, and > those who don't. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/PDF-keyword-searches-not-accurate-tp4046741.html > Sent from the Solr - User mailing list archive at Nabble.com.