You could also use the analysis handler to see if your field
definition strips numeric input.

Michael Della Bitta

------------------------------------------------
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Mar 12, 2013 at 4:14 PM, Jack Krupansky <j...@basetechnology.com> wrote:
> Use the "extract only" option for Solr Cell to get the text stream that was
> extracted by Solr Cell/Tika/PDFBox, then manually search through the
> response for some text that is near the "1386", and see what text is output
> in the vicinity of the "1386".
>
> See:
> http://wiki.apache.org/solr/ExtractingRequestHandler#Extract_Only
>
> Three possibilities: 1) the"text" is actually a graphic image (e.g., screen
> capture), 2) the "1386" has an embedded space and is split into two or more
> terms, or 3) the "1386" is getting concatenated with an adjacent term.
>
> -- Jack Krupansky
>
> -----Original Message----- From: JDJ
> Sent: Tuesday, March 12, 2013 3:21 PM
> To: solr-user@lucene.apache.org
> Subject: PDF keyword searches not accurate
>
>
> Hello, everyone.
>
> I'm working (basically for the first time) on a project that requires PDFs
> to be indexed and searched via Solr under ColdFusion Server 9.
>
> I've completed the project, but the client is asking a question that I don't
> have the answer for.
>
> Basically, there is one PDF that has "1386" in it (part of a form
> description) that is not appearing when searching for 1386.  The 1386 is in
> a relatively small PDF (2.7 Mb).  Is there a way to troubleshoot this issue?
> I'm a Solr n00b.
>
> Thank you,
>
>
>
>
> -----
> JDJ
>  "There are two kinds of people in the world;
>      those who understand binary, and
>      those who don't.
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/PDF-keyword-searches-not-accurate-tp4046741.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to