Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread neetika
Hi Erick, I am able to get the result fine. The problem was, I forgot to close the writer and so the index file (.cfs) was not getting generated. Thanks a lot for the timely help. Regards, Neetika Erick Erickson wrote: > > You have NOT supplied an example of the text you extracted > from th

Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread Erick Erickson
You have NOT supplied an example of the text you extracted from the document. But let's assume that the interesting string is exactly what you expect. Have you looked at your index with Luke to see if the data is there? I *strongly* suggest you get a copy of Luke (google lucene luke) to examine i

Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread neetika
Hi Erick, Befoe indexing I have printed the doc, and I have given the output also.It is printing well. Kindly please check my post again following... " System.out.println(doc); //Following code is for making index" and the corresponding output is... Document > Offhand I'd ass

Re: getting problem while indexing pdf files with pdfbox

2007-07-17 Thread Erick Erickson
Offhand I'd assume that your problem is using PDFbox. Have you tried printing out the docText string you get back from docText = stripper.getText(new PDDocument(cosDoc))? I'd recommend you assure yourself that you get valid text back from the PDF document before worrying about indexing it. Bes