This may help:
http://www.pdfbox.org/userguide/text_extraction.html#Lucene+Integration
ashwin kumar wrote:
> hi all i am able to convert a pdf in to a text file using pdfbox. and this
> is the code that i used
>
> import org.pdfbox.pdfparser.PDFParser;
> import org.pdfbox.pdmodel.PDDocument;
> i
12 mar 2007 kl. 07.54 skrev ashwin kumar:
ya sorry got it but that link contains only a program to index text
i have
already successfully indexed .txt now want to index pdf
You can not index the PDF. You need to index the text you have
extracted.
>> >content = strip.getText(doc);
ya sorry got it but that link contains only a program to index text i have
already successfully indexed .txt now want to index pdf
On 3/12/07, karl wettin <[EMAIL PROTECTED]> wrote:
12 mar 2007 kl. 07.44 skrev ashwin kumar:
> it says that the requested URL is not found
Compare the URL in you
12 mar 2007 kl. 07.44 skrev ashwin kumar:
it says that the requested URL is not found
Compare the URL in your browser with the URL in the mail. Perhaps
your mail client does not handle the line feed?
On 3/12/07, karl wettin <[EMAIL PROTECTED]> wrote:
12 mar 2007 kl. 07.03 skrev ashwi
it says that the requested URL is not found
On 3/12/07, karl wettin <[EMAIL PROTECTED]> wrote:
12 mar 2007 kl. 07.03 skrev ashwin kumar:
> hi all i am able to convert a pdf in to a text file using pdfbox.
> and this
> is the code that i used
> {
>
>String pdfFile=new String ("D:\\ASHWIN\\
12 mar 2007 kl. 07.03 skrev ashwin kumar:
hi all i am able to convert a pdf in to a text file using pdfbox.
and this
is the code that i used
{
String pdfFile=new String ("D:\\ASHWIN\\res\\ashwin.pdf");
PDDocument doc = PDDocument.load(pdfFile);
PDFTextStripper strip = new PDFTextStr