Cleaning scanned images is beyond the scope of PDFBox.

Good scanners (e.g. Kodak Alaris) clean images before passing them to the PC.

Tilman

Am 07.06.2018 um 19:49 schrieb Arthur Wang:

Hi, all,


I tried to convert a scanned image file (see attached: original_image.png) into 
a pdf(see attached: converted_pdf) file by using the example ImageToPdf code. 
it actually works very well after some adjustment, however, the converted pdf 
still keep some grey, or dark color marks, is there any way to clean it? I saw 
some commercial software which can scan a homedepot receipt into a very clean 
pdf, not sure if PDFBox can do the same thing? maybe have to get some OCR 
package to further process it?


I also copied the code i used below. The PDFBox version is: pdfbox.2.0.9


thanks for any comment,


Arthur


*****************************

try (PDDocument doc = new PDDocument())
         {
             PDPage page = new PDPage();
             doc.addPage(page);

             PDImageXObject pdImage = PDImageXObject.createFromFile(imagePath, 
doc);

             // draw the image at full size at (x=20, y=20)
             try (PDPageContentStream contents = new PDPageContentStream(doc, 
page))
             {

                  contents.drawImage(pdImage, -20, -80, pdImage.getWidth() / 2, 
pdImage.getHeight() / 2);
             }
             doc.save(pdfPath);



*****************************



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to