|
I am trying to convert PDF to text using java. I got as far as reading
and
decoding the stream objects. Some are still a bit strange even after decoding, e.g. (This example was FlateDecode encoded, shown decoded here): "1 g\r/GS2 gs\r0 792 m\r0 792 l\rf\rq\r1 i \r-1 793 614 -794 re\r0 792 m\rW n\r0 792.36 612 -792 re\rW n\r0 0 0 0 k\r0 738 612 -402.1 re\r0 792 m\rf*\r/EmbeddedDocument /MC1 BDC\rQ\rq\r1 i \r0 738 612 -402.1 re\rW* n\r/GS1 gs\rq\r624.4401 0 0 412.72 0.31 331.6501 cm\r/Im1 Do\rQ\rEMC\rQ\rq\r1 i \r-1 793 614 -794 re\r0 792 m\rW n\r0 792.36 612 -792 re\rW n\r/Cs9 cs 1 scn\r/GS1 gs\r-0.16 334.41 611.25 -29.64 re\rf*\r0 0 0 0 K\r0 J 0 j 0.911 w 10 M []0 d\r-0.61 334.87 612.15 -30.56 re\rS\rBT\r/F1 1 Tf\r15.78 0 0 15.79 21.5978 313.7892 Tm\r/Cs10 cs 1 scn\r0.0713 Tc\r0 Tw\r[(\b)-34.3(\t\\012 \f\\015)-34.3(\f )]TJ\rET\r0 0 0 0 k\r444.86 161.15 167.14 -161.15 re\r346.693 313.789 m\rf*\r/EmbeddedDocument /MC2 BDC\rQ\rq\r1 i \r444.86 161.15 167.14 -161.15 re\rW* n\r-1 793 614 -794 re\r0 792 m\rW n\r0 792.36 612 -792 re\rW n\r/GS1 gs\rq\r94.3378 0 0 86.1608 446.88 70.83 cm\r/Im2 Do\rQ\rEMC\rQ\r" Could it be a character encoding problem? Perhaps someone has had the problem before... |
- Re: [PDFdev] unexpected stream content G�ry Ducatel
- Re: [PDFdev] unexpected stream content Leonard Rosenthol
- AW: [PDFdev] unexpected stream content Jens Boschulte
- Re: [PDFdev] unexpected stream content G�ry Ducatel
- RE: [PDFdev] unexpected stream content Aandi Inston
- RE: [PDFdev] unexpected stream content Aandi Inston
