With RC2 the text is displayed changed (0 for g and ) for 1). But with the latest build (pdfbox-app-2.0.0-20151221.122720-1872) everything is fine. Thanks a lot.
Kind regard Hans Von: Tilman Hausherr <[email protected]> An: [email protected] Datum: 21.12.2015 19:08 Betreff: Re: Using PDFStreamParser Could you retry with the current version? Either get -SNAPSHOT through maven, or from https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-app/2.0.0-SNAPSHOT/ I can't reproduce what you mean (I tested with the trunk), so either I missed it, or (what I suspect) it is a bug that I fixed a short time ago (PDFBOX-3107). However I'm also unable to reproduce it with RC2 and RC1. Tilman Am 21.12.2015 um 16:02 schrieb [email protected]: > Hi! > > I have a very strange behaviour while copying a file with PDFBoxs > PDFStreamParser (RC2). > > I modfied RemoveAllText not to remove any text: > > public static void main( String[] args ) throws IOException > { > ... > PDDocument document = null; > try > { > document = PDDocument.load( new File(args[0]) ); > if( document.isEncrypted() ) > { > System.err.println( "Error: Encrypted documents are > not supported for this example." ); > System.exit( 1 ); > } > for( PDPage page : document.getPages() ) > { > PDFStreamParser parser = new PDFStreamParser(page); > parser.parse(); > List<Object> tokens = parser.getTokens(); > List<Object> newTokens = new ArrayList<Object>(); > for (Object token : tokens) > { > newTokens.add( token ); > } > PDStream newContents = new PDStream( document ); > OutputStream out = newContents > .createOutputStream(COSName.FLATE_DECODE); > ContentStreamWriter writer = new ContentStreamWriter( > out ); > writer.writeTokens( newTokens ); > out.close(); > page.setContents( newContents ); > } > document.save( args[1] ); > } > finally > { > if( document != null ) > { > document.close(); > } > } > } > > I open both PDFs with PDFDebugger and the Contents text view is equal for > both files (see second TJ!). In hex view there are differences with space > (20) an LF-Chars (0A), where eol seems to be inserted/replaced. > > BT > 0 0 0 1 k > /T1_0 1 Tf > 10 0 0 10 32.4181 265.8897 Tm > [ (\037\036\035\034\033\032\031\030\027) -28 > (\026\025\035\024\023\022\025\031\031\030\035\021) ] TJ > /T1_1 1 Tf > 9.8 0 0 10 32.4181 253.8897 Tm > [ (\037\036\035\034\033\032\031\030\027\026\025\024) -53 (\023\022\024) > -53 (\021\020\017\016\024) -53 (\015\023\014\013\012\011\024) -53 > (\010\030\027\026\025\024) -53 (\015\007\020\017\016\024) -53 > (\015\011\024) -53 (\006\025\033\005\025\004\026\003\025\002\026\024) -53 > (\002\001\027\024) -53 (\177\004\025\024) -53 ... TJ > > Consenquently the preview in PDFDebugger (page two!) is the same too. > > Übungskarte 49 (INT 1463), Karte 1/INT 1, Begleitheft für die > Kartenaufgaben im Fach Navigation für den SKS (Ausgabe 2013) > > > > But when opening the new PDF file with Adobe Reader 11.0.10.32 the text > has changed!! 1 is now ), but not für 2013! > > Übungskarte 49 (INT )463), Karte )/INT ), Begleitheft für die > Kartenaufgaben im Fach Navigation für den SKS (Ausgabe 2013) > > On page three Aufgabe is now Auf0abe. > > I have no idea how this can happen. Is there information anywhere else > except in the TJ-Block? The file size (old 960 K, new 1041 K) is slightly > different for 81 pages. > > This is the pdf > https://www.elwis.de/Freizeitschifffahrt/fuehrerscheininformationen/Navigationsaufgaben-SKS.pdf > > > Thanks > > Hans Stemmer > --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

