That's true, I've even tried to change the rendering text mode to other values already as mentioned in the PDF specs 1.5 table 5.3 before removing it also didn't work. So how to remove the graphics content then?
Best Regards, On Tue, Mar 24, 2015 at 10:06 AM, Maruan Sahyoun <sahy...@fileaffairs.de> wrote: > Hi, > > > Am 24.03.2015 um 09:55 schrieb a7med shre3y <a7med.shr...@gmail.com>: > > > > You can download it from here: > > > https://drive.google.com/file/d/0B5Kxacm1mej-MEZubTNYVVJYTFE/view?usp=sharing > > > > looking more closely you correctly replaced the text, but that text was in > there for searching within the PDF as it used text rendering mode 3 > (invisible). The 'text' you are still seeing is drawn using vector commands > so it's graphics content. > > BR > Maruan > > > > Best Regards, > > > > > > On Tue, Mar 24, 2015 at 9:48 AM, Maruan Sahyoun <sahy...@fileaffairs.de> > > wrote: > > > >> > >> > >>> Am 24.03.2015 um 09:40 schrieb a7med shre3y <a7med.shr...@gmail.com>: > >>> > >>> Hi, > >>> > >>> In fact PDFBox call the operation of transforming "7R %H $SSURYHG" to > "To > >>> Be Approved" as "encoding". Anyway, either it's encoding or decoding, I > >>> thought it's easier to transform "7R %H $SSURYHG" to "To Be Approved" > and > >>> not the opposite (or at least I don't know). I spent some quite long > time > >>> trying to find out how to find the character codes for the glyphs in > the > >>> currently used font, then I found that it's not an easy task. By the > way, > >>> if you know how to do that, I'd so much appreciate it because I need > that > >>> for replacing text with another text and for that the new text must be > >>> encoded the same way as the original! > >>> > >>> Back to the text removal, I am able to find the text and also remove it > >> by > >>> calling reset, as I mentioned in my first email, when I print the > output > >>> content I don't find the text anymore but I still see it when I open > the > >>> file. My first assumption was that there must be some other way to > remove > >>> the text other than the way I am using, and that's what you've actually > >>> confirmed in your reply, so could you please tell me what still > missing? > >>> > >> > >> Could you upload the PDF with the reset text too? > >> > >> BR > >> Maruan > >> > >> > >>> Thanks and regards, > >>> a7mad > >>> > >>> On Tue, Mar 24, 2015 at 9:22 AM, Maruan Sahyoun < > sahy...@fileaffairs.de> > >>> wrote: > >>> > >>>> Hi, > >>>> > >>>>> Am 24.03.2015 um 08:14 schrieb a7med shre3y <a7med.shr...@gmail.com > >: > >>>>> > >>>>> Hi, > >>>>> > >>>>> Here's how I do it: > >>>>> > >>>>> 1. I use the following method to encode the text: > >>>>> > >>>>> String encode(String text, PDFont font) throws Exception { > >>>>> StringBuilder builder = new StringBuilder(); > >>>>> byte[] stringBytes = text.getBytes(); > >>>>> int codeLength = 1; > >>>>> for(int i = 0; i < stringBytes.length; i += codeLength){ > >>>>> String c = font.encode(stringBytes, i, codeLength); > >>>>> if(c == null && (i + 1 < stringBytes.length)){ > >>>>> codeLength++; > >>>>> c = font.encode(stringBytes, i, codeLength); > >>>>> } > >>>>> builder.append(c); > >>>>> } > >>>>> return builder.toString(); > >>>>> } > >>>>> > >>>>> 2. Iterating through the tokens, I find the text either it's a > >> COSString > >>>>> ("Tj" operator) or a COSArray ("TJ" operator) then check if it's the > >> text > >>>>> I'm looking for to remove as following: > >>>>> > >>>>> if (op.getOperation().equals("Tj")) { > >>>>> COSString previous = (COSString) > tokens.get(j > >>>> - > >>>>> 1); > >>>>> String string = previous.getString(); > >>>>> String encodedString = encode(string, font); > >>>> > >>>> that string is already encoded. So you'd need to encode "To Be > Approved" > >>>> and compare if that matches the string you are reading from the PDF. > >>>> > >>>>> if(encodedString.contains("To Be > Approved")){ > >>>>> previous.reset(); > >>>>> } > >>>>> } else if (op.getOperation().equals("TJ")) { > >>>>> COSArray previous = (COSArray) tokens.get(j > - > >>>>> 1); > >>>>> StringBuilder stringBuilder = new > >>>>> StringBuilder(); > >>>>> for (int k = 0; k < previous.size(); k++) { > >>>>> Object arrElement = > >> previous.getObject(k); > >>>>> if (arrElement instanceof COSString) { > >>>>> COSString cosString = (COSString) > >>>>> arrElement; > >>>>> > >>>>> stringBuilder.append(cosString.getString()); > >>>>> } > >>>>> } > >>>>> String string = stringBuilder.toString(); > >>>>> String encodedString = encode(string, font); > >>>>> if(encodedString.contains("To Be > Approved")){ > >>>>> previous.clear(); > >>>>> } > >>>>> } > >>>>> > >>>>> Note: > >>>>> In case of COSArray, I first iterate through the whole array to get > the > >>>>> whole string before encoding and comparison and this works. > >>>>> > >>>>> Best Regards, > >>>>> a7mad > >>>>> > >>>>> > >>>>> > >>>>> On Mon, Mar 23, 2015 at 10:48 PM, Maruan Sahyoun < > >> sahy...@fileaffairs.de > >>>>> > >>>>> wrote: > >>>>> > >>>>>> Hi, > >>>>>> > >>>>>> your text is encoded so within the show text operator Tj the string > is > >>>>>> > >>>>>> 7R %H $SSURYHG > >>>>>> > >>>>>> You wrote that you encode your string to find it - what do you get? > >>>>>> > >>>>>> BR > >>>>>> Maruan > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Am 23.03.2015 um 22:01 schrieb a7med shre3y < > a7med.shr...@gmail.com > >>> : > >>>>>>> > >>>>>>> Hi Maruan, > >>>>>>> > >>>>>>> Here's a link from where you can download the PDF. > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >> > https://drive.google.com/file/d/0B5Kxacm1mej-bm82NzNvUXFPSmMtUjc0ZFVjVVlrODZnRzdn/view?usp=sharing > >>>>>>> > >>>>>>> Kind Regards, > >>>>>>> a7mad > >>>>>>> > >>>>>>> On Mon, Mar 23, 2015 at 8:57 PM, Maruan Sahyoun < > >>>> sahy...@fileaffairs.de> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> you need to upload it to a public location as the mailing list > >> doesn't > >>>>>>>> support attachments. > >>>>>>>> > >>>>>>>> BR > >>>>>>>> Maruan > >>>>>>>> > >>>>>>>>> Am 23.03.2015 um 19:18 schrieb a7med shre3y < > >> a7med.shr...@gmail.com > >>>>> : > >>>>>>>>> > >>>>>>>>> Dear Maruan, > >>>>>>>>> > >>>>>>>>> Thank you very much for the information. Please find herewith > >>>> attached > >>>>>>>> the PDF to reproduce the problem. > >>>>>>>>> The text to remove is: "To Be Approved". The text has a > multi-byte > >>>>>>>> encoding, so I call first to encode it in order to find it then > >> remove > >>>>>> it. > >>>>>>>>> > >>>>>>>>> Best Regards, > >>>>>>>>> a7mad > >>>>>>>>> > >>>>>>>>>> On Mon, Mar 23, 2015 at 4:13 PM, Maruan Sahyoun < > >>>>>> sahy...@fileaffairs.de> > >>>>>>>> wrote: > >>>>>>>>>> Dear a7mad, > >>>>>>>>>> > >>>>>>>>>> removing text from a PDF is not an easy task as > >>>>>>>>>> - text which might visually appear as a single item might > >> consistent > >>>>>> of > >>>>>>>> individual parts within the PDF itself e.g. each character or > groups > >>>> of > >>>>>>>> characters are place individually in different COSStrings > >>>>>>>>>> - text might be drawn using graphics commands > >>>>>>>>>> - text can appear within different parts of the PDF (e.g. the > text > >>>>>>>> might be content of a form field AND the annotation representing > the > >>>>>> form > >>>>>>>> field visually) > >>>>>>>>>> - you need to look up the encoding information to get form the > >>>>>>>> characters in the PDF "string" to the ones you are looking for > >>>>>>>>>> …. > >>>>>>>>>> > >>>>>>>>>> If you can post a specific PDF to a public location and describe > >> in > >>>>>>>> detail which string should have been replaced which hasn't I will > be > >>>>>> able > >>>>>>>> to tell you why that might have happened. > >>>>>>>>>> > >>>>>>>>>> Maruan > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> Am 23.03.2015 um 15:03 schrieb a7med shre3y < > >>>> a7med.shr...@gmail.com > >>>>>>> : > >>>>>>>>>>> > >>>>>>>>>>> Hi all, > >>>>>>>>>>> > >>>>>>>>>>> Currently I am facing a strange problem removing text from the > >> some > >>>>>>>> PDFs. > >>>>>>>>>>> My program is able to find the text and "remove it" by calling > >> the > >>>>>>>>>>> COSString.reset() method. > >>>>>>>>>>> The problem is, when I open the output PDF file, I still see > the > >>>> text > >>>>>>>> but > >>>>>>>>>>> not selectable (I mean when I try to highlight it with the > mouse > >> to > >>>>>>>> copy > >>>>>>>>>>> it, it's not selectable!). When print the content (tokens) of > the > >>>>>>>> output > >>>>>>>>>>> file, I DO NOT find the text at all!! > >>>>>>>>>>> > >>>>>>>>>>> I am currently stuck in the PDF specifications 1.5 and really > >>>> running > >>>>>>>> out > >>>>>>>>>>> of time. > >>>>>>>>>>> > >>>>>>>>>>> I'd so much appreciate any help or any idea on what's going on. > >>>>>>>>>>> > >>>>>>>>>>> Notes: > >>>>>>>>>>> 1. I use use PDFBox 1.7.1 > >>>>>>>>>>> 2. This problem does not occur with all PDFs, only some PDFs > >> cause > >>>>>>>> this > >>>>>>>>>>> problem. > >>>>>>>>>>> > >>>>>>>>>>> Thank you very much. > >>>>>>>>>>> a7mad > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>> --------------------------------------------------------------------- > >>>>>>>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>>>>>>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >> --------------------------------------------------------------------- > >>>>>>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>>>>>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>>>>>>> > >>>>>> > >>>>>> > >>>>>> > --------------------------------------------------------------------- > >>>>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>>>>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>>>>> > >>>>>> > >>>> > >>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >>>> For additional commands, e-mail: users-h...@pdfbox.apache.org > >>>> > >>>> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > >> For additional commands, e-mail: users-h...@pdfbox.apache.org > >> > >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: users-unsubscr...@pdfbox.apache.org > For additional commands, e-mail: users-h...@pdfbox.apache.org > >