RE: PDFTextStripper parsing problem IBM Linux -Dibm.stream.nio

Jerkins, Devan Tue, 09 Mar 2010 07:33:10 -0800

Below is a more complete answer from IBM about the -Dibm.stream.nio

-Dibm.stream.nio=[true | false]
This option addresses the ordering of IO and NIO converters. When this
option is set to true, the NIO converters are used instead of the IO
converters.


NIO stands for New IO.  The  NIO package was introduced from 1.4.0
onwards in order to overcome some of the short comings of the IO.

By default, IBM java uses IO converter because IO converters performs
better performance wise. Customer may use NIO converter by setting the
option -Dibm.stream.nio=true.
USAGE:
java -Dibm.stream.nio=true <app-name>

The reason that SUN and IBM JDK's differs in their behavior lies in the
point to which convertor each of the JVM defaults to. IBM defaults to
use IO converters which throws exceptions on errors whereas SUN defaults
to NIO converters which donot throw exceptions. SUN made this change
from 1.4.1 onwards and we choose not to adopt it as performance wise, IO
convertors are better.

The issue you are experiencing is actually IBM VM limitation and a
result of compromise between functionality & performance.

you can refer sdkandruntimeguide.win32.en.htm from IBM java5 SDK for
details on this jvm option.

Also note that many of the customers reported the similar issue in the
past and we had suggested the same work-around.


-----Original Message-----
From: Jerkins, Devan
Sent: Thursday, February 25, 2010 7:56 AM
To: [email protected]
Subject: RE: PDFTextStripper parsing problem IBM Linux

It sounds like the known issue, but I do see a difference. The PDF that I'm 
using can be read correctly on Linux when it isn't running in WAS and it can be 
read correctly when it is running on WAS in a Windows environment. The problem 
seems to be with IBM JVM environment on Linux. I'm planning on asking IBM about 
it, if I get an answer or find a work around I'll post back. If anyone has an 
ideals on how to solve it, let me know.

Many thanks,

Devan J

-----Original Message-----
From: Andreas Lehmkuehler [mailto:[email protected]]
Sent: Thursday, February 25, 2010 12:40 AM
To: [email protected]
Subject: Re: PDFTextStripper parsing problem IBM Linux

Hi,

Jerkins, Devan schrieb:
> I'm trouble getting the PDFTextStripper to correctly translating non word 
> characters. It reads "1" and passes back "one", " "
> and passes back "space". Has anyone seen this before and knows how to fix it. 
> This only happens when I run my code in IBM
 > WAS on Linux, if I run it on IBM WAS on Windows it works fine (i.e. "1"
returns "1"). The only way I was able to get it
 > to work on linux was to try a PDF that had embedded fonts.
Sounds like an already known issue [1]

BR
Andreas Lehmkühler

[1] https://issues.apache.org/jira/browse/PDFBOX-595


This e-mail and any attachments contain information belonging to the sender 
which may be confidential, proprietary, legally privileged, or otherwise 
protected from disclosure. This information is intended for the use of the 
addressee(s) only. If you are not the intended recipient (or authorized agent), 
you are hereby notified that you have received this e-mail transmission in 
error and that any review, retention, disclosure, copying, dissemination, 
printing, saving, or any other use of, or the taking of any action in reliance 
on the contents of this e-mail is strictly prohibited. E-mails exchanged with 
the sender may be retained and produced to others in compliance with applicable 
law. Nothing in this e-mail constitutes an electronic signature unless 
expressly stated otherwise. If you have received this e-mail in error, please 
notify us immediately by reply e-mail to the sender and delete this copy 
without reading it or saving it to your system.

RE: PDFTextStripper parsing problem IBM Linux -Dibm.stream.nio

Reply via email to