DO NOT REPLY [Bug 50471] Greek Extended character throwing ArrayIndexOutOfBoundException.

2012-04-19 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

Dominik Stadler dominik.stad...@gmx.at changed:

   What|Removed |Added

 CC||dominik.stad...@gmx.at
 Blocks||49636, 41999

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 50471] Greek Extended character throwing ArrayIndexOutOfBoundException.

2012-04-01 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

Glenn Adams gl...@skynav.com changed:

   What|Removed |Added

 Status|RESOLVED|CLOSED

--- Comment #6 from Glenn Adams gl...@skynav.com 2012-04-01 06:18:03 UTC ---
batch transition to closed; if someone wishes to restore one of these to
resolved in order to perform a verification step, then feel free to do so

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 50471] Greek Extended character throwing ArrayIndexOutOfBoundException.

2011-01-07 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

--- Comment #4 from Andreas L. Delmelle adelme...@apache.org 2011-01-07 
07:31:03 EST ---
(In reply to comment #3)
 At least there should be some configuration available to the end user to tell
 FOP to use some default line break in such special cases it becomes specific 
 to
 the customer who is using FOP. Just because of some special character the
 entire PDF generation should not be put in stake. Isn't it ? If given a choice
 to the customer to choose from set of options, to get rid of this situation
 then it is better, rather than crashing.

Very right indeed. 
So, if no one objects, I will apply the patch as proposed. FOP will no longer
crash, but simply show a '#' for such unassigned codepoints in the output.
Treating them as regular alphabetic characters seems to be safe enough for the
time being.
Customization of and/or more refined configuration possibilities for the
Unicode line-breaking algorithm is something that is still on the wish-list for
the longer term.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 50471] Greek Extended character throwing ArrayIndexOutOfBoundException.

2011-01-07 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

Andreas L. Delmelle adelme...@apache.org changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution||FIXED

--- Comment #5 from Andreas L. Delmelle adelme...@apache.org 2011-01-07 
16:28:06 EST ---

Fixed in Trunk. See: http://svn.apache.org/viewvc?rev=1056518view=rev

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 50471] Greek Extended character throwing ArrayIndexOutOfBoundException.

2011-01-06 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

--- Comment #2 from Chris Bowditch bowditch_ch...@hotmail.com 2011-01-06 
06:48:25 EST ---
Indeed you raise a very good point Andreas. Even if you make the code change, I
would expect # to appear in the output, because no font is likely to have a
glyph for a reserved code point. So I am also interested to hear the business
reason for using such a code point.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 50471] Greek Extended character throwing ArrayIndexOutOfBoundException.

2011-01-06 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

--- Comment #3 from tvsud...@rediffmail.com 2011-01-06 23:36:48 EST ---
Andreas,

Thanks a lot for your response. 

Actually we came across some special characters which are not intended to be
present in our database. We can figure out the reasons for this corruption and
correct but then I do expect FOP to display whatever content is available.
Whatever may be the character, till it is a valid code-point (even though it is
reserved and do not have any representation of its own) I do not expect FOP to
crash due to it.

At least there should be some configuration available to the end user to tell
FOP to use some default line break in such special cases it becomes specific to
the customer who is using FOP. Just because of some special character the
entire PDF generation should not be put in stake. Isn't it ? If given a choice
to the customer to choose from set of options, to get rid of this situation
then it is better, rather than crashing.

Frankly speaking, we lost the hope of getting some response on this issue from
Apache. We searched for this problem in google and we have seen many other guys
complaining about similar issue (i.e. getting ArrayIndexOutofBoundsException).
I believe they also might be having some reserved character in their text. We
at least nailed down the cause of the problem. A proper resolution to this
issue is of great help, not only to me but many others. 

Thanks again for looking into it and discussing about it in the forum.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


DO NOT REPLY [Bug 50471] Greek Extended character throwing ArrayIndexOutOfBoundException.

2011-01-05 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=50471

--- Comment #1 from Andreas L. Delmelle adelme...@apache.org 2011-01-05 
13:31:26 EST ---

Thanks for reporting, and apologies for the late reply...

At first glance, this seems like a minor oversight in the implementation of
Unicode linebreaking in FOP. This does not take into account the possibility
that a given codepoint is not assigned a 'class' in linebreaking context. (=
U+1F7E does not appear in the file
http://www.unicode.org/Public/UNIDATA/LineBreak.txt, which is used as a basis
to generate those arrays in LineBreakUtils.java)

On the other hand, one could obviously raise the question why you so
desperately need to have an unassigned codepoint in your output. Are you
absolutely sure you need this? If yes, then can you elaborate on the exact
reason? (i.e. What exactly is this unassigned codepoint used for?)

The most straightforward 'fix' seems to be roughly as follows:

Index: src/java/org/apache/fop/text/linebreak/LineBreakStatus.java
===
--- src/java/org/apache/fop/text/linebreak/LineBreakStatus.java(revision
1054383)
+++ src/java/org/apache/fop/text/linebreak/LineBreakStatus.java(working
copy)
@@ -87,6 +87,7 @@

 /* Initial conversions */
 switch (currentClass) {
+case 0: // Unassigned codepoint: consider as AL?
 case LineBreakUtils.LINE_BREAK_PROPERTY_AI:
 case LineBreakUtils.LINE_BREAK_PROPERTY_SG:
 case LineBreakUtils.LINE_BREAK_PROPERTY_XX:

What this does, is assign the class 'AL' or 'Alphabetic' to any codepoint that
has not been assigned a class by Unicode. This means it will be treated as a
regular letter.
Now, the reason why I am asking the question whether you are sure you know what
you're doing, is that this may turn out to be undesirable. Perhaps the
character in question needs to be treated as a space rather than a letter.
Unicode does not define U+1F7E other than as a 'reserved' character, so it
makes sense that Unicode cannot say what should happen with this character in
the context of linebreaking...

That said, it is also wrong of FOP to crash in this case, so the bug is
definitely genuine.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.