Re: Merge Request - Temp_ComplexScripts into Trunk

2011-10-20 Thread Simon Pepping
Jonathan,

Obviously, FOP's strongest supporters over the past years do not
require this new functionality. FOP needs the additional support of
new stakeholders of this new functionality. Could your teams test it
on their documents and report their findings to the fop-user email
list?

Simon Pepping

On Wed, Oct 19, 2011 at 03:20:40PM -0400, Jonathan Levinson wrote:
> We -- at InterSystems -- deploy an application that runs in upwards of 40 
> countries, using many of the languages for which complex script support is 
> required.
> 
> We definitely need complex script support.  It is a requirement for us.


Re: Merge Request - Temp_ComplexScripts into Trunk

2011-10-20 Thread Chris Bowditch

On 19/10/2011 19:32, Simon Pepping wrote:

Hi Simon,

I think you misunderstood my mail. I don't want to stop the merge. I 
simply thought it was an appropriate time to discuss some concerns that 
Vincent and Peter had identified. You are preaching to the converted 
about the need for supporting Complex scripts. It is an urgent 
requirement for us too.


If we don't discuss our concerns over the code now, then when do we 
discuss it?


Vincent and Peter will be replying to this thread shortly and will 
outline their primary concerns then.


Thanks,

Chris


Over the past ten years computing has pervaded life in all its facets,
and spread over the world. As a consequence computing is now used in
all languages and all scripts.

When I open my devanagari test file in emacs, it just works. When I
open it in firefox, it just works. The same when I open it in
LibreOffice Writer. I am sure that, if I would open it in *the* *Word*
processor, it would just work. When I process the file with FOP, it
does not work. With the complex scripts functionality, it works,
dependent on the use of supported or otherwise suitable fonts. (That
is true for all above applications, but apparently those come
configured with my system.)

So what does a person do who believes in the XML stack to maintain his
documentation, and wants to send his documents in Hindi to his
customers? See that XSL-FO with FOP is not a suitable solution for him
because Hindi uses a complex script?

FOP needs the complex scripts functionality to remain a player in the
global playing field.

This is for me the single overarching consideration to want this
functionality in FOP's trunk code, and in, say, half a year in a
release. All other considerations are minor, unless one wants to claim
that this code will block FOP's further development and maintenance in
the coming years.

Of course, not everybody needs this functionality, and there is a fear
of increased maintenance overhead. But the question is: For whom do we
develop FOP? Also for the large part of the world that uses complex
scripts?

With the development of the complex scripts functionality, Glenn Adams
and his sponsor Basis Technologies have created a new reality, which
is not going to go away. If this functionality does not end up in FOP,
it will probably live on elsewhere. If the FOP team is fine with that,
say no to the merge request, and feel comfortable with a trusted niche
application.

Simon Pepping

On Wed, Oct 19, 2011 at 09:50:24AM +0100, Chris Bowditch wrote:

On 18/10/2011 19:55, Simon Pepping wrote:

I merged the ComplexScripts branch into trunk. Result:

Hi Simon,

As well of the question of how to do the merge there is also the
question should we do the merge? Of course this is a valuable
feature to the community, and Glenn has invested a lot of time in
its development but is it truely production ready? I asked Vincent
to take a look at the branch earlier in the year as it's a feature
we also need, but he had several concerns that have not be
adequately answered. Take a look at comment #30;
https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30

I'm not sure why Vincent describes it as a "brief look" because he
spent several days on it. I also asked Peter to take a look and he
had similar concerns. 2 or 3 letter variable names are a barrier for
any committer wanting to maintain this code and I don't think it is
a sufficient argument to say that a pre-requisite to maintaining
this code is to be a domain expert. I would hope that any
experienced committer with a debugger should be able to solve some
bugs. Obviously certain problems will require domain expertise, but
the variables names are a key barrier to being able to maintain this
code.

I realise my comments might be a little controversial and I don't
mean any disrespect to Glenn or his work (which is largely
excellent), but we should at least discuss these topics before the
merge is completed.






DO NOT REPLY [Bug 49687] [PATCH] Complex Script Support

2011-10-20 Thread bugzilla
https://issues.apache.org/bugzilla/show_bug.cgi?id=49687

--- Comment #44 from Glenn Adams  2011-10-20 13:03:06 UTC ---
(In reply to comment #43)
> Created attachment 27822 [details]
> list of Gujarati words and sentences
> 
> As per my exchange with Glenn, I've attached a UTF-8 encoded file that 
> contains
> Gujarati words and sentences. Actually, it's an output of a multiple choice
> quiz from my application. If the data is expected in some other form, let me
> know. Also, let me know if you need more of such data.

Thanks. I'll let you know when I've got the Gujarati support working and have
tried out these word forms. By the way, for Arabic script, i have approximately
85,000 word forms which represents a significant cross section of a number of
corpuses. I would hope to have similar number of word forms for other scripts.

I prefer word forms only for Indic scripts rather than phrases or sentences,
since the latter do not typically use any whitespace between words. So I might
ask you to manually segment your data into word forms only.

G.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.


Re: Merge Request - Temp_ComplexScripts into Trunk

2011-10-20 Thread Vincent Hennebert
The Complex Scripts feature is obviously a great enhancement and we
would all love to have it implemented in FOP. However, that should not
come at the expense of maintainability and the implementation of other
new features.

When I look at the code in the Temp_ComplexScripts branch, I have
serious concerns regarding the latter two points. I would oppose merging
the branch back to Trunk until those are resolved.

Here are the sizes of some new files:
1075 src/java/org/apache/fop/fonts/GlyphSequence.java
1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java
1269 src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java
2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java
3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java

This latter one contains more than 50 field
declarations, and the Handler.startElement method alone is more than
1800 lines long.

Also, the o.a.f.fonts.truetype.TTFFile class has now grown to
5502 lines. That’s 3 times its original size which was already too big.
I regularly find myself looking at bits of this class, and I would be
unable to do so on a 5500 line class.

I don’t think it needs to be explained why big classes are undesirable?

Also, most files disable Checkstyle checks, the most important ones
being line length and white space. Many files have too long lines which
makes it a pain to read through, having to horizontally scroll all the
time. We agreed on a certain coding style in the project and it would be
good if new code could adhere to it.

Speaking of variable names, here is a method picked from
o.a.f.fonts.GlyphSequence:
/**
 * Merge overlapping and abutting sub-intervals.
 */
private static int[] mergeIntervals ( int[] ia ) {
int ni = ia.length;
int i, n, nm, is, ie;
// count merged sub-intervals
for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
int s = ia [ i + 0 ];
int e = ia [ i + 1 ];
if ( ( ie < 0 ) || ( s > ie ) ) {
is = s;
ie = e;
nm++;
} else if ( s >= is ) {
if ( e > ie ) {
ie = e;
}
}
}
int[] mi = new int [ nm * 2 ];
// populate merged sub-intervals
for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
int s = ia [ i + 0 ];
int e = ia [ i + 1 ];
int k = nm * 2;
if ( ( ie < 0 ) || ( s > ie ) ) {
is = s;
ie = e;
mi [ k + 0 ] = is;
mi [ k + 1 ] = ie;
nm++;
} else if ( s >= is ) {
if ( e > ie ) {
ie = e;
}
mi [ k - 1 ] = ie;
}
}
return mi;
}

Now I fully appreciate that one has to have some knowledge of an area to
understand code relating to that area, but no level of expertise,
however high, will help me to quickly understand this code. This is just
too easy to mistake one variable for another one when they differ by
only one letter.

Combined, these would make the code very hard to maintain by anyone
other than the original author, and this is why I’m opposed to merging
it to Trunk in its current form. When I commit code I do my very best to
make it as easy to read and understand by other people, and I would
really appreciate it I could have the same in return.


Thanks,
Vincent


On 19/10/11 19:32, Simon Pepping wrote:
> Over the past ten years computing has pervaded life in all its facets,
> and spread over the world. As a consequence computing is now used in
> all languages and all scripts.
> 
> When I open my devanagari test file in emacs, it just works. When I
> open it in firefox, it just works. The same when I open it in
> LibreOffice Writer. I am sure that, if I would open it in *the* *Word*
> processor, it would just work. When I process the file with FOP, it
> does not work. With the complex scripts functionality, it works,
> dependent on the use of supported or otherwise suitable fonts. (That
> is true for all above applications, but apparently those come
> configured with my system.)
> 
> So what does a person do who believes in the XML stack to maintain his
> documentation, and wants to send his documents in Hindi to his
> customers? See that XSL-FO with FOP is not a suitable solution for him
> because Hindi uses a complex script?
> 
> FOP needs the complex scripts functionality to remain a player in the
> global playing field.
> 
> This is for me the single overarching consideration to want this
> functionality in FOP's trunk code, and in, say, half a year in a
> release. All other considerations are minor, unless one wants to claim
> that this code will block FOP's further development and maintenance in
> the coming years.
> 
> Of course, not everybody needs this functionality, and there is a fear
> of increased maintenanc

Re: Merge Request - Temp_ComplexScripts into Trunk

2011-10-20 Thread Peter Hancock
This is a tough one.

The need for complex script support in FOP is likely high on the wish
list of any global supporter of FOP and it is certainly in the
interest of the project to support. The amount of work that Glenn has
done is considerable: the volume of code, the test coverage and the
obvious depth of domain knowledge.

>From the surface I would have been very much in favor of supporting a
merge in the near future, however I have had the chance to review some
areas of the complex script branch and I have some concerns.
The treatment of Unicode Bidi spans from the creation of the FO Tree
through to the construction of the Area Tree and I would have liked to
have seen the complex scripts solution integrate the Unicode Bidi
Algorithm more directly into the core process:  For example, the
implementation performs a post process on the FO Tree to resolve the
Bidi properties of FONodes relating to text. It would be preferable to
see the construction of the FO Tree embracing this Bidi aspect:
FONodes should be responsible for determining their own bidi state
from the fo node semantics in context to the position in the tree.
Such an implementation would immediately force the maintainer to
consider how a change would effect the Bidi process.

The layout and area tree construction phase interact a little more
directly  with the Bidi process, however most of the Bidi work is
delegated to static methods of a utility class (..layoutmgr.BidiUtil,
also used heavily in the Bidi post-process of the FO Tree) and
consequently require many instanceof expressions because of
differences in Bidi behavior between the Area classes.  Areas should
be responsible for the character re-ordering process.

Having this functionality in FOP is desirable and I would encourage
steps to address these concerns, and those outlined by Chris and
Vincent.

Peter

On Thu, Oct 20, 2011 at 2:53 PM, Vincent Hennebert  wrote:
> The Complex Scripts feature is obviously a great enhancement and we
> would all love to have it implemented in FOP. However, that should not
> come at the expense of maintainability and the implementation of other
> new features.
>
> When I look at the code in the Temp_ComplexScripts branch, I have
> serious concerns regarding the latter two points. I would oppose merging
> the branch back to Trunk until those are resolved.
>
> Here are the sizes of some new files:
> 1075 src/java/org/apache/fop/fonts/GlyphSequence.java
> 1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java
> 1269 
> src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java
> 2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java
> 3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java
>
> This latter one contains more than 50 field
> declarations, and the Handler.startElement method alone is more than
> 1800 lines long.
>
> Also, the o.a.f.fonts.truetype.TTFFile class has now grown to
> 5502 lines. That’s 3 times its original size which was already too big.
> I regularly find myself looking at bits of this class, and I would be
> unable to do so on a 5500 line class.
>
> I don’t think it needs to be explained why big classes are undesirable?
>
> Also, most files disable Checkstyle checks, the most important ones
> being line length and white space. Many files have too long lines which
> makes it a pain to read through, having to horizontally scroll all the
> time. We agreed on a certain coding style in the project and it would be
> good if new code could adhere to it.
>
> Speaking of variable names, here is a method picked from
> o.a.f.fonts.GlyphSequence:
>    /**
>     * Merge overlapping and abutting sub-intervals.
>     */
>    private static int[] mergeIntervals ( int[] ia ) {
>        int ni = ia.length;
>        int i, n, nm, is, ie;
>        // count merged sub-intervals
>        for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
>            int s = ia [ i + 0 ];
>            int e = ia [ i + 1 ];
>            if ( ( ie < 0 ) || ( s > ie ) ) {
>                is = s;
>                ie = e;
>                nm++;
>            } else if ( s >= is ) {
>                if ( e > ie ) {
>                    ie = e;
>                }
>            }
>        }
>        int[] mi = new int [ nm * 2 ];
>        // populate merged sub-intervals
>        for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) {
>            int s = ia [ i + 0 ];
>            int e = ia [ i + 1 ];
>            int k = nm * 2;
>            if ( ( ie < 0 ) || ( s > ie ) ) {
>                is = s;
>                ie = e;
>                mi [ k + 0 ] = is;
>                mi [ k + 1 ] = ie;
>                nm++;
>            } else if ( s >= is ) {
>                if ( e > ie ) {
>                    ie = e;
>                }
>                mi [ k - 1 ] = ie;
>            }
>        }
>        return mi;
>    }
>
> Now I fully appreciate that one has to have some knowledge of an area to
> understand code relating to that a

RE: Merge Request - Temp_ComplexScripts into Trunk

2011-10-20 Thread Jonathan Levinson
Hi Simon,

I've contacted my management and asked what our teams can do to help test.  I 
report to our development not to our quality departments, and I can't speak for 
our quality departments.  I've contacted our international teams about what 
they can do to help test.

The bottom-line to our testing is that when a new FOP is released, we test our 
software on the new FOP, and testing is done with every application that uses 
FOP.  With an international FOP, international applications would be tested.  
That much is guaranteed by our normal testing process.  It is a tribute to the 
quality of FOP that we have never had to report a FOP issue, even though our 
reports can be quite complicated.  But I take it you would like us to get 
involved with the testing effort before a new FOP is released.  When you are 
discussing our involvement, are you discussing our testing the FOP that results 
from the merger onto the trunk, once that is accomplished?   I understand you 
are ironing out the details of what that merger would look like. 

You said bug reports should go to fop-users, but isn't it the case that .fo 
attachments won't be accepted by fop-users?  Don't bug reports have to be 
created through Bugzilla?  We can discuss what we see in terms of bugs on 
fop-users, but if we can't provide .fo files, won't our discussion be less 
helpful?

Best Regards,
Jonathan Levinson
Senior Software Developer
Object Group
InterSystems
+1 617-621-0600
jonathan.levin...@intersystems.com


> -Original Message-
> From: Simon Pepping [mailto:spepp...@leverkruid.eu]
> Sent: Thursday, October 20, 2011 3:19 AM
> To: fop-dev@xmlgraphics.apache.org
> Subject: Re: Merge Request - Temp_ComplexScripts into Trunk
> 
> Jonathan,
> 
> Obviously, FOP's strongest supporters over the past years do not require this
> new functionality. FOP needs the additional support of new stakeholders of 
> this
> new functionality. Could your teams test it on their documents and report 
> their
> findings to the fop-user email list?
> 
> Simon Pepping
> 
> On Wed, Oct 19, 2011 at 03:20:40PM -0400, Jonathan Levinson wrote:
> > We -- at InterSystems -- deploy an application that runs in upwards of 40
> countries, using many of the languages for which complex script support is
> required.
> >
> > We definitely need complex script support.  It is a requirement for us.