Re: Merge Request - Temp_ComplexScripts into Trunk
Jonathan, Obviously, FOP's strongest supporters over the past years do not require this new functionality. FOP needs the additional support of new stakeholders of this new functionality. Could your teams test it on their documents and report their findings to the fop-user email list? Simon Pepping On Wed, Oct 19, 2011 at 03:20:40PM -0400, Jonathan Levinson wrote: > We -- at InterSystems -- deploy an application that runs in upwards of 40 > countries, using many of the languages for which complex script support is > required. > > We definitely need complex script support. It is a requirement for us.
Re: Merge Request - Temp_ComplexScripts into Trunk
On 19/10/2011 19:32, Simon Pepping wrote: Hi Simon, I think you misunderstood my mail. I don't want to stop the merge. I simply thought it was an appropriate time to discuss some concerns that Vincent and Peter had identified. You are preaching to the converted about the need for supporting Complex scripts. It is an urgent requirement for us too. If we don't discuss our concerns over the code now, then when do we discuss it? Vincent and Peter will be replying to this thread shortly and will outline their primary concerns then. Thanks, Chris Over the past ten years computing has pervaded life in all its facets, and spread over the world. As a consequence computing is now used in all languages and all scripts. When I open my devanagari test file in emacs, it just works. When I open it in firefox, it just works. The same when I open it in LibreOffice Writer. I am sure that, if I would open it in *the* *Word* processor, it would just work. When I process the file with FOP, it does not work. With the complex scripts functionality, it works, dependent on the use of supported or otherwise suitable fonts. (That is true for all above applications, but apparently those come configured with my system.) So what does a person do who believes in the XML stack to maintain his documentation, and wants to send his documents in Hindi to his customers? See that XSL-FO with FOP is not a suitable solution for him because Hindi uses a complex script? FOP needs the complex scripts functionality to remain a player in the global playing field. This is for me the single overarching consideration to want this functionality in FOP's trunk code, and in, say, half a year in a release. All other considerations are minor, unless one wants to claim that this code will block FOP's further development and maintenance in the coming years. Of course, not everybody needs this functionality, and there is a fear of increased maintenance overhead. But the question is: For whom do we develop FOP? Also for the large part of the world that uses complex scripts? With the development of the complex scripts functionality, Glenn Adams and his sponsor Basis Technologies have created a new reality, which is not going to go away. If this functionality does not end up in FOP, it will probably live on elsewhere. If the FOP team is fine with that, say no to the merge request, and feel comfortable with a trusted niche application. Simon Pepping On Wed, Oct 19, 2011 at 09:50:24AM +0100, Chris Bowditch wrote: On 18/10/2011 19:55, Simon Pepping wrote: I merged the ComplexScripts branch into trunk. Result: Hi Simon, As well of the question of how to do the merge there is also the question should we do the merge? Of course this is a valuable feature to the community, and Glenn has invested a lot of time in its development but is it truely production ready? I asked Vincent to take a look at the branch earlier in the year as it's a feature we also need, but he had several concerns that have not be adequately answered. Take a look at comment #30; https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30 I'm not sure why Vincent describes it as a "brief look" because he spent several days on it. I also asked Peter to take a look and he had similar concerns. 2 or 3 letter variable names are a barrier for any committer wanting to maintain this code and I don't think it is a sufficient argument to say that a pre-requisite to maintaining this code is to be a domain expert. I would hope that any experienced committer with a debugger should be able to solve some bugs. Obviously certain problems will require domain expertise, but the variables names are a key barrier to being able to maintain this code. I realise my comments might be a little controversial and I don't mean any disrespect to Glenn or his work (which is largely excellent), but we should at least discuss these topics before the merge is completed.
DO NOT REPLY [Bug 49687] [PATCH] Complex Script Support
https://issues.apache.org/bugzilla/show_bug.cgi?id=49687 --- Comment #44 from Glenn Adams 2011-10-20 13:03:06 UTC --- (In reply to comment #43) > Created attachment 27822 [details] > list of Gujarati words and sentences > > As per my exchange with Glenn, I've attached a UTF-8 encoded file that > contains > Gujarati words and sentences. Actually, it's an output of a multiple choice > quiz from my application. If the data is expected in some other form, let me > know. Also, let me know if you need more of such data. Thanks. I'll let you know when I've got the Gujarati support working and have tried out these word forms. By the way, for Arabic script, i have approximately 85,000 word forms which represents a significant cross section of a number of corpuses. I would hope to have similar number of word forms for other scripts. I prefer word forms only for Indic scripts rather than phrases or sentences, since the latter do not typically use any whitespace between words. So I might ask you to manually segment your data into word forms only. G. -- Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug.
Re: Merge Request - Temp_ComplexScripts into Trunk
The Complex Scripts feature is obviously a great enhancement and we would all love to have it implemented in FOP. However, that should not come at the expense of maintainability and the implementation of other new features. When I look at the code in the Temp_ComplexScripts branch, I have serious concerns regarding the latter two points. I would oppose merging the branch back to Trunk until those are resolved. Here are the sizes of some new files: 1075 src/java/org/apache/fop/fonts/GlyphSequence.java 1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java 1269 src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java 2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java 3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java This latter one contains more than 50 field declarations, and the Handler.startElement method alone is more than 1800 lines long. Also, the o.a.f.fonts.truetype.TTFFile class has now grown to 5502 lines. That’s 3 times its original size which was already too big. I regularly find myself looking at bits of this class, and I would be unable to do so on a 5500 line class. I don’t think it needs to be explained why big classes are undesirable? Also, most files disable Checkstyle checks, the most important ones being line length and white space. Many files have too long lines which makes it a pain to read through, having to horizontally scroll all the time. We agreed on a certain coding style in the project and it would be good if new code could adhere to it. Speaking of variable names, here is a method picked from o.a.f.fonts.GlyphSequence: /** * Merge overlapping and abutting sub-intervals. */ private static int[] mergeIntervals ( int[] ia ) { int ni = ia.length; int i, n, nm, is, ie; // count merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; if ( ( ie < 0 ) || ( s > ie ) ) { is = s; ie = e; nm++; } else if ( s >= is ) { if ( e > ie ) { ie = e; } } } int[] mi = new int [ nm * 2 ]; // populate merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; int k = nm * 2; if ( ( ie < 0 ) || ( s > ie ) ) { is = s; ie = e; mi [ k + 0 ] = is; mi [ k + 1 ] = ie; nm++; } else if ( s >= is ) { if ( e > ie ) { ie = e; } mi [ k - 1 ] = ie; } } return mi; } Now I fully appreciate that one has to have some knowledge of an area to understand code relating to that area, but no level of expertise, however high, will help me to quickly understand this code. This is just too easy to mistake one variable for another one when they differ by only one letter. Combined, these would make the code very hard to maintain by anyone other than the original author, and this is why I’m opposed to merging it to Trunk in its current form. When I commit code I do my very best to make it as easy to read and understand by other people, and I would really appreciate it I could have the same in return. Thanks, Vincent On 19/10/11 19:32, Simon Pepping wrote: > Over the past ten years computing has pervaded life in all its facets, > and spread over the world. As a consequence computing is now used in > all languages and all scripts. > > When I open my devanagari test file in emacs, it just works. When I > open it in firefox, it just works. The same when I open it in > LibreOffice Writer. I am sure that, if I would open it in *the* *Word* > processor, it would just work. When I process the file with FOP, it > does not work. With the complex scripts functionality, it works, > dependent on the use of supported or otherwise suitable fonts. (That > is true for all above applications, but apparently those come > configured with my system.) > > So what does a person do who believes in the XML stack to maintain his > documentation, and wants to send his documents in Hindi to his > customers? See that XSL-FO with FOP is not a suitable solution for him > because Hindi uses a complex script? > > FOP needs the complex scripts functionality to remain a player in the > global playing field. > > This is for me the single overarching consideration to want this > functionality in FOP's trunk code, and in, say, half a year in a > release. All other considerations are minor, unless one wants to claim > that this code will block FOP's further development and maintenance in > the coming years. > > Of course, not everybody needs this functionality, and there is a fear > of increased maintenanc
Re: Merge Request - Temp_ComplexScripts into Trunk
This is a tough one. The need for complex script support in FOP is likely high on the wish list of any global supporter of FOP and it is certainly in the interest of the project to support. The amount of work that Glenn has done is considerable: the volume of code, the test coverage and the obvious depth of domain knowledge. >From the surface I would have been very much in favor of supporting a merge in the near future, however I have had the chance to review some areas of the complex script branch and I have some concerns. The treatment of Unicode Bidi spans from the creation of the FO Tree through to the construction of the Area Tree and I would have liked to have seen the complex scripts solution integrate the Unicode Bidi Algorithm more directly into the core process: For example, the implementation performs a post process on the FO Tree to resolve the Bidi properties of FONodes relating to text. It would be preferable to see the construction of the FO Tree embracing this Bidi aspect: FONodes should be responsible for determining their own bidi state from the fo node semantics in context to the position in the tree. Such an implementation would immediately force the maintainer to consider how a change would effect the Bidi process. The layout and area tree construction phase interact a little more directly with the Bidi process, however most of the Bidi work is delegated to static methods of a utility class (..layoutmgr.BidiUtil, also used heavily in the Bidi post-process of the FO Tree) and consequently require many instanceof expressions because of differences in Bidi behavior between the Area classes. Areas should be responsible for the character re-ordering process. Having this functionality in FOP is desirable and I would encourage steps to address these concerns, and those outlined by Chris and Vincent. Peter On Thu, Oct 20, 2011 at 2:53 PM, Vincent Hennebert wrote: > The Complex Scripts feature is obviously a great enhancement and we > would all love to have it implemented in FOP. However, that should not > come at the expense of maintainability and the implementation of other > new features. > > When I look at the code in the Temp_ComplexScripts branch, I have > serious concerns regarding the latter two points. I would oppose merging > the branch back to Trunk until those are resolved. > > Here are the sizes of some new files: > 1075 src/java/org/apache/fop/fonts/GlyphSequence.java > 1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java > 1269 > src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java > 2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java > 3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java > > This latter one contains more than 50 field > declarations, and the Handler.startElement method alone is more than > 1800 lines long. > > Also, the o.a.f.fonts.truetype.TTFFile class has now grown to > 5502 lines. That’s 3 times its original size which was already too big. > I regularly find myself looking at bits of this class, and I would be > unable to do so on a 5500 line class. > > I don’t think it needs to be explained why big classes are undesirable? > > Also, most files disable Checkstyle checks, the most important ones > being line length and white space. Many files have too long lines which > makes it a pain to read through, having to horizontally scroll all the > time. We agreed on a certain coding style in the project and it would be > good if new code could adhere to it. > > Speaking of variable names, here is a method picked from > o.a.f.fonts.GlyphSequence: > /** > * Merge overlapping and abutting sub-intervals. > */ > private static int[] mergeIntervals ( int[] ia ) { > int ni = ia.length; > int i, n, nm, is, ie; > // count merged sub-intervals > for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) { > int s = ia [ i + 0 ]; > int e = ia [ i + 1 ]; > if ( ( ie < 0 ) || ( s > ie ) ) { > is = s; > ie = e; > nm++; > } else if ( s >= is ) { > if ( e > ie ) { > ie = e; > } > } > } > int[] mi = new int [ nm * 2 ]; > // populate merged sub-intervals > for ( i = 0, n = ni, nm = 0, is = ie = -1; i < n; i += 2 ) { > int s = ia [ i + 0 ]; > int e = ia [ i + 1 ]; > int k = nm * 2; > if ( ( ie < 0 ) || ( s > ie ) ) { > is = s; > ie = e; > mi [ k + 0 ] = is; > mi [ k + 1 ] = ie; > nm++; > } else if ( s >= is ) { > if ( e > ie ) { > ie = e; > } > mi [ k - 1 ] = ie; > } > } > return mi; > } > > Now I fully appreciate that one has to have some knowledge of an area to > understand code relating to that a
RE: Merge Request - Temp_ComplexScripts into Trunk
Hi Simon, I've contacted my management and asked what our teams can do to help test. I report to our development not to our quality departments, and I can't speak for our quality departments. I've contacted our international teams about what they can do to help test. The bottom-line to our testing is that when a new FOP is released, we test our software on the new FOP, and testing is done with every application that uses FOP. With an international FOP, international applications would be tested. That much is guaranteed by our normal testing process. It is a tribute to the quality of FOP that we have never had to report a FOP issue, even though our reports can be quite complicated. But I take it you would like us to get involved with the testing effort before a new FOP is released. When you are discussing our involvement, are you discussing our testing the FOP that results from the merger onto the trunk, once that is accomplished? I understand you are ironing out the details of what that merger would look like. You said bug reports should go to fop-users, but isn't it the case that .fo attachments won't be accepted by fop-users? Don't bug reports have to be created through Bugzilla? We can discuss what we see in terms of bugs on fop-users, but if we can't provide .fo files, won't our discussion be less helpful? Best Regards, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 jonathan.levin...@intersystems.com > -Original Message- > From: Simon Pepping [mailto:spepp...@leverkruid.eu] > Sent: Thursday, October 20, 2011 3:19 AM > To: fop-dev@xmlgraphics.apache.org > Subject: Re: Merge Request - Temp_ComplexScripts into Trunk > > Jonathan, > > Obviously, FOP's strongest supporters over the past years do not require this > new functionality. FOP needs the additional support of new stakeholders of > this > new functionality. Could your teams test it on their documents and report > their > findings to the fop-user email list? > > Simon Pepping > > On Wed, Oct 19, 2011 at 03:20:40PM -0400, Jonathan Levinson wrote: > > We -- at InterSystems -- deploy an application that runs in upwards of 40 > countries, using many of the languages for which complex script support is > required. > > > > We definitely need complex script support. It is a requirement for us.