Re: Merge Request - Temp_ComplexScripts into Trunk
However, if someone actually renamed my variables after I have declared my position, then I would interpret that as doing bad things to the code. In fact, i would revert such a change. If folks aren't willing to respect my style of coding along with my promise to document short names, then I will withdraw my request for a merge and abrogate my ICLA. I would not wish my work to be used in a community that does not have sufficient respect for personal coding styles of contributors that do not in any way vary from documented conventions. This is a long thread and my hope is that we will be able to resolve it amicably, and that our community can grow from this experience. The ComplexScripts functionality being merged is a valued and important addition to the Apache FOP Project. More important than the code to me personally, is the continued community and its growth. FOP is a community built around a common desire to create a robust and open source tool for generating XSL Formatting Output in various common formats for print, screen, archival and transfer purposes. There is an expressed concern about code readability and reusability. What concerns me most is that this may cause division. I'm concerned about the level of compromise and intent. It may take more of a stretch on each or our parts to move forward. Clay My religion is simple. My religion is kindness. - HH The Dalai Lama of Tibet
Re: Merge Request - Temp_ComplexScripts into Trunk
Ninth, spending time changing variable names is a waste of time when I could be working on adding support for other scripts. So someone else is going to have to waste all that time converting those names into more readable ones. That’s a bit unfair, isn’t it? I would advise against anyone wasting their time by changing my names. Indeed, I will likely react very negatively to such an attempt. What you want to do in your code is your business, but don't imagine you are going to start rewriting my code to meet your style. Or at least don't do so if you wish me to be a part of this team. I would take such an action as a direct affront. This is a big no. At the moment you hand in your code to FOP, it belongs to the community. Anyone can touch it. That is where team membership kicks in. Team members trust each other not to do bad things to the code. If in the indefinite future I am not working on this code, then feel free to change it as you like. In the mean time, I'd appreciate a little respect. Respect yes, but not touching it, no. Simon Pepping
Re: Merge Request - Temp_ComplexScripts into Trunk
On Thu, Oct 27, 2011 at 3:41 PM, Simon Pepping spepp...@leverkruid.euwrote: Ninth, spending time changing variable names is a waste of time when I could be working on adding support for other scripts. So someone else is going to have to waste all that time converting those names into more readable ones. That’s a bit unfair, isn’t it? I would advise against anyone wasting their time by changing my names. Indeed, I will likely react very negatively to such an attempt. What you want to do in your code is your business, but don't imagine you are going to start rewriting my code to meet your style. Or at least don't do so if you wish me to be a part of this team. I would take such an action as a direct affront. This is a big no. At the moment you hand in your code to FOP, it belongs to the community. Anyone can touch it. That is where team membership kicks in. Team members trust each other not to do bad things to the code. If in the indefinite future I am not working on this code, then feel free to change it as you like. In the mean time, I'd appreciate a little respect. Respect yes, but not touching it, no. I did not say or imply hands off, so of course I presume that anyone can touch it. Why do you insist on reading me otherwise? However, if someone actually renamed my variables after I have declared my position, then I would interpret that as doing bad things to the code. In fact, i would revert such a change. If folks aren't willing to respect my style of coding along with my promise to document short names, then I will withdraw my request for a merge and abrogate my ICLA. I would not wish my work to be used in a community that does not have sufficient respect for personal coding styles of contributors that do not in any way vary from documented conventions. Simon Pepping
RE: Merge Request - Temp_ComplexScripts into Trunk
There is little room for individuality in a coding community. I'd say it doesn't matter what you call your variables as long as someone who's never seen the code can understand the purpose through the name and/or comments, and they conform to any predefined naming standard for the project. Ages ago all variables had short names because disk space and/or memory was at a premium. Today that shouldn't be an excuse. Try to avoid overly simplistic names like x1 unless they have overly simplistic purpose (create, use, destroy within a 5 line method), and try to avoid overly verbose names. As long as what you write makes sense I'd agree with you no one should change the code simply for the sake of personal preference. You should certainly change it if you don't want someone else to change it if someone unfamiliar with it can't understand it. If they're changing your names that should be fine as long as they're fixing a bug or enhancing something where the name change makes sense to go with the new logic. From: Glenn Adams [mailto:gl...@skynav.com] Sent: Thursday, October 27, 2011 3:56 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Merge Request - Temp_ComplexScripts into Trunk On Thu, Oct 27, 2011 at 3:41 PM, Simon Pepping spepp...@leverkruid.eu wrote: Ninth, spending time changing variable names is a waste of time when I could be working on adding support for other scripts. So someone else is going to have to waste all that time converting those names into more readable ones. That's a bit unfair, isn't it? I would advise against anyone wasting their time by changing my names. Indeed, I will likely react very negatively to such an attempt. What you want to do in your code is your business, but don't imagine you are going to start rewriting my code to meet your style. Or at least don't do so if you wish me to be a part of this team. I would take such an action as a direct affront. This is a big no. At the moment you hand in your code to FOP, it belongs to the community. Anyone can touch it. That is where team membership kicks in. Team members trust each other not to do bad things to the code. If in the indefinite future I am not working on this code, then feel free to change it as you like. In the mean time, I'd appreciate a little respect. Respect yes, but not touching it, no. I did not say or imply hands off, so of course I presume that anyone can touch it. Why do you insist on reading me otherwise? However, if someone actually renamed my variables after I have declared my position, then I would interpret that as doing bad things to the code. In fact, i would revert such a change. If folks aren't willing to respect my style of coding along with my promise to document short names, then I will withdraw my request for a merge and abrogate my ICLA. I would not wish my work to be used in a community that does not have sufficient respect for personal coding styles of contributors that do not in any way vary from documented conventions. Simon Pepping
Re: Merge Request - Temp_ComplexScripts into Trunk
On 21/10/11 08:29, Glenn Adams wrote: On Thu, Oct 20, 2011 at 10:31 PM, Peter Hancock peter.hanc...@gmail.comwrote: From the surface I would have been very much in favor of supporting a merge in the near future, however I have had the chance to review some areas of the complex script branch and I have some concerns. The treatment of Unicode Bidi spans from the creation of the FO Tree through to the construction of the Area Tree and I would have liked to have seen the complex scripts solution integrate the Unicode Bidi Algorithm more directly into the core process: For example, the implementation performs a post process on the FO Tree to resolve the Bidi properties of FONodes relating to text. It would be preferable to see the construction of the FO Tree embracing this Bidi aspect: FONodes should be responsible for determining their own bidi state from the fo node semantics in context to the position in the tree. Such an implementation would immediately force the maintainer to consider how a change would effect the Bidi process. Please review XSL-FO 1.1 Section 5.8, and, in particular: the algorithm is applied to a sequence of characters coming from the content of one or more formatting objects. The sequence of characters is created by processing a fragment of the formatting object tree. A *fragment* is any contiguous sequence of children of some formatting object in the tree. The sequence is created by doing a pre-order traversal of the fragment down to the fo:character level. the final, text reordering step is not done during refinement. Instead, the XSL equivalent of re-ordering is done during area tree generation The current implementation adheres to the XSL-FO specification in this regard, while your suggestion that this behavior be isolated to individual FONodes is contrary to the specification and does not permit correct implementation of the functionality required. Section 5 of the XSL-FO 1.1 Recommendation starts with the following note: “Although the refinement process is described in a series of steps, this is solely for the convenience of exposition and does not imply they must be implemented as separate steps in any conforming implementation. A conforming implementation must only achieve the same effect.” So we are free to implement the algorithm any way we see adequate. But even so, a fragment is “any contiguous sequence of children of some formatting object in the tree“. That formatting object doesn’t have to be a whole page-sequence and can as well be a single block or inline or anything else. Implementing Bidi in individual FONode sub-classes allows to keep the treatment encapsulated in each FO element, and adapt it to the specific semantics of that element. If this is done in a single BidiUtil class, all the behaviours that are specific to each element are mixed together. Implementation details that should be kept within individual classes are being exposed to the rest of them. Elements must be handled in lenghty sequences of if/else statement using ‘instanceof’ and casts to the concrete class. If a new FO element is being implemented this will be very likely that it will be forgotten to add the appropriate ‘if’ statement for that element in the BidiUtil class. If it’s not forgotten, it will be difficult to find out where to put that statement. Doing treatment specific to an object outside its implementation screams for trouble and regressions as soon as a change is made in one or the other place. Unless people are aware that they must keep an eye on that BidiUtil class, which is unlikely for newcomer who don’t know the code well. This BidiUtil class has clearly been written in a procedural style, and there are reasons why that style was abandoned years ago in favour of an object-oriented paradigm, that allows to write more flexible, maintainable programs, and easier to understand by people who are not the original authors. I’d like to know what’s your view on this? I realize this is a complex subject area that requires considerable domain knowledge, but you can take my word as a domain expert (having implemented this functionality multiple times in commercial products) that the approach I have taken is (1) the most consistent with the XSL-FO specification, (2) the behavior required to achieve the desired functionality, and (3) the minimum changes and points of dependency to existing code. In contrast, a more distributed approach such as you suggest would (1) diverge from XSL-FO specified behavior, (2) increase and distribute the number of points of interaction with existing code so as to make behavior harder to understand, test, and debug, and, most telling, (3) not provide any functional or performance advantage. Regards, Glenn Thanks, Vincent
Re: Merge Request - Temp_ComplexScripts into Trunk
On 24/10/11 14:05, Glenn Adams wrote: On Mon, Oct 24, 2011 at 8:26 PM, Georg Datterl georg.datt...@geneon.dewrote: Hello Glenn, (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; You probably spent more years programming than I am alive, so please excuse me if that’s a stupid question: What is the reasoning/advantage behind those short variable names? First, I don't use short names everywhere. Mostly I just use in local variables, but generally not as class variables. Second, I was trained in Physics and Mathematics, which uses short variable names (E = M C ^ 2). Welcome to the Computer Science world, where longer variable names rule because they allow to make a program easier to understand and maintain. When I read the paper about the total-fit algorithm for breaking paragraphs into line, I found that the numerous one-letter variable names were an impediment to understanding it. It was difficult to remember what concept each variable was associated to. Third, I started programming in the 1960s with BAL 360, APL, then FORTRAN IV. We use short names there. Yes, it is very fortunate that the computer world has learnt from those old days, and moved on to embrace better programming practices. Fourth, I happen to have a good memory and I have no trouble remembering the meaning of variable names. Fifth, I find that short names prevents making lines too long and gives me more room for comments. By putting only one statement per line it is rare to bump into the 100 characters per line limit. Sixth, I am going to be maintaining this code. If anyone has a problem with specific code during a merge or regression, they merely need ask me. As Simon has already pointed out, this is not the way it should be in an open-source project. Seventh, that's just my style, and I assert it is as valid as doing it with long names. When I joined the FOP project, I adjusted my style to follow the project’s practices. It seemed obvious to me to do so, because a consistent style within a project avoids unnecessary distraction when wandering through its different parts. I would expect any contributor to do the same. Eighth, asking me to adhere to an undocumented convention that is not otherwise enforced, and for which there is no evidence or analysis of having been previously followed in FOP contributions is unwarranted. There is no documented convention simply because it has never occurred to anybody in the past that short variable names could be an option. This is this kind of unwritten convention that everybody takes for granted. Ninth, spending time changing variable names is a waste of time when I could be working on adding support for other scripts. So someone else is going to have to waste all that time converting those names into more readable ones. That’s a bit unfair, isn’t it? I can probably throw in a few more random reasons, but this should be sufficient. I've offered to add comments, take it or leave it. Would you at least agree to use more readable variable names in new code? That would be of great help to people who will get involved in the Bidi stuff in the future. Thanks, Vincent
Re: Merge Request - Temp_ComplexScripts into Trunk
inline On Wed, Oct 26, 2011 at 6:49 PM, Vincent Hennebert vhenneb...@gmail.comwrote: On 21/10/11 08:29, Glenn Adams wrote: On Thu, Oct 20, 2011 at 10:31 PM, Peter Hancock peter.hanc...@gmail.com wrote: From the surface I would have been very much in favor of supporting a merge in the near future, however I have had the chance to review some areas of the complex script branch and I have some concerns. The treatment of Unicode Bidi spans from the creation of the FO Tree through to the construction of the Area Tree and I would have liked to have seen the complex scripts solution integrate the Unicode Bidi Algorithm more directly into the core process: For example, the implementation performs a post process on the FO Tree to resolve the Bidi properties of FONodes relating to text. It would be preferable to see the construction of the FO Tree embracing this Bidi aspect: FONodes should be responsible for determining their own bidi state from the fo node semantics in context to the position in the tree. Such an implementation would immediately force the maintainer to consider how a change would effect the Bidi process. Please review XSL-FO 1.1 Section 5.8, and, in particular: the algorithm is applied to a sequence of characters coming from the content of one or more formatting objects. The sequence of characters is created by processing a fragment of the formatting object tree. A *fragment* is any contiguous sequence of children of some formatting object in the tree. The sequence is created by doing a pre-order traversal of the fragment down to the fo:character level. the final, text reordering step is not done during refinement. Instead, the XSL equivalent of re-ordering is done during area tree generation The current implementation adheres to the XSL-FO specification in this regard, while your suggestion that this behavior be isolated to individual FONodes is contrary to the specification and does not permit correct implementation of the functionality required. Section 5 of the XSL-FO 1.1 Recommendation starts with the following note: “Although the refinement process is described in a series of steps, this is solely for the convenience of exposition and does not imply they must be implemented as separate steps in any conforming implementation. A conforming implementation must only achieve the same effect.” So we are free to implement the algorithm any way we see adequate. And I have done just that: implement the [Bidi] algorithm in a way I found adequate. But even so, a fragment is “any contiguous sequence of children of some formatting object in the tree“. That formatting object doesn’t have to be a whole page-sequence and can as well be a single block or inline or anything else. Implementing Bidi in individual FONode sub-classes allows to keep the treatment encapsulated in each FO element, and adapt it to the specific semantics of that element. I have not ruled this option out. I have stated previously that this may be possible; however, there are possible problems with confining this behavior to FONode classes such as the fact that what constitutes a fragment and a delimited text range (see XSL-FO 1.1 Section 5.8) is not confined to FO node boundaries. If this is done in a single BidiUtil class, all the behaviours that are specific to each element are mixed together. Implementation details that should be kept within individual classes are being exposed to the rest of them. Elements must be handled in lenghty sequences of if/else statement using ‘instanceof’ and casts to the concrete class. In adding support to Bidi to FOP, I was faced with the problem of (1) not knowing the implementation, and (2) desiring to minimize the point of contact of my changes with existing code in order to better isolate behavioral regressions. I made the determination that it would be most convenient and expedient to code the bidi level and order resolution functionality in a single set of utility functions (with other helper classes, such as UnicodeBidiAlgorithm). That approach has worked so far. However, that doesn't mean it has to remain that way. It is not a straightforward task to add Bidi support to a large implementation like FOP which failed to take the needs of internationalization into account in its design. As a consequence, adding this support must be done delicately, and in stages. If a new FO element is being implemented this will be very likely that it will be forgotten to add the appropriate ‘if’ statement for that element in the BidiUtil class. If it’s not forgotten, it will be difficult to find out where to put that statement. Although I have not added tests for right-to-left writing mode for all existing layoutengine standard tests, I am in the process of doing just that, and have already put a number of tests in place. Eventually, there will be RTL WM tests for all these standard tests.
Re: Merge Request - Temp_ComplexScripts into Trunk
On Mon, Oct 24, 2011 at 09:05:34PM +0800, Glenn Adams wrote: are you claiming my code is not maintainable by other developers? if so, then please prove it objectively; otherwise, let's stop talking about this, and move on with the merge vote How would one go about proving objectively that code is not maintainable? There are many aspects to writing maintainable code, spanning from the synax level through to the structuring of classes and modules (packages). Importantly we should encourage: Code reuse - (using trusted libraries, applying the DRY principle) hard to measure objectively A consistent style - this may be an emergent aspect of a project and choosing guidelines at the start or even retrospectively may be too difficult, but we can largely infer the style from the current state. An imperfect but consistent style is arguably favorable to inconsistency. Idiomatic language usage - applying common solutions that leverage the constructs of, and the philosophies behind a language (e.g applying OO design patterns in Java applications). I find that writing code that is in keeping with a the style of a project and using the language as recommended makes it easier to distill the intention of a piece of code from the implementation and can lead towards self-documenting code. The inner workings of FOP are complex and I think that all efforts to boost understandability are essential. Peter On Wed, Oct 26, 2011 at 12:55 PM, Glenn Adams gl...@skynav.com wrote: inline On Wed, Oct 26, 2011 at 7:17 PM, Vincent Hennebert vhenneb...@gmail.com wrote: On 24/10/11 14:05, Glenn Adams wrote: On Mon, Oct 24, 2011 at 8:26 PM, Georg Datterl georg.datt...@geneon.dewrote: Hello Glenn, (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; You probably spent more years programming than I am alive, so please excuse me if that’s a stupid question: What is the reasoning/advantage behind those short variable names? First, I don't use short names everywhere. Mostly I just use in local variables, but generally not as class variables. Second, I was trained in Physics and Mathematics, which uses short variable names (E = M C ^ 2). Welcome to the Computer Science world, where longer variable names rule because they allow to make a program easier to understand and maintain. When I read the paper about the total-fit algorithm for breaking paragraphs into line, I found that the numerous one-letter variable names were an impediment to understanding it. It was difficult to remember what concept each variable was associated to. I had no trouble understanding it. In fact, I re-implemented it in Lisp (Scheme), and fixed a few issues in the process, which I reported to Don Knuth and for which he sent me a check for $2.56. See attached file. Note that I used long names for (structure) member names and dynamic variables, but often short names for local (lexical) variables in this code which I wrote 20 years ago. I haven't changed my style since then, and I don't intend to do so now. Third, I started programming in the 1960s with BAL 360, APL, then FORTRAN IV. We use short names there. Yes, it is very fortunate that the computer world has learnt from those old days, and moved on to embrace better programming practices. We are apparently in different generations, and this influences our thinking. I am not judging your style, but you seem to be quick to judge my style. Personally, I find ideology counterproductive. Fourth, I happen to have a good memory and I have no trouble remembering the meaning of variable names. Fifth, I find that short names prevents making lines too long and gives me more room for comments. By putting only one statement per line it is rare to bump into the 100 characters per line limit. Sixth, I am going to be maintaining this code. If anyone has a problem with specific code during a merge or regression, they merely need ask me. As Simon has already pointed out, this is not the way it should be in an open-source project. Seventh, that's just my style, and I assert it is as valid as doing it with long names. When I joined the FOP project, I adjusted my style to follow the project’s practices. It seemed obvious to me to do so, because a consistent style within a project avoids unnecessary distraction when wandering through its different parts. I would expect any contributor to do the same. Eighth, asking me to adhere to an undocumented convention that is not otherwise enforced, and for which there is no evidence or analysis of having been previously followed in FOP contributions is unwarranted. There is no documented convention simply
Re: Merge Request - Temp_ComplexScripts into Trunk
On Wed, Oct 26, 2011 at 8:36 PM, Peter Hancock peter.hanc...@gmail.comwrote: On Mon, Oct 24, 2011 at 09:05:34PM +0800, Glenn Adams wrote: are you claiming my code is not maintainable by other developers? if so, then please prove it objectively; otherwise, let's stop talking about this, and move on with the merge vote How would one go about proving objectively that code is not maintainable? My point is that this is a subjective exercise we are having here, and not particularly fruitful. It can be done objectively, or at least more objectively if one wants to take the time to do so. For example, by defining specific objective metrics and tools that measure those metrics against the existing code base and against new code. But instead of doing that, we are presently dealing with argument by innuendo. There are many aspects to writing maintainable code, spanning from the synax level through to the structuring of classes and modules (packages). Importantly we should encourage: Code reuse - (using trusted libraries, applying the DRY principle) hard to measure objectively A consistent style - this may be an emergent aspect of a project and choosing guidelines at the start or even retrospectively may be too difficult, but we can largely infer the style from the current state. An imperfect but consistent style is arguably favorable to inconsistency. Idiomatic language usage - applying common solutions that leverage the constructs of, and the philosophies behind a language (e.g applying OO design patterns in Java applications). I find that writing code that is in keeping with a the style of a project and using the language as recommended makes it easier to distill the intention of a piece of code from the implementation and can lead towards self-documenting code. The inner workings of FOP are complex and I think that all efforts to boost understandability are essential. It is rather ironic that I find myself being interpreted as somehow trying to decrease coding understandability, or being interpreted as promoting idiomatic or inconsistent usage. You should ask some of the hundred or so developers who have worked under me their opinion about my code. You would come to a different conclusion. I wonder what you think about the code in o.a.f.hyphenation.TernaryTree, where the author apparently did not know Java, and introduces the libc functions strcmp, strcpy, and strlen, and which uses the Java char type (within the String type) for coding tree pointers! I also note the author of this file uses short names for (exposed, non-private) instance variables, such as: protected char[] lo; protected char[] hi; protected char[] eq; protected char[] sc; protected CharVector kv; At least in my case, I use long names for instance variables, even when they are private. If you wanted to make a serious case against using short names, you would start first by analyzing existing FOP usage and using such an analysis to establish concrete metrics. That would allow objective comparisons to be made. G. Peter On Wed, Oct 26, 2011 at 12:55 PM, Glenn Adams gl...@skynav.com wrote: inline On Wed, Oct 26, 2011 at 7:17 PM, Vincent Hennebert vhenneb...@gmail.com wrote: On 24/10/11 14:05, Glenn Adams wrote: On Mon, Oct 24, 2011 at 8:26 PM, Georg Datterl georg.datt...@geneon.dewrote: Hello Glenn, (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; You probably spent more years programming than I am alive, so please excuse me if that’s a stupid question: What is the reasoning/advantage behind those short variable names? First, I don't use short names everywhere. Mostly I just use in local variables, but generally not as class variables. Second, I was trained in Physics and Mathematics, which uses short variable names (E = M C ^ 2). Welcome to the Computer Science world, where longer variable names rule because they allow to make a program easier to understand and maintain. When I read the paper about the total-fit algorithm for breaking paragraphs into line, I found that the numerous one-letter variable names were an impediment to understanding it. It was difficult to remember what concept each variable was associated to. I had no trouble understanding it. In fact, I re-implemented it in Lisp (Scheme), and fixed a few issues in the process, which I reported to Don Knuth and for which he sent me a check for $2.56. See attached file. Note that I used long names for (structure) member names and dynamic variables, but often short names for local (lexical) variables in this code which I wrote 20 years ago. I haven't
Re: Merge Request - Temp_ComplexScripts into Trunk
BTW, sometimes I choose to use longer names for local variables: see my reimplementation of number to string conversion in o.a.f.util.NumberConverter, which is a new (and large) class I added in the CS branch. I use a few short names here, but not as many as longer names. So you can see that sometimes I find it useful to use longer names. This sort of decision (when to use long or short) should be based on an author's preferences, and not established by fiat. Notice also the considerable use of nested classes (and interfaces), which tends to make the file longer, but nevertheless encapsulates abstractions in smaller units. True, this file could be sub-divided into smaller files, and I may yet do that. However, I found it convenient to keep it in one file for the initial implementation. On Wed, Oct 26, 2011 at 8:54 PM, Glenn Adams gl...@skynav.com wrote: On Wed, Oct 26, 2011 at 8:36 PM, Peter Hancock peter.hanc...@gmail.comwrote: On Mon, Oct 24, 2011 at 09:05:34PM +0800, Glenn Adams wrote: are you claiming my code is not maintainable by other developers? if so, then please prove it objectively; otherwise, let's stop talking about this, and move on with the merge vote How would one go about proving objectively that code is not maintainable? My point is that this is a subjective exercise we are having here, and not particularly fruitful. It can be done objectively, or at least more objectively if one wants to take the time to do so. For example, by defining specific objective metrics and tools that measure those metrics against the existing code base and against new code. But instead of doing that, we are presently dealing with argument by innuendo. There are many aspects to writing maintainable code, spanning from the synax level through to the structuring of classes and modules (packages). Importantly we should encourage: Code reuse - (using trusted libraries, applying the DRY principle) hard to measure objectively A consistent style - this may be an emergent aspect of a project and choosing guidelines at the start or even retrospectively may be too difficult, but we can largely infer the style from the current state. An imperfect but consistent style is arguably favorable to inconsistency. Idiomatic language usage - applying common solutions that leverage the constructs of, and the philosophies behind a language (e.g applying OO design patterns in Java applications). I find that writing code that is in keeping with a the style of a project and using the language as recommended makes it easier to distill the intention of a piece of code from the implementation and can lead towards self-documenting code. The inner workings of FOP are complex and I think that all efforts to boost understandability are essential. It is rather ironic that I find myself being interpreted as somehow trying to decrease coding understandability, or being interpreted as promoting idiomatic or inconsistent usage. You should ask some of the hundred or so developers who have worked under me their opinion about my code. You would come to a different conclusion. I wonder what you think about the code in o.a.f.hyphenation.TernaryTree, where the author apparently did not know Java, and introduces the libc functions strcmp, strcpy, and strlen, and which uses the Java char type (within the String type) for coding tree pointers! I also note the author of this file uses short names for (exposed, non-private) instance variables, such as: protected char[] lo; protected char[] hi; protected char[] eq; protected char[] sc; protected CharVector kv; At least in my case, I use long names for instance variables, even when they are private. If you wanted to make a serious case against using short names, you would start first by analyzing existing FOP usage and using such an analysis to establish concrete metrics. That would allow objective comparisons to be made. G. Peter On Wed, Oct 26, 2011 at 12:55 PM, Glenn Adams gl...@skynav.com wrote: inline On Wed, Oct 26, 2011 at 7:17 PM, Vincent Hennebert vhenneb...@gmail.com wrote: On 24/10/11 14:05, Glenn Adams wrote: On Mon, Oct 24, 2011 at 8:26 PM, Georg Datterl georg.datt...@geneon.dewrote: Hello Glenn, (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; You probably spent more years programming than I am alive, so please excuse me if that’s a stupid question: What is the reasoning/advantage behind those short variable names? First, I don't use short names everywhere. Mostly I just use in local variables, but generally not as class variables. Second, I was trained in
Re: Merge Request - Temp_ComplexScripts into Trunk
While you are at it, Peter, you may also take note that I have made liberal use of *assert* in the file I reference below (NumberConverter). If we are going to improve not only understandability but also real quality, how about a campaign to maximize use of assertions to document code assumptions? I notice that nobody has pointed out my liberal use of assertions as a positive for understanding and quality, while instead focusing on the narrow issue of short symbol name length as a negative. Given that symbol length has no impact at the JVM layer (except for consuming more runtime memory resources than shorter symbol names), perhaps we should focus on something which does have an impact, such as runtime assertion testing. On Wed, Oct 26, 2011 at 9:13 PM, Glenn Adams gl...@skynav.com wrote: BTW, sometimes I choose to use longer names for local variables: see my reimplementation of number to string conversion in o.a.f.util.NumberConverter, which is a new (and large) class I added in the CS branch. I use a few short names here, but not as many as longer names. So you can see that sometimes I find it useful to use longer names. This sort of decision (when to use long or short) should be based on an author's preferences, and not established by fiat. Notice also the considerable use of nested classes (and interfaces), which tends to make the file longer, but nevertheless encapsulates abstractions in smaller units. True, this file could be sub-divided into smaller files, and I may yet do that. However, I found it convenient to keep it in one file for the initial implementation. On Wed, Oct 26, 2011 at 8:54 PM, Glenn Adams gl...@skynav.com wrote: On Wed, Oct 26, 2011 at 8:36 PM, Peter Hancock peter.hanc...@gmail.comwrote: On Mon, Oct 24, 2011 at 09:05:34PM +0800, Glenn Adams wrote: are you claiming my code is not maintainable by other developers? if so, then please prove it objectively; otherwise, let's stop talking about this, and move on with the merge vote How would one go about proving objectively that code is not maintainable? My point is that this is a subjective exercise we are having here, and not particularly fruitful. It can be done objectively, or at least more objectively if one wants to take the time to do so. For example, by defining specific objective metrics and tools that measure those metrics against the existing code base and against new code. But instead of doing that, we are presently dealing with argument by innuendo. There are many aspects to writing maintainable code, spanning from the synax level through to the structuring of classes and modules (packages). Importantly we should encourage: Code reuse - (using trusted libraries, applying the DRY principle) hard to measure objectively A consistent style - this may be an emergent aspect of a project and choosing guidelines at the start or even retrospectively may be too difficult, but we can largely infer the style from the current state. An imperfect but consistent style is arguably favorable to inconsistency. Idiomatic language usage - applying common solutions that leverage the constructs of, and the philosophies behind a language (e.g applying OO design patterns in Java applications). I find that writing code that is in keeping with a the style of a project and using the language as recommended makes it easier to distill the intention of a piece of code from the implementation and can lead towards self-documenting code. The inner workings of FOP are complex and I think that all efforts to boost understandability are essential. It is rather ironic that I find myself being interpreted as somehow trying to decrease coding understandability, or being interpreted as promoting idiomatic or inconsistent usage. You should ask some of the hundred or so developers who have worked under me their opinion about my code. You would come to a different conclusion. I wonder what you think about the code in o.a.f.hyphenation.TernaryTree, where the author apparently did not know Java, and introduces the libc functions strcmp, strcpy, and strlen, and which uses the Java char type (within the String type) for coding tree pointers! I also note the author of this file uses short names for (exposed, non-private) instance variables, such as: protected char[] lo; protected char[] hi; protected char[] eq; protected char[] sc; protected CharVector kv; At least in my case, I use long names for instance variables, even when they are private. If you wanted to make a serious case against using short names, you would start first by analyzing existing FOP usage and using such an analysis to establish concrete metrics. That would allow objective comparisons to be made. G. Peter On Wed, Oct 26, 2011 at 12:55 PM, Glenn Adams gl...@skynav.com wrote: inline On Wed, Oct 26, 2011 at 7:17 PM, Vincent Hennebert vhenneb...@gmail.com wrote: On
Re: Merge Request - Temp_ComplexScripts into Trunk
I wonder what you think about the code in o.a.f.hyphenation.TernaryTree, where the author apparently did not know Java, and introduces the libc functions strcmp, strcpy, and strlen, and which uses the Java char type (within the String type) for coding tree pointers! My apprehension about certain areas of your code (and not the majority!) stems from such examples, and the headaches theycan bring. This is old code that I had no influence over at the time and I do not want it to have any bearing on where the project is heading. If you wanted to make a serious case against using short names, you would start first by analyzing existing FOP usage and using such an analysis to establish concrete metrics. I do not think I have focused on the length of variable or member names have I? I did a PhD in mathematics and I have a soft spot for the aesthetic value of short names. It is always pleasing to distill a mathematical proof to the simplist form possible and using consise variable naming is often a part of that. That said, I do not think that working codebenefits from this approach: what can seem like an efficient and powerful piece of code when written can prove to be an overly difficult thing to read later. Unlike yourself, apparently, my memory ain't so good and I benefit from code that has clear intention. Peter
Re: Merge Request - Temp_ComplexScripts into Trunk
On Wed, Oct 26, 2011 at 2:13 PM, Glenn Adams gl...@skynav.com wrote: Notice also the considerable use of nested classes (and interfaces), which tends to make the file longer, but nevertheless encapsulates abstractions in smaller units. True, this file could be sub-divided into smaller files, and I may yet do that. However, I found it convenient to keep it in one file for the initial implementation. I appreciate that Java does not always help us when striving for well encapsulated code AND manageable file lengths! I really do not think you implementation is fundamentally that far off the mark and the amount of constructive attention it has received has naturally been proportional to the quantity - something that is very impressive! Peter
Re: Merge Request - Temp_ComplexScripts into Trunk
On Wed, Oct 26, 2011 at 9:34 PM, Peter Hancock peter.hanc...@gmail.comwrote: I wonder what you think about the code in o.a.f.hyphenation.TernaryTree, where the author apparently did not know Java, and introduces the libc functions strcmp, strcpy, and strlen, and which uses the Java char type (within the String type) for coding tree pointers! My apprehension about certain areas of your code (and not the majority!) stems from such examples, and the headaches theycan bring. This is old code that I had no influence over at the time and I do not want it to have any bearing on where the project is heading. If you wanted to make a serious case against using short names, you would start first by analyzing existing FOP usage and using such an analysis to establish concrete metrics. I do not think I have focused on the length of variable or member names have I? I did a PhD in mathematics and I have a soft spot for the aesthetic value of short names. It is always pleasing to distill a mathematical proof to the simplist form possible and using consise variable naming is often a part of that. That said, I do not think that working codebenefits from this approach: what can seem like an efficient and powerful piece of code when written can prove to be an overly difficult thing to read later. Unlike yourself, apparently, my memory ain't so good and I benefit from code that has clear intention. Yet you continue to imply that: short variable names != clear intention This I must disagree with. I could use long random names and obfuscate intention. I can uses short names and document intention (in comments). I have agreed to do the latter. Is that not enough? Peter
RE: Merge Request - Temp_ComplexScripts into Trunk
I haven't looked at the code in question on this particular discussion so this is not to criticize. Overly concise variables names should be acceptable within limited scope. Calling an ObjectOutputStream oos may be sufficient when it's created and destroyed within one little method. Using i or z may suffice as loop counters within a single simple method, while you may want a longer name simply to track the loop if it gets more complex nesting loops. A project should have defined standards for meaningful variable naming, particularly when they're declared at the class level or they're public, protected, or passed in to the method. The simplest readability standard is of course the layout. Eclipse has plenty of preferences and an option to export them. Line wraps, comment format, etc should be consistant within a project. Of course if code must be reused it helps if standard naming can be enforced by such as abstract methods and interfaces. -Original Message- From: Peter Hancock [mailto:peter.hanc...@gmail.com] Sent: Wednesday, October 26, 2011 9:34 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Merge Request - Temp_ComplexScripts into Trunk I wonder what you think about the code in o.a.f.hyphenation.TernaryTree, where the author apparently did not know Java, and introduces the libc functions strcmp, strcpy, and strlen, and which uses the Java char type (within the String type) for coding tree pointers! My apprehension about certain areas of your code (and not the majority!) stems from such examples, and the headaches theycan bring. This is old code that I had no influence over at the time and I do not want it to have any bearing on where the project is heading. If you wanted to make a serious case against using short names, you would start first by analyzing existing FOP usage and using such an analysis to establish concrete metrics. I do not think I have focused on the length of variable or member names have I? I did a PhD in mathematics and I have a soft spot for the aesthetic value of short names. It is always pleasing to distill a mathematical proof to the simplist form possible and using consise variable naming is often a part of that. That said, I do not think that working codebenefits from this approach: what can seem like an efficient and powerful piece of code when written can prove to be an overly difficult thing to read later. Unlike yourself, apparently, my memory ain't so good and I benefit from code that has clear intention. Peter
Re: Merge Request - Temp_ComplexScripts into Trunk
On 22/10/11 01:22, Glenn Adams wrote: inline On Sat, Oct 22, 2011 at 12:04 AM, Chris Bowditch bowditch_ch...@hotmail.com wrote: Since Thunderhead also needs this feature we are willing to invest some time into it too. Currently my team are telling me it would take 9 person months to port this code into our branch of FOP, partly because of some merge conflicts, but also partly because we are not comfortable with some aspects of the code as it has already been pointed out. The estimate would include the time to refactor long files into small ones, deal with the variable names and restructuring the code. I would advise against this, since it would it is functionally unnecessary and since it will make future merges more difficult. I will be making additional changes as more features in this area are added. I don't see what refactoring long files into small ones buys you. However, if you make a reasoned argument for factoring specific long files (i.e., why such factoring improves architecture, modularity, etc), rather than simply say all long files must be refactored, then I will seriously discuss doing so post merge. When I read this I’m really puzzled because that should really be the other way around: what is the reason to keep those classes so big (and that must be a really good one)? Most likely the Single Responsibility Principle is being violated in those classes. BidiUtil is a good example of this: it computes Bidi levels on the FO tree, /and/ also re-orders areas on the area tree. Those two things should most probably be done in two separate classes. Especially since from the quick look I had they seem to be using two distinct sets or private methods. I appreciate your commitment to add comments to short identifiers declarations, so at least it will be easier for the team to translate the short variables to longer equivalents. Just so we are clear on what you propose, do you mean this: int gi = 0; // Glyph Index Yes. Note that I already do this in most cases, such as: private static void resolveExplicit ( int[] wca, int defaultLevel, int[] ea ) { int[] es = new int [ MAX_LEVELS ]; /* embeddings stack */ int ei = 0; /* embeddings stack index */ int ec = defaultLevel; /* current embedding level */ for ( int i = 0, n = wca.length; i n; i++ ) { int bc = wca [ i ]; /* bidi class of current char */ int el; /* embedding level to assign to current char */ switch ( bc ) { case LRE: // start left-to-right embedding case RLE: // start right-to-left embedding case LRO: // start left-to-right override case RLO: // start right-to-left override { int en; /* new embedding level */ if ( ( bc == RLE ) || ( bc == RLO ) ) { en = ( ( ec ~OVERRIDE ) + 1 ) | 1; } else { en = ( ( ec ~OVERRIDE ) + 2 ) ~1; } if ( en ( MAX_LEVELS + 1 ) ) { es [ ei++ ] = ec; if ( ( bc == LRO ) || ( bc == RLO ) ) { ec = en | OVERRIDE; } else { ec = en ~OVERRIDE; } } else { // max levels exceeded, so don't change level or override } el = ec; break; } ... What I'm agreeing to do in the relative near future (after merge, before new release) is to add such comments to those places where I have not already done so, which are probably a minority of such cases. This is good to hear, although it only marginally helps. Again, what’s the rationale behind having 2 or 3 letter variables when every course about programming will emphasise the importance of having reasonably long, self-describing variable names? What amount of typing is there to save when any modern IDE will auto-complete variable names? Explaining the purpose of a variable in a comment creates two problems: first, it forces the developer to constantly scroll from where the variable is being used to where it is declared, in order to remember what its purpose is. By doing so, they lose the context in which the variable is used and have to read the code again. This makes it extremely painful and difficult to understand the code. But more importantly, there is no guarantee that the comment is accurate. It’s notorious that comments tend to be left behind when changes are made to the code. Which means that they quickly become misleading or even plain wrong. Therefore people tend to not trust comments, or even not read them
Re: Merge Request - Temp_ComplexScripts into Trunk
Vincent, We apparently disagree on whether coding should be based on ideology or on practical results. You appear to favor the former, I favor the latter. I think we will have to leave it at that. I'm not going to alter my programming style in order to adhere to your notion of ideal programming practice. It would be one thing if there was an established consensus on these matters based on objective reasoning, but there is not: (1) the limit on file length appear to be arbitrary, and not a result of an explicit reasoned process; i had reasons for coding the way I did (including the use of nested classes), and I feel no need to alter or justify that process; if someone can make an objectively reasoned argument on a case by case basis, then I'm willing to consider refactoring at some point in the future when other more important tasks are completed; e.g., I would be willing to break out the nested class UnicodeBidiAlgorithm from BidiUtil.java into a separate file (which would in address your comment about separating bidi level determination from reordering); (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; Note that in both of these cases, I am offering to take concrete steps to address your comments, though not necessarily in the manner or to the full extent you would prefer. This is called compromise, an important aspect of working in a team. G. On Mon, Oct 24, 2011 at 6:54 PM, Vincent Hennebert vhenneb...@gmail.comwrote: On 22/10/11 01:22, Glenn Adams wrote: inline On Sat, Oct 22, 2011 at 12:04 AM, Chris Bowditch bowditch_ch...@hotmail.com wrote: Since Thunderhead also needs this feature we are willing to invest some time into it too. Currently my team are telling me it would take 9 person months to port this code into our branch of FOP, partly because of some merge conflicts, but also partly because we are not comfortable with some aspects of the code as it has already been pointed out. The estimate would include the time to refactor long files into small ones, deal with the variable names and restructuring the code. I would advise against this, since it would it is functionally unnecessary and since it will make future merges more difficult. I will be making additional changes as more features in this area are added. I don't see what refactoring long files into small ones buys you. However, if you make a reasoned argument for factoring specific long files (i.e., why such factoring improves architecture, modularity, etc), rather than simply say all long files must be refactored, then I will seriously discuss doing so post merge. When I read this I’m really puzzled because that should really be the other way around: what is the reason to keep those classes so big (and that must be a really good one)? Most likely the Single Responsibility Principle is being violated in those classes. BidiUtil is a good example of this: it computes Bidi levels on the FO tree, /and/ also re-orders areas on the area tree. Those two things should most probably be done in two separate classes. Especially since from the quick look I had they seem to be using two distinct sets or private methods. I appreciate your commitment to add comments to short identifiers declarations, so at least it will be easier for the team to translate the short variables to longer equivalents. Just so we are clear on what you propose, do you mean this: int gi = 0; // Glyph Index Yes. Note that I already do this in most cases, such as: private static void resolveExplicit ( int[] wca, int defaultLevel, int[] ea ) { int[] es = new int [ MAX_LEVELS ]; /* embeddings stack */ int ei = 0; /* embeddings stack index */ int ec = defaultLevel; /* current embedding level */ for ( int i = 0, n = wca.length; i n; i++ ) { int bc = wca [ i ]; /* bidi class of current char */ int el; /* embedding level to assign to current char */ switch ( bc ) { case LRE: // start left-to-right embedding case RLE: // start right-to-left embedding case LRO: // start left-to-right override case RLO: // start right-to-left override { int en; /* new embedding level */ if ( ( bc == RLE ) || ( bc == RLO ) ) { en = ( ( ec ~OVERRIDE ) + 1 ) | 1; } else { en
Re: Merge Request - Temp_ComplexScripts into Trunk
On Mon, Oct 24, 2011 at 8:26 PM, Georg Datterl georg.datt...@geneon.dewrote: Hello Glenn, (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; You probably spent more years programming than I am alive, so please excuse me if that’s a stupid question: What is the reasoning/advantage behind those short variable names? First, I don't use short names everywhere. Mostly I just use in local variables, but generally not as class variables. Second, I was trained in Physics and Mathematics, which uses short variable names (E = M C ^ 2). Third, I started programming in the 1960s with BAL 360, APL, then FORTRAN IV. We use short names there. Fourth, I happen to have a good memory and I have no trouble remembering the meaning of variable names. Fifth, I find that short names prevents making lines too long and gives me more room for comments. Sixth, I am going to be maintaining this code. If anyone has a problem with specific code during a merge or regression, they merely need ask me. Seventh, that's just my style, and I assert it is as valid as doing it with long names. Eighth, asking me to adhere to an undocumented convention that is not otherwise enforced, and for which there is no evidence or analysis of having been previously followed in FOP contributions is unwarranted. Ninth, spending time changing variable names is a waste of time when I could be working on adding support for other scripts. I can probably throw in a few more random reasons, but this should be sufficient. I've offered to add comments, take it or leave it.
RE: Merge Request - Temp_ComplexScripts into Trunk
Short variable names should use less memory, which is mostly irrelevant these days. In an open project where other people could be working on the same code (or other code in the same package) it helps if all names are consistant. Personally I could never figure out what variable naming conventions are. Each class I write seems to provide reason to use an entirely new convention. As long as someone who has never seen your code before can determine the purpose of each variable, I'd say you're good. If that requires comments, definitely add comments. When in doubt, comment. From: Glenn Adams [mailto:gl...@skynav.com] Sent: Monday, October 24, 2011 9:06 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Merge Request - Temp_ComplexScripts into Trunk On Mon, Oct 24, 2011 at 8:26 PM, Georg Datterl georg.datt...@geneon.de wrote: Hello Glenn, (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; You probably spent more years programming than I am alive, so please excuse me if that's a stupid question: What is the reasoning/advantage behind those short variable names? First, I don't use short names everywhere. Mostly I just use in local variables, but generally not as class variables. Second, I was trained in Physics and Mathematics, which uses short variable names (E = M C ^ 2). Third, I started programming in the 1960s with BAL 360, APL, then FORTRAN IV. We use short names there. Fourth, I happen to have a good memory and I have no trouble remembering the meaning of variable names. Fifth, I find that short names prevents making lines too long and gives me more room for comments. Sixth, I am going to be maintaining this code. If anyone has a problem with specific code during a merge or regression, they merely need ask me. Seventh, that's just my style, and I assert it is as valid as doing it with long names. Eighth, asking me to adhere to an undocumented convention that is not otherwise enforced, and for which there is no evidence or analysis of having been previously followed in FOP contributions is unwarranted. Ninth, spending time changing variable names is a waste of time when I could be working on adding support for other scripts. I can probably throw in a few more random reasons, but this should be sufficient. I've offered to add comments, take it or leave it.
Re: Merge Request - Temp_ComplexScripts into Trunk
Hi, I'm not sure that short variables names affect readability when long mathematical formulas are used. sometimes, code concision can help in understanding what code does: depending on what you can read and understand at a glance. Readability should be in a place between concision and verbose, depending on the threated topic. that can be discussed, but this should not prevent from merging GA's works. +1 for merging it now. Le 24/10/2011 15:26, Eric Douglas a écrit : Short variable names should use less memory, which is mostly irrelevant these days. In an open project where other people could be working on the same code (or other code in the same package) it helps if all names are consistant. Personally I could never figure out what variable naming conventions are. Each class I write seems to provide reason to use an entirely new convention. As long as someone who has never seen your code before can determine the purpose of each variable, I'd say you're good. If that requires comments, definitely add comments. When in doubt, comment. *From:* Glenn Adams [mailto:gl...@skynav.com] *Sent:* Monday, October 24, 2011 9:06 AM *To:* fop-dev@xmlgraphics.apache.org *Subject:* Re: Merge Request - Temp_ComplexScripts into Trunk On Mon, Oct 24, 2011 at 8:26 PM, Georg Datterl georg.datt...@geneon.de mailto:georg.datt...@geneon.de wrote: Hello Glenn, (2) there is no standard for symbol length documented in FOP practice or enforced by checkstyle; I decline to exchange my choice of symbols with longer symbols simply because you prefer it that way; I have offered to add comments to my uses, and that is the most I'm willing to do to address this matter; You probably spent more years programming than I am alive, so please excuse me if that’s a stupid question: What is the reasoning/advantage behind those short variable names? First, I don't use short names everywhere. Mostly I just use in local variables, but generally not as class variables. Second, I was trained in Physics and Mathematics, which uses short variable names (E = M C ^ 2). Third, I started programming in the 1960s with BAL 360, APL, then FORTRAN IV. We use short names there. Fourth, I happen to have a good memory and I have no trouble remembering the meaning of variable names. Fifth, I find that short names prevents making lines too long and gives me more room for comments. Sixth, I am going to be maintaining this code. If anyone has a problem with specific code during a merge or regression, they merely need ask me. Seventh, that's just my style, and I assert it is as valid as doing it with long names. Eighth, asking me to adhere to an undocumented convention that is not otherwise enforced, and for which there is no evidence or analysis of having been previously followed in FOP contributions is unwarranted. Ninth, spending time changing variable names is a waste of time when I could be working on adding support for other scripts. I can probably throw in a few more random reasons, but this should be sufficient. I've offered to add comments, take it or leave it. -- Pascal
Re: Merge Request - Temp_ComplexScripts into Trunk
On Mon, Oct 24, 2011 at 09:05:34PM +0800, Glenn Adams wrote: Sixth, I am going to be maintaining this code. If anyone has a problem with specific code during a merge or regression, they merely need ask me. That is a big no. There will always be a moment when someone else must or wants to work on this code. FOP code cannot depend on a single person, it must be maintainable by other developers. Simon
Re: Merge Request - Temp_ComplexScripts into Trunk
are you claiming my code is not maintainable by other developers? if so, then please prove it objectively; otherwise, let's stop talking about this, and move on with the merge vote On Tue, Oct 25, 2011 at 1:21 AM, Simon Pepping spepp...@leverkruid.euwrote: On Mon, Oct 24, 2011 at 09:05:34PM +0800, Glenn Adams wrote: Sixth, I am going to be maintaining this code. If anyone has a problem with specific code during a merge or regression, they merely need ask me. That is a big no. There will always be a moment when someone else must or wants to work on this code. FOP code cannot depend on a single person, it must be maintainable by other developers. Simon
Re: Merge Request - Temp_ComplexScripts into Trunk
On Thu, Oct 20, 2011 at 10:31 PM, Peter Hancock peter.hanc...@gmail.comwrote: From the surface I would have been very much in favor of supporting a merge in the near future, however I have had the chance to review some areas of the complex script branch and I have some concerns. The treatment of Unicode Bidi spans from the creation of the FO Tree through to the construction of the Area Tree and I would have liked to have seen the complex scripts solution integrate the Unicode Bidi Algorithm more directly into the core process: For example, the implementation performs a post process on the FO Tree to resolve the Bidi properties of FONodes relating to text. It would be preferable to see the construction of the FO Tree embracing this Bidi aspect: FONodes should be responsible for determining their own bidi state from the fo node semantics in context to the position in the tree. Such an implementation would immediately force the maintainer to consider how a change would effect the Bidi process. Please review XSL-FO 1.1 Section 5.8, and, in particular: the algorithm is applied to a sequence of characters coming from the content of one or more formatting objects. The sequence of characters is created by processing a fragment of the formatting object tree. A *fragment* is any contiguous sequence of children of some formatting object in the tree. The sequence is created by doing a pre-order traversal of the fragment down to the fo:character level. the final, text reordering step is not done during refinement. Instead, the XSL equivalent of re-ordering is done during area tree generation The current implementation adheres to the XSL-FO specification in this regard, while your suggestion that this behavior be isolated to individual FONodes is contrary to the specification and does not permit correct implementation of the functionality required. I realize this is a complex subject area that requires considerable domain knowledge, but you can take my word as a domain expert (having implemented this functionality multiple times in commercial products) that the approach I have taken is (1) the most consistent with the XSL-FO specification, (2) the behavior required to achieve the desired functionality, and (3) the minimum changes and points of dependency to existing code. In contrast, a more distributed approach such as you suggest would (1) diverge from XSL-FO specified behavior, (2) increase and distribute the number of points of interaction with existing code so as to make behavior harder to understand, test, and debug, and, most telling, (3) not provide any functional or performance advantage. Regards, Glenn
Re: Merge Request - Temp_ComplexScripts into Trunk
On Thu, Oct 20, 2011 at 02:53:54PM +0100, Vincent Hennebert wrote: Here are the sizes of some new files: 1075 src/java/org/apache/fop/fonts/GlyphSequence.java 1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java 1269 src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java 2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java 3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java This latter one contains more than 50 field declarations, and the Handler.startElement method alone is more than 1800 lines long. Also, the o.a.f.fonts.truetype.TTFFile class has now grown to 5502 lines. That’s 3 times its original size which was already too big. I regularly find myself looking at bits of this class, and I would be unable to do so on a 5500 line class. I agree that these are undesirably long. I don’t think it needs to be explained why big classes are undesirable? Also, most files disable Checkstyle checks, the most important ones being line length and white space. Many files have too long lines which makes it a pain to read through, having to horizontally scroll all the time. We agreed on a certain coding style in the project and it would be good if new code could adhere to it. I agree that this is not in compliance with the code style of the FOP project. Speaking of variable names, here is a method picked from o.a.f.fonts.GlyphSequence: /** * Merge overlapping and abutting sub-intervals. */ private static int[] mergeIntervals ( int[] ia ) { int ni = ia.length; int i, n, nm, is, ie; // count merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; if ( ( ie 0 ) || ( s ie ) ) { is = s; ie = e; nm++; } else if ( s = is ) { if ( e ie ) { ie = e; } } } int[] mi = new int [ nm * 2 ]; // populate merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; int k = nm * 2; if ( ( ie 0 ) || ( s ie ) ) { is = s; ie = e; mi [ k + 0 ] = is; mi [ k + 1 ] = ie; nm++; } else if ( s = is ) { if ( e ie ) { ie = e; } mi [ k - 1 ] = ie; } } return mi; } It could be refactored as follows: /** * Merge overlapping and abutting sub-intervals. * @param inputArray The array to be merged */ private static int[] mergeIntervals ( int[] inputArray ) { int numMerge = 0; // count merged sub-intervals for ( int i = 0, iCurrent = -1, iNext = -1; i inputArray.length; i += 2 ) { int current = inputArray [ i + 0 ]; int next = inputArray [ i + 1 ]; if ( ( iNext 0 ) || ( current iNext ) ) { iCurrent = current; iNext = next; numMerge++; } else if ( current = iCurrent ) { if ( next iNext ) { iNext = next; } } } int[] returnArray = new int [ numMerge * 2 ]; // populate merged sub-intervals for ( int i = 0, numMerge2 = 0, iCurrent = -1, iNext = -1; i inputArray.length; i += 2 ) { int current = inputArray [ i + 0 ]; int next = inputArray [ i + 1 ]; int numMerge2Square = numMerge2 * 2; if ( ( iNext 0 ) || ( current iNext ) ) { iCurrent = current; iNext = next; returnArray [ numMerge2Square + 0 ] = iCurrent; returnArray [ numMerge2Square + 1 ] = iNext; numMerge2++; } else if ( current = iCurrent ) { if ( next iNext ) { iNext = next; } returnArray [ numMerge2Square - 1 ] = iNext; } } return returnArray; } Many variables are now limited in scope to the loops. Only numMerge and returnArray are visible outside the loop. The names make it somewhat clearer what the code does (if I guessed the names right). BTW Is there a requirement that the length of the input array is even? If not, inputArray [ i + 1 ] will raise an exception if i == n-1. So, yes, I agree with your objections against the actual format of the code. Simon
Re: Merge Request - Temp_ComplexScripts into Trunk
I am pleased to learn that you are also in need of this new functionality. I share some of Vincent and Peter's concerns about technical points of the code. On the other hand, this is the only implementation of complex scripts we have, created by Glenn, in the style of Glenn. It is an initial implementation, and it is normal that it requires further work, maybe even design changes to make it more flexible. Does keeping it in a branch make that further work easier? Merging it into trunk will enhance its visibility, and make it available to more users. Simon On Thu, Oct 20, 2011 at 02:02:10PM +0100, Chris Bowditch wrote: On 19/10/2011 19:32, Simon Pepping wrote: Hi Simon, I think you misunderstood my mail. I don't want to stop the merge. I simply thought it was an appropriate time to discuss some concerns that Vincent and Peter had identified. You are preaching to the converted about the need for supporting Complex scripts. It is an urgent requirement for us too. If we don't discuss our concerns over the code now, then when do we discuss it? Vincent and Peter will be replying to this thread shortly and will outline their primary concerns then. Thanks, Chris
Re: Merge Request - Temp_ComplexScripts into Trunk
On 21/10/2011 09:36, Simon Pepping wrote: Hi Simon, I am pleased to learn that you are also in need of this new functionality. I share some of Vincent and Peter's concerns about technical points of the code. On the other hand, this is the only implementation of complex scripts we have, created by Glenn, in the style of Glenn. It is an initial implementation, and it is normal that it requires further work, maybe even design changes to make it more flexible. Does keeping it in a branch make that further work easier? Merging it into trunk will enhance its visibility, and make it available to more users. I'm not opposing the merge, I simply saw it as an appropriate milesone at which to open the debate on our concerns. It feels like we are making some progress here, so thanks for helping the debate along. I would really like to see an acknowledgement from Glenn that there are some imperfections that need addressing. If I saw that then I would give my full backing, but even without that I would vote +0 for the merge for the reasons you highlight above. Thanks, Chris Simon On Thu, Oct 20, 2011 at 02:02:10PM +0100, Chris Bowditch wrote: On 19/10/2011 19:32, Simon Pepping wrote: Hi Simon, I think you misunderstood my mail. I don't want to stop the merge. I simply thought it was an appropriate time to discuss some concerns that Vincent and Peter had identified. You are preaching to the converted about the need for supporting Complex scripts. It is an urgent requirement for us too. If we don't discuss our concerns over the code now, then when do we discuss it? Vincent and Peter will be replying to this thread shortly and will outline their primary concerns then. Thanks, Chris
Re: Merge Request - Temp_ComplexScripts into Trunk
Chris, I would really like to see an acknowledgement from Glenn that there are some imperfections that need addressing. I wasn't aware I had given anyone the impression of presenting a perfect submission. In fact, one of my favorite quotes is Voltaire's *le mieux est l'ennemi du bien* the best [perfect] is the enemy of the good, which a former colleague Charlie Sandbank (now deceased) used to love to cite at least once a day in ITU committee meetings. My coding philosophy is by and large based on a step-wise refinement process: (1) make it (some subset of possible features) work (2) make sure its correct (and regression testable) - or if following BDD, then do this first (3) make it (more) understandable/maintainable, i.e., re-factor, improve comments, documentation, etc (4) optionally, make it faster and smaller (5) optionally, add more features (6) go to (1) Right now, I've finished steps (1) and (2) to a certain degree for an initial set of features, a degree that I think is sufficient to merit moving it into trunk. I have not yet seriously addressed (3) through (6). It would seem strange to expect that all these points have been addressed at this point in the process. If this work goes into the trunk, it will provide greater exposure to help drive and prioritize the remaining steps. There are trade-offs in time and money about how to spend my effort on steps (3) to (6). I have a well defined set of issues already waiting for item (5) [1], so, post merge, I can spend my time on these features or work on (3) or (4), or could attempt to divide my time between them. The bottom line is that it is a process that started some time ago and will continue for an indefinite time into the future. The current request for merging is just one step in that process. I expect to be maintaining this code and feature set for the indefinite future, according to the desires of my sponsor, Basis Technologies, who has a definite interest in the production use of an internationalized FOP, as well as others who have expressed a similar interest. As a consequence, there is little chance that any other FOP dev is going to have to work on this code any time soon. Of course, if they want to do so, I would certainly welcome community contributions to additional features, testing, optimization, documentation, etc., in this area. I have no personal need to *own* this code if others wish to help, and I certainly encourage it; however, at the same time, I do have a sponsor who is willing to continue investing in improving FOP in this area, and that willingness should not be discounted. [1] http://skynav.trac.cvsdude.com/fop/report/1 Regarding the comments about line length, file length etc., I will note that, at least with line length, I have maintained the existing (but arbitrary) limit of 100 in existing files. In the case of new files, I have chosen to use a longer line length that works better for my development process. I use GNU EMACS on a wide-screen MacBook Pro that has an effective width of 200 columns, and which, when this limit is exceeded, wraps the line (on display only). I find that arbitrary line lengths like 100 curtail my coding style, and as I've been coding for 40+ years, it's a pretty well established style (though back in the days when I wrote Fortran IV using an IBM 026 card punchhttp://en.wikipedia.org/wiki/IBM_026#IBM_024.2C_026_Card_Punches, I had to stick with 80 columns). [If line length followed Moore's Law, we would be using lines of length 1760, starting from 1967 with 80 columns.] By the way, I would note that the FOP Coding Conventions [2] do not specify a maximum line length. In searching through the FOP DEV archives, I also don't see an explicit discussion about line length (though I may have missed it). What I do see is the initial introduction of the checkstyle target by Jeremias in 2002 [4], where it appears he basically adjusted the checkstyle rules to pass most of the extant code at that time. [2] http://xmlgraphics.apache.org/fop/dev/conventions.html [3] http://marc.info/?l=fop-devw=2r=1s=%2Bcheckstyle+%2Bline+%2Blengthq=b [4] http://marc.info/?l=fop-devm=103124169719869w=2 My opinion about the various length limits (parameter count, method length, file length, line length) as reported by checkstyle is that these should interpreted as guidelines, and not hard rules. In the case of existing files, it makes sense to attempt to adhere to them when possible (and I have done this), while recognizing that some exceptions are inevitable. In the case of new files, I agree they are reasonable targets, but I would not readily agree to slavish adherence. Some latitude should be afforded to individual coders, particularly when they are adding a substantial amount of code in new files. Regarding identifier length, neither the FOP coding conventions [2] nor checkstyle indicates or checks for any kind of limitation; so I will respectfully decline to change my code based on other author's
Re: Merge Request - Temp_ComplexScripts into Trunk
Quick question about this. Please forgive my naïveté but, does this code affect processing if you're not using ComplexScript support? Thanks, Clay
Re: Merge Request - Temp_ComplexScripts into Trunk
On 21/10/11 09:36, Simon Pepping wrote: I am pleased to learn that you are also in need of this new functionality. I share some of Vincent and Peter's concerns about technical points of the code. On the other hand, this is the only implementation of complex scripts we have, created by Glenn, in the style of Glenn. It is an initial implementation, and it is normal that it requires further work, maybe even design changes to make it more flexible. Does keeping it in a branch make that further work easier? Merging it into trunk will enhance its visibility, and make it available to more users. If it’s merged into Trunk, anyone who makes changes to the Trunk that break the Complex Scripts feature will have to fix what is breaking. And for the reasons I’ve given earlier, I believe that this would put too much of a burden on developers. Simon On Thu, Oct 20, 2011 at 02:02:10PM +0100, Chris Bowditch wrote: On 19/10/2011 19:32, Simon Pepping wrote: Hi Simon, I think you misunderstood my mail. I don't want to stop the merge. I simply thought it was an appropriate time to discuss some concerns that Vincent and Peter had identified. You are preaching to the converted about the need for supporting Complex scripts. It is an urgent requirement for us too. If we don't discuss our concerns over the code now, then when do we discuss it? Vincent and Peter will be replying to this thread shortly and will outline their primary concerns then. Thanks, Chris Vincent
Re: Merge Request - Temp_ComplexScripts into Trunk
On 21/10/2011 15:13, Glenn Adams wrote: Chris, Hi Glenn, I would really like to see an acknowledgement from Glenn that there are some imperfections that need addressing. I wasn't aware I had given anyone the impression of presenting a perfect submission. In fact, one of my favorite quotes is Voltaire's /le mieux est l'ennemi du bien/ the best [perfect] is the enemy of the good, which a former colleague Charlie Sandbank (now deceased) used to love to cite at least once a day in ITU committee meetings. Many thanks for taking the time to write this long e-mail. I do appreciate it. I reached this impression because I see Vincent and Peter giving feedback but I've yet to see any of the suggestions actioned. It could be that some of their suggestions aren't appropriate, but I do believe some of them are good points. My coding philosophy is by and large based on a step-wise refinement process: (1) make it (some subset of possible features) work (2) make sure its correct (and regression testable) - or if following BDD, then do this first (3) make it (more) understandable/maintainable, i.e., re-factor, improve comments, documentation, etc (4) optionally, make it faster and smaller (5) optionally, add more features (6) go to (1) Right now, I've finished steps (1) and (2) to a certain degree for an initial set of features, a degree that I think is sufficient to merit moving it into trunk. I have not yet seriously addressed (3) through (6). It would seem strange to expect that all these points have been addressed at this point in the process. Thanks for clarifying. Just to be clear, I didn't mean to say that reaching the end of development was a pre-requisite to merging to trunk. It just seemed like a major milestone and therefore seemed like a suitable prompt for the opening of a discussion about our concerns. If this work goes into the trunk, it will provide greater exposure to help drive and prioritize the remaining steps. There are trade-offs in time and money about how to spend my effort on steps (3) to (6). I have a well defined set of issues already waiting for item (5) [1], so, post merge, I can spend my time on these features or work on (3) or (4), or could attempt to divide my time between them. The bottom line is that it is a process that started some time ago and will continue for an indefinite time into the future. The current request for merging is just one step in that process. That makes sense to me. I expect to be maintaining this code and feature set for the indefinite future, according to the desires of my sponsor, Basis Technologies, who has a definite interest in the production use of an internationalized FOP, as well as others who have expressed a similar interest. As a consequence, there is little chance that any other FOP dev is going to have to work on this code any time soon. Of course, if they want to do so, I would certainly welcome community contributions to additional features, testing, optimization, documentation, etc., in this area. I have no personal need to *own* this code if others wish to help, and I certainly encourage it; however, at the same time, I do have a sponsor who is willing to continue investing in improving FOP in this area, and that willingness should not be discounted. Since Thunderhead also needs this feature we are willing to invest some time into it too. Currently my team are telling me it would take 9 person months to port this code into our branch of FOP, partly because of some merge conflicts, but also partly because we are not comfortable with some aspects of the code as it has already been pointed out. The estimate would include the time to refactor long files into small ones, deal with the variable names and restructuring the code. [1] http://skynav.trac.cvsdude.com/fop/report/1 Regarding the comments about line length, file length etc., I will note that, at least with line length, I have maintained the existing (but arbitrary) limit of 100 in existing files. In the case of new files, I have chosen to use a longer line length that works better for my development process. I use GNU EMACS on a wide-screen MacBook Pro that has an effective width of 200 columns, and which, when this limit is exceeded, wraps the line (on display only). I find that arbitrary line lengths like 100 curtail my coding style, and as I've been coding for 40+ years, it's a pretty well established style (though back in the days when I wrote Fortran IV using an IBM 026 card punch http://en.wikipedia.org/wiki/IBM_026#IBM_024.2C_026_Card_Punches, I had to stick with 80 columns). [If line length followed Moore's Law, we would be using lines of length 1760, starting from 1967 with 80 columns.] I personally don't have a problem with line length, but I think the File length is a small issue that we would like to address. I think the code would be easier to maintain if the larger classes were broken
Re: Merge Request - Temp_ComplexScripts into Trunk
inline On Sat, Oct 22, 2011 at 12:04 AM, Chris Bowditch bowditch_ch...@hotmail.com wrote: Since Thunderhead also needs this feature we are willing to invest some time into it too. Currently my team are telling me it would take 9 person months to port this code into our branch of FOP, partly because of some merge conflicts, but also partly because we are not comfortable with some aspects of the code as it has already been pointed out. The estimate would include the time to refactor long files into small ones, deal with the variable names and restructuring the code. I would advise against this, since it would it is functionally unnecessary and since it will make future merges more difficult. I will be making additional changes as more features in this area are added. I don't see what refactoring long files into small ones buys you. However, if you make a reasoned argument for factoring specific long files (i.e., why such factoring improves architecture, modularity, etc), rather than simply say all long files must be refactored, then I will seriously discuss doing so post merge. I appreciate your commitment to add comments to short identifiers declarations, so at least it will be easier for the team to translate the short variables to longer equivalents. Just so we are clear on what you propose, do you mean this: int gi = 0; // Glyph Index Yes. Note that I already do this in most cases, such as: private static void resolveExplicit ( int[] wca, int defaultLevel, int[] ea ) { int[] es = new int [ MAX_LEVELS ]; /* embeddings stack */ int ei = 0; /* embeddings stack index */ int ec = defaultLevel; /* current embedding level */ for ( int i = 0, n = wca.length; i n; i++ ) { int bc = wca [ i ]; /* bidi class of current char */ int el; /* embedding level to assign to current char */ switch ( bc ) { case LRE: // start left-to-right embedding case RLE: // start right-to-left embedding case LRO: // start left-to-right override case RLO: // start right-to-left override { int en; /* new embedding level */ if ( ( bc == RLE ) || ( bc == RLO ) ) { en = ( ( ec ~OVERRIDE ) + 1 ) | 1; } else { en = ( ( ec ~OVERRIDE ) + 2 ) ~1; } if ( en ( MAX_LEVELS + 1 ) ) { es [ ei++ ] = ec; if ( ( bc == LRO ) || ( bc == RLO ) ) { ec = en | OVERRIDE; } else { ec = en ~OVERRIDE; } } else { // max levels exceeded, so don't change level or override } el = ec; break; } ... What I'm agreeing to do in the relative near future (after merge, before new release) is to add such comments to those places where I have not already done so, which are probably a minority of such cases. G.
Re: Merge Request - Temp_ComplexScripts into Trunk
Jonathan, Obviously, FOP's strongest supporters over the past years do not require this new functionality. FOP needs the additional support of new stakeholders of this new functionality. Could your teams test it on their documents and report their findings to the fop-user email list? Simon Pepping On Wed, Oct 19, 2011 at 03:20:40PM -0400, Jonathan Levinson wrote: We -- at InterSystems -- deploy an application that runs in upwards of 40 countries, using many of the languages for which complex script support is required. We definitely need complex script support. It is a requirement for us.
Re: Merge Request - Temp_ComplexScripts into Trunk
On 19/10/2011 19:32, Simon Pepping wrote: Hi Simon, I think you misunderstood my mail. I don't want to stop the merge. I simply thought it was an appropriate time to discuss some concerns that Vincent and Peter had identified. You are preaching to the converted about the need for supporting Complex scripts. It is an urgent requirement for us too. If we don't discuss our concerns over the code now, then when do we discuss it? Vincent and Peter will be replying to this thread shortly and will outline their primary concerns then. Thanks, Chris Over the past ten years computing has pervaded life in all its facets, and spread over the world. As a consequence computing is now used in all languages and all scripts. When I open my devanagari test file in emacs, it just works. When I open it in firefox, it just works. The same when I open it in LibreOffice Writer. I am sure that, if I would open it in *the* *Word* processor, it would just work. When I process the file with FOP, it does not work. With the complex scripts functionality, it works, dependent on the use of supported or otherwise suitable fonts. (That is true for all above applications, but apparently those come configured with my system.) So what does a person do who believes in the XML stack to maintain his documentation, and wants to send his documents in Hindi to his customers? See that XSL-FO with FOP is not a suitable solution for him because Hindi uses a complex script? FOP needs the complex scripts functionality to remain a player in the global playing field. This is for me the single overarching consideration to want this functionality in FOP's trunk code, and in, say, half a year in a release. All other considerations are minor, unless one wants to claim that this code will block FOP's further development and maintenance in the coming years. Of course, not everybody needs this functionality, and there is a fear of increased maintenance overhead. But the question is: For whom do we develop FOP? Also for the large part of the world that uses complex scripts? With the development of the complex scripts functionality, Glenn Adams and his sponsor Basis Technologies have created a new reality, which is not going to go away. If this functionality does not end up in FOP, it will probably live on elsewhere. If the FOP team is fine with that, say no to the merge request, and feel comfortable with a trusted niche application. Simon Pepping On Wed, Oct 19, 2011 at 09:50:24AM +0100, Chris Bowditch wrote: On 18/10/2011 19:55, Simon Pepping wrote: I merged the ComplexScripts branch into trunk. Result: Hi Simon, As well of the question of how to do the merge there is also the question should we do the merge? Of course this is a valuable feature to the community, and Glenn has invested a lot of time in its development but is it truely production ready? I asked Vincent to take a look at the branch earlier in the year as it's a feature we also need, but he had several concerns that have not be adequately answered. Take a look at comment #30; https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30 I'm not sure why Vincent describes it as a brief look because he spent several days on it. I also asked Peter to take a look and he had similar concerns. 2 or 3 letter variable names are a barrier for any committer wanting to maintain this code and I don't think it is a sufficient argument to say that a pre-requisite to maintaining this code is to be a domain expert. I would hope that any experienced committer with a debugger should be able to solve some bugs. Obviously certain problems will require domain expertise, but the variables names are a key barrier to being able to maintain this code. I realise my comments might be a little controversial and I don't mean any disrespect to Glenn or his work (which is largely excellent), but we should at least discuss these topics before the merge is completed.
Re: Merge Request - Temp_ComplexScripts into Trunk
The Complex Scripts feature is obviously a great enhancement and we would all love to have it implemented in FOP. However, that should not come at the expense of maintainability and the implementation of other new features. When I look at the code in the Temp_ComplexScripts branch, I have serious concerns regarding the latter two points. I would oppose merging the branch back to Trunk until those are resolved. Here are the sizes of some new files: 1075 src/java/org/apache/fop/fonts/GlyphSequence.java 1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java 1269 src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java 2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java 3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java This latter one contains more than 50 field declarations, and the Handler.startElement method alone is more than 1800 lines long. Also, the o.a.f.fonts.truetype.TTFFile class has now grown to 5502 lines. That’s 3 times its original size which was already too big. I regularly find myself looking at bits of this class, and I would be unable to do so on a 5500 line class. I don’t think it needs to be explained why big classes are undesirable? Also, most files disable Checkstyle checks, the most important ones being line length and white space. Many files have too long lines which makes it a pain to read through, having to horizontally scroll all the time. We agreed on a certain coding style in the project and it would be good if new code could adhere to it. Speaking of variable names, here is a method picked from o.a.f.fonts.GlyphSequence: /** * Merge overlapping and abutting sub-intervals. */ private static int[] mergeIntervals ( int[] ia ) { int ni = ia.length; int i, n, nm, is, ie; // count merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; if ( ( ie 0 ) || ( s ie ) ) { is = s; ie = e; nm++; } else if ( s = is ) { if ( e ie ) { ie = e; } } } int[] mi = new int [ nm * 2 ]; // populate merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; int k = nm * 2; if ( ( ie 0 ) || ( s ie ) ) { is = s; ie = e; mi [ k + 0 ] = is; mi [ k + 1 ] = ie; nm++; } else if ( s = is ) { if ( e ie ) { ie = e; } mi [ k - 1 ] = ie; } } return mi; } Now I fully appreciate that one has to have some knowledge of an area to understand code relating to that area, but no level of expertise, however high, will help me to quickly understand this code. This is just too easy to mistake one variable for another one when they differ by only one letter. Combined, these would make the code very hard to maintain by anyone other than the original author, and this is why I’m opposed to merging it to Trunk in its current form. When I commit code I do my very best to make it as easy to read and understand by other people, and I would really appreciate it I could have the same in return. Thanks, Vincent On 19/10/11 19:32, Simon Pepping wrote: Over the past ten years computing has pervaded life in all its facets, and spread over the world. As a consequence computing is now used in all languages and all scripts. When I open my devanagari test file in emacs, it just works. When I open it in firefox, it just works. The same when I open it in LibreOffice Writer. I am sure that, if I would open it in *the* *Word* processor, it would just work. When I process the file with FOP, it does not work. With the complex scripts functionality, it works, dependent on the use of supported or otherwise suitable fonts. (That is true for all above applications, but apparently those come configured with my system.) So what does a person do who believes in the XML stack to maintain his documentation, and wants to send his documents in Hindi to his customers? See that XSL-FO with FOP is not a suitable solution for him because Hindi uses a complex script? FOP needs the complex scripts functionality to remain a player in the global playing field. This is for me the single overarching consideration to want this functionality in FOP's trunk code, and in, say, half a year in a release. All other considerations are minor, unless one wants to claim that this code will block FOP's further development and maintenance in the coming years. Of course, not everybody needs this functionality, and there is a fear of increased maintenance overhead. But the question is: For
Re: Merge Request - Temp_ComplexScripts into Trunk
This is a tough one. The need for complex script support in FOP is likely high on the wish list of any global supporter of FOP and it is certainly in the interest of the project to support. The amount of work that Glenn has done is considerable: the volume of code, the test coverage and the obvious depth of domain knowledge. From the surface I would have been very much in favor of supporting a merge in the near future, however I have had the chance to review some areas of the complex script branch and I have some concerns. The treatment of Unicode Bidi spans from the creation of the FO Tree through to the construction of the Area Tree and I would have liked to have seen the complex scripts solution integrate the Unicode Bidi Algorithm more directly into the core process: For example, the implementation performs a post process on the FO Tree to resolve the Bidi properties of FONodes relating to text. It would be preferable to see the construction of the FO Tree embracing this Bidi aspect: FONodes should be responsible for determining their own bidi state from the fo node semantics in context to the position in the tree. Such an implementation would immediately force the maintainer to consider how a change would effect the Bidi process. The layout and area tree construction phase interact a little more directly with the Bidi process, however most of the Bidi work is delegated to static methods of a utility class (..layoutmgr.BidiUtil, also used heavily in the Bidi post-process of the FO Tree) and consequently require many instanceof expressions because of differences in Bidi behavior between the Area classes. Areas should be responsible for the character re-ordering process. Having this functionality in FOP is desirable and I would encourage steps to address these concerns, and those outlined by Chris and Vincent. Peter On Thu, Oct 20, 2011 at 2:53 PM, Vincent Hennebert vhenneb...@gmail.com wrote: The Complex Scripts feature is obviously a great enhancement and we would all love to have it implemented in FOP. However, that should not come at the expense of maintainability and the implementation of other new features. When I look at the code in the Temp_ComplexScripts branch, I have serious concerns regarding the latter two points. I would oppose merging the branch back to Trunk until those are resolved. Here are the sizes of some new files: 1075 src/java/org/apache/fop/fonts/GlyphSequence.java 1089 src/java/org/apache/fop/fonts/GlyphProcessingState.java 1269 src/codegen/unicode/java/org/apache/fop/text/bidi/GenerateBidiTestData.java 2034 src/java/org/apache/fop/layoutmgr/BidiUtil.java 3449 test/java/org/apache/fop/complexscripts/util/TTXFile.java This latter one contains more than 50 field declarations, and the Handler.startElement method alone is more than 1800 lines long. Also, the o.a.f.fonts.truetype.TTFFile class has now grown to 5502 lines. That’s 3 times its original size which was already too big. I regularly find myself looking at bits of this class, and I would be unable to do so on a 5500 line class. I don’t think it needs to be explained why big classes are undesirable? Also, most files disable Checkstyle checks, the most important ones being line length and white space. Many files have too long lines which makes it a pain to read through, having to horizontally scroll all the time. We agreed on a certain coding style in the project and it would be good if new code could adhere to it. Speaking of variable names, here is a method picked from o.a.f.fonts.GlyphSequence: /** * Merge overlapping and abutting sub-intervals. */ private static int[] mergeIntervals ( int[] ia ) { int ni = ia.length; int i, n, nm, is, ie; // count merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; if ( ( ie 0 ) || ( s ie ) ) { is = s; ie = e; nm++; } else if ( s = is ) { if ( e ie ) { ie = e; } } } int[] mi = new int [ nm * 2 ]; // populate merged sub-intervals for ( i = 0, n = ni, nm = 0, is = ie = -1; i n; i += 2 ) { int s = ia [ i + 0 ]; int e = ia [ i + 1 ]; int k = nm * 2; if ( ( ie 0 ) || ( s ie ) ) { is = s; ie = e; mi [ k + 0 ] = is; mi [ k + 1 ] = ie; nm++; } else if ( s = is ) { if ( e ie ) { ie = e; } mi [ k - 1 ] = ie; } } return mi; } Now I fully appreciate that one has to have some knowledge of an area to understand code relating to that area, but no level of expertise, however high, will help me to quickly
RE: Merge Request - Temp_ComplexScripts into Trunk
Hi Simon, I've contacted my management and asked what our teams can do to help test. I report to our development not to our quality departments, and I can't speak for our quality departments. I've contacted our international teams about what they can do to help test. The bottom-line to our testing is that when a new FOP is released, we test our software on the new FOP, and testing is done with every application that uses FOP. With an international FOP, international applications would be tested. That much is guaranteed by our normal testing process. It is a tribute to the quality of FOP that we have never had to report a FOP issue, even though our reports can be quite complicated. But I take it you would like us to get involved with the testing effort before a new FOP is released. When you are discussing our involvement, are you discussing our testing the FOP that results from the merger onto the trunk, once that is accomplished? I understand you are ironing out the details of what that merger would look like. You said bug reports should go to fop-users, but isn't it the case that .fo attachments won't be accepted by fop-users? Don't bug reports have to be created through Bugzilla? We can discuss what we see in terms of bugs on fop-users, but if we can't provide .fo files, won't our discussion be less helpful? Best Regards, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 jonathan.levin...@intersystems.com -Original Message- From: Simon Pepping [mailto:spepp...@leverkruid.eu] Sent: Thursday, October 20, 2011 3:19 AM To: fop-dev@xmlgraphics.apache.org Subject: Re: Merge Request - Temp_ComplexScripts into Trunk Jonathan, Obviously, FOP's strongest supporters over the past years do not require this new functionality. FOP needs the additional support of new stakeholders of this new functionality. Could your teams test it on their documents and report their findings to the fop-user email list? Simon Pepping On Wed, Oct 19, 2011 at 03:20:40PM -0400, Jonathan Levinson wrote: We -- at InterSystems -- deploy an application that runs in upwards of 40 countries, using many of the languages for which complex script support is required. We definitely need complex script support. It is a requirement for us.
Re: Merge Request - Temp_ComplexScripts into Trunk
On 18/10/2011 19:55, Simon Pepping wrote: I merged the ComplexScripts branch into trunk. Result: Hi Simon, As well of the question of how to do the merge there is also the question should we do the merge? Of course this is a valuable feature to the community, and Glenn has invested a lot of time in its development but is it truely production ready? I asked Vincent to take a look at the branch earlier in the year as it's a feature we also need, but he had several concerns that have not be adequately answered. Take a look at comment #30; https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30 I'm not sure why Vincent describes it as a brief look because he spent several days on it. I also asked Peter to take a look and he had similar concerns. 2 or 3 letter variable names are a barrier for any committer wanting to maintain this code and I don't think it is a sufficient argument to say that a pre-requisite to maintaining this code is to be a domain expert. I would hope that any experienced committer with a debugger should be able to solve some bugs. Obviously certain problems will require domain expertise, but the variables names are a key barrier to being able to maintain this code. I realise my comments might be a little controversial and I don't mean any disrespect to Glenn or his work (which is largely excellent), but we should at least discuss these topics before the merge is completed. Thanks Chris --- Merging r981451 through r1185769 into '.': Summary of conflicts: Text conflicts: 58 Tree conflicts: 126 Most tree conflicts are probably an artifact of subversion. See svn info lib/xmlgraphics-commons-1.5svn.jar|tail -n 4 Tree conflict: local add, incoming add upon merge Source left: (file) https://svn.apache.org/repos/asf/xmlgraphics/fop/trunk/lib/xmlgraphics-commons-1.5svn.jar@981450 Source right: (file) https://svn.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_ComplexScripts/lib/xmlgraphics-commons-1.5svn.jar@1185769 This will cause quite some work. I also merged trunk into ComplexScripts. Result: --- Merging r1177231 through r1185780 into '.': Summary of conflicts: Text conflicts: 2 Tree conflicts: 2 I resolved the text conflicts easily. Again the tree conflicts were not real conflicts. Both merges should result in the same code: trunk + ComplexScripts. I did not commit the merge of trunk into ComplexScripts to the repository. I do not think it would facilitate merging ComplexScripts into trunk. Simon On Sat, Oct 15, 2011 at 06:17:49PM +0800, Glenn Adams wrote: With this latest patch, I am satisfied that there is sufficient testing and stability in the CS branch to support its merger into trunk. Therefore, I request that such a merge be accomplished after applying patch 5 to the CS branch as described below.
Re: Merge Request - Temp_ComplexScripts into Trunk
I provided a detailed comment on Vincent's brief review at: https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c31 With the exception of the the following comment, the remaining comments are editorial in nature or have no actionable response. How feasible is it to run the BIDI algorithm on a subset of the FO tree? If I'm right, ATM it is launched on a whole page-sequence. When we refactor the code so that layout can be started before a whole page-sequence has been parsed, in order to avoid the infamous memory issue that a lot of users are running into, will that code allow to do it? Regarding the point(s) at which the bidi algorithm is invoked, I agree there are possible, alternative points of invocation. Some of which may distribute the cost of its performance as opposed to concentrating that work in one call. However, even if there are alternatives, it is not clear whether they are desirable, and whether it would have any impact on memory usage or time performance. Further, when we refactor the code is an unknown, future possibility, and making a substantial change at this time in order to perform certain optimizations that only come into play in the eventuality that such code refactoring occurs seems to be an example of premature optimization. As regards to symbol name length, I would merely cite that the FOP Coding Conventions http://xmlgraphics.apache.org/fop/dev/conventions.html do not specify length of an identifier (either short or long) as an established convention. Let's not use this valuable contribution as a *cause célèbre* to argue or establish new conventions that do not exist. Otherwise we will soon be arguing over whether i, j, and k are reasonable identifiers for enumeration variables. You ask about whether this merge is truly production ready? Is this a requirement for doing a merge from a temporary branch? How would you define truly production ready? Is FOP itself truly production ready? I've spend the last couple of months creating over 500,000 tests. I wonder if any other part of FOP is so thoroughly tested? Frankly speaking, I would like to put even more tests into place, and I will over time, but I'm not going to do that until the current work is merged. If there is a substantial bug or regression that would be caused by this merger, then I'd be happy to address that problem before merging. I look forward to any report of this nature. If folks wish to keep pushing the issue of symbol name length, then please enumerate, by source file and line number each case to which is objected. Also, please provide objective comparative data showing how the reported data diverges from statistically similar usage elsewhere in FOP. Please also define an objective measure of the level of domain knowledge needed for working with specific code in this merge for which a comment about symbol length is an issue. In particular, please show how a reader skilled in the arts of the code being considered would not be able to readily infer the meaning of a specific symbol in cases where I have not explicitly provided a comment at the place of definition. G. On Wed, Oct 19, 2011 at 4:50 PM, Chris Bowditch bowditch_ch...@hotmail.comwrote: On 18/10/2011 19:55, Simon Pepping wrote: I merged the ComplexScripts branch into trunk. Result: Hi Simon, As well of the question of how to do the merge there is also the question should we do the merge? Of course this is a valuable feature to the community, and Glenn has invested a lot of time in its development but is it truely production ready? I asked Vincent to take a look at the branch earlier in the year as it's a feature we also need, but he had several concerns that have not be adequately answered. Take a look at comment #30; https://issues.apache.org/**bugzilla/show_bug.cgi?id=**49687#c30https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30 I'm not sure why Vincent describes it as a brief look because he spent several days on it. I also asked Peter to take a look and he had similar concerns. 2 or 3 letter variable names are a barrier for any committer wanting to maintain this code and I don't think it is a sufficient argument to say that a pre-requisite to maintaining this code is to be a domain expert. I would hope that any experienced committer with a debugger should be able to solve some bugs. Obviously certain problems will require domain expertise, but the variables names are a key barrier to being able to maintain this code. I realise my comments might be a little controversial and I don't mean any disrespect to Glenn or his work (which is largely excellent), but we should at least discuss these topics before the merge is completed. Thanks Chris --- Merging r981451 through r1185769 into '.': Summary of conflicts: Text conflicts: 58 Tree conflicts: 126 Most tree conflicts are probably an artifact of subversion. See svn info lib/xmlgraphics-commons-1.**5svn.jar|tail -n 4 Tree
Re: Merge Request - Temp_ComplexScripts into Trunk
HI, IMHO, Production ready should only cited before a FOP release, not for a merge of branch to trunk. At this stage, the only questions are about regression tests (and code readability, since open source). Merging the branch now should encourage more users to test these new features and give feedback. Le 19/10/2011 14:25, Glenn Adams a écrit : You ask about whether this merge is truly production ready? Is this a requirement for doing a merge from a temporary branch? How would you define truly production ready? Is FOP itself truly production ready? I've spend the last couple of months creating over 500,000 tests. I wonder if any other part of FOP is so thoroughly tested? Frankly speaking, I would like to put even more tests into place, and I will over time, but I'm not going to do that until the current work is merged. -- Pascal
RE: Merge Request - Temp_ComplexScripts into Trunk
We -- at InterSystems -- deploy an application that runs in upwards of 40 countries, using many of the languages for which complex script support is required. We definitely need complex script support. It is a requirement for us. Thanks, Jonathan Levinson Senior Software Developer Object Group InterSystems +1 617-621-0600 jonathan.levin...@intersystems.com -Original Message- From: Simon Pepping [mailto:spepp...@leverkruid.eu] Sent: Wednesday, October 19, 2011 2:32 PM To: fop-dev@xmlgraphics.apache.org Subject: Re: Merge Request - Temp_ComplexScripts into Trunk Over the past ten years computing has pervaded life in all its facets, and spread over the world. As a consequence computing is now used in all languages and all scripts. When I open my devanagari test file in emacs, it just works. When I open it in firefox, it just works. The same when I open it in LibreOffice Writer. I am sure that, if I would open it in *the* *Word* processor, it would just work. When I process the file with FOP, it does not work. With the complex scripts functionality, it works, dependent on the use of supported or otherwise suitable fonts. (That is true for all above applications, but apparently those come configured with my system.) So what does a person do who believes in the XML stack to maintain his documentation, and wants to send his documents in Hindi to his customers? See that XSL-FO with FOP is not a suitable solution for him because Hindi uses a complex script? FOP needs the complex scripts functionality to remain a player in the global playing field. This is for me the single overarching consideration to want this functionality in FOP's trunk code, and in, say, half a year in a release. All other considerations are minor, unless one wants to claim that this code will block FOP's further development and maintenance in the coming years. Of course, not everybody needs this functionality, and there is a fear of increased maintenance overhead. But the question is: For whom do we develop FOP? Also for the large part of the world that uses complex scripts? With the development of the complex scripts functionality, Glenn Adams and his sponsor Basis Technologies have created a new reality, which is not going to go away. If this functionality does not end up in FOP, it will probably live on elsewhere. If the FOP team is fine with that, say no to the merge request, and feel comfortable with a trusted niche application. Simon Pepping On Wed, Oct 19, 2011 at 09:50:24AM +0100, Chris Bowditch wrote: On 18/10/2011 19:55, Simon Pepping wrote: I merged the ComplexScripts branch into trunk. Result: Hi Simon, As well of the question of how to do the merge there is also the question should we do the merge? Of course this is a valuable feature to the community, and Glenn has invested a lot of time in its development but is it truely production ready? I asked Vincent to take a look at the branch earlier in the year as it's a feature we also need, but he had several concerns that have not be adequately answered. Take a look at comment #30; https://issues.apache.org/bugzilla/show_bug.cgi?id=49687#c30 I'm not sure why Vincent describes it as a brief look because he spent several days on it. I also asked Peter to take a look and he had similar concerns. 2 or 3 letter variable names are a barrier for any committer wanting to maintain this code and I don't think it is a sufficient argument to say that a pre-requisite to maintaining this code is to be a domain expert. I would hope that any experienced committer with a debugger should be able to solve some bugs. Obviously certain problems will require domain expertise, but the variables names are a key barrier to being able to maintain this code. I realise my comments might be a little controversial and I don't mean any disrespect to Glenn or his work (which is largely excellent), but we should at least discuss these topics before the merge is completed.
Re: Merge Request - Temp_ComplexScripts into Trunk
I merged the ComplexScripts branch into trunk. Result: --- Merging r981451 through r1185769 into '.': Summary of conflicts: Text conflicts: 58 Tree conflicts: 126 Most tree conflicts are probably an artifact of subversion. See svn info lib/xmlgraphics-commons-1.5svn.jar|tail -n 4 Tree conflict: local add, incoming add upon merge Source left: (file) https://svn.apache.org/repos/asf/xmlgraphics/fop/trunk/lib/xmlgraphics-commons-1.5svn.jar@981450 Source right: (file) https://svn.apache.org/repos/asf/xmlgraphics/fop/branches/Temp_ComplexScripts/lib/xmlgraphics-commons-1.5svn.jar@1185769 This will cause quite some work. I also merged trunk into ComplexScripts. Result: --- Merging r1177231 through r1185780 into '.': Summary of conflicts: Text conflicts: 2 Tree conflicts: 2 I resolved the text conflicts easily. Again the tree conflicts were not real conflicts. Both merges should result in the same code: trunk + ComplexScripts. I did not commit the merge of trunk into ComplexScripts to the repository. I do not think it would facilitate merging ComplexScripts into trunk. Simon On Sat, Oct 15, 2011 at 06:17:49PM +0800, Glenn Adams wrote: With this latest patch, I am satisfied that there is sufficient testing and stability in the CS branch to support its merger into trunk. Therefore, I request that such a merge be accomplished after applying patch 5 to the CS branch as described below.