Re: Fix for paragraph breaking
[EMAIL PROTECTED] wrote: > I suggest we write a special language hyphenation file for URLs -- it is > not a natural language, but it is one nevertheless, with its own lexical > rules. interesstingly, I just had the same thought. > (Can someone provide me with a pointer to the pertinent spec?) http://www.rfc-editor.org/rfc/rfc2396.txt > Stylesheets like DocBook's can take advantage of this by specifying the new > language code, something like x-url. This approach can also be used with > programming languages or other similar stuff, and it has already been > proven to work with languages that can produce very long words (Herr > Pietschmann und die xml:lang='de' Leute soll mit mir einstimmig sein ;-). > However, the hyphen would not be a good choice as the character to use in > the breaking point: a better choice would be to use ellipses (...) in the > preceeding AND in the following line. Can this be achieved? Certain problems here: - There are quite a few places where the length of the laguage name is hardwired to 2 (or 5 if using a location). This doesn't mend "x-url" won't work, but I'd rather check before making promises. - The hyphenation character is, well, a *character*, and it is appended at the first part of the hyphenated word. This is hardwired. I don't recommend hacking around there, the code is already very brittle and will be rewritten in HEAD anyway. Fuerthermore, "..." should be used with care because dots can occur in URLs and play a noticable role especially in the host part. I'd settle for a zero width space or, perhaps, a backslash. The hyphenation character can be explicitely specified with the hyphenation-character property, and the spec mandates the hyphen char U+2010 as default (I think FOP uses a dash, but so what). There is a field in the hyphenation XML file for a language specific hyphenation charactar, but I think it's ignored. > I can write such an hyphenation file if you people agree this is a sensible > solution. That would be interesting. For everybody else interested: FOP uses the hyphenation algorithm from TEX, which is described in "The TeXBook", appendix H. The TEX-source of this book can be downloaded from a variety of places, just type "textbook.tex" into Google. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Fix for paragraph breaking
The concept of correctness is highly subjective; what is 'correct' in one situation may not be in another. In this context 'correct' means _conforming_ behaviour. I can't remember any section in the FO standard that says the processor must guess where and how to hyphenate, perhaps because machines are unable to do subjective decisions. Its behaviour must be conservative in these cases, doing only what it is suposed to do. I prefer obedient software, not smart ones like M$-Word that think it knows better. For example, M$-Word hyphenation in Portuguese is so bad I have to turn it off --- there is no way to control it. Your primary keys, for example, need a different rule, perhaps breaking anywhere using no separation character at all. You can specify that in a very simple hyphenation file, and name it x-key or x-sgs (simple glyph sequence:-). Each case is different, and using hyphenation rule files you can specify the appropriate rules for each situation. I like that: to have control. Correctness is not in question here -- there is no correct way to break a primary key; as for myself, I would work to guarantee that the key always fits in the cell, and turn hyphenation off. But they are your keys -- you must be able to have then broken (or not) at your discretion. FOP must obbey. Cheers = Marcelo Jaccoud Amaral Petrobrás (http://www.petrobras.com.br) mailto:[EMAIL PROTECTED] voice: +55 21 2534-3485 fax: +55 21 2534-1809 = Wisdom is only a comparative quality, it will not bear a single definition. --Marquess of Halifax Kevin Yeung <[EMAIL PROTECTED]Para: [EMAIL PROTECTED] et.hk> cc: Assunto: Re: Fix for paragraph breaking 18/09/2002 21:48 Favor responder a fop-dev Hi there I don't see why this is the 'correct' behaviour. If a long string cannot be read, it is not correct, is it? The software is not serving its purpose. And I'm concerned about writing a URL hyphenation. What about long strings that are neither natural language nor URL? I sometimes need to print long primary key, which has hyphen in itself. How will the extra hyphens affect my PK? I think we should just break the text at margin and wrap the string to the next line. Just my 2 cents Kevin On Wed, 18 Sep 2002 [EMAIL PROTECTED] wrote: > Date: Wed, 18 Sep 2002 08:36:08 -0300 > From: [EMAIL PROTECTED] > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: Fix for paragraph breaking > > Sorry, commit denied for a variety of reasons: > 1. It is not clear whether the problem you attempt to fix is a problem > at all. Actually, it can be argued FOPs behaviour is correct, annoying > as it may be sometimes. This is the main showstopper. > > > Although I concur that FOP should never break words while hyphenation is > off, I sympathise with Mr. Baals. I had a similar problem with URLs, which > can become quite long and do not fit in the hyphenation rules for any > language. If they grow beyond the line width there is no way of getting it > right without inserting spaces manually . While using discretionary > hyphens can solve the problem localy (I do not remember FOP taking them > into account while hyphenating; it is most handy when a word has irregular > hyphenation), it would be counterproductive. > > I suggest we write a special language hyphenation file for URLs -- it is > not a natural language, but it is one nevertheless, with its own lexical > rules. (Can someone provide me with a pointer to the pertinent spec?) > Stylesheets like DocBook's can take advantage of this by specifying the new > language code, something like x-url. This approach can also be used with > programming languages or other similar stuff, and it has already been > proven to work with languages that can produce very long words (Herr > Pietsc
Re: Fix for paragraph breaking
Hi there I don't see why this is the 'correct' behaviour. If a long string cannot be read, it is not correct, is it? The software is not serving its purpose. And I'm concerned about writing a URL hyphenation. What about long strings that are neither natural language nor URL? I sometimes need to print long primary key, which has hyphen in itself. How will the extra hyphens affect my PK? I think we should just break the text at margin and wrap the string to the next line. Just my 2 cents Kevin On Wed, 18 Sep 2002 [EMAIL PROTECTED] wrote: > Date: Wed, 18 Sep 2002 08:36:08 -0300 > From: [EMAIL PROTECTED] > Reply-To: [EMAIL PROTECTED] > To: [EMAIL PROTECTED] > Subject: Re: Fix for paragraph breaking > > Sorry, commit denied for a variety of reasons: > 1. It is not clear whether the problem you attempt to fix is a problem > at all. Actually, it can be argued FOPs behaviour is correct, annoying > as it may be sometimes. This is the main showstopper. > > > Although I concur that FOP should never break words while hyphenation is > off, I sympathise with Mr. Baals. I had a similar problem with URLs, which > can become quite long and do not fit in the hyphenation rules for any > language. If they grow beyond the line width there is no way of getting it > right without inserting spaces manually . While using discretionary > hyphens can solve the problem localy (I do not remember FOP taking them > into account while hyphenating; it is most handy when a word has irregular > hyphenation), it would be counterproductive. > > I suggest we write a special language hyphenation file for URLs -- it is > not a natural language, but it is one nevertheless, with its own lexical > rules. (Can someone provide me with a pointer to the pertinent spec?) > Stylesheets like DocBook's can take advantage of this by specifying the new > language code, something like x-url. This approach can also be used with > programming languages or other similar stuff, and it has already been > proven to work with languages that can produce very long words (Herr > Pietschmann und die xml:lang='de' Leute soll mit mir einstimmig sein ;-). > However, the hyphen would not be a good choice as the character to use in > the breaking point: a better choice would be to use ellipses (...) in the > preceeding AND in the following line. Can this be achieved? > > I can write such an hyphenation file if you people agree this is a sensible > solution. > > > = > Marcelo Jaccoud Amaral > Petrobrás (http://www.petrobras.com.br) > mailto:[EMAIL PROTECTED] > voice: +55 21 2534-3485 > fax: +55 21 2534-1809 > = > If brute force doesn't work, maybe you're not using enough brute force. > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, email: [EMAIL PROTECTED] > > > -- K - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Fix for paragraph breaking
"J.Pietschmann" <[EMAIL PROTECTED]Para: [EMAIL PROTECTED], e>[EMAIL PROTECTED] cc: 17/09/2002 17:36 Assunto: Re: Fix for paragraph breaking. Favor responder a fop-dev [EMAIL PROTECTED] wrote: > I fixed a problem that FOP had with breaking loong words. If these > words were too long to fit on one line, and no hyphenation was selected, > the word would go beyond the right side of the page. Sorry, commit denied for a variety of reasons: 1. It is not clear whether the problem you attempt to fix is a problem at all. Actually, it can be argued FOPs behaviour is correct, annoying as it may be sometimes. This is the main showstopper. Although I concur that FOP should never break words while hyphenation is off, I sympathise with Mr. Baals. I had a similar problem with URLs, which can become quite long and do not fit in the hyphenation rules for any language. If they grow beyond the line width there is no way of getting it right without inserting spaces manually . While using discretionary hyphens can solve the problem localy (I do not remember FOP taking them into account while hyphenating; it is most handy when a word has irregular hyphenation), it would be counterproductive. I suggest we write a special language hyphenation file for URLs -- it is not a natural language, but it is one nevertheless, with its own lexical rules. (Can someone provide me with a pointer to the pertinent spec?) Stylesheets like DocBook's can take advantage of this by specifying the new language code, something like x-url. This approach can also be used with programming languages or other similar stuff, and it has already been proven to work with languages that can produce very long words (Herr Pietschmann und die xml:lang='de' Leute soll mit mir einstimmig sein ;-). However, the hyphen would not be a good choice as the character to use in the breaking point: a better choice would be to use ellipses (...) in the preceeding AND in the following line. Can this be achieved? I can write such an hyphenation file if you people agree this is a sensible solution. = Marcelo Jaccoud Amaral Petrobrás (http://www.petrobras.com.br) mailto:[EMAIL PROTECTED] voice: +55 21 2534-3485 fax: +55 21 2534-1809 = If brute force doesn't work, maybe you're not using enough brute force. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Fix for paragraph breaking.
[EMAIL PROTECTED] wrote: > I fixed a problem that FOP had with breaking loong words. If these > words were too long to fit on one line, and no hyphenation was selected, > the word would go beyond the right side of the page. Sorry, commit denied for a variety of reasons: 1. It is not clear whether the problem you attempt to fix is a problem at all. Actually, it can be argued FOPs behaviour is correct, annoying as it may be sometimes. This is the main showstopper. As for the formalities: 2. Please use a recent CVS checkout as original while generating the diff. 3. Please make sure that your files are *not* using tabs for indentation. 4. Please submit patches as unified diffs if possible (use -u), or context diff otherwise (use -c). 5. Please do not submit patches which seems to be reversed. 6. Please check whether your patched code still adheres to the Apache Java coding style. J.Pietschmann - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]