Re: Fix for paragraph breaking

2002-09-19 Thread J.Pietschmann

[EMAIL PROTECTED] wrote:
> I suggest we write a special language hyphenation file for URLs -- it is
> not a natural language, but it is one nevertheless, with its own lexical
> rules.

interesstingly, I just had the same thought.

> (Can someone provide me with a pointer to the pertinent spec?)
  http://www.rfc-editor.org/rfc/rfc2396.txt

> Stylesheets like DocBook's can take advantage of this by specifying the new
> language code, something like x-url. This approach can also be used with
> programming languages or other similar stuff, and it has already been
> proven to work with languages that can produce very long words (Herr
> Pietschmann und die xml:lang='de' Leute soll mit mir einstimmig sein ;-).
> However, the hyphen would not be a good choice as the character to use in
> the breaking point: a better choice would be to use ellipses (...) in the
> preceeding AND in the following line. Can this be achieved?

Certain problems here:
- There are quite a few places where the length of the laguage
   name is hardwired to 2 (or 5 if using a location). This doesn't
   mend "x-url" won't work, but I'd rather check before making
   promises.
- The hyphenation character is, well, a *character*, and it is
   appended at the first part of the hyphenated word. This is
   hardwired. I don't recommend hacking around there, the code is
   already very brittle and will be rewritten in HEAD anyway.
   Fuerthermore, "..." should be used with care because dots can
   occur in URLs and play a noticable role especially in the host
   part. I'd settle for a zero width space or, perhaps, a backslash.
   The hyphenation character can be explicitely specified with the
   hyphenation-character property, and the spec mandates the hyphen
   char U+2010 as default (I think FOP uses a dash, but so what).
   There is a field in the hyphenation XML file for a language specific
   hyphenation charactar, but I think it's ignored.

> I can write such an hyphenation file if you people agree this is a sensible
> solution.
That would be interesting.

For everybody else interested: FOP uses the hyphenation
algorithm from TEX, which is described in "The TeXBook",
appendix H.
The TEX-source of this book can be downloaded from a variety
of places, just type "textbook.tex" into Google.

J.Pietschmann



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: Fix for paragraph breaking

2002-09-19 Thread jaccoud


The concept of correctness is highly subjective; what is 'correct' in one
situation may not be in another. In this context 'correct' means
_conforming_ behaviour. I can't remember any section in the FO standard
that says the processor must guess where and how to hyphenate, perhaps
because machines are unable to do subjective decisions. Its behaviour must
be conservative in these cases, doing only what it is suposed to do. I
prefer obedient software, not smart ones like M$-Word that think it knows
better. For example, M$-Word hyphenation in Portuguese is so bad I have to
turn it off --- there is no way to control it.

Your primary keys, for example, need a different rule, perhaps breaking
anywhere using no separation character at all. You can specify that in a
very simple hyphenation file, and name it x-key or x-sgs (simple glyph
sequence:-). Each case is different, and using hyphenation rule files you
can specify the appropriate rules for each situation. I like that: to have
control. Correctness is not in question here -- there is no correct way to
break a primary key; as for myself, I would work to guarantee that the key
always fits in the cell, and turn hyphenation off. But they are your keys
-- you must be able to have then broken (or not) at your discretion. FOP
must obbey.

Cheers

=
Marcelo Jaccoud Amaral
Petrobrás (http://www.petrobras.com.br)
mailto:[EMAIL PROTECTED]
voice: +55 21 2534-3485
fax: +55 21 2534-1809
=
Wisdom is only a comparative quality, it will not bear a single definition.
--Marquess of Halifax



   
 
  Kevin Yeung  
 
  <[EMAIL PROTECTED]Para: [EMAIL PROTECTED]
 
  et.hk>   cc: 
 
               Assunto:  Re: Fix for paragraph 
breaking 
  18/09/2002 21:48 
 
  Favor responder a
 
  fop-dev  
 
   
 
   
 




Hi there

I don't see why this is the 'correct' behaviour. If a long string cannot
be read, it is not correct, is it? The software is not serving its
purpose.

And I'm concerned about writing a URL hyphenation. What about long strings
that are neither natural language nor URL? I sometimes need to print long
primary key, which has hyphen in itself. How will the extra hyphens affect
my PK?

I think we should just break the text at margin and wrap the string to the
next line.

Just my 2 cents

Kevin

On Wed, 18 Sep 2002 [EMAIL PROTECTED] wrote:

> Date: Wed, 18 Sep 2002 08:36:08 -0300
> From: [EMAIL PROTECTED]
> Reply-To: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: Fix for paragraph breaking
>
> Sorry, commit denied for a variety of reasons:
> 1. It is not clear whether the problem you attempt to fix is a problem
> at all. Actually, it can be argued FOPs behaviour is correct,
annoying
> as it may be sometimes. This is the main showstopper.
>
>
> Although I concur that FOP should never break words while hyphenation is
> off, I sympathise with Mr. Baals. I had a similar problem with URLs,
which
> can become quite long and do not fit in the hyphenation rules for any
> language. If they grow beyond the line width there is no way of getting
it
> right without inserting spaces manually  . While using
discretionary
> hyphens can solve the problem localy (I do not remember FOP taking them
> into account while hyphenating; it is most handy when a word has
irregular
> hyphenation), it would be counterproductive.
>
> I suggest we write a special language hyphenation file for URLs -- it is
> not a natural language, but it is one nevertheless, with its own lexical
> rules. (Can someone provide me with a pointer to the pertinent spec?)
> Stylesheets like DocBook's can take advantage of this by specifying the
new
> language code, something like x-url. This approach can also be used with
> programming languages or other similar stuff, and it has already been
> proven to work with languages that can produce very long words (Herr
> Pietsc

Re: Fix for paragraph breaking

2002-09-18 Thread Kevin Yeung

Hi there

I don't see why this is the 'correct' behaviour. If a long string cannot
be read, it is not correct, is it? The software is not serving its
purpose.

And I'm concerned about writing a URL hyphenation. What about long strings
that are neither natural language nor URL? I sometimes need to print long
primary key, which has hyphen in itself. How will the extra hyphens affect
my PK?

I think we should just break the text at margin and wrap the string to the
next line.

Just my 2 cents

Kevin

On Wed, 18 Sep 2002 [EMAIL PROTECTED] wrote:

> Date: Wed, 18 Sep 2002 08:36:08 -0300
> From: [EMAIL PROTECTED]
> Reply-To: [EMAIL PROTECTED]
> To: [EMAIL PROTECTED]
> Subject: Re: Fix for paragraph breaking
> 
> Sorry, commit denied for a variety of reasons:
> 1. It is not clear whether the problem you attempt to fix is a problem
> at all. Actually, it can be argued FOPs behaviour is correct, annoying
> as it may be sometimes. This is the main showstopper.
> 
> 
> Although I concur that FOP should never break words while hyphenation is
> off, I sympathise with Mr. Baals. I had a similar problem with URLs, which
> can become quite long and do not fit in the hyphenation rules for any
> language. If they grow beyond the line width there is no way of getting it
> right without inserting spaces manually  . While using discretionary
> hyphens can solve the problem localy (I do not remember FOP taking them
> into account while hyphenating; it is most handy when a word has irregular
> hyphenation), it would be counterproductive.
> 
> I suggest we write a special language hyphenation file for URLs -- it is
> not a natural language, but it is one nevertheless, with its own lexical
> rules. (Can someone provide me with a pointer to the pertinent spec?)
> Stylesheets like DocBook's can take advantage of this by specifying the new
> language code, something like x-url. This approach can also be used with
> programming languages or other similar stuff, and it has already been
> proven to work with languages that can produce very long words (Herr
> Pietschmann und die xml:lang='de' Leute soll mit mir einstimmig sein ;-).
> However, the hyphen would not be a good choice as the character to use in
> the breaking point: a better choice would be to use ellipses (...) in the
> preceeding AND in the following line. Can this be achieved?
> 
> I can write such an hyphenation file if you people agree this is a sensible
> solution.
> 
> 
> =
> Marcelo Jaccoud Amaral
> Petrobrás (http://www.petrobras.com.br)
> mailto:[EMAIL PROTECTED]
> voice: +55 21 2534-3485
> fax: +55 21 2534-1809
> =
> If brute force doesn't work, maybe you're not using enough brute force.
> 
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, email: [EMAIL PROTECTED]
> 
> 
> 

--
K


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: Fix for paragraph breaking

2002-09-18 Thread jaccoud

   
 
  "J.Pietschmann"  
 
  <[EMAIL PROTECTED]Para: [EMAIL PROTECTED],   
 
  e>[EMAIL PROTECTED]  
 
   cc: 
 
  17/09/2002 17:36         Assunto:  Re: Fix for paragraph 
breaking.
  Favor responder a
 
  fop-dev  
 
   
 
   
 








[EMAIL PROTECTED] wrote:
> I fixed a problem that FOP had with breaking loong words. If these
> words were too long to fit on one line, and no hyphenation was selected,
> the word would go beyond the right side of the page.

Sorry, commit denied for a variety of reasons:
1. It is not clear whether the problem you attempt to fix is a problem
at all. Actually, it can be argued FOPs behaviour is correct, annoying
as it may be sometimes. This is the main showstopper.


Although I concur that FOP should never break words while hyphenation is
off, I sympathise with Mr. Baals. I had a similar problem with URLs, which
can become quite long and do not fit in the hyphenation rules for any
language. If they grow beyond the line width there is no way of getting it
right without inserting spaces manually  . While using discretionary
hyphens can solve the problem localy (I do not remember FOP taking them
into account while hyphenating; it is most handy when a word has irregular
hyphenation), it would be counterproductive.

I suggest we write a special language hyphenation file for URLs -- it is
not a natural language, but it is one nevertheless, with its own lexical
rules. (Can someone provide me with a pointer to the pertinent spec?)
Stylesheets like DocBook's can take advantage of this by specifying the new
language code, something like x-url. This approach can also be used with
programming languages or other similar stuff, and it has already been
proven to work with languages that can produce very long words (Herr
Pietschmann und die xml:lang='de' Leute soll mit mir einstimmig sein ;-).
However, the hyphen would not be a good choice as the character to use in
the breaking point: a better choice would be to use ellipses (...) in the
preceeding AND in the following line. Can this be achieved?

I can write such an hyphenation file if you people agree this is a sensible
solution.


=
Marcelo Jaccoud Amaral
Petrobrás (http://www.petrobras.com.br)
mailto:[EMAIL PROTECTED]
voice: +55 21 2534-3485
fax: +55 21 2534-1809
=
If brute force doesn't work, maybe you're not using enough brute force.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: Fix for paragraph breaking.

2002-09-17 Thread J.Pietschmann

[EMAIL PROTECTED] wrote:
> I fixed a problem that FOP had with breaking loong words. If these
> words were too long to fit on one line, and no hyphenation was selected,
> the word would go beyond the right side of the page.

Sorry, commit denied for a variety of reasons:
1. It is not clear whether the problem you attempt to fix is a problem
at all. Actually, it can be argued FOPs behaviour is correct, annoying
as it may be sometimes. This is the main showstopper.
As for the formalities:
2. Please use a recent CVS checkout as original while generating the diff.
3. Please make sure that your files are *not* using tabs for indentation.
4. Please submit patches as unified diffs if possible (use -u), or context
diff otherwise (use -c).
5. Please do not submit patches which seems to be reversed.
6. Please check whether your patched code still adheres to the Apache
Java coding style.

J.Pietschmann


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]