RE: Forced Line Breaking

2020-07-20 Thread Moser,Nicholas
Hi Chris,

I've made a number of changes since my original proposal.

Basically, I've modified FOP to consider every point in a word a valid 
hyphenation point. I am completely ignoring the hyphenation pattern files now. 
As a result, I'm finding that the algorithm is a bit too "hyphenation happy" in 
that it wants to hyphenate more than I'd like it to. To alleviate this, I've 
also made the penalty for hyphenation much harsher and the max adjustment 
higher. This results in hyphenation only being used as a last resort.

I'm happy to upload these changes to a JIRA improvement. My issue is that I 
have hard-coded my changes into FOP; these changes would certainly be more 
appropriate as toggleable functionality. What's the most appropriate place in 
FOP for a setting like this? I'm thinking a setting in the FOP configuration 
file (fop.xconf), but I'm not even sure what I would call this setting.

Thanks!

-Original Message-
From: Chris Bowditch  
Sent: Thursday, July 2, 2020 4:32 AM
To: fop-dev@xmlgraphics.apache.org
Subject: Re: Forced Line Breaking

Thanks for the suggestion. Can you attach your proposal as patch to a JIRA 
improvement so it's not lost?

I think in the past this approach was not favoured as strict adherence to the 
spec was preferred. However, I know you are not alone in not liking this 
behaviour, so others may benefit from such an improvement

Thanks,

Chris

On 12/06/2020 22:41, Moser,Nicholas wrote:
> Hello all,
>
> I have a question about line breaking. I'm working in an environment where 
> text going off the page should be avoided at all costs. Even when using 
> hyphenation in FOP, it seems like some strings (e.g. only numbers) cannot be 
> broken and therefore can go off the page[1]. This seems especially prevalent 
> when using table cells. One of the common recommendations seems to be 
> introducing zero width spaces into the string. However, I'm more interested 
> in seeing if it's possible for FOP to detect this issue automatically and 
> force a line break.
>
> My current understanding of the line breaking algorithm is that each "word" 
> will be a single KnuthInlineBox. If hyphenation is enabled and hyphenation 
> points are found, it will break this KnuthInlineBox into multiple 
> KnuthInlineBoxes that are groups of characters; penalties and glues are then 
> placed between each new KnuthInlineBox (representing each hyphenation point). 
> I'm wondering how realistic it would be to further break these new 
> KnuthInlineBoxes into KnuthInlineBoxes for each character. Then, add glue and 
> even higher penalities between these character KnuthInlineBoxes. The goal 
> would be to prefer the standard hyphenation points, but have even higher 
> penalty places to break just in case. If there are no hyphenation points 
> found, it will just break the original KnuthInlineBox into KnuthInlineBoxes 
> for each character.
>
> I have created a simple proof of concept and it "seems" to work.
> My question is: how reasonable does this strategy sound? Is there a better 
> way to accomplish this that someone could recommend? I suppose it's less 
> efficient since multiple objects will be created for each character. 
> Additionally, since hyphens could be added anywhere in a string, it could be 
> added somewhere that changes the meaning of the string. I consider that a 
> reasonable alternative to text going off the screen though.
>
> Thanks!
>
> [1] Here is an example where the string of characters will go off of the page 
> since it is only numbers:
>  xmlns:fo="https://nam01.safelinks.protection.outlook.com/?url=http%3A%
> 2F%2Fwww.w3.org%2F1999%2FXSL%2FFormatdata=02%7C01%7CNicholas.Mose
> r%40cerner.com%7Cea73a27757cd4f17a66808d81e6acc51%7Cfbc493a80d244454a8
> 15f4ca58e8c09d%7C0%7C0%7C637292791330892823sdata=06HQ0xMFDwXzMMBu
> pXGjmS7bRS%2BBjxO%2FSCe2Q%2BNhlAw%3Dreserved=0" 
> language="en"> master-name="all" page-width="8.5in" page-height="11in" 
> margin-left="1in" margin-right="1in" margin-top="1in" 
> margin-bottom="1in"> ut-master-set> flow-name="xsl-region-body"> hyphenate="true">
> 11
> 11
> 11
> 1112 >
>
>
> CONFIDENTIALITY NOTICE This message and any included attachments are from 
> Cerner Corporation and are intended only for the addressee. The information 
> contained in this message is confidential and may constitute inside or 
> non-public information under international, federal, or state securities 
> l

Re: Forced Line Breaking

2020-07-02 Thread Chris Bowditch
Thanks for the suggestion. Can you attach your proposal as patch to a 
JIRA improvement so it's not lost?


I think in the past this approach was not favoured as strict adherence 
to the spec was preferred. However, I know you are not alone in not 
liking this behaviour, so others may benefit from such an improvement


Thanks,

Chris

On 12/06/2020 22:41, Moser,Nicholas wrote:

Hello all,

I have a question about line breaking. I'm working in an environment where text 
going off the page should be avoided at all costs. Even when using hyphenation 
in FOP, it seems like some strings (e.g. only numbers) cannot be broken and 
therefore can go off the page[1]. This seems especially prevalent when using 
table cells. One of the common recommendations seems to be introducing zero 
width spaces into the string. However, I'm more interested in seeing if it's 
possible for FOP to detect this issue automatically and force a line break.

My current understanding of the line breaking algorithm is that each "word" 
will be a single KnuthInlineBox. If hyphenation is enabled and hyphenation points are 
found, it will break this KnuthInlineBox into multiple KnuthInlineBoxes that are groups 
of characters; penalties and glues are then placed between each new KnuthInlineBox 
(representing each hyphenation point). I'm wondering how realistic it would be to further 
break these new KnuthInlineBoxes into KnuthInlineBoxes for each character. Then, add glue 
and even higher penalities between these character KnuthInlineBoxes. The goal would be to 
prefer the standard hyphenation points, but have even higher penalty places to break just 
in case. If there are no hyphenation points found, it will just break the original 
KnuthInlineBox into KnuthInlineBoxes for each character.

I have created a simple proof of concept and it "seems" to work.
My question is: how reasonable does this strategy sound? Is there a better way 
to accomplish this that someone could recommend? I suppose it's less efficient 
since multiple objects will be created for each character. Additionally, since 
hyphens could be added anywhere in a string, it could be added somewhere that 
changes the meaning of the string. I consider that a reasonable alternative to 
text going off the screen though.

Thanks!

[1] Here is an example where the string of characters will go off of the page 
since it is only numbers:
http://www.w3.org/1999/XSL/Format; language="en">12


CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.
.





Forced Line Breaking

2020-06-12 Thread Moser,Nicholas
Hello all,

I have a question about line breaking. I'm working in an environment where text 
going off the page should be avoided at all costs. Even when using hyphenation 
in FOP, it seems like some strings (e.g. only numbers) cannot be broken and 
therefore can go off the page[1]. This seems especially prevalent when using 
table cells. One of the common recommendations seems to be introducing zero 
width spaces into the string. However, I'm more interested in seeing if it's 
possible for FOP to detect this issue automatically and force a line break.

My current understanding of the line breaking algorithm is that each "word" 
will be a single KnuthInlineBox. If hyphenation is enabled and hyphenation 
points are found, it will break this KnuthInlineBox into multiple 
KnuthInlineBoxes that are groups of characters; penalties and glues are then 
placed between each new KnuthInlineBox (representing each hyphenation point). 
I'm wondering how realistic it would be to further break these new 
KnuthInlineBoxes into KnuthInlineBoxes for each character. Then, add glue and 
even higher penalities between these character KnuthInlineBoxes. The goal would 
be to prefer the standard hyphenation points, but have even higher penalty 
places to break just in case. If there are no hyphenation points found, it will 
just break the original KnuthInlineBox into KnuthInlineBoxes for each character.

I have created a simple proof of concept and it "seems" to work.
My question is: how reasonable does this strategy sound? Is there a better way 
to accomplish this that someone could recommend? I suppose it's less efficient 
since multiple objects will be created for each character. Additionally, since 
hyphens could be added anywhere in a string, it could be added somewhere that 
changes the meaning of the string. I consider that a reasonable alternative to 
text going off the screen though.

Thanks!

[1] Here is an example where the string of characters will go off of the page 
since it is only numbers:
http://www.w3.org/1999/XSL/Format; 
language="en">12


CONFIDENTIALITY NOTICE This message and any included attachments are from 
Cerner Corporation and are intended only for the addressee. The information 
contained in this message is confidential and may constitute inside or 
non-public information under international, federal, or state securities laws. 
Unauthorized forwarding, printing, copying, distribution, or use of such 
information is strictly prohibited and may be unlawful. If you are not the 
addressee, please promptly delete this message and notify the sender of the 
delivery error by e-mail or you may call Cerner's corporate offices in Kansas 
City, Missouri, U.S.A at (+1) (816)221-1024.