[jira] [Commented] (FOP-2880) [PATCH] Hyphenated words are not searchable in readers (when accessibilty is turned on)

2020-05-26 Thread Chris Bowditch (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17116615#comment-17116615
 ] 

Chris Bowditch commented on FOP-2880:
-

Submit a patch to correct glyphlist.xml and encoding.xml, and we'll review it

> [PATCH] Hyphenated words are not searchable in readers (when accessibilty is 
> turned on)
> ---
>
> Key: FOP-2880
> URL: https://issues.apache.org/jira/browse/FOP-2880
> Project: FOP
>  Issue Type: Bug
>  Components: unqualified
>Affects Versions: 2.3
>Reporter: Dan Caprioara
>Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The hyphenated words are rendered by FOP using the hard hyphen character. 
> This contradicts the PDF specification, where in section:
> 14.8.2.2.3 Incidental Artifacts
> clearly states that the SHY  soft hyphen U+00AD character should be used. 
> The effect is that the hyphenated words are not searchable, and the 
> copy/paste feature includes also the hard hyphens, instead of removing them 
> and joining the words pieces together.
> Here is a small patch that can be applied on the FOP core project in order to 
> fix this - this is more like a proof of concept, the real fix would be to 
> change the default hyphenation character * in the FOProppertyMapping and to 
> change the font mappings: to remove the replacement of the SHY with the  
> HYPHEN, see the org.apache.fop.fonts.CodePointMapping.encStandardEncoding, 
> and org.apache.fop.fonts.CodePointMapping.encISOLatin1Encoding
> {code}
>0xad, 0x002D, // hyphen
>0xad, 0x00AD, // hyphen
> {code}
> The patch:
> {code}
> Index: src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java
> ===
> --- src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java 
> (revision 191037)
> +++ src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java 
> (working copy)
> @@ -15,7 +15,7 @@
>   * limitations under the License.
>   */
>  
> -/* $Id$ */
> +/* $Id: CommonHyphenation.java 1610839 2014-07-15 20:25:58Z vhennebert $ */
>  
>  package org.apache.fop.fo.properties;
>  
> @@ -184,6 +184,16 @@
>   */
>  public int getHyphIPD(org.apache.fop.fonts.Font font) {
>  char hyphChar = getHyphChar(font);
> +
> +if (hyphChar == '\u00ad') {
> +  // Bizarre fix, defining the SHY as default hyphenation character 
> in the FOPropertyMapping, leads 
> +  // to hard hyphens not selectable in the PDF reader.
> +  //
> +  // Mapping also the hard hyphen makes the character selectable!
> +  font.mapChar('\u002d');  
> +}  
> +
> +
>  return font.getCharWidth(hyphChar);
>  }
>  
> Index: src/main/java/org/apache/fop/fo/FOPropertyMapping.java
> ===
> --- src/main/java/org/apache/fop/fo/FOPropertyMapping.java(revision 
> 190759)
> +++ src/main/java/org/apache/fop/fo/FOPropertyMapping.java(working copy)
> @@ -1106,7 +1106,10 @@
>  // hyphenation-character
>  m  = new CharacterProperty.Maker(PR_HYPHENATION_CHARACTER);
>  m.setInherited(true);
> -m.setDefault("-");
> +//
> +// m.setDefault("-");
> +m.setDefault("\u00ad");
> +
>  addPropertyMaker("hyphenation-character", m);
>  
>  // hyphenation-push-character-count
> Index: src/main/java/org/apache/fop/render/pdf/PDFPainter.java
> ===
> --- src/main/java/org/apache/fop/render/pdf/PDFPainter.java   (revision 
> 190759)
> +++ src/main/java/org/apache/fop/render/pdf/PDFPainter.java   (working copy)
> @@ -420,7 +420,8 @@
>  PDFStructElem structElem = (PDFStructElem) 
> getContext().getStructureTreeElement();
>  languageAvailabilityChecker.checkLanguageAvailability(text);
>  MarkedContentInfo mci = 
> logicalStructureHandler.addTextContentItem(structElem);
> -String actualText = getContext().isHyphenated() ? 
> text.substring(0, text.length() - 1) : null;
> +//String actualText = getContext().isHyphenated() ? 
> text.substring(0, text.length() - 1) : null;
> +String actualText = null;
>  generator.endTextObject();
>  generator.updateColor(state.getTextColor(), true, null);
>  generator.beginTextObject(mci.tag, mci.mcid, actualText);
> @@ -490,6 +491,15 @@
>  float glyphAdjust = 0;
>  if (font.hasCodePoint(orgChar)) {
>  ch = font.mapCodePoint(orgChar);
> + if (orgChar == 
> '\u00ad' && ch == '\u002d'){
> +   

[jira] [Commented] (FOP-2880) [PATCH] Hyphenated words are not searchable in readers (when accessibilty is turned on)

2020-05-18 Thread Dan Caprioara (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110016#comment-17110016
 ] 

Dan Caprioara commented on FOP-2880:


Yes, it is a hack. How can we proceed further to get the glyphlist.xml and 
encoding.xml updated? 

> [PATCH] Hyphenated words are not searchable in readers (when accessibilty is 
> turned on)
> ---
>
> Key: FOP-2880
> URL: https://issues.apache.org/jira/browse/FOP-2880
> Project: FOP
>  Issue Type: Bug
>  Components: unqualified
>Affects Versions: 2.3
>Reporter: Dan Caprioara
>Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The hyphenated words are rendered by FOP using the hard hyphen character. 
> This contradicts the PDF specification, where in section:
> 14.8.2.2.3 Incidental Artifacts
> clearly states that the SHY  soft hyphen U+00AD character should be used. 
> The effect is that the hyphenated words are not searchable, and the 
> copy/paste feature includes also the hard hyphens, instead of removing them 
> and joining the words pieces together.
> Here is a small patch that can be applied on the FOP core project in order to 
> fix this - this is more like a proof of concept, the real fix would be to 
> change the default hyphenation character * in the FOProppertyMapping and to 
> change the font mappings: to remove the replacement of the SHY with the  
> HYPHEN, see the org.apache.fop.fonts.CodePointMapping.encStandardEncoding, 
> and org.apache.fop.fonts.CodePointMapping.encISOLatin1Encoding
> {code}
>0xad, 0x002D, // hyphen
>0xad, 0x00AD, // hyphen
> {code}
> The patch:
> {code}
> Index: src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java
> ===
> --- src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java 
> (revision 191037)
> +++ src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java 
> (working copy)
> @@ -15,7 +15,7 @@
>   * limitations under the License.
>   */
>  
> -/* $Id$ */
> +/* $Id: CommonHyphenation.java 1610839 2014-07-15 20:25:58Z vhennebert $ */
>  
>  package org.apache.fop.fo.properties;
>  
> @@ -184,6 +184,16 @@
>   */
>  public int getHyphIPD(org.apache.fop.fonts.Font font) {
>  char hyphChar = getHyphChar(font);
> +
> +if (hyphChar == '\u00ad') {
> +  // Bizarre fix, defining the SHY as default hyphenation character 
> in the FOPropertyMapping, leads 
> +  // to hard hyphens not selectable in the PDF reader.
> +  //
> +  // Mapping also the hard hyphen makes the character selectable!
> +  font.mapChar('\u002d');  
> +}  
> +
> +
>  return font.getCharWidth(hyphChar);
>  }
>  
> Index: src/main/java/org/apache/fop/fo/FOPropertyMapping.java
> ===
> --- src/main/java/org/apache/fop/fo/FOPropertyMapping.java(revision 
> 190759)
> +++ src/main/java/org/apache/fop/fo/FOPropertyMapping.java(working copy)
> @@ -1106,7 +1106,10 @@
>  // hyphenation-character
>  m  = new CharacterProperty.Maker(PR_HYPHENATION_CHARACTER);
>  m.setInherited(true);
> -m.setDefault("-");
> +//
> +// m.setDefault("-");
> +m.setDefault("\u00ad");
> +
>  addPropertyMaker("hyphenation-character", m);
>  
>  // hyphenation-push-character-count
> Index: src/main/java/org/apache/fop/render/pdf/PDFPainter.java
> ===
> --- src/main/java/org/apache/fop/render/pdf/PDFPainter.java   (revision 
> 190759)
> +++ src/main/java/org/apache/fop/render/pdf/PDFPainter.java   (working copy)
> @@ -420,7 +420,8 @@
>  PDFStructElem structElem = (PDFStructElem) 
> getContext().getStructureTreeElement();
>  languageAvailabilityChecker.checkLanguageAvailability(text);
>  MarkedContentInfo mci = 
> logicalStructureHandler.addTextContentItem(structElem);
> -String actualText = getContext().isHyphenated() ? 
> text.substring(0, text.length() - 1) : null;
> +//String actualText = getContext().isHyphenated() ? 
> text.substring(0, text.length() - 1) : null;
> +String actualText = null;
>  generator.endTextObject();
>  generator.updateColor(state.getTextColor(), true, null);
>  generator.beginTextObject(mci.tag, mci.mcid, actualText);
> @@ -490,6 +491,15 @@
>  float glyphAdjust = 0;
>  if (font.hasCodePoint(orgChar)) {
>  ch = font.mapCodePoint(orgChar);
> + if (orgChar == 
> '\u00ad' && ch 

[jira] [Commented] (FOP-2880) [PATCH] Hyphenated words are not searchable in readers (when accessibilty is turned on)

2019-10-22 Thread Chris Bowditch (Jira)


[ 
https://issues.apache.org/jira/browse/FOP-2880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16956890#comment-16956890
 ] 

Chris Bowditch commented on FOP-2880:
-

The patch seems like a hack to me. I know you describe it as a prototype, but 
why not just fix CodePointMapping? This is generated code from glyphlist.xml 
file. This is based on a very old Adobe Glyphlist.txt file, and SoftHyphen 
appear absent from glyphlist.xml, with 0x00AD being mapped instead to hyphen. 
But softhyphen is correctly defined in the latest Adobe Glyphlist.txt. So 
updating glyphlist.xml and possibly encoding.xml to resolve this seems like the 
way to resolve this properly

> [PATCH] Hyphenated words are not searchable in readers (when accessibilty is 
> turned on)
> ---
>
> Key: FOP-2880
> URL: https://issues.apache.org/jira/browse/FOP-2880
> Project: FOP
>  Issue Type: Bug
>  Components: unqualified
>Affects Versions: 2.3
>Reporter: Dan Caprioara
>Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> The hyphenated words are rendered by FOP using the hard hyphen character. 
> This contradicts the PDF specification, where in section:
> 14.8.2.2.3 Incidental Artifacts
> clearly states that the SHY  soft hyphen U+00AD character should be used. 
> The effect is that the hyphenated words are not searchable, and the 
> copy/paste feature includes also the hard hyphens, instead of removing them 
> and joining the words pieces together.
> Here is a small patch that can be applied on the FOP core project in order to 
> fix this - this is more like a proof of concept, the real fix would be to 
> change the default hyphenation character * in the FOProppertyMapping and to 
> change the font mappings: to remove the replacement of the SHY with the  
> HYPHEN, see the org.apache.fop.fonts.CodePointMapping.encStandardEncoding, 
> and org.apache.fop.fonts.CodePointMapping.encISOLatin1Encoding
> {code}
>0xad, 0x002D, // hyphen
>0xad, 0x00AD, // hyphen
> {code}
> The patch:
> {code}
> Index: src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java
> ===
> --- src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java 
> (revision 191037)
> +++ src/main/java/org/apache/fop/fo/properties/CommonHyphenation.java 
> (working copy)
> @@ -15,7 +15,7 @@
>   * limitations under the License.
>   */
>  
> -/* $Id$ */
> +/* $Id: CommonHyphenation.java 1610839 2014-07-15 20:25:58Z vhennebert $ */
>  
>  package org.apache.fop.fo.properties;
>  
> @@ -184,6 +184,16 @@
>   */
>  public int getHyphIPD(org.apache.fop.fonts.Font font) {
>  char hyphChar = getHyphChar(font);
> +
> +if (hyphChar == '\u00ad') {
> +  // Bizarre fix, defining the SHY as default hyphenation character 
> in the FOPropertyMapping, leads 
> +  // to hard hyphens not selectable in the PDF reader.
> +  //
> +  // Mapping also the hard hyphen makes the character selectable!
> +  font.mapChar('\u002d');  
> +}  
> +
> +
>  return font.getCharWidth(hyphChar);
>  }
>  
> Index: src/main/java/org/apache/fop/fo/FOPropertyMapping.java
> ===
> --- src/main/java/org/apache/fop/fo/FOPropertyMapping.java(revision 
> 190759)
> +++ src/main/java/org/apache/fop/fo/FOPropertyMapping.java(working copy)
> @@ -1106,7 +1106,10 @@
>  // hyphenation-character
>  m  = new CharacterProperty.Maker(PR_HYPHENATION_CHARACTER);
>  m.setInherited(true);
> -m.setDefault("-");
> +//
> +// m.setDefault("-");
> +m.setDefault("\u00ad");
> +
>  addPropertyMaker("hyphenation-character", m);
>  
>  // hyphenation-push-character-count
> Index: src/main/java/org/apache/fop/render/pdf/PDFPainter.java
> ===
> --- src/main/java/org/apache/fop/render/pdf/PDFPainter.java   (revision 
> 190759)
> +++ src/main/java/org/apache/fop/render/pdf/PDFPainter.java   (working copy)
> @@ -420,7 +420,8 @@
>  PDFStructElem structElem = (PDFStructElem) 
> getContext().getStructureTreeElement();
>  languageAvailabilityChecker.checkLanguageAvailability(text);
>  MarkedContentInfo mci = 
> logicalStructureHandler.addTextContentItem(structElem);
> -String actualText = getContext().isHyphenated() ? 
> text.substring(0, text.length() - 1) : null;
> +//String actualText = getContext().isHyphenated() ? 
> text.substring(0, text.length() - 1) : null;
> +String actualText = null;
>  generator.endTextObject();
>