[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-06-08 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: out-1.pdf

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: 3-fonts-copy-paste-result.png, 3-fonts-fop.xconf, 
> 3-fonts-latn-ligatures-FOP.fo, 3-fonts-latn-ligatures-FOP.pdf, Screenshot 
> 2022-06-07 092013.png, Screenshot 2022-06-08 074532.png, fop-1.xconf, 
> fop-2.xconf, fop.xconf, image-2022-05-31-15-50-26-058.png, 
> image-2022-05-31-15-50-39-029.png, image-2022-05-31-15-52-01-435.png, 
> image-2022-06-07-15-31-01-526.png, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out-1.pdf, out.pdf, test-1.fo, test-2.fo, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-06-08 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: fop-1.xconf
test-1.fo

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: 3-fonts-copy-paste-result.png, 3-fonts-fop.xconf, 
> 3-fonts-latn-ligatures-FOP.fo, 3-fonts-latn-ligatures-FOP.pdf, Screenshot 
> 2022-06-07 092013.png, Screenshot 2022-06-08 074532.png, fop-1.xconf, 
> fop.xconf, image-2022-05-31-15-50-26-058.png, 
> image-2022-05-31-15-50-39-029.png, image-2022-05-31-15-52-01-435.png, 
> image-2022-06-07-15-31-01-526.png, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test-1.fo, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-06-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Hönings updated FOP-2701:

Attachment: Screenshot 2022-06-08 074532.png

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: 3-fonts-copy-paste-result.png, 3-fonts-fop.xconf, 
> 3-fonts-latn-ligatures-FOP.fo, 3-fonts-latn-ligatures-FOP.pdf, Screenshot 
> 2022-06-07 092013.png, Screenshot 2022-06-08 074532.png, fop.xconf, 
> image-2022-05-31-15-50-26-058.png, image-2022-05-31-15-50-39-029.png, 
> image-2022-05-31-15-52-01-435.png, image-2022-06-07-15-31-01-526.png, 
> latn-ligatures-Antenna-House.pdf, latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-06-07 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Hönings updated FOP-2701:

Attachment: Screenshot 2022-06-07 092013.png

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Assignee: J Frank
>Priority: Major
> Attachments: 3-fonts-copy-paste-result.png, 3-fonts-fop.xconf, 
> 3-fonts-latn-ligatures-FOP.fo, 3-fonts-latn-ligatures-FOP.pdf, Screenshot 
> 2022-06-07 092013.png, fop.xconf, image-2022-05-31-15-50-26-058.png, 
> image-2022-05-31-15-50-39-029.png, image-2022-05-31-15-52-01-435.png, 
> latn-ligatures-Antenna-House.pdf, latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-06-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Hönings updated FOP-2701:

Attachment: 3-fonts-latn-ligatures-FOP.fo
3-fonts-latn-ligatures-FOP.pdf
3-fonts-copy-paste-result.png

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: 3-fonts-copy-paste-result.png, 3-fonts-fop.xconf, 
> 3-fonts-latn-ligatures-FOP.fo, 3-fonts-latn-ligatures-FOP.pdf, fop.xconf, 
> image-2022-05-31-15-50-26-058.png, image-2022-05-31-15-50-39-029.png, 
> image-2022-05-31-15-52-01-435.png, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-06-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martin Hönings updated FOP-2701:

Attachment: 3-fonts-fop.xconf

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: 3-fonts-copy-paste-result.png, 3-fonts-fop.xconf, 
> 3-fonts-latn-ligatures-FOP.fo, 3-fonts-latn-ligatures-FOP.pdf, fop.xconf, 
> image-2022-05-31-15-50-26-058.png, image-2022-05-31-15-50-39-029.png, 
> image-2022-05-31-15-52-01-435.png, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-05-31 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: image-2022-05-31-15-52-01-435.png

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: fop.xconf, image-2022-05-31-15-50-26-058.png, 
> image-2022-05-31-15-50-39-029.png, image-2022-05-31-15-52-01-435.png, 
> latn-ligatures-Antenna-House.pdf, latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-05-31 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: image-2022-05-31-15-50-39-029.png

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: fop.xconf, image-2022-05-31-15-50-26-058.png, 
> image-2022-05-31-15-50-39-029.png, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-05-31 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: image-2022-05-31-15-50-26-058.png

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: fop.xconf, image-2022-05-31-15-50-26-058.png, 
> image-2022-05-31-15-50-39-029.png, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-05-10 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: fop.xconf

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: fop.xconf, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-05-10 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: (was: fop.xconf)

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Updated] (FOP-2701) Some of the latin ligatures make text not searchable in PDF

2022-05-10 Thread J Frank (Jira)


 [ 
https://issues.apache.org/jira/browse/FOP-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J Frank updated FOP-2701:
-
Attachment: test.fo
fop.xconf
out.pdf

> Some of the latin ligatures make text not searchable in PDF
> ---
>
> Key: FOP-2701
> URL: https://issues.apache.org/jira/browse/FOP-2701
> Project: FOP
>  Issue Type: Bug
>  Components: font/opentype
>Affects Versions: 2.1
> Environment: Windows 10, Calibri font.
>Reporter: Dan Caprioara
>Priority: Major
> Attachments: fop.xconf, latn-ligatures-Antenna-House.pdf, 
> latn-ligatures-FOP.pdf, out.pdf, test.fo
>
>
> This problem happens using the Calibri font, that is packed in the MS Office 
> suite and Windows 10.
> I tested with the following text: {{file settings}}. 
> The resulted PDF text contains ligatures: {{(fi)le se(tti)ngs}}
> Searching for {{file}} in Acrobat Reader results in the first word being 
> selected. This is Ok. But searching for {{set}}, or {{settings}} gives no 
> results. 
> The same example, run with Antenna House works fine, you get results when 
> searching for {{settings}}.
> Here is the complete FO file:
> {code:xml}
> 
> http://www.w3.org/1999/XSL/Format;>
> 
> 
> 
> 
> 
> 
> 
> file 
> settings
> 
> 
> 
> {code}
> Some considerations:
> # A workaround would be to reject all the substitutions that are not part of 
> org.apache.fop.fonts.type1.AdobeStandardEncoding. This would leave the (fi) 
> ligature, but reject the (tti) one. But this seems to work only for Calibri 
> and not for Roboto!!
> # I think there might be some issues with the font embedding, and some 
> substitution mapping data is lost. It is just a guess, I am not sure how PDF 
> deals with substitutions.
> I know that setting in FO xml:lang to "en" disables the ligatures, but is not 
> a solution for my project. I would appreciate any suggestions.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)