Re: Japanese Hyphenation was: Re: hyphenation patterns
Konnichiwa. On Fri, 07 Mar 2003 20:22:36 +0100 , J.Pietschmann wrote: Hm. I don't read japanese :-/ JIS X 4051 illustrates line-breaking, justification, writing-mode, letter-spacing, ruby, etc. for Japanese text processing. CSS3 module:text is useful to understand these features in English. http://www.w3.org/TR/css3-text/ This document is probably same as JIS X 4051. Following section is espeically useful for line-breaking. 6. Line breaking 11.2. Hanging punctuation: the 'hanging-punctuation' property Another useful document is following book. http://www.oreilly.com/catalog/cjkvinfo/index.html CJKV Information Processing Chinese, Japanese, Korean Vietnamese Computing By Ken Lunde 1st Edition December 1998 1-56592-224-7, Order Number: 2247 1125 pages Certainly, many japanese people wish that FOP will implement it, but the Japanese Tex hypenation file does not work with current FOP. What's the reason for this? I got the impression both the Japanese and the Chinese TeX versions patched also the TeX source in order to adapt to their respective line breaking rules. I'm not sure how relevant this is to hyphenation. Current FOP can not control any line breaking restrictions. The Asian languages line-breaking strategy has different controls from those of western text. In Japanese, this restriction is called 'kinsoku'. A set of kinsoku character is Open Punctuation, Close Punctuation and Ambiguous Quotation defined in UAX#14. For example, you must not layout U+300C (LEFT CORNER BRACKET) categorized in OP at the end of line and U+3002 (IDEOGRAPHIC FULL STOP) categorized in CP at the head of line. These restriction is estimated at each end of line where is same point as the western soft-hyphenate estimation (i.e. break opportunity estimation). Can FOP currently control these restrictions without any modification? If can, it is my misunderstanding and Japanese Tex hypenation file can use it. But if can not, FOP must implements this feature to use Japanese Tex hypenation file. I think that the cost to implement JIS X 4051 line breaking algorithm is almost equivalent to implement TR14. So I suggested to implement TR14. This is planned for HEAD. The TR14 rules for CJK hyphenation seems to be easy: in absence of any more complicated requirements, hyphenate after every full character. Does the above mentioned standard add such more complicated rules which TR14 does not care too much about? There is no more complicated rules for line-breaking. CSS3 module:text says following :-) http://www.w3.org/TR/css3-text/#line-break-prop | The rules described by JIS X-4051 have been superseded by | the Unicode Technical Report #14. JIS X 4051 line-breaking and TR14 is almost equivalent. In addition, TR14 can use for CJKV and any language with single Unicode Line-Break-Properties file! --- Satoshi Ishigami VIC TOKAI CORPORATION - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: Japanese Hyphenation was: Re: hyphenation patterns
Konnichiwa. On Thu, 06 Mar 2003 22:32:10 +0100 , J.Pietschmann wrote: On a related matter: some time ago someone mentioned the japanese hyphenation standard. I was not able to find the document, probably all web sites dealing with this are in japanese. Is there anybody listening who can help out? I wrote it in the past. http://marc.theaimsgroup.com/?l=fop-devm=102992807207069w=2 The JIS X 4051 spec is written in Japanese. I don't know whether there is English version spec, or not. Certainly, many japanese people wish that FOP will implement it, but the Japanese Tex hypenation file does not work with current FOP. I think that FOP should implements UAX#14(TR14) if possible. For example, AntennaHouse's XSLFormatter implements UAX#14. UAX#14, Line Breaking Properties http://www.unicode.org/reports/tr14/ Old discussion related with TR14 in fop-dev is http://marc.theaimsgroup.com/?l=fop-devw=2r=1s=tr14q=b --- Satoshi Ishigami VIC TOKAI CORPORATION - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: FOP - about the Hyphenation!
On Fri, 16 Aug 2002 14:40:9 +0800 , stoneson wrote: 1: Question: how can i get the TeX hyphenation pattern file ? and how to turn it into an xml file?? 2: I am a Chinese . how can I to get the Chinese version Hyphenation file for FOP?? It may be difficult to contorol Asian line-breaking on current FOP simply. If you will be able to create a Chinese hyphenation pattern file, FOP will NOT behave as your expectation. This reason is the Asian languages line-breaking strategy has different controls from those of western text. (For examples, JIS X 4051 spec defines Japanese controls). The XSL Requirements for Internationalization is written in a following document (a little old :-). XSL Requirements Summary (W3C Working Draft 11-May-1998) http://www.w3.org/TR/WD-XSLReq I think that existence of this document means current XSL spec does not cover Asian languages line-breaking and justifications. By the way, CSS3 module: text (W3C Working Draft 17 May 2001) http://www.w3.org/TR/css3-text/ supports them and illustrate in detail. There is a Japanese line-breaking algorithm named 'Kinsoku'. Kinsoku consists of 'Head of line Japanese hyphenation' and 'End of line Japanese hyphenation'. The Japanese version of Tex implements Kinsoku. This implementation can get from ftp://ftp.ascii.co.jp/pub/TeX/ascii-ptex/tetex/ptex-texmf-2.0.tar.gz The kinsoku.tex file in this tar ball is equal to Japanese TeX hyphenation pattern file for Kinsoku. If you want the better line-breaking handling for CJKV, you must probably implement it on FOP in yourself same as Japanese version of Tex. I think if you do not implement it, these controls probably depends on FOP that is UserAgent. For examples, when text-align=justified is specified in fo documents that root element has language=zh/ja/ko/vi, currently FOP behaves like CSS3's text-justify=inter-ideograph. Regards, --- Satoshi Ishigami VIC TOKAI CORPORATION - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: [ANNOUNCEMENT] FOP 0.20.4 released
Konnichiwa Christian-san. On Mon, 08 Jul 2002 10:04:19 +0200 , Christian Geisert wrote: Because the documentation generation is broken in the maintenance branch (stylebook needs xerces1) It is mentioned in the release notes ;-) ...snip...x8...snip...x8... I want to get xml-docs for every release. Can I get it anywhere else? From CVS trunk (tag fop-0_20_4-doc) AFAIK older source release include xml-docs. OK. I will try to checkout with its tag. Thanks, --- Satoshi Ishigami VIC TOKAI CORPORATION - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: [ANNOUNCEMENT] FOP 0.20.4 released
Konnichiwa. On Sun, 07 Jul 2002 21:39:21 +0200 , Christian Geisert wrote: the FOP team is pleased to announce the release of FOP 0.20.4 Binary and source distributions are available at: http://xml.apache.org/dist/fop Why is not xml-docs included to src distribution? I thought the elimination of xml-docs to be only bin distribution. http://marc.theaimsgroup.com/?t=10250420581r=1w=2 http://marc.theaimsgroup.com/?t=10239848503r=1w=2 Now I am translating the all of FOP documents into Japanese. I want to get xml-docs for every release. Can I get it anywhere else? BTW the ReleaseNotes.html file in root directory is mispointed to html-docs/relnotes.html and its information is still 0.20.3. Thanks. --- Satoshi Ishigami VIC TOKAI CORPORATION - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: [ANNOUNCEMENT] FOP 0.20.3 Release Candidate 2 available
Hi, Christian. On Thu, 21 Feb 2002 17:30:32 +0100 , Christian Geisert wrote: the second Release Candidate for 0.20.3 (Maintenance release) is finally available at http://xml.apache.org/dist/fop for downloading and testing. Great !! - Improved i18n support for AWT viewer (Japanese dialogs) Submitted by: Satoshi Ishigami ([EMAIL PROTECTED]) I checked my patche. The resources.ja file which included my posted patch has NOT included in 0.20.3rc2. I repost only org/apache/fop/viewer/resources/resources.ja file. Please commit this file. Regards. --- Satoshi Ishigami VIC TOKAI CORPORATION resources.ja Description: Binary data - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: JDK 1.4 and fonts
see below site: Endorsed Standards Override Mechanism http://java.sun.com/j2se/1.4/docs/guide/standards/index.html JDK1.4 includes such as Xalan-J 2.2.D11 in rt.jar by default, so current sh/bat script or build.xml may be not able to build FOPwhen you use with JDK1.4 . Regards. --- Satoshi Ishigami VIC TOKAI CORPORATION On Fri, 22 Feb 2002 00:09:18 -0500 , Christopher Burkey wrote: Hi, Has anyone been able to compile FOP with JDK 1.4? It gets this far in the build process: in: ./build/src/codegen/extproperties.xml style: ./build/src/codegen/enumgen.xsl out: ./build/src/org/apache/fop/fo/properties/extenums_ignore_this.java [style] Transforming into C:\src-workspaces\per\xml-fop\build\src\org\apache\fop\render\pdf [style] Loading stylesheet C:\src-workspaces\per\xml-fop\.\build\src\codegen\code-point-mapping.xsl [style] Processing C:\src-workspaces\per\xml-fop\build\src\codegen\encodings.xml to C:\src-workspaces\per\xml-fop\build\src\org\apache\fop\render\pdf\CodePointM apping.java [style] Failed to process C:\src-workspaces\per\xml-fop\build\src\codegen\encodings.xml My end goal is to improve the appearance of font spacing, within the AWT Renderer, using JDK 1.4's improved font handling. I have compiled with 1.3 then tested the AWT renderer under 1.4. I think there are improvements with fonts spacing over JDK 1.3 but its still not perfect. I have tried to comment out this line: // space is rendered larger than given by // the FontMetrics object // if (i = 32) // w = (int)(1.4 * fmt.charWidth(i) * FONT_FACTOR); // else w = (int)(fmt.charWidth(i) * FONT_FACTOR); But that did nothing. Attached is a fo file that displays the problems with font space. In my attached file you can change the font size and it looks much better. If anyone knows what it could be please let me know. BTW: This is all using 0.20.3 Under Windows 2000, JDK 1.4 final. Thanks! _ Christopher Burkey[EMAIL PROTECTED] President513-542-3401 eInnovation Inc. http://einnovation.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
Re: cvs commit: xml-fop/docs/xml-docs/fop fonts.xml
On 18 Feb 2002 09:01:50 - , [EMAIL PROTECTED] wrote: keiron 02/02/18 01:01:50 Modified:docs/xml-docs/fop fonts.xml Log: some more font embedding info ...snip...x8...snip...x8... +notep +If you do not want the font embedded in the PDF then remove the +embed-file attribute. The PDF will then contain text using +the font with the font metrics and to view it properly the +font will need to be installed where it is being viewed. +/p/note This feature does not work correctly when using any CIDFonts because specified CIDFont's CMap is wrong. When no font are embeded in PDF, the PDF viewer, such as Adobe AcrobatReader, try to use the host operating system installed font. Then character code is mapped to glyph data by using CMaps through some steps. For more details, see Adobe CMap and CID Font Files Specification Version 1.0 http://partners.adobe.com/asn/developer/pdfs/tn/5014.CMap_CIDFont_Spec.pdf I explain and assume that the CIDFontType2 is used. I illustrate with a simple figure below: 1st mapping2nd mapping character code - CID - glyph data CMap CIDFont resource resource In fact, a little more complicated mapping may be occured. There are two kinds of mapping. The one is to map the character code in PDF to CID and the another is to map the CID to glyph data in the used font. First, the 1st mapping is done. When generating PDF without embedding any font, FOP currently use TrueType font's cmap glyph id as character code. This information is contained in the font metrics file. The CMap that used at 1st mapping is specified at the Encoding entry in the Type0 Font Dictionary (see PDF1.4 spec, 5.6.5 Type0 Font Dictionaries section, p.353). There are many Encoding value as described in PDF1.4 spec. (see 5.6.4 CMaps section, p.342). FOP uses Identity-H encoding as fixed value. This is implemented in org.apache.fop.render.pdf.fonts.MultiByteFont class. The Identity-H encoding does not convert any character code into CID. Therefore the mapped CID is equivalent with character code (This code is the TrueType cmap glyph id). Next, the 2nd mapping is peformed. This mapping is based on CIDSystemInfo Dictionary (see PDF1.4 spec, 5.6.2 CIDSystemInfo Dictionaries section, p.336). For example, if the CID is Adobe-Japan1-2 character collection, the CIDSystemInfo must specify: /CIDSystemInfo /Registry (Adobe)/Ordering (Japan1)/Supplement 2 As mentioned above, the CID handled by FOP depends on each font. Currently FOP specifies CIDSystemInfo dictionary as following: /CIDSystemInfo /Registry (Adobe)/Ordering (UCS)/Supplement 0 This CIDSystemInfo does not the pre-registered one (For more details about ToUnicode, see PDF1.4 spec, 5.9 ToUnicode CMaps section, p.368). If I remember correctly, this CIDSystemInfo is used with ToUnicode CMaps in about FOP-0.18.0. In currently FOP, however, the feature of ToUnicode is commented out. So, the 2nd mapping could now work correctly (The generated PDF is not readable and is not viewable). Thus this CIDSystemInfo is WRONG !!! In my experimental investigation, the following CIDSystemInfo is work correctly when no font embedding. /CIDSystemInfo /Registry (Adobe)/Ordering (Identity)/Supplement 0 Since a few months ago, I knew this problem. However I did not report this to here because my solution is experimental one. I looked for some document that proved my solution. The CIDToGIDMap in CIDFont dictionary (PDF1.4 spec, 5.6.3 CIDFonts section, p.339) is most nearest, but the association of CIDSystemInfo are not written there... The value of the Ordering entry is also fixed and the getOrdering() method is implemented org.apache.fop.render.pdf.fonts.MultiByteFont class. If FOP supports to generate PDF with no font embedding, I suggest to use my represented CIDSystemInfo on current FOP's font handling architecture. Please check on your environment and point it out if you noticed my misunderstanding or any mistakes :-) Sorry if my english is bad. Thanks. --- Satoshi Ishigami VIC TOKAI CORPORATION - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
i18n in awt viewer [PATCH]
Hi all ! For AWTViewer i18n, I hacked the AWTViewer for FOP-0.23.0rc maintenance release and I had some questions. I tried to show the menu as Japanese text for i18n for trial. I wrote the resource files by UTF-8 encoding because it is not possible to map multi-bytes characters by iso-8859-1 encoding. I also convert all existed resource file for AWTViewer from iso-8859-1 to UTF-8 automatically. I modified some source files associated with AWTViewer, too. This modification is for the message i18n and for additional fonts support on AWTViewer. The additional fonts are specified in a userconfig.xml file. And you may start FOP with -c and -awt options. NOTE:Now the additional fonts that you can specified is only TrueType font (.ttf and .ttc). This restriction comes from the feature of SUN's JDK (see java.awt.Font javadoc). 1) If you specify the embed-file attribute in userconfig.xml and use JDK1.3 or higher, the specified TrueType font is loaded and used in AWTViewer. 2) Otherwise, if you don't specify the embed-file attribute or use JDK1.2, the additional font is regarded as Java's "Dialog" logical font name for each Locale. I attempt to show Japanese fo file. My test environment is: SUN's JDK1.2.2 and JDK1.3. LANG=EN and JA environment variable. When use LANG=JA, 1) and 2) are works fine (menu is Japanese text and the represented document is readable). LANG=EN and 1) also works fine (menu is English text and document readable). But the convination of LANG=EN and 2) does not works fine (menu looks English text well but document is unreadble). However I think that this behavior is right. Below is my questions. Currently the text of menu for the AWTViewer is loaded from org.apache.fop.viewer.resources package. The AWTViewer needs two kinds of resource file (messages.lang and resources.lang). I can not found messages.lang file for some languages (fi,fr, it,pl.ru). Therefore the following command can not start the AWTViewer (thrown NPE). fop -awt -l fi foo.fo Q1.Is not messages.lang file necessary with resources.lang file? The language to use on AWTViewer is decided by getting the system default or specifying -l option. The decided language is used for the suffix of resource files. However Java has a feature that access to a resource file for every language by default (i.e. java.util.ResourceBundle class and for examples rsources_ja.properties). Currently AWTViewer regards resource files as written by iso-8859-1 encoding. This is not good for i18n. I converted them to UTF-8 encoding, but I think that the ResourceBundle framework is better than now because AWTViewer can start even if there are not resource files for any languages. Q2.Why AWTViewer does not use ResourceBundle? Best Regards. --- 石神 覚司(Satoshi Ishigami) VIC TOKAI CORPORATION patch.tar.gz Description: Binary data - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
i18n in TXTRenderer
Hi . I hacked the TXTRenderer for i18n. Currently the org.apache.fop.render.pcl.PCLStream class is used as OutputStream in TXTRenderer. The add method in PCLStream calss is as below: public void add(String str) { if (!doOutput) return; byte buff[] = new byte[str.length()]; int countr; int len = str.length(); for (countr = 0; countr len; countr++) buff[countr] = (byte)str.charAt(countr); try { out.write(buff); } catch (IOException e) { // e.printStackTrace(); // e.printStackTrace(System.out); throw new RuntimeException(e.toString()); } } I think that this algorithm is wrong for the character 127. This reason is that the literal length of char is 2 bytes and the literal length of byte is 1 byte. To avoid this problem, I think that the following algorithm is better than now. public void add(String str) { if (!doOutput) return; try { byte buff[] = str.getBytes(UTF-8); out.write(buff); } catch (IOException e) { throw new RuntimeException(e.toString()); } } This algorithm may be not good for PCLRenderer because I don't know whether the PCL printer supports the UTF-8 encoding or not. However I think that the TXTRenderer could use the multilingualable encoding because it is possible to include some languages in a same single fo file. Therere I consider that the TXTRenderer should not use the PCLStream and had better use original OutputStream (such as TXTStream). Will my thought be wrong? Best Regards. --- Satoshi Ishigami VIC TOKAI CORPORATION - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]
[PATCH] Japanese line breaking
Hi, All! I found that current line-breaking algorithm was not fine for some situations when specified language=ja, such as using fo:basic-link, lt;gt;, and spacing. I am attaching the sample fo file(test.fo) and resulting pdf (test.pdf) for current version in this mail. The attached patch works fine (it may be slightly redundant). This algorithm separates words as following: AIUEO Hello World - |A|I|U|E|O| Hello| World| I suppose A,I,U,E and O to be Japanese here. Best Regards. --- Satoshi Ishigami VIC TOKAI CORPORATION test.tar.gz Description: GNU Zip compressed data - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]