Re: Japanese Hyphenation was: Re: hyphenation patterns

2003-03-10 Thread Satoshi Ishigami

Konnichiwa.

On Fri, 07 Mar 2003 20:22:36 +0100 , J.Pietschmann wrote:

 Hm. I don't read japanese :-/

JIS X 4051 illustrates line-breaking, justification, writing-mode,
letter-spacing, ruby, etc. for Japanese text processing.

CSS3 module:text is useful to understand these features in English.
http://www.w3.org/TR/css3-text/

This document is probably same as JIS X 4051. Following section is
espeically useful for line-breaking.

6. Line breaking
11.2.  Hanging punctuation: the 'hanging-punctuation' property

Another useful document is following book.

http://www.oreilly.com/catalog/cjkvinfo/index.html
CJKV Information Processing
Chinese, Japanese, Korean  Vietnamese Computing
By Ken Lunde
1st Edition December 1998
1-56592-224-7, Order Number: 2247
1125 pages



  Certainly, many japanese people wish that FOP will implement it,
  but the Japanese Tex hypenation file does not work with current 
  FOP.

 What's the reason for this? I got the impression both the Japanese
 and the Chinese TeX versions patched also the TeX source in order
 to adapt to their respective line breaking rules. I'm not sure
 how relevant this is to hyphenation.

Current FOP can not control any line breaking restrictions.

The Asian languages line-breaking strategy has different
controls from those of western text. In Japanese, this
restriction is called 'kinsoku'.

A set of kinsoku character is Open Punctuation, Close Punctuation
and Ambiguous Quotation defined in UAX#14.

For example, you must not layout U+300C (LEFT CORNER BRACKET)
categorized in OP at the end of line and U+3002 (IDEOGRAPHIC
FULL STOP) categorized in CP at the head of line.

These restriction is estimated at each end of line where is same
point as the western soft-hyphenate estimation (i.e. break opportunity
estimation).

Can FOP currently control these restrictions without any modification?
If can, it is my misunderstanding and Japanese Tex hypenation file
can use it. But if can not, FOP must implements this feature to use
Japanese Tex hypenation file.

I think that the cost to implement JIS X 4051 line breaking
algorithm is almost equivalent to implement TR14. So I suggested
to implement TR14.


 This is planned for HEAD. The TR14 rules for CJK hyphenation seems
 to be easy: in absence of any more complicated requirements,
 hyphenate after every full character. Does the above mentioned
 standard add such more complicated rules which TR14 does not
 care too much about?

There is no more complicated rules for line-breaking.

CSS3 module:text says following :-)

http://www.w3.org/TR/css3-text/#line-break-prop
| The rules described by JIS X-4051 have been superseded by
| the Unicode Technical Report #14.

JIS X 4051 line-breaking and TR14 is almost equivalent.
In addition, TR14 can use for CJKV and any language with single
Unicode Line-Break-Properties file!

---
Satoshi Ishigami   VIC TOKAI CORPORATION

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



Re: Japanese Hyphenation was: Re: hyphenation patterns

2003-03-06 Thread Satoshi Ishigami

Konnichiwa.

On Thu, 06 Mar 2003 22:32:10 +0100 , J.Pietschmann wrote:

 On a related matter: some time ago someone mentioned the japanese
 hyphenation standard. I was not able to find the document, probably
 all web sites dealing with this are in japanese. Is there anybody
 listening who can help out?

I wrote it in the past.
http://marc.theaimsgroup.com/?l=fop-devm=102992807207069w=2

The JIS X 4051 spec is written in Japanese. I don't know whether
there is English version spec, or not.

Certainly, many japanese people wish that FOP will implement it,
but the Japanese Tex hypenation file does not work with current 
FOP.

I think that FOP should implements UAX#14(TR14) if possible.
For example, AntennaHouse's XSLFormatter implements UAX#14.

UAX#14, Line Breaking Properties
http://www.unicode.org/reports/tr14/
Old discussion related with TR14 in fop-dev is
http://marc.theaimsgroup.com/?l=fop-devw=2r=1s=tr14q=b

---
Satoshi Ishigami   VIC TOKAI CORPORATION

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]



Re: FOP - about the Hyphenation!

2002-08-21 Thread Satoshi Ishigami


On Fri, 16 Aug 2002 14:40:9 +0800 , stoneson wrote:

 1: Question: how can i get the TeX hyphenation pattern file ? and how to turn it 
into an xml file??

 2: I am a Chinese . how can I to get the Chinese version Hyphenation file for FOP??

It may be difficult to contorol Asian line-breaking on
current FOP simply. If you will be able to create
a Chinese hyphenation pattern file, FOP will NOT behave
as your expectation.

This reason is the Asian languages line-breaking strategy
has different controls from those of western text.
(For examples, JIS X 4051 spec defines Japanese controls).


The XSL Requirements for Internationalization is written in
a following document (a little old :-).

XSL Requirements Summary (W3C Working Draft 11-May-1998)
http://www.w3.org/TR/WD-XSLReq

I think that existence of this document means current XSL spec
does not cover Asian languages line-breaking and justifications.
By the way,

CSS3 module: text (W3C Working Draft 17 May 2001)
http://www.w3.org/TR/css3-text/

supports them and illustrate in detail.


There is a Japanese line-breaking algorithm named 'Kinsoku'.
Kinsoku consists of 'Head of line Japanese hyphenation' and
'End of line Japanese hyphenation'. The Japanese version of
Tex implements Kinsoku. This implementation can get from

ftp://ftp.ascii.co.jp/pub/TeX/ascii-ptex/tetex/ptex-texmf-2.0.tar.gz

The kinsoku.tex file in this tar ball is equal to Japanese
TeX hyphenation pattern file for Kinsoku. If you want the better
line-breaking handling for CJKV, you must probably implement
it on FOP in yourself same as Japanese version of Tex.


I think if you do not implement it, these controls probably
depends on FOP that is UserAgent.

For examples, when text-align=justified is specified in
fo documents that root element has language=zh/ja/ko/vi,
currently FOP behaves like CSS3's text-justify=inter-ideograph.

Regards,

---
Satoshi Ishigami   VIC TOKAI CORPORATION

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: [ANNOUNCEMENT] FOP 0.20.4 released

2002-07-08 Thread Satoshi Ishigami


Konnichiwa Christian-san.

On Mon, 08 Jul 2002 10:04:19 +0200 , Christian Geisert wrote:

 Because the documentation generation is broken in the maintenance
 branch (stylebook needs xerces1)
 It is mentioned in the release notes ;-)
...snip...x8...snip...x8...
  I want to get xml-docs for every release. Can I get it anywhere 
  else?
 
  From CVS trunk (tag fop-0_20_4-doc)
 AFAIK older source release include xml-docs.

OK. I will try to checkout with its tag.

Thanks,

---
Satoshi Ishigami   VIC TOKAI CORPORATION

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: [ANNOUNCEMENT] FOP 0.20.4 released

2002-07-07 Thread Satoshi Ishigami


Konnichiwa.

On Sun, 07 Jul 2002 21:39:21 +0200 , Christian Geisert wrote:

 the FOP team is pleased to announce the release of FOP 0.20.4
 
 Binary and source distributions are available at: 
 http://xml.apache.org/dist/fop

Why is not xml-docs included to src distribution?
I thought the elimination of xml-docs to be only bin 
distribution.

http://marc.theaimsgroup.com/?t=10250420581r=1w=2
http://marc.theaimsgroup.com/?t=10239848503r=1w=2

Now I am translating the all of FOP documents into Japanese.
I want to get xml-docs for every release. Can I get it anywhere 
else?

BTW the ReleaseNotes.html file in root directory is
mispointed to html-docs/relnotes.html and its information
is still 0.20.3.

Thanks.

---
Satoshi Ishigami   VIC TOKAI CORPORATION

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: [ANNOUNCEMENT] FOP 0.20.3 Release Candidate 2 available

2002-02-21 Thread Satoshi Ishigami


Hi, Christian.

On Thu, 21 Feb 2002 17:30:32 +0100 , Christian Geisert wrote:

 the second Release Candidate for 0.20.3 (Maintenance release) is
 finally available at http://xml.apache.org/dist/fop for downloading
 and testing.

Great !!

 - Improved i18n support for AWT viewer (Japanese dialogs)
 Submitted by: Satoshi Ishigami ([EMAIL PROTECTED])

I checked my patche. The resources.ja file which included my 
posted patch has NOT included in 0.20.3rc2.

I repost only org/apache/fop/viewer/resources/resources.ja file.
Please commit this file.

Regards.

---
Satoshi Ishigami   VIC TOKAI CORPORATION



resources.ja
Description: Binary data

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


Re: JDK 1.4 and fonts

2002-02-21 Thread Satoshi Ishigami


see below site:

Endorsed Standards Override Mechanism
http://java.sun.com/j2se/1.4/docs/guide/standards/index.html

JDK1.4 includes such as Xalan-J 2.2.D11 in rt.jar by default,
so current sh/bat script or build.xml may be not able to build 
FOPwhen you use with JDK1.4 .

Regards.

---
Satoshi Ishigami   VIC TOKAI CORPORATION



On Fri, 22 Feb 2002 00:09:18 -0500 , Christopher Burkey wrote:

 Hi,
 
   Has anyone been able to compile FOP with JDK 1.4? It gets this far in 
the 
 build process:
 
 in: ./build/src/codegen/extproperties.xml
 style: ./build/src/codegen/enumgen.xsl
 out: ./build/src/org/apache/fop/fo/properties/extenums_ignore_this.java
 
  [style] Transforming into 
 C:\src-workspaces\per\xml-fop\build\src\org\apache\fop\render\pdf
  [style] Loading stylesheet 
 C:\src-workspaces\per\xml-fop\.\build\src\codegen\code-point-mapping.xsl
  [style] Processing 
 C:\src-workspaces\per\xml-fop\build\src\codegen\encodings.xml to 
 C:\src-workspaces\per\xml-fop\build\src\org\apache\fop\render\pdf\CodePointM
 
 apping.java
  [style] Failed to process 
 C:\src-workspaces\per\xml-fop\build\src\codegen\encodings.xml
 
   My end goal is to improve the appearance of font spacing, within the 
AWT 
 Renderer, using JDK 1.4's improved font handling.
 
   I have compiled with 1.3 then tested the AWT renderer under 1.4. I 
think 
 there are improvements with fonts spacing over JDK 1.3 but its still not 
 perfect.
 
   I have tried to comment out this line:
 
  // space is rendered larger than given by
  // the FontMetrics object
 // if (i = 32)
 // w = (int)(1.4 * fmt.charWidth(i) * FONT_FACTOR);
//  else
  w = (int)(fmt.charWidth(i) * FONT_FACTOR);
 
 
   But that did nothing. Attached is a fo file that displays the problems
 
 with font space. In my attached file you can change the font size and it 
 looks much better. If anyone knows what it could be please let me know.
 
 
 BTW: This is all using 0.20.3 Under Windows 2000, JDK 1.4 final.
 
 Thanks!
 
 _
 Christopher Burkey[EMAIL PROTECTED] 
 President513-542-3401
 eInnovation Inc.  http://einnovation.com
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, email: [EMAIL PROTECTED]
 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




Re: cvs commit: xml-fop/docs/xml-docs/fop fonts.xml

2002-02-18 Thread Satoshi Ishigami


On 18 Feb 2002 09:01:50 - , [EMAIL PROTECTED] wrote:

 keiron  02/02/18 01:01:50
 
   Modified:docs/xml-docs/fop fonts.xml
   Log:
   some more font embedding info
...snip...x8...snip...x8...
   +notep
   +If you do not want the font embedded in the PDF then remove the
   +embed-file attribute. The PDF will then contain text using
   +the font with the font metrics and to view it properly the
   +font will need to be installed where it is being viewed.
   +/p/note

This feature does not work correctly when using any CIDFonts
because specified CIDFont's CMap is wrong.

When no font are embeded in PDF, the PDF viewer, such as
Adobe AcrobatReader, try to use the host operating system
installed font. Then character code is mapped to glyph data
by using CMaps through some steps. For more details, see

Adobe CMap and CID Font Files Specification Version 1.0
http://partners.adobe.com/asn/developer/pdfs/tn/5014.CMap_CIDFont_Spec.pdf

I explain and assume that the CIDFontType2 is used. I illustrate
with a simple figure below:

1st mapping2nd mapping
character code - CID - glyph data
   CMap  CIDFont
 resource   resource

In fact, a little more complicated mapping may be occured.

There are two kinds of mapping. The one is to map the character
code in PDF to CID and the another is to map the CID to glyph
data in the used font.

First, the 1st mapping is done. When generating PDF without
embedding any font, FOP currently use TrueType font's cmap
glyph id as character code. This information is contained
in the font metrics file.

The CMap that used at 1st mapping is specified at the Encoding
entry in the Type0 Font Dictionary (see PDF1.4 spec, 5.6.5 Type0
Font Dictionaries section, p.353).

There are many Encoding value as described in PDF1.4 spec.
(see 5.6.4 CMaps section, p.342). FOP uses Identity-H encoding
as fixed value. This is implemented in
org.apache.fop.render.pdf.fonts.MultiByteFont class.

The Identity-H encoding does not convert any character code into
CID. Therefore the mapped CID is equivalent with character code
(This code is the TrueType cmap glyph id).

Next, the 2nd mapping is peformed. This mapping is based on
CIDSystemInfo Dictionary (see PDF1.4 spec, 5.6.2 CIDSystemInfo
Dictionaries section, p.336).

For example, if the CID is Adobe-Japan1-2 character collection,
the CIDSystemInfo must specify:
/CIDSystemInfo  /Registry (Adobe)/Ordering (Japan1)/Supplement 2 

As mentioned above, the CID handled by FOP depends on each font.
Currently FOP specifies CIDSystemInfo dictionary as following:
/CIDSystemInfo  /Registry (Adobe)/Ordering (UCS)/Supplement 0 

This CIDSystemInfo does not the pre-registered one (For more
details about ToUnicode, see PDF1.4 spec, 5.9 ToUnicode CMaps
section, p.368). 

If I remember correctly, this CIDSystemInfo is used with ToUnicode
CMaps in about FOP-0.18.0. In currently FOP, however, the feature of
ToUnicode is commented out. So, the 2nd mapping could now work
correctly (The generated PDF is not readable and is not viewable).
Thus this CIDSystemInfo is WRONG !!!

In my experimental investigation, the following CIDSystemInfo is
work correctly when no font embedding.

/CIDSystemInfo  /Registry (Adobe)/Ordering (Identity)/Supplement 0 

Since a few months ago, I knew this problem. However I did not
report this to here because my solution is experimental one.

I looked for some document that proved my solution. The CIDToGIDMap
in CIDFont dictionary (PDF1.4 spec, 5.6.3 CIDFonts section, p.339)
is most nearest, but the association of CIDSystemInfo are not
written there...

The value of the Ordering entry is also fixed and the getOrdering()
method is implemented org.apache.fop.render.pdf.fonts.MultiByteFont
class.

If FOP supports to generate PDF with no font embedding, I suggest
to use my represented CIDSystemInfo on current FOP's font handling
architecture.

Please check on your environment and point it out if you noticed
my misunderstanding or any mistakes :-)

Sorry if my english is bad.

Thanks.


---
Satoshi Ishigami   VIC TOKAI CORPORATION

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




i18n in awt viewer [PATCH]

2002-01-27 Thread Satoshi Ishigami

Hi all !

For AWTViewer i18n, I hacked the AWTViewer for FOP-0.23.0rc
maintenance release and I had some questions.

I tried to show the menu as Japanese text for i18n for trial.
I wrote the resource files by UTF-8 encoding because it is not
possible to map multi-bytes characters by iso-8859-1 encoding.
I also convert all existed resource file for AWTViewer from
iso-8859-1 to UTF-8 automatically.

I modified some source files associated with AWTViewer, too.
This modification is for the message i18n and for additional
fonts support on AWTViewer.

The additional fonts are specified in a userconfig.xml file.
And you may start FOP with -c and -awt options. NOTE:Now the
additional fonts that you can specified is only TrueType font
(.ttf and .ttc). This restriction comes from the feature of
SUN's JDK (see java.awt.Font javadoc).

1) If you specify the embed-file attribute in userconfig.xml
and use JDK1.3 or higher, the specified TrueType font is loaded
and used in AWTViewer.

2) Otherwise, if you don't specify the embed-file attribute or
use JDK1.2, the additional font is regarded as Java's "Dialog"
logical font name for each Locale.

I attempt to show Japanese fo file. My test environment is:
SUN's JDK1.2.2 and JDK1.3. LANG=EN and JA environment variable.

When use LANG=JA, 1) and 2) are works fine (menu is Japanese
text and the represented document is readable).

LANG=EN and 1) also works fine (menu is English text and
document readable). But the convination of LANG=EN and 2) does
not works fine (menu looks English text well but document is
unreadble). However I think that this behavior is right.


Below is my questions.

Currently the text of menu for the AWTViewer is loaded from
org.apache.fop.viewer.resources package. The AWTViewer needs
two kinds of resource file (messages.lang and resources.lang).

I can not found messages.lang file for some languages (fi,fr,
it,pl.ru). Therefore the following command can not start the
AWTViewer (thrown NPE).

fop -awt -l fi foo.fo

Q1.Is not messages.lang file necessary with resources.lang file?


The language to use on AWTViewer is decided by getting the
system default or specifying -l option. The decided language
is used for the suffix of resource files. However Java has
a feature that access to a resource file for every language
by default (i.e. java.util.ResourceBundle class and
for examples rsources_ja.properties).

Currently AWTViewer regards resource files as written by
iso-8859-1 encoding. This is not good for i18n. I converted
them to UTF-8 encoding, but I think that the ResourceBundle
framework is better than now because AWTViewer can start
even if there are not resource files for any languages.

Q2.Why AWTViewer does not use ResourceBundle?

Best Regards.

---
石神 覚司(Satoshi Ishigami)   VIC TOKAI CORPORATION


patch.tar.gz
Description: Binary data

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]


i18n in TXTRenderer

2002-01-27 Thread Satoshi Ishigami


Hi .

I hacked the TXTRenderer for i18n.

Currently the org.apache.fop.render.pcl.PCLStream class is
used as OutputStream in TXTRenderer. The add method in
PCLStream calss is as below:

public void add(String str) {
if (!doOutput)
return;

byte buff[] = new byte[str.length()];
int countr;
int len = str.length();
for (countr = 0; countr  len; countr++)
buff[countr] = (byte)str.charAt(countr);
try {
out.write(buff);
} catch (IOException e) {
// e.printStackTrace();
// e.printStackTrace(System.out);
throw new RuntimeException(e.toString());
}
}

I think that this algorithm is wrong for the character  127.
This reason is that the literal length of char is 2 bytes and
the literal length of byte is 1 byte. To avoid this problem,
I think that the following algorithm is better than now.

public void add(String str) {
if (!doOutput) return;
try {
byte buff[] = str.getBytes(UTF-8);
out.write(buff);
} catch (IOException e) {
throw new RuntimeException(e.toString());
}
}

This algorithm may be not good for PCLRenderer because
I don't know whether the PCL printer supports the UTF-8
encoding or not.

However I think that the TXTRenderer could use the
multilingualable encoding because it is possible to include
some languages in a same single fo file.

Therere I consider that the TXTRenderer should not use the
PCLStream and had better use original OutputStream (such as
TXTStream).

Will my thought be wrong?

Best Regards.

---
Satoshi Ishigami   VIC TOKAI CORPORATION

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]




[PATCH] Japanese line breaking

2001-10-14 Thread Satoshi Ishigami

Hi, All!

I found that current line-breaking algorithm was not fine for 
some situations when specified language=ja, such as using
fo:basic-link, lt;gt;, and spacing.

I am attaching the sample fo file(test.fo) and resulting pdf
(test.pdf) for current version in this mail.

The attached patch works fine (it may be slightly redundant). 
This algorithm separates words as following:

  AIUEO Hello World - |A|I|U|E|O| Hello| World|

I suppose A,I,U,E and O to be Japanese here.

Best Regards.

---
Satoshi Ishigami   VIC TOKAI CORPORATION



test.tar.gz
Description: GNU Zip compressed data

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, email: [EMAIL PROTECTED]