RE: [docbook-apps] Japanese index

2018-04-25 Thread Jan Tosovsky
On 2018-04-25 Jirka Kosek wrote:
> On 25.4.2018 11:57, Tony Graham wrote:
> > Sorry, but I've never used it, so all that I know is what's on the website.
> 
> I see. After digging some old emails I have been able to find this link:
> 
> https://www.antennahouse.com/i18n-support-library-2/
> 
> It contains open-source part of library that should be working with
> "kimber" method in the stylesheets. This could provide Jan with correct
> Japanese indexing.

Oops, I've forgotten there are other indexing methods. That 'kimber' looks very 
promising. 

Thanks a lot for the link to the Saxon extension. The original link at 
http://www.sagehill.net/docbookxsl/IndexIntl.html is broken now.

If I understand correctly, comparing the open-source version 1 and the version 
2, the latter brings enhancements in Chinese sorting and support for additional 
languages. Both is not directly related to Japanese so I'll start with that 
open source version. 

Thanks,

Jan







-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Japanese index

2018-04-25 Thread Bob Stayton
I forgot to mention that Eliot pointed me to a new package of i18n 
support that he created for DITA and it is packaged for the DITA-OT, but 
it can be adapted for use outside of DITA. It is written for Saxon 9, 
however.  It is available on GitHub at:


https://github.com/dita-community/org.dita-community.i18n

Bob Stayton
Sagehill Enterprises
b...@sagehill.net

On 4/25/2018 3:10 AM, Jirka Kosek wrote:

On 25.4.2018 11:57, Tony Graham wrote:

Sorry, but I've never used it, so all that I know is what's on the website.

I see. After digging some old emails I have been able to find this link:

https://www.antennahouse.com/i18n-support-library-2/

It contains open-source part of library that should be working with
"kimber" method in the stylesheets. This could provide Jan with correct
Japanese indexing.





Re: [docbook-apps] Japanese index

2018-04-25 Thread Tony Graham

On 25/04/2018 11:10, Jirka Kosek wrote:

On 25.4.2018 11:57, Tony Graham wrote:

Sorry, but I've never used it, so all that I know is what's on the website.


I see. After digging some old emails I have been able to find this link:

https://www.antennahouse.com/i18n-support-library-2/

It contains open-source part of library that should be working with
"kimber" method in the stylesheets. This could provide Jan with correct
Japanese indexing.


As relayed to me:

Eliot Kimber originally developed the i18n Library for one of his 
customers and made it open source.  Antenna House made some minor 
corrections and improvements and made those available under the open 
source license, the Support Library with no formal support.  At the same 
time, Antenna House added Chinese sorting, both Traditional and 
Simplified, enhanced the library for DocBook, and offered official 
support.  Over the years Antenna House has further enhanced the sorting 
module, greatly improved it, added additional languages, and created 
stylesheets (and developed the PDF5-ML DITA plugin).


Regards,


Tony Graham.
--
Senior Architect
XML Division
Antenna House, Inc.

Skerries, Ireland
tgra...@antenna.co.jp

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Japanese index

2018-04-25 Thread Bob Stayton

Hello all,

Thanks for tracking down that package, Jirka.  I haven't tested it, but 
that should work with Japanese using the kimber indexing method as 
described in my book:


http://www.sagehill.net/docbookxsl/IndexIntl.html#KimberIndexMethod

I contacted Eliot Kimber, the author of the i18n_support library for 
whom the "kimber" method was named.  He informed me that the original 
library was under the GNU Lesser GPL license, and Antenna House took it 
in 2008, enhanced it by paying for the building of a complete 
Traditional Chinese dictionary, and made it their commercial product. 
After that date, he made further enhancements, including locating an 
open source Chinese dictionary.  I have a copy of a later version, but I 
don't think it has that dictionary.  He says it is still under the GNU 
license and can be distributed.  I will compare the two versions and 
will eventually put a package up on the DocBook Wiki for others to use.  
But for now, the Antenna House version should work.


Bob Stayton
Sagehill Enterprises
b...@sagehill.net

On 4/25/2018 3:10 AM, Jirka Kosek wrote:

On 25.4.2018 11:57, Tony Graham wrote:

Sorry, but I've never used it, so all that I know is what's on the website.

I see. After digging some old emails I have been able to find this link:

https://www.antennahouse.com/i18n-support-library-2/

It contains open-source part of library that should be working with
"kimber" method in the stylesheets. This could provide Jan with correct
Japanese indexing.





Re: [docbook-apps] Japanese index

2018-04-25 Thread Jirka Kosek
On 25.4.2018 11:57, Tony Graham wrote:
> Sorry, but I've never used it, so all that I know is what's on the website.

I see. After digging some old emails I have been able to find this link:

https://www.antennahouse.com/i18n-support-library-2/

It contains open-source part of library that should be working with
"kimber" method in the stylesheets. This could provide Jan with correct
Japanese indexing.

-- 
--
  Jirka Kosek  e-mail: ji...@kosek.cz  http://xmlguru.cz
--
 Professional XML and Web consulting and training services
DocBook/DITA customization, custom XSLT/XSL-FO document processing
--
Bringing you XML Prague conferencehttp://xmlprague.cz
--



signature.asc
Description: OpenPGP digital signature


Re: [docbook-apps] Japanese index

2018-04-25 Thread Tony Graham

On 25/04/2018 10:50, Jirka Kosek wrote:
...

It's probably not what you want to hear, but Antenna House does have a
commercial product for doing DocBook indexes:

https://www.antennahouse.com/antenna1/i18n-index-library/


Isn't this newer version of library that is needed for "kimber" indexing
method? I though that Elliot intended to convince AH to make this
library open-source, but it seems that my memory is wrong.


Sorry, but I've never used it, so all that I know is what's on the website.

Regards,


Tony Graham.
--
Senior Architect
XML Division
Antenna House, Inc.

Skerries, Ireland
tgra...@antenna.co.jp

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



Re: [docbook-apps] Japanese index

2018-04-25 Thread Jirka Kosek
On 24.4.2018 21:53, Tony Graham wrote:
>> But it is still unclear how to tweak the index code to generate groups
>> from
>> non-latin characters.
> 
> I don't know, either.

DocBook stylesheets support three methods of indexing, see:

http://www.sagehill.net/docbookxsl/IndexIntl.html

In "kosek" method you can easily define groups based on the first or
first two characters of indexed words. Unfortunately there is currently
no suitable definition for Japanese. And my Japanese knowledge is not
enough to create such definition.

But internals of this methods are described in the following paper:

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.2069=rep1=pdf

This might give you enough clue to adapt it to Japanese. If you will be
successful it would be great if you can contribute definitions back to
the stylesheets. Feel free to contact me if you need more info.

> It's probably not what you want to hear, but Antenna House does have a
> commercial product for doing DocBook indexes:
> 
> https://www.antennahouse.com/antenna1/i18n-index-library/

Isn't this newer version of library that is needed for "kimber" indexing
method? I though that Elliot intended to convince AH to make this
library open-source, but it seems that my memory is wrong.

-- 
--
  Jirka Kosek  e-mail: ji...@kosek.cz  http://xmlguru.cz
--
 Professional XML and Web consulting and training services
DocBook/DITA customization, custom XSLT/XSL-FO document processing
--
Bringing you XML Prague conferencehttp://xmlprague.cz
--



signature.asc
Description: OpenPGP digital signature


Re: [docbook-apps] Japanese index

2018-04-24 Thread Tony Graham

On 24/04/2018 19:39, Jan Tosovsky wrote:

has anybody any experience with generating Japanese back-of-the-book index
from DocBook source?


More than 20 years ago.


I am facing same issues discussed in this old thread (all entries end up in
the Symbols section):
https://lists.oasis-open.org/archives/docbook-apps/200605/msg00063.html

If I understand correctly, indices in Japanese should be grouped
phonetically:
https://www.slideshare.net/k16shikano/imybp-light

I've found promising Kuromoji library https://github.com/atilika/kuromoji
I can imagine it could somehow pre-process all index entries and generate
values for the 'sortas' attribute.


Slide 35 of those slides shows a corner case that a morphological
analyzer could get wrong. (I'm not able to test it, myself.)

If you were using 'kuromoji', you could concatenate the values of the
'Reading' feature for all of the parts of speech of an index entry and
use that as the 'sortas' value.


But it is still unclear how to tweak the index code to generate groups from
non-latin characters.


I don't know, either.


Or are there better ways?


It's probably not what you want to hear, but Antenna House does have a
commercial product for doing DocBook indexes:

https://www.antennahouse.com/antenna1/i18n-index-library/

Regards,


Tony Graham.
--
Senior Architect
XML Division
Antenna House, Inc.

Skerries, Ireland
tgra...@antenna.co.jp

-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org