Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?

2012-10-15 Thread Cedric Blancher
On 12 September 2012 20:04, Glenn Fowler g...@research.att.com wrote:

 On Wed, 12 Sep 2012 19:36:50 +0200 Cedric Blancher wrote:
 On 12 September 2012 19:25, Glenn Fowler g...@research.att.com wrote:
 
  I changed the posting to just ast-developers
  it would be good for all to not cross-post identical messages to { 
  ast-users ast-developers }

 Do the lists work again? I've been posting to ast-users and
 ast-developers because it's both user and developer concern, and the
 damn f*king mailman was broken for ast-developers.

 there has been some tweaking on the mailman and spamassassin fronts
 plans are to shortly move the ast-* and uwin-* lists to an external server
 which will insulate the lists from internal spam warfare/fallout

  On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote:
  Here's another (likely contentious) issue:
  Can we document the extra EUC regex character classes, which are for
  example present in almost all Unix Japanese locales (e.g. jspace,
  jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in
  ksh93(1)?
 
  ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all
  functions supported by the current locale, and this winds its way
  to all ast regex and ksh pattern matching and globbing

 I know. The shame is: Not documented.

 aha
 well I plan to update (or in some cases instantiate) documentation
 along with the ongoining no-globals/thread-safe/sized-buffer sweep
 of libast and a related libraries

 
  rather than document all possible NAMEs for all possible locales,
  possibly in multiple places (regex(ast), typeset(ksh), etc.),
  can we reference documentation that describes the concept and how
  a user could determine the available functions for a specific locale?

 AFAIK there is no standard function in libc to get the extra character
 classes. Typically you learn them when you learn information
 processing in the CJKV locales.

 this inability of the standard to view itself as a programmable
 object that can be queried is an unfortunate shortcoming

 I noticed that some of the Japanese translations of ksh(1) and
 ksh93(1) in AIX and Solaris list these extra classes - but only for
 Japanese, the Chinese translation then only lists the Chinese extra
 classes - and so on.

 IMO someone has to dig out the EUC documentation and just provide the
 list of all possible combinations (~15 or so, not much). That will be
 better than hiding this feature in some goddamn places none will ever
 find (as a general point of criticism: AST is great and does an even
 greater job with standards conformance - but it's full of places where
 the light of documentation has yet to appear. Which is IMO a shame).

 fixing the ast doc holes is a high priority this fall

What's the status on this one?

Ced
-- 
Cedric Blancher cedric.blanc...@googlemail.com
Institute Pasteur
___
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers


Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?

2012-09-12 Thread Glenn Fowler

I changed the posting to just ast-developers
it would be good for all to not cross-post identical messages to { ast-users 
ast-developers }

On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote:
 Here's another (likely contentious) issue:
 Can we document the extra EUC regex character classes, which are for
 example present in almost all Unix Japanese locales (e.g. jspace,
 jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in
 ksh93(1)?

ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all
functions supported by the current locale, and this winds its way
to all ast regex and ksh pattern matching and globbing

rather than document all possible NAMEs for all possible locales,
possibly in multiple places (regex(ast), typeset(ksh), etc.),
can we reference documentation that describes the concept and how
a user could determine the available functions for a specific locale?

___
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers


Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?

2012-09-12 Thread Cedric Blancher
On 12 September 2012 19:25, Glenn Fowler g...@research.att.com wrote:

 I changed the posting to just ast-developers
 it would be good for all to not cross-post identical messages to { ast-users 
 ast-developers }

Do the lists work again? I've been posting to ast-users and
ast-developers because it's both user and developer concern, and the
damn f*king mailman was broken for ast-developers.


 On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote:
 Here's another (likely contentious) issue:
 Can we document the extra EUC regex character classes, which are for
 example present in almost all Unix Japanese locales (e.g. jspace,
 jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in
 ksh93(1)?

 ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all
 functions supported by the current locale, and this winds its way
 to all ast regex and ksh pattern matching and globbing

I know. The shame is: Not documented.


 rather than document all possible NAMEs for all possible locales,
 possibly in multiple places (regex(ast), typeset(ksh), etc.),
 can we reference documentation that describes the concept and how
 a user could determine the available functions for a specific locale?

AFAIK there is no standard function in libc to get the extra character
classes. Typically you learn them when you learn information
processing in the CJKV locales.
I noticed that some of the Japanese translations of ksh(1) and
ksh93(1) in AIX and Solaris list these extra classes - but only for
Japanese, the Chinese translation then only lists the Chinese extra
classes - and so on.

IMO someone has to dig out the EUC documentation and just provide the
list of all possible combinations (~15 or so, not much). That will be
better than hiding this feature in some goddamn places none will ever
find (as a general point of criticism: AST is great and does an even
greater job with standards conformance - but it's full of places where
the light of documentation has yet to appear. Which is IMO a shame).

Ced
-- 
Cedric Blancher cedric.blanc...@googlemail.com
Institute Pasteur
___
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers


Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?

2012-09-12 Thread Glenn Fowler

On Wed, 12 Sep 2012 19:36:50 +0200 Cedric Blancher wrote:
 On 12 September 2012 19:25, Glenn Fowler g...@research.att.com wrote:
 
  I changed the posting to just ast-developers
  it would be good for all to not cross-post identical messages to { 
  ast-users ast-developers }

 Do the lists work again? I've been posting to ast-users and
 ast-developers because it's both user and developer concern, and the
 damn f*king mailman was broken for ast-developers.

there has been some tweaking on the mailman and spamassassin fronts
plans are to shortly move the ast-* and uwin-* lists to an external server
which will insulate the lists from internal spam warfare/fallout

  On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote:
  Here's another (likely contentious) issue:
  Can we document the extra EUC regex character classes, which are for
  example present in almost all Unix Japanese locales (e.g. jspace,
  jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in
  ksh93(1)?
 
  ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all
  functions supported by the current locale, and this winds its way
  to all ast regex and ksh pattern matching and globbing

 I know. The shame is: Not documented.

aha
well I plan to update (or in some cases instantiate) documentation
along with the ongoining no-globals/thread-safe/sized-buffer sweep
of libast and a related libraries

 
  rather than document all possible NAMEs for all possible locales,
  possibly in multiple places (regex(ast), typeset(ksh), etc.),
  can we reference documentation that describes the concept and how
  a user could determine the available functions for a specific locale?

 AFAIK there is no standard function in libc to get the extra character
 classes. Typically you learn them when you learn information
 processing in the CJKV locales.

this inability of the standard to view itself as a programmable
object that can be queried is an unfortunate shortcoming

 I noticed that some of the Japanese translations of ksh(1) and
 ksh93(1) in AIX and Solaris list these extra classes - but only for
 Japanese, the Chinese translation then only lists the Chinese extra
 classes - and so on.

 IMO someone has to dig out the EUC documentation and just provide the
 list of all possible combinations (~15 or so, not much). That will be
 better than hiding this feature in some goddamn places none will ever
 find (as a general point of criticism: AST is great and does an even
 greater job with standards conformance - but it's full of places where
 the light of documentation has yet to appear. Which is IMO a shame).

fixing the ast doc holes is a high priority this fall

___
ast-developers mailing list
ast-developers@research.att.com
https://mailman.research.att.com/mailman/listinfo/ast-developers