Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?
On 12 September 2012 20:04, Glenn Fowler g...@research.att.com wrote: On Wed, 12 Sep 2012 19:36:50 +0200 Cedric Blancher wrote: On 12 September 2012 19:25, Glenn Fowler g...@research.att.com wrote: I changed the posting to just ast-developers it would be good for all to not cross-post identical messages to { ast-users ast-developers } Do the lists work again? I've been posting to ast-users and ast-developers because it's both user and developer concern, and the damn f*king mailman was broken for ast-developers. there has been some tweaking on the mailman and spamassassin fronts plans are to shortly move the ast-* and uwin-* lists to an external server which will insulate the lists from internal spam warfare/fallout On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote: Here's another (likely contentious) issue: Can we document the extra EUC regex character classes, which are for example present in almost all Unix Japanese locales (e.g. jspace, jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in ksh93(1)? ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all functions supported by the current locale, and this winds its way to all ast regex and ksh pattern matching and globbing I know. The shame is: Not documented. aha well I plan to update (or in some cases instantiate) documentation along with the ongoining no-globals/thread-safe/sized-buffer sweep of libast and a related libraries rather than document all possible NAMEs for all possible locales, possibly in multiple places (regex(ast), typeset(ksh), etc.), can we reference documentation that describes the concept and how a user could determine the available functions for a specific locale? AFAIK there is no standard function in libc to get the extra character classes. Typically you learn them when you learn information processing in the CJKV locales. this inability of the standard to view itself as a programmable object that can be queried is an unfortunate shortcoming I noticed that some of the Japanese translations of ksh(1) and ksh93(1) in AIX and Solaris list these extra classes - but only for Japanese, the Chinese translation then only lists the Chinese extra classes - and so on. IMO someone has to dig out the EUC documentation and just provide the list of all possible combinations (~15 or so, not much). That will be better than hiding this feature in some goddamn places none will ever find (as a general point of criticism: AST is great and does an even greater job with standards conformance - but it's full of places where the light of documentation has yet to appear. Which is IMO a shame). fixing the ast doc holes is a high priority this fall What's the status on this one? Ced -- Cedric Blancher cedric.blanc...@googlemail.com Institute Pasteur ___ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers
Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?
I changed the posting to just ast-developers it would be good for all to not cross-post identical messages to { ast-users ast-developers } On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote: Here's another (likely contentious) issue: Can we document the extra EUC regex character classes, which are for example present in almost all Unix Japanese locales (e.g. jspace, jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in ksh93(1)? ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all functions supported by the current locale, and this winds its way to all ast regex and ksh pattern matching and globbing rather than document all possible NAMEs for all possible locales, possibly in multiple places (regex(ast), typeset(ksh), etc.), can we reference documentation that describes the concept and how a user could determine the available functions for a specific locale? ___ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers
Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?
On 12 September 2012 19:25, Glenn Fowler g...@research.att.com wrote: I changed the posting to just ast-developers it would be good for all to not cross-post identical messages to { ast-users ast-developers } Do the lists work again? I've been posting to ast-users and ast-developers because it's both user and developer concern, and the damn f*king mailman was broken for ast-developers. On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote: Here's another (likely contentious) issue: Can we document the extra EUC regex character classes, which are for example present in almost all Unix Japanese locales (e.g. jspace, jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in ksh93(1)? ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all functions supported by the current locale, and this winds its way to all ast regex and ksh pattern matching and globbing I know. The shame is: Not documented. rather than document all possible NAMEs for all possible locales, possibly in multiple places (regex(ast), typeset(ksh), etc.), can we reference documentation that describes the concept and how a user could determine the available functions for a specific locale? AFAIK there is no standard function in libc to get the extra character classes. Typically you learn them when you learn information processing in the CJKV locales. I noticed that some of the Japanese translations of ksh(1) and ksh93(1) in AIX and Solaris list these extra classes - but only for Japanese, the Chinese translation then only lists the Chinese extra classes - and so on. IMO someone has to dig out the EUC documentation and just provide the list of all possible combinations (~15 or so, not much). That will be better than hiding this feature in some goddamn places none will ever find (as a general point of criticism: AST is great and does an even greater job with standards conformance - but it's full of places where the light of documentation has yet to appear. Which is IMO a shame). Ced -- Cedric Blancher cedric.blanc...@googlemail.com Institute Pasteur ___ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers
Re: [ast-developers] Documenting EUC regex character classes in ksh93(1)?
On Wed, 12 Sep 2012 19:36:50 +0200 Cedric Blancher wrote: On 12 September 2012 19:25, Glenn Fowler g...@research.att.com wrote: I changed the posting to just ast-developers it would be good for all to not cross-post identical messages to { ast-users ast-developers } Do the lists work again? I've been posting to ast-users and ast-developers because it's both user and developer concern, and the damn f*king mailman was broken for ast-developers. there has been some tweaking on the mailman and spamassassin fronts plans are to shortly move the ast-* and uwin-* lists to an external server which will insulate the lists from internal spam warfare/fallout On Wed, 12 Sep 2012 19:14:00 +0200 Cedric Blancher wrote: Here's another (likely contentious) issue: Can we document the extra EUC regex character classes, which are for example present in almost all Unix Japanese locales (e.g. jspace, jhira, jkata, jkanjim jdigit, while EUC AFAIK defines more), in ksh93(1)? ast regex [[:NAME:]] uses wctype(3) and iswctype(3) to support all functions supported by the current locale, and this winds its way to all ast regex and ksh pattern matching and globbing I know. The shame is: Not documented. aha well I plan to update (or in some cases instantiate) documentation along with the ongoining no-globals/thread-safe/sized-buffer sweep of libast and a related libraries rather than document all possible NAMEs for all possible locales, possibly in multiple places (regex(ast), typeset(ksh), etc.), can we reference documentation that describes the concept and how a user could determine the available functions for a specific locale? AFAIK there is no standard function in libc to get the extra character classes. Typically you learn them when you learn information processing in the CJKV locales. this inability of the standard to view itself as a programmable object that can be queried is an unfortunate shortcoming I noticed that some of the Japanese translations of ksh(1) and ksh93(1) in AIX and Solaris list these extra classes - but only for Japanese, the Chinese translation then only lists the Chinese extra classes - and so on. IMO someone has to dig out the EUC documentation and just provide the list of all possible combinations (~15 or so, not much). That will be better than hiding this feature in some goddamn places none will ever find (as a general point of criticism: AST is great and does an even greater job with standards conformance - but it's full of places where the light of documentation has yet to appear. Which is IMO a shame). fixing the ast doc holes is a high priority this fall ___ ast-developers mailing list ast-developers@research.att.com https://mailman.research.att.com/mailman/listinfo/ast-developers