Re: [NTG-context] polish sorting

2010-08-19 Thread Philipp Gesang
Hi Hans,

1.  changing the English sorting rules as you suggested had no effect,
neither did the “add_uppercase_mappings('pl',1)”.

2.  I think my original question was stated imprecisly, so let me
emphasize what I'm after:

Suppose you've got three string aaa, Aaa and aab. They are tested
_as if_ they had the same case, i.e. “aaa == Aaa” (the sorter
returns 0). Then (only if the case-indifferent test returned equal)
another check is done for the _first_ character only. If both
strings differ in  case of the first char, then the string with the
lowercase one gets precedence. The correct order will be:

[1] = aaa, [2] = Aaa, [3] = aab

whereas with uppercase after lowercase (as I understand it) you'd
get:

[1] = aaa, [2] = aab, [3] = Aaa

And that is why I extended the splitter (a) to keep the state of the
first character as a boolean as well as (b) to return lowercase sort
strings, and the comparer to do an extra check for this whenever
basicsort returns 0.

I really don't expect you to change the sorter, far from it. Perhaps you
can keep an extra comparer around to do the job -- after all the table
is called “comparers” but for now contains only a single one. Same for
splitters. And as this rule seems to be quite popular around the world
it might probably become useful someday. If you decide against it I'll
just put it on the wiki which will be fine enough, I guess.

Philipp


On 2010-08-19 00:48:14, Hans Hagen wrote:
 On 18-8-2010 6:08, Philipp Gesang wrote:
 Hi,
 
 I'm creating some sorting tables. While researching this topic I
 stumbled on the Polish dictionary sorting rules: if two strings are
 equal except for case then the one gets precedence that begins
 lowercase.[1] (This seems to apply to the Swedish order as well but I
 have no means to verify that. Apparently, my German dictionary (from
 1991) follows the same rule without explicitly stating so.)
 
 Context seems to prefer it the other way round, so I modified two
 functions from sort-ini.lua to handle that; but I'm not happy with
 this solution.
 
 So my question: is there already, or could we have some mechanism
 to influence the details of sorting in context?
 
 i wonder if this works out ok (needs a test index):
 
 sorters.replacements[pl] = {
 -- no replacements
 }
 
 sorters.entries[pl] = {
 [a] = a, [ą] = ą, [b] = b, [c] = c, [ć] = ć,
 [d] = d, [e] = e, [ę] = ę, [f] = f, [g] = g,
 [h] = h, [i] = i, [j] = j, [k] = k, [l] = l,
 [ł] = ł, [m] = m, [n] = n, [ń] = ń, [o] = o,
 [ó] = ó, [p] = p, [q] = q, [r] = r, [s] = s,
 [ś] = ś, [t] = t, [u] = u, [v] = v, [w] = w,
 [x] = x, [y] = y, [z] = z, [ź] = ź, [ż] = ż,
 }
 
 sorters.mappings[pl] = {
 [a] =  1, [ą] =  2, [b] =  3, [c] =  4, [ć] =  5,
 [d] =  6, [e] =  7, [ę] =  8, [f] =  9, [g] = 10,
 [h] = 11, [i] = 12, [j] = 13, [k] = 14, [l] = 15,
 [ł] = 16, [m] = 17, [n] = 18, [ń] = 19, [o] = 20,
 [ó] = 21, [p] = 22, [q] = 23, [r] = 24, [s] = 25,
 [ś] = 26, [t] = 27, [u] = 28, [v] = 29, [w] = 30,
 [x] = 31, [y] = 32, [z] = 33, [ź] = 34, [ż] = 35,
 }
 
 add_uppercase_entries ('pl')
 add_uppercase_mappings('pl',1)
 
 
 
 
 -
   Hans Hagen | PRAGMA ADE
   Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
  | www.pragma-pod.nl
 -
 ___
 If your question is of interest to others as well, please add an entry to the 
 Wiki!
 
 maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
 webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
 archive  : http://foundry.supelec.fr/projects/contextrev/
 wiki : http://contextgarden.net
 ___

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments


pgptJknSpTpb0.pgp
Description: PGP signature
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] polish sorting

2010-08-19 Thread Hans Hagen

Hi,

\mainlanguage[pl]

\starttext

\placeregister[index][criterium=text,n=1]

\startlines
aaa\index{aaa}
Aaa\index{Aaa}
aab\index{aab}
Aab\index{Aab}
\stoplines

\stoptext

typesets

a
aaa 1
Aaa 1
aab 1
Aab 1

aaa
aab
Aaa
Aab

so, what order do you expect here?

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] polish sorting

2010-08-19 Thread Philipp Gesang
On 2010-08-19 11:41:43, Hans Hagen wrote:
 Hi,
 
 a
 aaa 1
 Aaa 1
 aab 1
 Aab 1
 
 aaa
 aab
 Aaa
 Aab
 
 so, what order do you expect here?
Exactly this! Could you please post an example how to achieve this in
lua using comparers.basic (the registers implementation gives me
headaches)?

And how can I invert this behaviour so that [1] = Aaa, [2] = aaa, 
[3] = Aab, c.? Language settings don't seem to have an effect on
the order.

Thanks, Philipp

PS: Did you just include the table from my example into the beta? Wow,
that's fast! There's more to come …
 
 -
   Hans Hagen | PRAGMA ADE
   Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
 tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
  | www.pragma-pod.nl
 -
 ___
 If your question is of interest to others as well, please add an entry to the 
 Wiki!
 
 maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
 webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
 archive  : http://foundry.supelec.fr/projects/contextrev/
 wiki : http://contextgarden.net
 ___

-- 
()  ascii ribbon campaign - against html e-mail
/\  www.asciiribbon.org   - against proprietary attachments


pgpSgEa4JjWh4.pgp
Description: PGP signature
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] polish sorting

2010-08-19 Thread Hans Hagen

On 19-8-2010 12:35, Philipp Gesang wrote:

On 2010-08-1911:41:43, Hans Hagen wrote:

Hi,

a
aaa 1
Aaa 1
aab 1
Aab 1

aaa
aab
Aaa
Aab

so, what order do you expect here?

Exactly this! Could you please post an example how to achieve this in
lua using comparers.basic (the registers implementation gives me
headaches)?

And how can I invert this behaviour so that [1] = Aaa, [2] = aaa,
[3] = Aab,c.? Language settings don't seem to have an effect on
the order.

Thanks, Philipp

PS: Did you just include the table from my example into the beta? Wow,
 that's fast! There's more to come …


yes, it's in the beta .. just the table

Hans

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] polish sorting

2010-08-18 Thread Hans Hagen

On 18-8-2010 6:08, Philipp Gesang wrote:

Hi,

I'm creating some sorting tables. While researching this topic I
stumbled on the Polish dictionary sorting rules: if two strings are
equal except for case then the one gets precedence that begins
lowercase.[1] (This seems to apply to the Swedish order as well but I
have no means to verify that. Apparently, my German dictionary (from
1991) follows the same rule without explicitly stating so.)

Context seems to prefer it the other way round, so I modified two
functions from sort-ini.lua to handle that; but I'm not happy with
this solution.

So my question: is there already, or could we have some mechanism
to influence the details of sorting in context?


adapting the sorter is no option

grep for -- uppercase after lowercase in sort-lan ... you can define a 
sort vector that deals with it


Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___


Re: [NTG-context] polish sorting

2010-08-18 Thread Hans Hagen

On 18-8-2010 6:08, Philipp Gesang wrote:

Hi,

I'm creating some sorting tables. While researching this topic I
stumbled on the Polish dictionary sorting rules: if two strings are
equal except for case then the one gets precedence that begins
lowercase.[1] (This seems to apply to the Swedish order as well but I
have no means to verify that. Apparently, my German dictionary (from
1991) follows the same rule without explicitly stating so.)

Context seems to prefer it the other way round, so I modified two
functions from sort-ini.lua to handle that; but I'm not happy with
this solution.

So my question: is there already, or could we have some mechanism
to influence the details of sorting in context?


i wonder if this works out ok (needs a test index):

sorters.replacements[pl] = {
-- no replacements
}

sorters.entries[pl] = {
[a] = a, [ą] = ą, [b] = b, [c] = c, [ć] = ć,
[d] = d, [e] = e, [ę] = ę, [f] = f, [g] = g,
[h] = h, [i] = i, [j] = j, [k] = k, [l] = l,
[ł] = ł, [m] = m, [n] = n, [ń] = ń, [o] = o,
[ó] = ó, [p] = p, [q] = q, [r] = r, [s] = s,
[ś] = ś, [t] = t, [u] = u, [v] = v, [w] = w,
[x] = x, [y] = y, [z] = z, [ź] = ź, [ż] = ż,
}

sorters.mappings[pl] = {
[a] =  1, [ą] =  2, [b] =  3, [c] =  4, [ć] =  5,
[d] =  6, [e] =  7, [ę] =  8, [f] =  9, [g] = 10,
[h] = 11, [i] = 12, [j] = 13, [k] = 14, [l] = 15,
[ł] = 16, [m] = 17, [n] = 18, [ń] = 19, [o] = 20,
[ó] = 21, [p] = 22, [q] = 23, [r] = 24, [s] = 25,
[ś] = 26, [t] = 27, [u] = 28, [v] = 29, [w] = 30,
[x] = 31, [y] = 32, [z] = 33, [ź] = 34, [ż] = 35,
}

add_uppercase_entries ('pl')
add_uppercase_mappings('pl',1)




-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
tel: 038 477 53 69 | voip: 087 875 68 74 | www.pragma-ade.com
 | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://tex.aanhet.net
archive  : http://foundry.supelec.fr/projects/contextrev/
wiki : http://contextgarden.net
___