Gabriel Genellina wrote: > At Wednesday 18/10/2006 03:42, Ron Adam wrote: > >> I put together the following module today and would like some feedback >> on any >> obvious problems. Or even opinions of weather or not it is a good >> approach. >> if self.flag & CAPS_FIRST: >> s = s.swapcase() > > This is just coincidental; it relies on (lowercase)<(uppercase) on the > locale collating sequence, and I don't see why it should be always so.
The LC_COLLATE structure (in the python.exe C code I think) controls the order of upper and lower case during collating. I don't know if there is anyway to examine it unfortunately. If there was a way to change the LC_COLLATE structure, I wouldn't need to resort to tricks like s.swapcase(). But without that info, I don't know of another way. Maybe changing the CAPS_FIRST to REVERSE_CAPS_ORDER would do? >> if self.flag & IGNORE_LEADING_WS: >> s = s.strip() I'm not sure if this would make any visible difference. It might determine order of two strings where they are the same, but one has white space at the end the other doesn't. They run at the same speed either way, so I'll go ahead and change it. Thanks. > This ignores trailing ws too. (lstrip?) > >> if self.flag & NUMERICAL: >> if self.flag & COMMA_IN_NUMERALS: >> rex = >> re.compile('^(\d*\,?\d*\.?\d*)(\D*)(\d*\,?\d*\.?\d*)', >> re.LOCALE) >> else: >> rex = re.compile('^(\d*\.?\d*)(\D*)(\d*\.?\d*)', >> re.LOCALE) >> slist = rex.split(s) >> for i, x in enumerate(slist): >> if self.flag & COMMA_IN_NUMERALS: >> x = x.replace(',', '') >> try: >> slist[i] = float(x) >> except: >> slist[i] = locale.strxfrm(x) >> return slist >> return locale.strxfrm(s) > > You should try to make this part a bit more generic. If you are > concerned about locales, do not use "comma" explicitely. In other > countries 10*100=1.000 - and 1,234 is a fraction between 1 and 2. See the most recent version of this I posted. It is a bit more generic. news://news.cox.net:119/[EMAIL PROTECTED] Maybe a 'comma_is_decimal' option? Options are cheep so it's no problem to add them as long as they make sense. ;-) These options are what I refer to as mid-level options. The programmer does still need to know something about the data they are collating. They may still need to do some preprocessing even with this, but maybe not as much. In a higher level collation routine, I think you would just need to specify a named sort type, such as 'dictionary', 'directory', 'enventory' and it would set the options and accordingly. The problem with that approach is the higher level definitions may be different depending on locale or even the field it is used in. >> The NUMERICAL option orders leading and trailing digits as numerals. >> >> >>> t = ['a5', 'a40', '4abc', '20abc', 'a10.2', '13.5b', 'b2'] >> >>> collated(t, NUMERICAL) >> ['4abc', '13.5b', '20abc', 'a5', 'a10.2', 'a40', 'b2'] > > From the name "NUMERICAL" I would expect this sorting: b2, 4abc, a5, > a10.2, 13.5b, 20abc, a40 (that is, sorting as numbers only). > Maybe GROUP_NUMBERS... but I dont like that too much either... How about 'VALUE_ORDERING' ? The term I've seen before is called natural ordering, but that is more general and can include date, roman numerals, as well as other type. Cheers, Ron -- http://mail.python.org/mailman/listinfo/python-list