On 25 October 2016 at 04:37, Steven D'Aprano <st...@pearwood.info> wrote:
>> I would be happy to see a somewhat more general and user friendly >> version of string.translate function. >> It could work this way: >> string.newtranslate(file_with_table, Drop=True, Dec=True) > Mikhail, I appreciate that you have many ideas and want to share them, > but try to think about how those ideas would work. The Python standard > library is full of really well-designed programming interfaces. You can > learn a lot by thinking "what existing function is this like? how does > that existing function work?". Hi Steven, Thank you for the reply. I agree the idea with the file is not good, I already agreed with that and that was pointed by others too. Of course it is up to me how do I store the table. I will try to be more precise with my ideas ;) The new str.translate() interface is indeed much more versatile and provides good ways to define the table. >Or it can take a mapping (usually a dict) that maps either characters or >ordinal numbers to a new string (not just a single character, but an >arbitrary string) or ordinal numbers. > > str.maketrans({'a': 'A', 98: 66, 0x63: 0x:43}) >(or None, to delete them). Note the flexibility: you don't need to Good. But of course if I do it with big tables, I would anyway need to parse them from some table file. Typing all values direct in code is not a comfortable way. This again should make it clear how I become the "None" value after parsing the table from plain format like 97:[nothin here] (another point for my research). > Could it be better? Perhaps. I've suggested that maybe translate could > automatically call maketrans if given more than one argument. Maybe > there's an easier way to just delete unwanted characters. Perhaps there > could be a way to say "any character not in the translation table should > be dropped". These are interesting questions. So my previous thought on it was, that there could be set of such functions: str.translate_keep(table) - this is current translate, namely keeps non-defined chars untouched str.translate_drop(table) - all the same, but dropping non-defined chars Probaly also a pair of functions without translation: str.remove(chars) - removes given chars str.keep(chars) - removes all, except chars Motivation is that those can be optimised for speed and I suppose those can work faster than re.sub(). The question is how common are these tasks, I don't have any statistics regarding this. >There are no 16-bit strings. >Unicode is a 21-bit encoding, usually encoded as either fixed-width >sequence of 4-byte code units (UTF-32) or a variable-width sequence of >2-byte (UTF-16) or 1-byte (UTF-8) code units. But it absolutely is not a >"16-bit string". So in general case they should expand to 32 bit unsigned integers if I understand correctly? IIRC, Windows uses UTF16 for filenames. Anyway I will not pretend I can give any ideas regarding optimising thing there. It is just that I tend to treat those translate/filter functions as purely numeric, so I should be able to use those on any data chunk without thinking, if it is a text or not, this implies of course I must be sure that units are expanded to fixed bytesize. >> but as said I don't like very much the idea and would be OK for me to >> use numeric values only. > I think you are very possibly the only Python programmer in the world > who thinks that writing decimal ordinal values is more user-friendly > than writing the actual character itself. I know I would much rather > see $, π or ╔ than 36, 960 or 9556. Yeah I am strange. This however gives you guarantee for any environment that you can see and input them ans save the work in ASCII. Mikhail _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/