Re: RE Module Performance

2013-07-31 Thread Chris Angelico
On Wed, Jul 31, 2013 at 6:45 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: if you care about minimizing every possible byte, you should use a low-level language like C. Then you can give every character 21 bits, and be happy that you don't waste even one bit. Could go better!

Re: RE Module Performance

2013-07-31 Thread Antoon Pardon
Op 31-07-13 05:30, Michael Torrie schreef: On 07/30/2013 12:19 PM, Antoon Pardon wrote: So? Why are you making this a point of discussion? I was not aware that the pro and cons of various editor buffer implemantations was relevant to the point I was trying to make. I for one found it very

Re: RE Module Performance

2013-07-31 Thread Antoon Pardon
Op 30-07-13 21:09, wxjmfa...@gmail.com schreef: Matable, immutable, copyint + xxx, bufferint, O(n) Yes, but conceptualy the reencoding happen sometime, somewhere. Which is a far cry from your previous claim that it happened every time you enter a char. This of course make your case harder

Re: RE Module Performance

2013-07-31 Thread wxjmfauth
FSR: === The 'a' in 'a€' and 'a\U0001d11e: ['{:#010b}'.format(c) for c in 'a€'.encode('utf-16-be')] ['0b', '0b0111', '0b0010', '0b10101100'] ['{:#010b}'.format(c) for c in 'a\U0001d11e'.encode('utf-32-be')] ['0b', '0b', '0b', '0b0111', '0b',

Re: RE Module Performance

2013-07-31 Thread Antoon Pardon
Op 31-07-13 10:32, wxjmfa...@gmail.com schreef: Unicode/utf* i) (primary key) Create and use a unique set of encoded code points. FSR does this. st1 = 'a€' st2 = 'aa' ord(st1[0]) 97 ord(st2[0]) 97 ii) (secondary key) Depending of the wish, memory/performance:

Re: RE Module Performance

2013-07-31 Thread Michael Torrie
On 07/31/2013 01:23 AM, Antoon Pardon wrote: Op 31-07-13 05:30, Michael Torrie schreef: On 07/30/2013 12:19 PM, Antoon Pardon wrote: So? Why are you making this a point of discussion? I was not aware that the pro and cons of various editor buffer implemantations was relevant to the point I

Re: RE Module Performance

2013-07-31 Thread Michael Torrie
On 07/31/2013 02:32 AM, wxjmfa...@gmail.com wrote: Unicode/utf* Why do you keep using the terms utf and Unicode interchangeably? -- http://mail.python.org/mailman/listinfo/python-list

Re: RE Module Performance

2013-07-31 Thread wxjmfauth
Le mercredi 31 juillet 2013 07:45:18 UTC+2, Steven D'Aprano a écrit : On Tue, 30 Jul 2013 12:09:11 -0700, wxjmfauth wrote: And do not forget, in a pure utf coding scheme, your char or a char will *never* be larger than 4 bytes. sys.getsizeof('a') 26

Re: RE Module Performance

2013-07-31 Thread Chris Angelico
On Wed, Jul 31, 2013 at 9:15 PM, wxjmfa...@gmail.com wrote: ... char never consumes or requires more than 4 bytes ... The integer 5 should be able to be stored in 3 bits. sys.getsizeof(5) 14 Clearly Python is doing something really horribly wrong here. In fact, sys.getsizeof needs to be

Re: RE Module Performance

2013-07-30 Thread wxjmfauth
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote: Back to utf. utfs are not only elements of a unique set of encoded code points. They have an interesting feature. Each utf chunk holds intrisically the character (in

Re: RE Module Performance

2013-07-30 Thread Antoon Pardon
Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not waste time in reencoding the buffer as soon as you enter an €, the 125505th char. Sorry, I wanted to say z instead of

Re: RE Module Performance

2013-07-30 Thread Chris Angelico
On Tue, Jul 30, 2013 at 3:01 PM, wxjmfa...@gmail.com wrote: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not waste time in reencoding the buffer as soon as you enter an €, the 125505th char. Sorry, I wanted to say z

Re: RE Module Performance

2013-07-30 Thread MRAB
On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not waste time in reencoding the buffer as soon as you enter an €, the 125505th

Re: RE Module Performance

2013-07-30 Thread Antoon Pardon
Op 30-07-13 18:13, MRAB schreef: On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not waste time in reencoding the buffer as soon

Re: RE Module Performance

2013-07-30 Thread MRAB
On 30/07/2013 17:39, Antoon Pardon wrote: Op 30-07-13 18:13, MRAB schreef: On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not

Re: RE Module Performance

2013-07-30 Thread Tim Delaney
On 31 July 2013 00:01, wxjmfa...@gmail.com wrote: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not waste time in reencoding the buffer as soon as you enter an €, the 125505th char. Sorry, I wanted to say z instead

Re: RE Module Performance

2013-07-30 Thread Joshua Landau
On 30 July 2013 17:39, Antoon Pardon antoon.par...@rece.vub.ac.be wrote: Op 30-07-13 18:13, MRAB schreef: On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy

Re: RE Module Performance

2013-07-30 Thread Antoon Pardon
Op 30-07-13 19:14, MRAB schreef: On 30/07/2013 17:39, Antoon Pardon wrote: Op 30-07-13 18:13, MRAB schreef: On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy

Re: RE Module Performance

2013-07-30 Thread wxjmfauth
Matable, immutable, copyint + xxx, bufferint, O(n) Yes, but conceptualy the reencoding happen sometime, somewhere. The internal ucs-2 will never automagically be transformed into ucs-4 (eg). timeit.timeit('a'*1 +'€') 7.087220684719967 timeit.timeit('a'*1 +'z') 1.5685214234430873

Re: RE Module Performance

2013-07-30 Thread Chris Angelico
On Tue, Jul 30, 2013 at 8:09 PM, wxjmfa...@gmail.com wrote: Matable, immutable, copyint + xxx, bufferint, O(n) Yes, but conceptualy the reencoding happen sometime, somewhere. The internal ucs-2 will never automagically be transformed into ucs-4 (eg). But probably not on the entire

Re: RE Module Performance

2013-07-30 Thread Terry Reedy
On 7/30/2013 1:40 PM, Joshua Landau wrote: Additionally, who says a language couldn't use, say, B-Trees for all of its list-like types, including strings? Tk apparently uses a B-tree in its text widget. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list

Re: RE Module Performance

2013-07-30 Thread Neil Hodgson
MRAB: The disadvantage there is that when you move the cursor you must move characters around. For example, what if the cursor was at the start and you wanted to move it to the end? Also, when the gap has been filled, you need to make a new one. The normal technique is to only move the gap

Re: RE Module Performance

2013-07-30 Thread Michael Torrie
On 07/30/2013 12:19 PM, Antoon Pardon wrote: So? Why are you making this a point of discussion? I was not aware that the pro and cons of various editor buffer implemantations was relevant to the point I was trying to make. I for one found it very interesting. In fact this thread caused me to

Re: RE Module Performance

2013-07-30 Thread Michael Torrie
On 07/30/2013 01:09 PM, wxjmfa...@gmail.com wrote: Matable, immutable, copyint + xxx, bufferint, O(n) Yes, but conceptualy the reencoding happen sometime, somewhere. The internal ucs-2 will never automagically be transformed into ucs-4 (eg). So what major python project are you working

Re: RE Module Performance

2013-07-30 Thread Steven D'Aprano
On Tue, 30 Jul 2013 12:09:11 -0700, wxjmfauth wrote: And do not forget, in a pure utf coding scheme, your char or a char will *never* be larger than 4 bytes. sys.getsizeof('a') 26 sys.getsizeof('\U000101000') 48 Neither character above is larger than 4 bytes. You forgot to deduct the

Re: RE Module Performance

2013-07-29 Thread Antoon Pardon
Op 26-07-13 15:21, wxjmfa...@gmail.com schreef: Hint: To understand Unicode (and every coding scheme), you should understand utf. The how and the *why*. No you don't. You are mixing the information with how the information is coded. utf is like base64, a way of coding the information that is

Re: RE Module Performance

2013-07-29 Thread Antoon Pardon
Op 28-07-13 20:19, Joshua Landau schreef: On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be mailto:antoon.par...@rece.vub.ac.be wrote: Op 27-07-13 20:21, wxjmfa...@gmail.com mailto:wxjmfa...@gmail.com schreef: utf-8 or any (utf) never need and never spend their

Re: RE Module Performance

2013-07-29 Thread Antoon Pardon
Op 28-07-13 21:30, wxjmfa...@gmail.com schreef: To be short, this is *never* the FSR, always something else. Suggestion. Start by solving all these micro-benchmarks. all the memory cases. It a good start, no? There is nothing to solve. Unicode doesn't force implementations to use the same

Re: RE Module Performance

2013-07-29 Thread Chris Angelico
On Sun, Jul 28, 2013 at 11:14 PM, Joshua Landau jos...@landau.ws wrote: GC does have sometimes severe impact in memory-constrained environments, though. See http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/, about half-way down, specifically

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote: Do not forget that à la FSR mechanism for a non-ascii user is *irrelevant*. You have been told repeatedly, Python's internals are *full* of ASCII- only

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Chris Angelico
On Mon, Jul 29, 2013 at 12:43 PM, wxjmfa...@gmail.com wrote: Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : 3.2 timeit.timeit(r = dir(list)) 22.300465007102908 3.3 timeit.timeit(r = dir(list)) 27.13981129541519 3.2: len(dir(list)) 42 3.3: len(dir(list)) 45

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Heiko Wundram
Am 29.07.2013 13:43, schrieb wxjmfa...@gmail.com: 3.2 timeit.timeit(r = dir(list)) 22.300465007102908 3.3 timeit.timeit(r = dir(list)) 27.13981129541519 For the record, I do not put your example to contradict you. I was expecting such a result even before testing. Now, if you do not

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Devyn Collier Johnson
On 07/29/2013 08:06 AM, Heiko Wundram wrote: Am 29.07.2013 13:43, schrieb wxjmfa...@gmail.com: 3.2 timeit.timeit(r = dir(list)) 22.300465007102908 3.3 timeit.timeit(r = dir(list)) 27.13981129541519 For the record, I do not put your example to contradict you. I was expecting such a result

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit : On Mon, Jul 29, 2013 at 12:43 PM, wxjmfa...@gmail.com wrote: Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : 3.2 timeit.timeit(r = dir(list)) 22.300465007102908 3.3 timeit.timeit(r

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le dimanche 28 juillet 2013 19:36:00 UTC+2, Terry Reedy a écrit : On 7/28/2013 11:52 AM, Michael Torrie wrote: 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that slicing a string would be very very slow, Not necessarily so. See below. and that's

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit : On Mon, Jul 29, 2013 at 12:43 PM, wxjmfa...@gmail.com wrote: Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : 3.2 timeit.timeit(r = dir(list)) 22.300465007102908 3.3 timeit.timeit(r

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Chris Angelico
On Mon, Jul 29, 2013 at 3:20 PM, wxjmfa...@gmail.com wrote: c:\python32\pythonw -u timitmod.py 15.258061416225663 Exit code: 0 c:\Python33\pythonw -u timitmod.py 17.052203122286194 Exit code: 0 len(dir(C)) Did you even think to check that before you posted timings? ChrisA --

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth
Le lundi 29 juillet 2013 16:49:34 UTC+2, Chris Angelico a écrit : On Mon, Jul 29, 2013 at 3:20 PM, wxjmfa...@gmail.com wrote: c:\python32\pythonw -u timitmod.py 15.258061416225663 Exit code: 0 c:\Python33\pythonw -u timitmod.py 17.052203122286194 Exit code: 0

Re: RE Module Performance

2013-07-28 Thread Antoon Pardon
Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: Quickly. sys.getsizeof() at the light of what I explained. 1) As this FSR works with multiple encoding, it has to keep track of the encoding. it puts is in the overhead of str class (overhead = real overhead + encoding). In such a absurd way,

FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Michael Torrie
On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: Good point. FSR, nice tool for those who wish to teach Unicode. It is not every day, one has such an opportunity. I had a long e-mail composed, but decided to chop it down, but still too long. so I ditched a lot of the context, which jmf also

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Chris Angelico
On Sun, Jul 28, 2013 at 4:52 PM, Michael Torrie torr...@gmail.com wrote: Is my understanding of these things wrong? No, your understanding of those matters is fine. There's just one area you seem to be misunderstanding; you appear to think that jmf actually cares about logical argument. I gave

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Terry Reedy
On 7/28/2013 11:52 AM, Michael Torrie wrote: 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that slicing a string would be very very slow, Not necessarily so. See below. and that's unacceptable for the use cases of python strings. I'm assuming you understand big O

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Chris Angelico
On Sun, Jul 28, 2013 at 6:36 PM, Terry Reedy tjre...@udel.edu wrote: I posted about a week ago, in response to Chris A., a method by which lookup for UTF-16 can be made O(log2 k), or perhaps more accurately, O(1+log2(k+1)), where k is the number of non-BMP chars in the string. Which is an

Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote: Back to utf. utfs are not only elements of a unique set of encoded code points. They have an interesting feature. Each utf chunk holds intrisically the character (in

Re: RE Module Performance

2013-07-28 Thread Joshua Landau
On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be wrote: Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: utf-8 or any (utf) never need and never spend their time in reencoding. So? That python sometimes needs to do some kind of background processing is not a problem,

Re: RE Module Performance

2013-07-28 Thread Chris Angelico
On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau jos...@landau.ws wrote: On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be wrote: Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: utf-8 or any (utf) never need and never spend their time in reencoding. So? That python sometimes

Re: RE Module Performance

2013-07-28 Thread MRAB
On 28/07/2013 19:13, wxjmfa...@gmail.com wrote: Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote: Back to utf. utfs are not only elements of a unique set of encoded code points. They have an interesting feature. Each utf

Re: RE Module Performance

2013-07-28 Thread Terry Reedy
On 7/28/2013 2:29 PM, Chris Angelico wrote: On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau jos...@landau.ws wrote: Somewhat off topic, but befitting of the triviality of this thread, do I understand correctly that you are saying garbage collection never causes any noticeable slowdown in

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit : On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: Good point. FSR, nice tool for those who wish to teach Unicode. It is not every day, one has such an opportunity. I had a long e-mail composed, but decided to

Re: RE Module Performance

2013-07-28 Thread wxjmfauth
Le dimanche 28 juillet 2013 21:04:56 UTC+2, MRAB a écrit : On 28/07/2013 19:13, wxjmfa...@gmail.com wrote: Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote: Back to utf. utfs are not only elements of a unique

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread MRAB
On 28/07/2013 20:23, wxjmfa...@gmail.com wrote: [snip] Compare these (a BDFL exemple, where I'using a non-ascii char) Py 3.2 (narrow build) Why are you using a narrow build of Python 3.2? It doesn't treat all codepoints equally (those outside the BMP can't be stored in one code unit) and,

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Antoon Pardon
Op 28-07-13 21:23, wxjmfa...@gmail.com schreef: Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit : On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: Good point. FSR, nice tool for those who wish to teach Unicode. It is not every day, one has such an opportunity. I

Re: RE Module Performance

2013-07-28 Thread Lele Gaifax
wxjmfa...@gmail.com writes: Suggestion. Start by solving all these micro-benchmarks. all the memory cases. It a good start, no? Since you seem the only one who has this dramatic problem with such micro-benchmarks, that BTW have nothing to do with unicode compliance, I'd suggest *you* should

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Steven D'Aprano
On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote: Do not forget that à la FSR mechanism for a non-ascii user is *irrelevant*. You have been told repeatedly, Python's internals are *full* of ASCII- only strings. py dir(list) ['__add__', '__class__', '__contains__', '__delattr__',

Re: RE Module Performance

2013-07-28 Thread Joshua Landau
On 28 July 2013 19:29, Chris Angelico ros...@gmail.com wrote: On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau jos...@landau.ws wrote: On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be wrote: Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: utf-8 or any (utf) never need

Re: RE Module Performance

2013-07-27 Thread Steven D'Aprano
On Fri, 26 Jul 2013 08:46:58 -0700, wxjmfauth wrote: BTW, I'm pleased to read sequence of bits and not bytes. Again, utf transformers are producing sequence of bits, call Unicode Transformation Units, with lengths of 8/16/32 *bits*, from there the names utf8/16/32. UCS transformers are (were)

Re: RE Module Performance

2013-07-27 Thread wxjmfauth
Le samedi 27 juillet 2013 04:05:03 UTC+2, Michael Torrie a écrit : On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote: sys.getsizeof('––') - sys.getsizeof('–') I have already explained / commented this. Maybe it got lost in translation, but I don't understand your point with

Re: RE Module Performance

2013-07-27 Thread Ian Kelly
On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote: Back to utf. utfs are not only elements of a unique set of encoded code points. They have an interesting feature. Each utf chunk holds intrisically the character (in fact the code point) it is supposed to represent. In utf-32, the

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le jeudi 25 juillet 2013 22:45:38 UTC+2, Ian a écrit : On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le vendredi 26 juillet 2013 05:09:34 UTC+2, Michael Torrie a écrit : On 07/25/2013 11:18 AM, Steven D'Aprano wrote: JMF has explained that it is impossible, impossible I say!, to write an editor using a flexible string representation. Since Emacs uses such a flexible string

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit : On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: UTF-8 uses a flexible representation on a character-by-character basis. When parsing UTF-8, one needs to look at EVERY character to

Re: RE Module Performance

2013-07-26 Thread wxjmfauth
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit : On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: UTF-8 uses a flexible representation on a character-by-character basis. When parsing UTF-8, one needs to look at EVERY character to

Re: RE Module Performance

2013-07-26 Thread Michael Torrie
On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote: sys.getsizeof('––') - sys.getsizeof('–') I have already explained / commented this. Maybe it got lost in translation, but I don't understand your point with that. Hint: To understand Unicode (and every coding scheme), you should understand

Re: RE Module Performance

2013-07-26 Thread Steven D'Aprano
On Thu, 25 Jul 2013 21:20:45 -0600, Ian Kelly wrote: On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: UTF-8 uses a flexible representation on a character-by-character basis. When parsing UTF-8, one needs to look at EVERY character to decide how

Re: RE Module Performance

2013-07-26 Thread Ian Kelly
On Fri, Jul 26, 2013 at 9:37 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: See the similarity now? Both flexibly change the width used by code- points, UTF-8 based on the code-point itself regardless of the rest of the string, Python based on the largest code-point in the

Re: RE Module Performance

2013-07-26 Thread Steven D'Aprano
On Fri, 26 Jul 2013 22:12:36 -0600, Ian Kelly wrote: On Fri, Jul 26, 2013 at 9:37 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: See the similarity now? Both flexibly change the width used by code- points, UTF-8 based on the code-point itself regardless of the rest of the

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano
On Wed, 24 Jul 2013 09:00:39 -0600, Michael Torrie wrote about JMF: His most recent argument that Python should use UTF as a representation is very strange to be honest. He's not arguing for anything, he is just hating on anything that gives even the tiniest benefit to ASCII users. This isn't

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Thu, Jul 25, 2013 at 3:49 PM, Serhiy Storchaka storch...@gmail.com wrote: 24.07.13 21:15, Chris Angelico написав(ла): To my mind, exposing UTF-16 surrogates to the application is a bug to be fixed, not a feature to be maintained. Python 3 uses code points from U+DC80 to U+DCFF (which

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano
On Thu, 25 Jul 2013 00:34:24 +1000, Chris Angelico wrote: But mainly, I'm just wondering how many people here have any basis from which to argue the point he's trying to make. I doubt most of us have (a) implemented an editor widget, or (b) tested multiple different internal representations

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano
On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote: If nobody had ever thought of doing a multi-format string representation, I could well imagine the Python core devs debating whether the cost of UTF-32 strings is worth the correctness and consistency improvements... and most likely

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Thu, Jul 25, 2013 at 5:02 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 25 Jul 2013 00:34:24 +1000, Chris Angelico wrote: But mainly, I'm just wondering how many people here have any basis from which to argue the point he's trying to make. I doubt most of us have

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Thu, Jul 25, 2013 at 5:15 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote: If nobody had ever thought of doing a multi-format string representation, I could well imagine the Python core devs debating whether the cost

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano
On Thu, 25 Jul 2013 17:58:10 +1000, Chris Angelico wrote: On Thu, Jul 25, 2013 at 5:15 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote: If nobody had ever thought of doing a multi-format string representation, I could

Re: RE Module Performance

2013-07-25 Thread wxjmfauth
Le mercredi 24 juillet 2013 16:47:36 UTC+2, Michael Torrie a écrit : On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote: Sorry, you are not understanding Unicode. What is a Unicode Transformation Format (UTF), what is the goal of a UTF and why it is important for an implementation to

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Thu, Jul 25, 2013 at 7:22 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: What I'm trying to say is that it is possible to use UTF-16 internally, but *not* assume that every code point (character) is represented by a single 2-byte unit. For example, the len() of a UTF-16

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Thu, Jul 25, 2013 at 7:27 PM, wxjmfa...@gmail.com wrote: A coding scheme works with a unique set of characters (the repertoire), and the implementation (the programming) works with a unique set of encoded code points. The critical step is the path {unique set of characters} -- {unique set

Re: RE Module Performance

2013-07-25 Thread Jeremy Sanders
wxjmfa...@gmail.com wrote: Short example. Writing an editor with something like the FSR is simply impossible (properly). http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html#Text-Representations To conserve memory, Emacs does not hold fixed-length 22-bit numbers

Re: RE Module Performance

2013-07-25 Thread Devyn Collier Johnson
On 07/25/2013 09:36 AM, Jeremy Sanders wrote: wxjmfa...@gmail.com wrote: Short example. Writing an editor with something like the FSR is simply impossible (properly). http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html#Text-Representations To conserve memory,

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano
On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: wxjmfa...@gmail.com wrote: Short example. Writing an editor with something like the FSR is simply impossible (properly). http://www.gnu.org/software/emacs/manual/html_node/elisp/Text- Representations.html#Text-Representations To

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: To conserve memory, Emacs does not hold fixed-length 22-bit numbers that are codepoints of text characters within buffers and strings. Rather,

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano
On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: To conserve memory, Emacs does not hold fixed-length 22-bit numbers that are

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Fri, Jul 26, 2013 at 3:18 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy

Re: RE Module Performance

2013-07-25 Thread wxjmfauth
Le jeudi 25 juillet 2013 12:14:46 UTC+2, Chris Angelico a écrit : On Thu, Jul 25, 2013 at 7:27 PM, wxjmfa...@gmail.com wrote: A coding scheme works with a unique set of characters (the repertoire), and the implementation (the programming) works with a unique set of encoded code

Re: RE Module Performance

2013-07-25 Thread Chris Angelico
On Fri, Jul 26, 2013 at 5:07 AM, wxjmfa...@gmail.com wrote: Let start with a simple string \textemdash or \texttendash sys.getsizeof('–') 40 sys.getsizeof('a') 26 Most of the cost is in those two apostrophes, look: sys.getsizeof('a') 26 sys.getsizeof(a) 8 Okay, that's slightly unfair

RE: RE Module Performance

2013-07-25 Thread Prasad, Ramit
Chris Angelico wrote: On Fri, Jul 26, 2013 at 5:07 AM, wxjmfa...@gmail.com wrote: Let start with a simple string \textemdash or \texttendash sys.getsizeof('-') 40 sys.getsizeof('a') 26 Most of the cost is in those two apostrophes, look: sys.getsizeof('a') 26

Re: RE Module Performance

2013-07-25 Thread Ian Kelly
On Wed, Jul 24, 2013 at 9:34 AM, Chris Angelico ros...@gmail.com wrote: On Thu, Jul 25, 2013 at 12:17 AM, David Hutto dwightdhu...@gmail.com wrote: I've screwed up plenty of times in python, but can write code like a pro when I'm feeling better(on SSI and medicaid). An editor can be built

Re: RE Module Performance

2013-07-25 Thread Ian Kelly
On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano
On Thu, 25 Jul 2013 15:45:38 -0500, Ian Kelly wrote: On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info

Re: RE Module Performance

2013-07-25 Thread Michael Torrie
On 07/25/2013 01:07 PM, wxjmfa...@gmail.com wrote: Let start with a simple string \textemdash or \texttendash sys.getsizeof('–') 40 sys.getsizeof('a') 26 That's meaningless. You're comparing the overhead of a string object itself (a one-time cost anyway), not the overhead of storing the

Re: RE Module Performance

2013-07-25 Thread Michael Torrie
On 07/25/2013 11:18 AM, Steven D'Aprano wrote: JMF has explained that it is impossible, impossible I say!, to write an editor using a flexible string representation. Since Emacs uses such a flexible string representation, Emacs is impossible, and therefore Emacs doesn't exist. Now I'm even

Re: RE Module Performance

2013-07-25 Thread Ian Kelly
On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano steve+comp.lang.pyt...@pearwood.info wrote: UTF-8 uses a flexible representation on a character-by-character basis. When parsing UTF-8, one needs to look at EVERY character to decide how many bytes you need to read. In Python 3, the flexible

Re: RE Module Performance

2013-07-24 Thread wxjmfauth
Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit : On 07/12/2013 09:59 AM, Joshua Landau wrote: If you're interested, the basic of it is that strings now use a variable number of bytes to encode their values depending on whether values outside of the ASCII range and

Re: RE Module Performance

2013-07-24 Thread Chris Angelico
On Wed, Jul 24, 2013 at 11:40 PM, wxjmfa...@gmail.com wrote: Short example. Writing an editor with something like the FSR is simply impossible (properly). jmf, have you ever written an editor with *any* string representation? Are you speaking from any level of experience at all? ChrisA --

Re: RE Module Performance

2013-07-24 Thread David Hutto
I've screwed up plenty of times in python, but can write code like a pro when I'm feeling better(on SSI and medicaid). An editor can be built simply, but it's preference that makes the difference. Some might have used tkinter, gtk. wxpython or other methods for the task. I think the main issue in

Re: RE Module Performance

2013-07-24 Thread David Hutto
I've screwed up plenty of times in python, but can write code like a pro when I'm feeling better(on SSI and medicaid). An editor can be built simply, but it's preference that makes the difference. Some might have used tkinter, gtk. wxpython or other methods for the task. I think the main issue in

Re: RE Module Performance

2013-07-24 Thread Chris Angelico
On Thu, Jul 25, 2013 at 12:17 AM, David Hutto dwightdhu...@gmail.com wrote: I've screwed up plenty of times in python, but can write code like a pro when I'm feeling better(on SSI and medicaid). An editor can be built simply, but it's preference that makes the difference. Some might have used

Re: RE Module Performance

2013-07-24 Thread Michael Torrie
On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote: Sorry, you are not understanding Unicode. What is a Unicode Transformation Format (UTF), what is the goal of a UTF and why it is important for an implementation to work with a UTF. Really? Enlighten me. Personally, I would never use UTF as a

Re: RE Module Performance

2013-07-24 Thread Michael Torrie
On 07/24/2013 08:34 AM, Chris Angelico wrote: Frankly, Python's strings are a *terrible* internal representation for an editor widget - not because of PEP 393, but simply because they are immutable, and every keypress would result in a rebuilding of the string. On the flip side, I could quite

Re: RE Module Performance

2013-07-24 Thread Chris Angelico
On Thu, Jul 25, 2013 at 12:47 AM, Michael Torrie torr...@gmail.com wrote: On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote: Sorry, you are not understanding Unicode. What is a Unicode Transformation Format (UTF), what is the goal of a UTF and why it is important for an implementation to work

Re: RE Module Performance

2013-07-24 Thread Terry Reedy
On 7/24/2013 11:00 AM, Michael Torrie wrote: On 07/24/2013 08:34 AM, Chris Angelico wrote: Frankly, Python's strings are a *terrible* internal representation for an editor widget - not because of PEP 393, but simply because they are immutable, and every keypress would result in a rebuilding of

  1   2   >