subject:"Re\: RE Module Performance"

Re: RE Module Performance

2013-07-31 Thread Chris Angelico

On Wed, Jul 31, 2013 at 9:15 PM, wrote: > ... char never consumes or requires more than 4 bytes ... > The integer 5 should be able to be stored in 3 bits. >>> sys.getsizeof(5) 14 Clearly Python is doing something really horribly wrong here. In fact, sys.getsizeof needs to be changed to return

Re: RE Module Performance

2013-07-31 Thread wxjmfauth

Le mercredi 31 juillet 2013 07:45:18 UTC+2, Steven D'Aprano a écrit : > On Tue, 30 Jul 2013 12:09:11 -0700, wxjmfauth wrote: > > > > > And do not forget, in a pure utf coding scheme, your char or a char will > > > *never* be larger than 4 bytes. > > > > > sys.getsizeof('a') > > > 26 >

Re: RE Module Performance

2013-07-31 Thread Michael Torrie

On 07/31/2013 02:32 AM, wxjmfa...@gmail.com wrote: > Unicode/utf* Why do you keep using the terms "utf" and "Unicode" interchangeably? -- http://mail.python.org/mailman/listinfo/python-list

Re: RE Module Performance

2013-07-31 Thread Michael Torrie

On 07/31/2013 01:23 AM, Antoon Pardon wrote: > Op 31-07-13 05:30, Michael Torrie schreef: >> On 07/30/2013 12:19 PM, Antoon Pardon wrote: >>> So? Why are you making this a point of discussion? I was not aware that >>> the pro and cons of various editor buffer implemantations was relevant >>> to the

Re: RE Module Performance

2013-07-31 Thread Antoon Pardon

Op 31-07-13 10:32, wxjmfa...@gmail.com schreef: > Unicode/utf* > > > i) ("primary key") Create and use a unique set of encoded > code points. FSR does this. >>> st1 = 'a€' >>> st2 = 'aa' >>> ord(st1[0]) 97 >>> ord(st2[0]) 97 >>> > ii) ("secondary key") Depending of the wish, > memor

Re: RE Module Performance

2013-07-31 Thread wxjmfauth

FSR: === The 'a' in 'a€' and 'a\U0001d11e: >>> ['{:#010b}'.format(c) for c in 'a€'.encode('utf-16-be')] ['0b', '0b0111', '0b0010', '0b10101100'] >>> ['{:#010b}'.format(c) for c in 'a\U0001d11e'.encode('utf-32-be')] ['0b', '0b', '0b', '0b0111', '0b00

Re: RE Module Performance

2013-07-31 Thread Antoon Pardon

Op 30-07-13 21:09, wxjmfa...@gmail.com schreef: > Matable, immutable, copyint + xxx, bufferint, O(n) > Yes, but conceptualy the reencoding happen sometime, somewhere. Which is a far cry from your previous claim that it happened every time you enter a char. This of course make your case harde

Re: RE Module Performance

2013-07-31 Thread Antoon Pardon

Op 31-07-13 05:30, Michael Torrie schreef: > On 07/30/2013 12:19 PM, Antoon Pardon wrote: >> So? Why are you making this a point of discussion? I was not aware that >> the pro and cons of various editor buffer implemantations was relevant >> to the point I was trying to make. > > I for one found i

Re: RE Module Performance

2013-07-31 Thread Chris Angelico

On Wed, Jul 31, 2013 at 6:45 AM, Steven D'Aprano wrote: > if you care about minimizing every possible byte, you should > use a low-level language like C. Then you can give every character 21 > bits, and be happy that you don't waste even one bit. Could go better! Since not every character has bee

Re: RE Module Performance

2013-07-30 Thread Steven D'Aprano

On Tue, 30 Jul 2013 12:09:11 -0700, wxjmfauth wrote: > And do not forget, in a pure utf coding scheme, your char or a char will > *never* be larger than 4 bytes. > sys.getsizeof('a') > 26 sys.getsizeof('\U000101000') > 48 Neither character above is larger than 4 bytes. You forgot to de

Re: RE Module Performance

2013-07-30 Thread Michael Torrie

On 07/30/2013 01:09 PM, wxjmfa...@gmail.com wrote: > Matable, immutable, copyint + xxx, bufferint, O(n) > Yes, but conceptualy the reencoding happen sometime, somewhere. > The internal "ucs-2" will never automagically be transformed > into "ucs-4" (eg). So what major python project are you wo

Re: RE Module Performance

2013-07-30 Thread Michael Torrie

On 07/30/2013 12:19 PM, Antoon Pardon wrote: > So? Why are you making this a point of discussion? I was not aware that > the pro and cons of various editor buffer implemantations was relevant > to the point I was trying to make. I for one found it very interesting. In fact this thread caused me t

Re: RE Module Performance

2013-07-30 Thread Neil Hodgson

MRAB: The disadvantage there is that when you move the cursor you must move characters around. For example, what if the cursor was at the start and you wanted to move it to the end? Also, when the gap has been filled, you need to make a new one. The normal technique is to only move the gap

Re: RE Module Performance

2013-07-30 Thread Terry Reedy

On 7/30/2013 1:40 PM, Joshua Landau wrote: Additionally, who says a language couldn't use, say, B-Trees for all of its list-like types, including strings? Tk apparently uses a B-tree in its text widget. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list

Re: RE Module Performance

2013-07-30 Thread Chris Angelico

On Tue, Jul 30, 2013 at 8:09 PM, wrote: > Matable, immutable, copyint + xxx, bufferint, O(n) > Yes, but conceptualy the reencoding happen sometime, somewhere. > The internal "ucs-2" will never automagically be transformed > into "ucs-4" (eg). But probably not on the entire document. With ev

Re: RE Module Performance

2013-07-30 Thread wxjmfauth

Matable, immutable, copyint + xxx, bufferint, O(n) Yes, but conceptualy the reencoding happen sometime, somewhere. The internal "ucs-2" will never automagically be transformed into "ucs-4" (eg). >>> timeit.timeit("'a'*1 +'€'") 7.087220684719967 >>> timeit.timeit("'a'*1 +'z'") 1.568521

Re: RE Module Performance

2013-07-30 Thread Antoon Pardon

Op 30-07-13 19:14, MRAB schreef: On 30/07/2013 17:39, Antoon Pardon wrote: Op 30-07-13 18:13, MRAB schreef: On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the

Re: RE Module Performance

2013-07-30 Thread Joshua Landau

On 30 July 2013 17:39, Antoon Pardon wrote: > Op 30-07-13 18:13, MRAB schreef: > > On 30/07/2013 15:38, Antoon Pardon wrote: >> >>> Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: >>> I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the bu

Re: RE Module Performance

2013-07-30 Thread Tim Delaney

On 31 July 2013 00:01, wrote: > > I am pretty sure that once you have typed your 127504 > ascii characters, you are very happy the buffer of your > editor does not waste time in reencoding the buffer as > soon as you enter an €, the 125505th char. Sorry, I wanted > to say z instead of euro, just

Re: RE Module Performance

2013-07-30 Thread MRAB

On 30/07/2013 17:39, Antoon Pardon wrote: Op 30-07-13 18:13, MRAB schreef: On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not wa

Re: RE Module Performance

2013-07-30 Thread Antoon Pardon

Op 30-07-13 18:13, MRAB schreef: On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not waste time in reencoding the buffer as soon a

Re: RE Module Performance

2013-07-30 Thread MRAB

On 30/07/2013 15:38, Antoon Pardon wrote: Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: I am pretty sure that once you have typed your 127504 ascii characters, you are very happy the buffer of your editor does not waste time in reencoding the buffer as soon as you enter an €, the 125505th cha

Re: RE Module Performance

2013-07-30 Thread Chris Angelico

On Tue, Jul 30, 2013 at 3:01 PM, wrote: > I am pretty sure that once you have typed your 127504 > ascii characters, you are very happy the buffer of your > editor does not waste time in reencoding the buffer as > soon as you enter an €, the 125505th char. Sorry, I wanted > to say z instead of eur

Re: RE Module Performance

2013-07-30 Thread Antoon Pardon

Op 30-07-13 16:01, wxjmfa...@gmail.com schreef: > > I am pretty sure that once you have typed your 127504 > ascii characters, you are very happy the buffer of your > editor does not waste time in reencoding the buffer as > soon as you enter an €, the 125505th char. Sorry, I wanted > to say z inste

Re: RE Module Performance

2013-07-30 Thread wxjmfauth

Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : > On Sat, Jul 27, 2013 at 12:21 PM, wrote: > > > Back to utf. utfs are not only elements of a unique set of encoded > > > code points. They have an interesting feature. Each "utf chunk" > > > holds intrisically the character (in fact th

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth

Le lundi 29 juillet 2013 16:49:34 UTC+2, Chris Angelico a écrit : > On Mon, Jul 29, 2013 at 3:20 PM, wrote: > > >>c:\python32\pythonw -u "timitmod.py" > > > 15.258061416225663 > > >>Exit code: 0 > > >>c:\Python33\pythonw -u "timitmod.py" > > > 17.052203122286194 > > >>Exit code: 0 > > >

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Chris Angelico

On Mon, Jul 29, 2013 at 3:20 PM, wrote: >>c:\python32\pythonw -u "timitmod.py" > 15.258061416225663 >>Exit code: 0 >>c:\Python33\pythonw -u "timitmod.py" > 17.052203122286194 >>Exit code: 0 >>> len(dir(C)) Did you even think to check that before you posted timings? ChrisA -- http://mail.pytho

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth

Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit : > On Mon, Jul 29, 2013 at 12:43 PM, wrote: > > > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : > > > 3.2 > > timeit.timeit("r = dir(list)") > > > 22.300465007102908 > > > > > > 3.3 > > timei

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth

Le dimanche 28 juillet 2013 19:36:00 UTC+2, Terry Reedy a écrit : > On 7/28/2013 11:52 AM, Michael Torrie wrote: > > > > > > 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that > > > slicing a string would be very very slow, > > > > Not necessarily so. See below. > > >

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth

Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit : > On Mon, Jul 29, 2013 at 12:43 PM, wrote: > > > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : > > > 3.2 > > timeit.timeit("r = dir(list)") > > > 22.300465007102908 > > > > > > 3.3 > > timei

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Devyn Collier Johnson

On 07/29/2013 08:06 AM, Heiko Wundram wrote: Am 29.07.2013 13:43, schrieb wxjmfa...@gmail.com: 3.2 timeit.timeit("r = dir(list)") 22.300465007102908 3.3 timeit.timeit("r = dir(list)") 27.13981129541519 For the record, I do not put your example to contradict you. I was expecting such a res

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Heiko Wundram

Am 29.07.2013 13:43, schrieb wxjmfa...@gmail.com: 3.2 timeit.timeit("r = dir(list)") 22.300465007102908 3.3 timeit.timeit("r = dir(list)") 27.13981129541519 For the record, I do not put your example to contradict you. I was expecting such a result even before testing. Now, if you do not un

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread Chris Angelico

On Mon, Jul 29, 2013 at 12:43 PM, wrote: > Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : > 3.2 timeit.timeit("r = dir(list)") > 22.300465007102908 > > 3.3 timeit.timeit("r = dir(list)") > 27.13981129541519 3.2: >>> len(dir(list)) 42 3.3: >>> len(dir(list)) 45

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-29 Thread wxjmfauth

Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit : > On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote: > > > > > Do not forget that à la "FSR" mechanism for a non-ascii user is > > > *irrelevant*. > > > > You have been told repeatedly, Python's internals are *full* of A

Re: RE Module Performance

2013-07-29 Thread Chris Angelico

On Sun, Jul 28, 2013 at 11:14 PM, Joshua Landau wrote: > GC does have sometimes severe impact in memory-constrained environments, > though. See http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/, > about half-way down, specifically > http://sealedabstract.com/wp-content/uploads/2013/05/

Re: RE Module Performance

2013-07-29 Thread Antoon Pardon

Op 28-07-13 21:30, wxjmfa...@gmail.com schreef: To be short, this is *never* the FSR, always something else. Suggestion. Start by solving all these "micro-benchmarks". all the memory cases. It a good start, no? There is nothing to solve. Unicode doesn't force implementations to use the same

Re: RE Module Performance

2013-07-29 Thread Antoon Pardon

Op 28-07-13 20:19, Joshua Landau schreef: On 28 July 2013 09:45, Antoon Pardon mailto:antoon.par...@rece.vub.ac.be>> wrote: Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: utf-8 or any (utf) never need and never spend their time in reencoding

Re: RE Module Performance

2013-07-29 Thread Antoon Pardon

Op 26-07-13 15:21, wxjmfa...@gmail.com schreef: Hint: To understand Unicode (and every coding scheme), you should understand "utf". The how and the *why*. No you don't. You are mixing the information with how the information is coded. utf is like base64, a way of coding the information that is

Re: RE Module Performance

2013-07-28 Thread Joshua Landau

On 28 July 2013 19:29, Chris Angelico wrote: > On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau wrote: > > On 28 July 2013 09:45, Antoon Pardon > wrote: > >> > >> Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: > >>> > >>> utf-8 or any (utf) never need and never spend their time > >>> in reencodi

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Steven D'Aprano

On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote: > Do not forget that à la "FSR" mechanism for a non-ascii user is > *irrelevant*. You have been told repeatedly, Python's internals are *full* of ASCII- only strings. py> dir(list) ['__add__', '__class__', '__contains__', '__delattr__', '__del

Re: RE Module Performance

2013-07-28 Thread Lele Gaifax

wxjmfa...@gmail.com writes: > Suggestion. Start by solving all these "micro-benchmarks". > all the memory cases. It a good start, no? Since you seem the only one who has this dramatic problem with such micro-benchmarks, that BTW have nothing to do with "unicode compliance", I'd suggest *you* shou

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Antoon Pardon

Op 28-07-13 21:23, wxjmfa...@gmail.com schreef: Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit : On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: Good point. FSR, nice tool for those who wish to teach Unicode. It is not every day, one has such an opportunity. I had

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread MRAB

On 28/07/2013 20:23, wxjmfa...@gmail.com wrote: [snip] Compare these (a BDFL exemple, where I'using a non-ascii char) Py 3.2 (narrow build) Why are you using a narrow build of Python 3.2? It doesn't treat all codepoints equally (those outside the BMP can't be stored in one code unit) and, the

Re: RE Module Performance

2013-07-28 Thread wxjmfauth

Le dimanche 28 juillet 2013 21:04:56 UTC+2, MRAB a écrit : > On 28/07/2013 19:13, wxjmfa...@gmail.com wrote: > > > Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : > > >> On Sat, Jul 27, 2013 at 12:21 PM, wrote: > > >> > > >> > Back to utf. utfs are not only elements of a unique set

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread wxjmfauth

Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit : > On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: > > > Good point. FSR, nice tool for those who wish to teach > > > Unicode. It is not every day, one has such an opportunity. > > > > I had a long e-mail composed, but deci

Re: RE Module Performance

2013-07-28 Thread Terry Reedy

On 7/28/2013 2:29 PM, Chris Angelico wrote: On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau wrote: Somewhat off topic, but befitting of the triviality of this thread, do I understand correctly that you are saying garbage collection never causes any noticeable slowdown in real-world circumstanc

Re: RE Module Performance

2013-07-28 Thread MRAB

On 28/07/2013 19:13, wxjmfa...@gmail.com wrote: Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : On Sat, Jul 27, 2013 at 12:21 PM, wrote: > Back to utf. utfs are not only elements of a unique set of encoded > code points. They have an interesting feature. Each "utf chunk" > holds i

Re: RE Module Performance

2013-07-28 Thread Chris Angelico

On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau wrote: > On 28 July 2013 09:45, Antoon Pardon wrote: >> >> Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: >>> >>> utf-8 or any (utf) never need and never spend their time >>> in reencoding. >> >> >> So? That python sometimes needs to do some kind of

Re: RE Module Performance

2013-07-28 Thread Joshua Landau

On 28 July 2013 09:45, Antoon Pardon wrote: > Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: > >> utf-8 or any (utf) never need and never spend their time >> in reencoding. >> > > So? That python sometimes needs to do some kind of background > processing is not a problem, whether it is garbage c

Re: RE Module Performance

2013-07-28 Thread wxjmfauth

Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit : > On Sat, Jul 27, 2013 at 12:21 PM, wrote: > > > Back to utf. utfs are not only elements of a unique set of encoded > > > code points. They have an interesting feature. Each "utf chunk" > > > holds intrisically the character (in fact th

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Chris Angelico

On Sun, Jul 28, 2013 at 6:36 PM, Terry Reedy wrote: > I posted about a week ago, in response to Chris A., a method by which lookup > for UTF-16 can be made O(log2 k), or perhaps more accurately, > O(1+log2(k+1)), where k is the number of non-BMP chars in the string. > Which is an optimization cho

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Terry Reedy

On 7/28/2013 11:52 AM, Michael Torrie wrote: 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that slicing a string would be very very slow, Not necessarily so. See below. and that's unacceptable for the use cases of python strings. I'm assuming you understand big O notat

Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Chris Angelico

On Sun, Jul 28, 2013 at 4:52 PM, Michael Torrie wrote: > Is my understanding of these things wrong? No, your understanding of those matters is fine. There's just one area you seem to be misunderstanding; you appear to think that jmf actually cares about logical argument. I gave up on that theory

FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread Michael Torrie

On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: > Good point. FSR, nice tool for those who wish to teach > Unicode. It is not every day, one has such an opportunity. I had a long e-mail composed, but decided to chop it down, but still too long. so I ditched a lot of the context, which jmf also

Re: RE Module Performance

2013-07-28 Thread Antoon Pardon

Op 27-07-13 20:21, wxjmfa...@gmail.com schreef: Quickly. sys.getsizeof() at the light of what I explained. 1) As this FSR works with multiple encoding, it has to keep track of the encoding. it puts is in the overhead of str class (overhead = real overhead + encoding). In such a absurd way, that

Re: RE Module Performance

2013-07-27 Thread Ian Kelly

On Sat, Jul 27, 2013 at 12:21 PM, wrote: > Back to utf. utfs are not only elements of a unique set of encoded > code points. They have an interesting feature. Each "utf chunk" > holds intrisically the character (in fact the code point) it is > supposed to represent. In utf-32, the obvious case, i

Re: RE Module Performance

2013-07-27 Thread wxjmfauth

Le samedi 27 juillet 2013 04:05:03 UTC+2, Michael Torrie a écrit : > On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote: > > sys.getsizeof('––') - sys.getsizeof('–') > > > > > > I have already explained / commented this. > > > > Maybe it got lost in translation, but I don't understand yo

Re: RE Module Performance

2013-07-26 Thread Steven D'Aprano

On Fri, 26 Jul 2013 08:46:58 -0700, wxjmfauth wrote: > BTW, I'm pleased to read "sequence of bits" and not bytes. Again, utf > transformers are producing sequence of bits, call Unicode Transformation > Units, with lengths of 8/16/32 *bits*, from there the names utf8/16/32. > UCS transformers are (

Re: RE Module Performance

2013-07-26 Thread Steven D'Aprano

On Fri, 26 Jul 2013 22:12:36 -0600, Ian Kelly wrote: > On Fri, Jul 26, 2013 at 9:37 PM, Steven D'Aprano > wrote: >> See the similarity now? Both flexibly change the width used by code- >> points, UTF-8 based on the code-point itself regardless of the rest of >> the string, Python based on the lar

Re: RE Module Performance

2013-07-26 Thread Ian Kelly

On Fri, Jul 26, 2013 at 9:37 PM, Steven D'Aprano wrote: > See the similarity now? Both flexibly change the width used by code- > points, UTF-8 based on the code-point itself regardless of the rest of > the string, Python based on the largest code-point in the string. No, I think we're just using

Re: RE Module Performance

2013-07-26 Thread Steven D'Aprano

On Thu, 25 Jul 2013 21:20:45 -0600, Ian Kelly wrote: > On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano > wrote: >> UTF-8 uses a flexible representation on a character-by-character basis. >> When parsing UTF-8, one needs to look at EVERY character to decide how >> many bytes you need to read. In

Re: RE Module Performance

2013-07-26 Thread Michael Torrie

On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote: sys.getsizeof('––') - sys.getsizeof('–') > > I have already explained / commented this. Maybe it got lost in translation, but I don't understand your point with that. > Hint: To understand Unicode (and every coding scheme), you should > und

Re: RE Module Performance

2013-07-26 Thread wxjmfauth

Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit : > On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano > > wrote: > > > UTF-8 uses a flexible representation on a character-by-character basis. > > > When parsing UTF-8, one needs to look at EVERY character to decide how > > > many bytes yo

Re: RE Module Performance

2013-07-26 Thread wxjmfauth

Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit : > On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano > > wrote: > > > UTF-8 uses a flexible representation on a character-by-character basis. > > > When parsing UTF-8, one needs to look at EVERY character to decide how > > > many bytes yo

Re: RE Module Performance

2013-07-26 Thread wxjmfauth

Le vendredi 26 juillet 2013 05:09:34 UTC+2, Michael Torrie a écrit : > On 07/25/2013 11:18 AM, Steven D'Aprano wrote: > > > JMF has explained that it is impossible, impossible I say!, to write an > > > editor using a flexible string representation. Since Emacs uses such a > > > flexible string

Re: RE Module Performance

2013-07-26 Thread wxjmfauth

Le jeudi 25 juillet 2013 22:45:38 UTC+2, Ian a écrit : > On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano > > wrote: > > > On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: > > > > > >> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano > > >> wrote: > > >>> On Thu, 25 Jul 2013 14:36

Re: RE Module Performance

2013-07-25 Thread Ian Kelly

On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano wrote: > UTF-8 uses a flexible representation on a character-by-character basis. > When parsing UTF-8, one needs to look at EVERY character to decide how > many bytes you need to read. In Python 3, the flexible representation is > on a string-by-str

Re: RE Module Performance

2013-07-25 Thread Michael Torrie

On 07/25/2013 11:18 AM, Steven D'Aprano wrote: > JMF has explained that it is impossible, impossible I say!, to write an > editor using a flexible string representation. Since Emacs uses such a > flexible string representation, Emacs is impossible, and therefore Emacs > doesn't exist. Now I'm e

Re: RE Module Performance

2013-07-25 Thread Michael Torrie

On 07/25/2013 01:07 PM, wxjmfa...@gmail.com wrote: > Let start with a simple string \textemdash or \texttendash > sys.getsizeof('–') > 40 sys.getsizeof('a') > 26 That's meaningless. You're comparing the overhead of a string object itself (a one-time cost anyway), not the overhead of st

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano

On Thu, 25 Jul 2013 15:45:38 -0500, Ian Kelly wrote: > On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano > wrote: >> On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: >> >>> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano >>> wrote: On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sander

Re: RE Module Performance

2013-07-25 Thread Ian Kelly

On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano wrote: > On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: > >> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano >> wrote: >>> On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: "To conserve memory, Emacs does not hold fixed-le

Re: RE Module Performance

2013-07-25 Thread Ian Kelly

On Wed, Jul 24, 2013 at 9:34 AM, Chris Angelico wrote: > On Thu, Jul 25, 2013 at 12:17 AM, David Hutto wrote: >> I've screwed up plenty of times in python, but can write code like a pro >> when I'm feeling better(on SSI and medicaid). An editor can be built simply, >> but it's preference that mak

RE: RE Module Performance

2013-07-25 Thread Prasad, Ramit

Chris Angelico wrote: > On Fri, Jul 26, 2013 at 5:07 AM, wrote: > > Let start with a simple string \textemdash or \texttendash > > > sys.getsizeof('-') > > 40 > sys.getsizeof('a') > > 26 > > Most of the cost is in those two apostrophes, look: > > >>> sys.getsizeof('a') > 26 > >>> sys.

Re: RE Module Performance

2013-07-25 Thread Chris Angelico

On Fri, Jul 26, 2013 at 5:07 AM, wrote: > Let start with a simple string \textemdash or \texttendash > sys.getsizeof('–') > 40 sys.getsizeof('a') > 26 Most of the cost is in those two apostrophes, look: >>> sys.getsizeof('a') 26 >>> sys.getsizeof(a) 8 Okay, that's slightly unfair (bo

Re: RE Module Performance

2013-07-25 Thread wxjmfauth

Le jeudi 25 juillet 2013 12:14:46 UTC+2, Chris Angelico a écrit : > On Thu, Jul 25, 2013 at 7:27 PM, wrote: > > > A coding scheme works with a unique set of characters (the repertoire), > > > and the implementation (the programming) works with a unique set > > > of encoded code points. The cri

Re: RE Module Performance

2013-07-25 Thread Chris Angelico

On Fri, Jul 26, 2013 at 3:18 AM, Steven D'Aprano wrote: > On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: > >> On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano >> wrote: >>> On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: "To conserve memory, Emacs does not hold fixed-len

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano

On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote: > On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano > wrote: >> On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: >>> "To conserve memory, Emacs does not hold fixed-length 22-bit numbers >>> that are codepoints of text characters wi

Re: RE Module Performance

2013-07-25 Thread Chris Angelico

On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano wrote: > On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: >> "To conserve memory, Emacs does not hold fixed-length 22-bit numbers >> that are codepoints of text characters within buffers and strings. >> Rather, Emacs uses a variable-length

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano

On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote: > wxjmfa...@gmail.com wrote: > >> Short example. Writing an editor with something like the FSR is simply >> impossible (properly). > > http://www.gnu.org/software/emacs/manual/html_node/elisp/Text- Representations.html#Text-Representation

Re: RE Module Performance

2013-07-25 Thread Devyn Collier Johnson

On 07/25/2013 09:36 AM, Jeremy Sanders wrote: wxjmfa...@gmail.com wrote: Short example. Writing an editor with something like the FSR is simply impossible (properly). http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html#Text-Representations "To conserve memory,

Re: RE Module Performance

2013-07-25 Thread Jeremy Sanders

wxjmfa...@gmail.com wrote: > Short example. Writing an editor with something like the > FSR is simply impossible (properly). http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html#Text-Representations "To conserve memory, Emacs does not hold fixed-length 22-bit number

Re: RE Module Performance

2013-07-25 Thread Chris Angelico

On Thu, Jul 25, 2013 at 7:27 PM, wrote: > A coding scheme works with a unique set of characters (the repertoire), > and the implementation (the programming) works with a unique set > of encoded code points. The critical step is the path > {unique set of characters} <--> {unique set of encoded cod

Re: RE Module Performance

2013-07-25 Thread Chris Angelico

On Thu, Jul 25, 2013 at 7:22 PM, Steven D'Aprano wrote: > What I'm trying to say is that it is possible to use UTF-16 internally, > but *not* assume that every code point (character) is represented by a > single 2-byte unit. For example, the len() of a UTF-16 string should not > be calculated by c

Re: RE Module Performance

2013-07-25 Thread wxjmfauth

Le mercredi 24 juillet 2013 16:47:36 UTC+2, Michael Torrie a écrit : > On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote: > > > Sorry, you are not understanding Unicode. What is a Unicode > > > Transformation Format (UTF), what is the goal of a UTF and > > > why it is important for an implementa

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano

On Thu, 25 Jul 2013 17:58:10 +1000, Chris Angelico wrote: > On Thu, Jul 25, 2013 at 5:15 PM, Steven D'Aprano > wrote: >> On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote: >> >>> If nobody had ever thought of doing a multi-format string >>> representation, I could well imagine the Python c

Re: RE Module Performance

2013-07-25 Thread Chris Angelico

On Thu, Jul 25, 2013 at 5:15 PM, Steven D'Aprano wrote: > On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote: > >> If nobody had ever thought of doing a multi-format string >> representation, I could well imagine the Python core devs debating >> whether the cost of UTF-32 strings is worth th

Re: RE Module Performance

2013-07-25 Thread Chris Angelico

On Thu, Jul 25, 2013 at 5:02 PM, Steven D'Aprano wrote: > On Thu, 25 Jul 2013 00:34:24 +1000, Chris Angelico wrote: > >> But mainly, I'm just wondering how many people here have any basis from >> which to argue the point he's trying to make. I doubt most of us have >> (a) implemented an editor wid

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano

On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote: > If nobody had ever thought of doing a multi-format string > representation, I could well imagine the Python core devs debating > whether the cost of UTF-32 strings is worth the correctness and > consistency improvements... and most likely

Re: RE Module Performance

2013-07-25 Thread Steven D'Aprano

On Thu, 25 Jul 2013 00:34:24 +1000, Chris Angelico wrote: > But mainly, I'm just wondering how many people here have any basis from > which to argue the point he's trying to make. I doubt most of us have > (a) implemented an editor widget, or (b) tested multiple different > internal representation

Re: RE Module Performance

2013-07-24 Thread Chris Angelico

On Thu, Jul 25, 2013 at 3:49 PM, Serhiy Storchaka wrote: > 24.07.13 21:15, Chris Angelico написав(ла): > >> To my mind, exposing UTF-16 >> surrogates to the application is a bug to be fixed, not a feature to >> be maintained. > > > Python 3 uses code points from U+DC80 to U+DCFF (which are in surr

Re: RE Module Performance

2013-07-24 Thread Steven D'Aprano

On Wed, 24 Jul 2013 09:00:39 -0600, Michael Torrie wrote about JMF: > His most recent argument that Python should use UTF as a representation > is very strange to be honest. He's not arguing for anything, he is just hating on anything that gives even the tiniest benefit to ASCII users. This isn'

Re: RE Module Performance

2013-07-24 Thread Serhiy Storchaka

24.07.13 21:15, Chris Angelico написав(ла): To my mind, exposing UTF-16 surrogates to the application is a bug to be fixed, not a feature to be maintained. Python 3 uses code points from U+DC80 to U+DCFF (which are in surrogates area) to represent undecodable bytes with surrogateescape error h

Re: RE Module Performance

2013-07-24 Thread Chris Angelico

On Thu, Jul 25, 2013 at 8:59 AM, Michael Torrie wrote: > I don't fully understand > why making strings simply "unicode" in javascript breaks compatibility > with older scripts. What operations are performed on strings that > making unicode an abstract type would break? Imagine this in JavaScrip

Re: RE Module Performance

2013-07-24 Thread Michael Torrie

On 07/24/2013 04:19 PM, Chris Angelico wrote: > I'm referring here to objections like jmf's, and also to threads like this: > > http://mozilla.6506.n7.nabble.com/Flexible-String-Representation-full-Unicode-for-ES6-td267585.html > > According to the ECMAScript people, UTF-16 and exposing surrogate

Re: RE Module Performance

2013-07-24 Thread Chris Angelico

On Thu, Jul 25, 2013 at 8:09 AM, Terry Reedy wrote: > On 7/24/2013 2:15 PM, Chris Angelico wrote: >> To my mind, exposing UTF-16 surrogates to the application is a bug >> to be fixed, not a feature to be maintained. > > It is definitely not a feature, but a proper UTF-16 implementation would not >

Re: RE Module Performance

2013-07-24 Thread Terry Reedy

On 7/24/2013 2:15 PM, Chris Angelico wrote: On Thu, Jul 25, 2013 at 3:52 AM, Terry Reedy wrote: For my purpose, the mock Text works the same in 2.7 and 3.3+. Thanks for that report! And yes, it's going to behave exactly the same way, because its underlying structure is an ordered list of or

Re: RE Module Performance

2013-07-24 Thread Chris Angelico

On Thu, Jul 25, 2013 at 3:52 AM, Terry Reedy wrote: > On 7/24/2013 11:00 AM, Michael Torrie wrote: >> >> On 07/24/2013 08:34 AM, Chris Angelico wrote: >>> >>> Frankly, Python's strings are a *terrible* internal representation >>> for an editor widget - not because of PEP 393, but simply because >>

Re: RE Module Performance

2013-07-24 Thread Terry Reedy

On 7/24/2013 11:00 AM, Michael Torrie wrote: On 07/24/2013 08:34 AM, Chris Angelico wrote: Frankly, Python's strings are a *terrible* internal representation for an editor widget - not because of PEP 393, but simply because they are immutable, and every keypress would result in a rebuilding of t

Re: RE Module Performance

2013-07-24 Thread Chris Angelico

On Thu, Jul 25, 2013 at 12:47 AM, Michael Torrie wrote: > On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote: >> Sorry, you are not understanding Unicode. What is a Unicode >> Transformation Format (UTF), what is the goal of a UTF and >> why it is important for an implementation to work with a UTF.

Re: RE Module Performance

2013-07-24 Thread Michael Torrie

On 07/24/2013 08:34 AM, Chris Angelico wrote: > Frankly, Python's strings are a *terrible* internal representation > for an editor widget - not because of PEP 393, but simply because > they are immutable, and every keypress would result in a rebuilding > of the string. On the flip side, I could qui

1 2 >

1 - 100 of 126 matches

Mail list logo