On Wed, Jul 31, 2013 at 6:45 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
if you care about minimizing every possible byte, you should
use a low-level language like C. Then you can give every character 21
bits, and be happy that you don't waste even one bit.
Could go better!
Op 31-07-13 05:30, Michael Torrie schreef:
On 07/30/2013 12:19 PM, Antoon Pardon wrote:
So? Why are you making this a point of discussion? I was not aware that
the pro and cons of various editor buffer implemantations was relevant
to the point I was trying to make.
I for one found it very
Op 30-07-13 21:09, wxjmfa...@gmail.com schreef:
Matable, immutable, copyint + xxx, bufferint, O(n)
Yes, but conceptualy the reencoding happen sometime, somewhere.
Which is a far cry from your previous claim that it happened
every time you enter a char.
This of course make your case harder
FSR:
===
The 'a' in 'a€' and 'a\U0001d11e:
['{:#010b}'.format(c) for c in 'a€'.encode('utf-16-be')]
['0b', '0b0111', '0b0010', '0b10101100']
['{:#010b}'.format(c) for c in 'a\U0001d11e'.encode('utf-32-be')]
['0b', '0b', '0b', '0b0111',
'0b',
Op 31-07-13 10:32, wxjmfa...@gmail.com schreef:
Unicode/utf*
i) (primary key) Create and use a unique set of encoded
code points.
FSR does this.
st1 = 'a€'
st2 = 'aa'
ord(st1[0])
97
ord(st2[0])
97
ii) (secondary key) Depending of the wish,
memory/performance:
On 07/31/2013 01:23 AM, Antoon Pardon wrote:
Op 31-07-13 05:30, Michael Torrie schreef:
On 07/30/2013 12:19 PM, Antoon Pardon wrote:
So? Why are you making this a point of discussion? I was not aware that
the pro and cons of various editor buffer implemantations was relevant
to the point I
On 07/31/2013 02:32 AM, wxjmfa...@gmail.com wrote:
Unicode/utf*
Why do you keep using the terms utf and Unicode interchangeably?
--
http://mail.python.org/mailman/listinfo/python-list
Le mercredi 31 juillet 2013 07:45:18 UTC+2, Steven D'Aprano a écrit :
On Tue, 30 Jul 2013 12:09:11 -0700, wxjmfauth wrote:
And do not forget, in a pure utf coding scheme, your char or a char will
*never* be larger than 4 bytes.
sys.getsizeof('a')
26
On Wed, Jul 31, 2013 at 9:15 PM, wxjmfa...@gmail.com wrote:
... char never consumes or requires more than 4 bytes ...
The integer 5 should be able to be stored in 3 bits.
sys.getsizeof(5)
14
Clearly Python is doing something really horribly wrong here. In fact,
sys.getsizeof needs to be
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote:
Back to utf. utfs are not only elements of a unique set of encoded
code points. They have an interesting feature. Each utf chunk
holds intrisically the character (in
Op 30-07-13 16:01, wxjmfa...@gmail.com schreef:
I am pretty sure that once you have typed your 127504
ascii characters, you are very happy the buffer of your
editor does not waste time in reencoding the buffer as
soon as you enter an €, the 125505th char. Sorry, I wanted
to say z instead of
On Tue, Jul 30, 2013 at 3:01 PM, wxjmfa...@gmail.com wrote:
I am pretty sure that once you have typed your 127504
ascii characters, you are very happy the buffer of your
editor does not waste time in reencoding the buffer as
soon as you enter an €, the 125505th char. Sorry, I wanted
to say z
On 30/07/2013 15:38, Antoon Pardon wrote:
Op 30-07-13 16:01, wxjmfa...@gmail.com schreef:
I am pretty sure that once you have typed your 127504 ascii
characters, you are very happy the buffer of your editor does not
waste time in reencoding the buffer as soon as you enter an €, the
125505th
Op 30-07-13 18:13, MRAB schreef:
On 30/07/2013 15:38, Antoon Pardon wrote:
Op 30-07-13 16:01, wxjmfa...@gmail.com schreef:
I am pretty sure that once you have typed your 127504 ascii
characters, you are very happy the buffer of your editor does not
waste time in reencoding the buffer as soon
On 30/07/2013 17:39, Antoon Pardon wrote:
Op 30-07-13 18:13, MRAB schreef:
On 30/07/2013 15:38, Antoon Pardon wrote:
Op 30-07-13 16:01, wxjmfa...@gmail.com schreef:
I am pretty sure that once you have typed your 127504 ascii
characters, you are very happy the buffer of your editor does not
On 31 July 2013 00:01, wxjmfa...@gmail.com wrote:
I am pretty sure that once you have typed your 127504
ascii characters, you are very happy the buffer of your
editor does not waste time in reencoding the buffer as
soon as you enter an €, the 125505th char. Sorry, I wanted
to say z instead
On 30 July 2013 17:39, Antoon Pardon antoon.par...@rece.vub.ac.be wrote:
Op 30-07-13 18:13, MRAB schreef:
On 30/07/2013 15:38, Antoon Pardon wrote:
Op 30-07-13 16:01, wxjmfa...@gmail.com schreef:
I am pretty sure that once you have typed your 127504 ascii
characters, you are very happy
Op 30-07-13 19:14, MRAB schreef:
On 30/07/2013 17:39, Antoon Pardon wrote:
Op 30-07-13 18:13, MRAB schreef:
On 30/07/2013 15:38, Antoon Pardon wrote:
Op 30-07-13 16:01, wxjmfa...@gmail.com schreef:
I am pretty sure that once you have typed your 127504 ascii
characters, you are very happy
Matable, immutable, copyint + xxx, bufferint, O(n)
Yes, but conceptualy the reencoding happen sometime, somewhere.
The internal ucs-2 will never automagically be transformed
into ucs-4 (eg).
timeit.timeit('a'*1 +'€')
7.087220684719967
timeit.timeit('a'*1 +'z')
1.5685214234430873
On Tue, Jul 30, 2013 at 8:09 PM, wxjmfa...@gmail.com wrote:
Matable, immutable, copyint + xxx, bufferint, O(n)
Yes, but conceptualy the reencoding happen sometime, somewhere.
The internal ucs-2 will never automagically be transformed
into ucs-4 (eg).
But probably not on the entire
On 7/30/2013 1:40 PM, Joshua Landau wrote:
Additionally, who says a language couldn't use, say, B-Trees for all of
its list-like types, including strings?
Tk apparently uses a B-tree in its text widget.
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list
MRAB:
The disadvantage there is that when you move the cursor you must move
characters around. For example, what if the cursor was at the start and
you wanted to move it to the end? Also, when the gap has been filled,
you need to make a new one.
The normal technique is to only move the gap
On 07/30/2013 12:19 PM, Antoon Pardon wrote:
So? Why are you making this a point of discussion? I was not aware that
the pro and cons of various editor buffer implemantations was relevant
to the point I was trying to make.
I for one found it very interesting. In fact this thread caused me to
On 07/30/2013 01:09 PM, wxjmfa...@gmail.com wrote:
Matable, immutable, copyint + xxx, bufferint, O(n)
Yes, but conceptualy the reencoding happen sometime, somewhere.
The internal ucs-2 will never automagically be transformed
into ucs-4 (eg).
So what major python project are you working
On Tue, 30 Jul 2013 12:09:11 -0700, wxjmfauth wrote:
And do not forget, in a pure utf coding scheme, your char or a char will
*never* be larger than 4 bytes.
sys.getsizeof('a')
26
sys.getsizeof('\U000101000')
48
Neither character above is larger than 4 bytes. You forgot to deduct the
Op 26-07-13 15:21, wxjmfa...@gmail.com schreef:
Hint: To understand Unicode (and every coding scheme), you should
understand utf. The how and the *why*.
No you don't. You are mixing the information with how the information
is coded. utf is like base64, a way of coding the information that is
Op 28-07-13 20:19, Joshua Landau schreef:
On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be
mailto:antoon.par...@rece.vub.ac.be wrote:
Op 27-07-13 20:21, wxjmfa...@gmail.com mailto:wxjmfa...@gmail.com
schreef:
utf-8 or any (utf) never need and never spend their
Op 28-07-13 21:30, wxjmfa...@gmail.com schreef:
To be short, this is *never* the FSR, always something
else.
Suggestion. Start by solving all these micro-benchmarks.
all the memory cases. It a good start, no?
There is nothing to solve. Unicode doesn't force implementations
to use the same
On Sun, Jul 28, 2013 at 11:14 PM, Joshua Landau jos...@landau.ws wrote:
GC does have sometimes severe impact in memory-constrained environments,
though. See http://sealedabstract.com/rants/why-mobile-web-apps-are-slow/,
about half-way down, specifically
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote:
Do not forget that à la FSR mechanism for a non-ascii user is
*irrelevant*.
You have been told repeatedly, Python's internals are *full* of ASCII-
only
On Mon, Jul 29, 2013 at 12:43 PM, wxjmfa...@gmail.com wrote:
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
3.2
timeit.timeit(r = dir(list))
22.300465007102908
3.3
timeit.timeit(r = dir(list))
27.13981129541519
3.2:
len(dir(list))
42
3.3:
len(dir(list))
45
Am 29.07.2013 13:43, schrieb wxjmfa...@gmail.com:
3.2
timeit.timeit(r = dir(list))
22.300465007102908
3.3
timeit.timeit(r = dir(list))
27.13981129541519
For the record, I do not put your example to contradict
you. I was expecting such a result even before testing.
Now, if you do not
On 07/29/2013 08:06 AM, Heiko Wundram wrote:
Am 29.07.2013 13:43, schrieb wxjmfa...@gmail.com:
3.2
timeit.timeit(r = dir(list))
22.300465007102908
3.3
timeit.timeit(r = dir(list))
27.13981129541519
For the record, I do not put your example to contradict
you. I was expecting such a result
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit :
On Mon, Jul 29, 2013 at 12:43 PM, wxjmfa...@gmail.com wrote:
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
3.2
timeit.timeit(r = dir(list))
22.300465007102908
3.3
timeit.timeit(r
Le dimanche 28 juillet 2013 19:36:00 UTC+2, Terry Reedy a écrit :
On 7/28/2013 11:52 AM, Michael Torrie wrote:
3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
slicing a string would be very very slow,
Not necessarily so. See below.
and that's
Le lundi 29 juillet 2013 13:57:47 UTC+2, Chris Angelico a écrit :
On Mon, Jul 29, 2013 at 12:43 PM, wxjmfa...@gmail.com wrote:
Le dimanche 28 juillet 2013 22:52:16 UTC+2, Steven D'Aprano a écrit :
3.2
timeit.timeit(r = dir(list))
22.300465007102908
3.3
timeit.timeit(r
On Mon, Jul 29, 2013 at 3:20 PM, wxjmfa...@gmail.com wrote:
c:\python32\pythonw -u timitmod.py
15.258061416225663
Exit code: 0
c:\Python33\pythonw -u timitmod.py
17.052203122286194
Exit code: 0
len(dir(C))
Did you even think to check that before you posted timings?
ChrisA
--
Le lundi 29 juillet 2013 16:49:34 UTC+2, Chris Angelico a écrit :
On Mon, Jul 29, 2013 at 3:20 PM, wxjmfa...@gmail.com wrote:
c:\python32\pythonw -u timitmod.py
15.258061416225663
Exit code: 0
c:\Python33\pythonw -u timitmod.py
17.052203122286194
Exit code: 0
Op 27-07-13 20:21, wxjmfa...@gmail.com schreef:
Quickly. sys.getsizeof() at the light of what I explained.
1) As this FSR works with multiple encoding, it has to keep
track of the encoding. it puts is in the overhead of str
class (overhead = real overhead + encoding). In such
a absurd way,
On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote:
Good point. FSR, nice tool for those who wish to teach
Unicode. It is not every day, one has such an opportunity.
I had a long e-mail composed, but decided to chop it down, but still too
long. so I ditched a lot of the context, which jmf also
On Sun, Jul 28, 2013 at 4:52 PM, Michael Torrie torr...@gmail.com wrote:
Is my understanding of these things wrong?
No, your understanding of those matters is fine. There's just one area
you seem to be misunderstanding; you appear to think that jmf actually
cares about logical argument. I gave
On 7/28/2013 11:52 AM, Michael Torrie wrote:
3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that
slicing a string would be very very slow,
Not necessarily so. See below.
and that's unacceptable for
the use cases of python strings. I'm assuming you understand big O
On Sun, Jul 28, 2013 at 6:36 PM, Terry Reedy tjre...@udel.edu wrote:
I posted about a week ago, in response to Chris A., a method by which lookup
for UTF-16 can be made O(log2 k), or perhaps more accurately,
O(1+log2(k+1)), where k is the number of non-BMP chars in the string.
Which is an
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote:
Back to utf. utfs are not only elements of a unique set of encoded
code points. They have an interesting feature. Each utf chunk
holds intrisically the character (in
On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be wrote:
Op 27-07-13 20:21, wxjmfa...@gmail.com schreef:
utf-8 or any (utf) never need and never spend their time
in reencoding.
So? That python sometimes needs to do some kind of background
processing is not a problem,
On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau jos...@landau.ws wrote:
On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be wrote:
Op 27-07-13 20:21, wxjmfa...@gmail.com schreef:
utf-8 or any (utf) never need and never spend their time
in reencoding.
So? That python sometimes
On 28/07/2013 19:13, wxjmfa...@gmail.com wrote:
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote:
Back to utf. utfs are not only elements of a unique set of encoded
code points. They have an interesting feature. Each utf
On 7/28/2013 2:29 PM, Chris Angelico wrote:
On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau jos...@landau.ws wrote:
Somewhat off topic, but befitting of the triviality of this thread, do I
understand correctly that you are saying garbage collection never causes any
noticeable slowdown in
Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit :
On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote:
Good point. FSR, nice tool for those who wish to teach
Unicode. It is not every day, one has such an opportunity.
I had a long e-mail composed, but decided to
Le dimanche 28 juillet 2013 21:04:56 UTC+2, MRAB a écrit :
On 28/07/2013 19:13, wxjmfa...@gmail.com wrote:
Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :
On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote:
Back to utf. utfs are not only elements of a unique
On 28/07/2013 20:23, wxjmfa...@gmail.com wrote:
[snip]
Compare these (a BDFL exemple, where I'using a non-ascii char)
Py 3.2 (narrow build)
Why are you using a narrow build of Python 3.2? It doesn't treat all
codepoints equally (those outside the BMP can't be stored in one code
unit) and,
Op 28-07-13 21:23, wxjmfa...@gmail.com schreef:
Le dimanche 28 juillet 2013 17:52:47 UTC+2, Michael Torrie a écrit :
On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote:
Good point. FSR, nice tool for those who wish to teach
Unicode. It is not every day, one has such an opportunity.
I
wxjmfa...@gmail.com writes:
Suggestion. Start by solving all these micro-benchmarks.
all the memory cases. It a good start, no?
Since you seem the only one who has this dramatic problem with such
micro-benchmarks, that BTW have nothing to do with unicode compliance,
I'd suggest *you* should
On Sun, 28 Jul 2013 12:23:04 -0700, wxjmfauth wrote:
Do not forget that à la FSR mechanism for a non-ascii user is
*irrelevant*.
You have been told repeatedly, Python's internals are *full* of ASCII-
only strings.
py dir(list)
['__add__', '__class__', '__contains__', '__delattr__',
On 28 July 2013 19:29, Chris Angelico ros...@gmail.com wrote:
On Sun, Jul 28, 2013 at 7:19 PM, Joshua Landau jos...@landau.ws wrote:
On 28 July 2013 09:45, Antoon Pardon antoon.par...@rece.vub.ac.be
wrote:
Op 27-07-13 20:21, wxjmfa...@gmail.com schreef:
utf-8 or any (utf) never need
On Fri, 26 Jul 2013 08:46:58 -0700, wxjmfauth wrote:
BTW, I'm pleased to read sequence of bits and not bytes. Again, utf
transformers are producing sequence of bits, call Unicode Transformation
Units, with lengths of 8/16/32 *bits*, from there the names utf8/16/32.
UCS transformers are (were)
Le samedi 27 juillet 2013 04:05:03 UTC+2, Michael Torrie a écrit :
On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote:
sys.getsizeof('––') - sys.getsizeof('–')
I have already explained / commented this.
Maybe it got lost in translation, but I don't understand your point with
On Sat, Jul 27, 2013 at 12:21 PM, wxjmfa...@gmail.com wrote:
Back to utf. utfs are not only elements of a unique set of encoded
code points. They have an interesting feature. Each utf chunk
holds intrisically the character (in fact the code point) it is
supposed to represent. In utf-32, the
Le jeudi 25 juillet 2013 22:45:38 UTC+2, Ian a écrit :
On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
Le vendredi 26 juillet 2013 05:09:34 UTC+2, Michael Torrie a écrit :
On 07/25/2013 11:18 AM, Steven D'Aprano wrote:
JMF has explained that it is impossible, impossible I say!, to write an
editor using a flexible string representation. Since Emacs uses such a
flexible string
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit :
On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
UTF-8 uses a flexible representation on a character-by-character basis.
When parsing UTF-8, one needs to look at EVERY character to
Le vendredi 26 juillet 2013 05:20:45 UTC+2, Ian a écrit :
On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
UTF-8 uses a flexible representation on a character-by-character basis.
When parsing UTF-8, one needs to look at EVERY character to
On 07/26/2013 07:21 AM, wxjmfa...@gmail.com wrote:
sys.getsizeof('––') - sys.getsizeof('–')
I have already explained / commented this.
Maybe it got lost in translation, but I don't understand your point with
that.
Hint: To understand Unicode (and every coding scheme), you should
understand
On Thu, 25 Jul 2013 21:20:45 -0600, Ian Kelly wrote:
On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
UTF-8 uses a flexible representation on a character-by-character basis.
When parsing UTF-8, one needs to look at EVERY character to decide how
On Fri, Jul 26, 2013 at 9:37 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
See the similarity now? Both flexibly change the width used by code-
points, UTF-8 based on the code-point itself regardless of the rest of
the string, Python based on the largest code-point in the
On Fri, 26 Jul 2013 22:12:36 -0600, Ian Kelly wrote:
On Fri, Jul 26, 2013 at 9:37 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
See the similarity now? Both flexibly change the width used by code-
points, UTF-8 based on the code-point itself regardless of the rest of
the
On Wed, 24 Jul 2013 09:00:39 -0600, Michael Torrie wrote about JMF:
His most recent argument that Python should use UTF as a representation
is very strange to be honest.
He's not arguing for anything, he is just hating on anything that gives
even the tiniest benefit to ASCII users. This isn't
On Thu, Jul 25, 2013 at 3:49 PM, Serhiy Storchaka storch...@gmail.com wrote:
24.07.13 21:15, Chris Angelico написав(ла):
To my mind, exposing UTF-16
surrogates to the application is a bug to be fixed, not a feature to
be maintained.
Python 3 uses code points from U+DC80 to U+DCFF (which
On Thu, 25 Jul 2013 00:34:24 +1000, Chris Angelico wrote:
But mainly, I'm just wondering how many people here have any basis from
which to argue the point he's trying to make. I doubt most of us have
(a) implemented an editor widget, or (b) tested multiple different
internal representations
On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote:
If nobody had ever thought of doing a multi-format string
representation, I could well imagine the Python core devs debating
whether the cost of UTF-32 strings is worth the correctness and
consistency improvements... and most likely
On Thu, Jul 25, 2013 at 5:02 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Thu, 25 Jul 2013 00:34:24 +1000, Chris Angelico wrote:
But mainly, I'm just wondering how many people here have any basis from
which to argue the point he's trying to make. I doubt most of us have
On Thu, Jul 25, 2013 at 5:15 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote:
If nobody had ever thought of doing a multi-format string
representation, I could well imagine the Python core devs debating
whether the cost
On Thu, 25 Jul 2013 17:58:10 +1000, Chris Angelico wrote:
On Thu, Jul 25, 2013 at 5:15 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Thu, 25 Jul 2013 04:15:42 +1000, Chris Angelico wrote:
If nobody had ever thought of doing a multi-format string
representation, I could
Le mercredi 24 juillet 2013 16:47:36 UTC+2, Michael Torrie a écrit :
On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote:
Sorry, you are not understanding Unicode. What is a Unicode
Transformation Format (UTF), what is the goal of a UTF and
why it is important for an implementation to
On Thu, Jul 25, 2013 at 7:22 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
What I'm trying to say is that it is possible to use UTF-16 internally,
but *not* assume that every code point (character) is represented by a
single 2-byte unit. For example, the len() of a UTF-16
On Thu, Jul 25, 2013 at 7:27 PM, wxjmfa...@gmail.com wrote:
A coding scheme works with a unique set of characters (the repertoire),
and the implementation (the programming) works with a unique set
of encoded code points. The critical step is the path
{unique set of characters} -- {unique set
wxjmfa...@gmail.com wrote:
Short example. Writing an editor with something like the
FSR is simply impossible (properly).
http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html#Text-Representations
To conserve memory, Emacs does not hold fixed-length 22-bit numbers
On 07/25/2013 09:36 AM, Jeremy Sanders wrote:
wxjmfa...@gmail.com wrote:
Short example. Writing an editor with something like the
FSR is simply impossible (properly).
http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Representations.html#Text-Representations
To conserve memory,
On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote:
wxjmfa...@gmail.com wrote:
Short example. Writing an editor with something like the FSR is simply
impossible (properly).
http://www.gnu.org/software/emacs/manual/html_node/elisp/Text-
Representations.html#Text-Representations
To
On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote:
To conserve memory, Emacs does not hold fixed-length 22-bit numbers
that are codepoints of text characters within buffers and strings.
Rather,
On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy Sanders wrote:
To conserve memory, Emacs does not hold fixed-length 22-bit numbers
that are
On Fri, Jul 26, 2013 at 3:18 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy
Le jeudi 25 juillet 2013 12:14:46 UTC+2, Chris Angelico a écrit :
On Thu, Jul 25, 2013 at 7:27 PM, wxjmfa...@gmail.com wrote:
A coding scheme works with a unique set of characters (the repertoire),
and the implementation (the programming) works with a unique set
of encoded code
On Fri, Jul 26, 2013 at 5:07 AM, wxjmfa...@gmail.com wrote:
Let start with a simple string \textemdash or \texttendash
sys.getsizeof('–')
40
sys.getsizeof('a')
26
Most of the cost is in those two apostrophes, look:
sys.getsizeof('a')
26
sys.getsizeof(a)
8
Okay, that's slightly unfair
Chris Angelico wrote:
On Fri, Jul 26, 2013 at 5:07 AM, wxjmfa...@gmail.com wrote:
Let start with a simple string \textemdash or \texttendash
sys.getsizeof('-')
40
sys.getsizeof('a')
26
Most of the cost is in those two apostrophes, look:
sys.getsizeof('a')
26
On Wed, Jul 24, 2013 at 9:34 AM, Chris Angelico ros...@gmail.com wrote:
On Thu, Jul 25, 2013 at 12:17 AM, David Hutto dwightdhu...@gmail.com wrote:
I've screwed up plenty of times in python, but can write code like a pro
when I'm feeling better(on SSI and medicaid). An editor can be built
On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Thu, 25 Jul 2013 14:36:25 +0100, Jeremy
On Thu, 25 Jul 2013 15:45:38 -0500, Ian Kelly wrote:
On Thu, Jul 25, 2013 at 12:18 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
On Fri, 26 Jul 2013 01:36:07 +1000, Chris Angelico wrote:
On Fri, Jul 26, 2013 at 1:26 AM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info
On 07/25/2013 01:07 PM, wxjmfa...@gmail.com wrote:
Let start with a simple string \textemdash or \texttendash
sys.getsizeof('–')
40
sys.getsizeof('a')
26
That's meaningless. You're comparing the overhead of a string object
itself (a one-time cost anyway), not the overhead of storing the
On 07/25/2013 11:18 AM, Steven D'Aprano wrote:
JMF has explained that it is impossible, impossible I say!, to write an
editor using a flexible string representation. Since Emacs uses such a
flexible string representation, Emacs is impossible, and therefore Emacs
doesn't exist.
Now I'm even
On Thu, Jul 25, 2013 at 8:48 PM, Steven D'Aprano
steve+comp.lang.pyt...@pearwood.info wrote:
UTF-8 uses a flexible representation on a character-by-character basis.
When parsing UTF-8, one needs to look at EVERY character to decide how
many bytes you need to read. In Python 3, the flexible
Le samedi 13 juillet 2013 01:13:47 UTC+2, Michael Torrie a écrit :
On 07/12/2013 09:59 AM, Joshua Landau wrote:
If you're interested, the basic of it is that strings now use a
variable number of bytes to encode their values depending on whether
values outside of the ASCII range and
On Wed, Jul 24, 2013 at 11:40 PM, wxjmfa...@gmail.com wrote:
Short example. Writing an editor with something like the
FSR is simply impossible (properly).
jmf, have you ever written an editor with *any* string representation?
Are you speaking from any level of experience at all?
ChrisA
--
I've screwed up plenty of times in python, but can write code like a pro
when I'm feeling better(on SSI and medicaid). An editor can be built
simply, but it's preference that makes the difference. Some might have used
tkinter, gtk. wxpython or other methods for the task.
I think the main issue in
I've screwed up plenty of times in python, but can write code like a pro
when I'm feeling better(on SSI and medicaid). An editor can be built
simply, but it's preference that makes the difference. Some might have used
tkinter, gtk. wxpython or other methods for the task.
I think the main issue in
On Thu, Jul 25, 2013 at 12:17 AM, David Hutto dwightdhu...@gmail.com wrote:
I've screwed up plenty of times in python, but can write code like a pro
when I'm feeling better(on SSI and medicaid). An editor can be built simply,
but it's preference that makes the difference. Some might have used
On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote:
Sorry, you are not understanding Unicode. What is a Unicode
Transformation Format (UTF), what is the goal of a UTF and
why it is important for an implementation to work with a UTF.
Really? Enlighten me.
Personally, I would never use UTF as a
On 07/24/2013 08:34 AM, Chris Angelico wrote:
Frankly, Python's strings are a *terrible* internal representation
for an editor widget - not because of PEP 393, but simply because
they are immutable, and every keypress would result in a rebuilding
of the string. On the flip side, I could quite
On Thu, Jul 25, 2013 at 12:47 AM, Michael Torrie torr...@gmail.com wrote:
On 07/24/2013 07:40 AM, wxjmfa...@gmail.com wrote:
Sorry, you are not understanding Unicode. What is a Unicode
Transformation Format (UTF), what is the goal of a UTF and
why it is important for an implementation to work
On 7/24/2013 11:00 AM, Michael Torrie wrote:
On 07/24/2013 08:34 AM, Chris Angelico wrote:
Frankly, Python's strings are a *terrible* internal representation
for an editor widget - not because of PEP 393, but simply because
they are immutable, and every keypress would result in a rebuilding
of
1 - 100 of 126 matches
Mail list logo