Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-28 Thread jak
Il 27/05/2021 05:54, Cameron Simpson ha scritto: On 26May2021 12:11, Jon Ribbens wrote: On 2021-05-26, Alan Gauld wrote: I confess I had just assumed the unicode strings were stored in native unicode UTF8 format. If you do that then indexing and slicing strings becomes very slow. True, bu

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-27 Thread Chris Angelico
On Thu, May 27, 2021 at 1:56 PM Cameron Simpson wrote: > > On 26May2021 12:11, Jon Ribbens wrote: > >On 2021-05-26, Alan Gauld wrote: > >> I confess I had just assumed the unicode strings were stored > >> in native unicode UTF8 format. > > > >If you do that then indexing and slicing strings beco

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Cameron Simpson
On 26May2021 12:11, Jon Ribbens wrote: >On 2021-05-26, Alan Gauld wrote: >> I confess I had just assumed the unicode strings were stored >> in native unicode UTF8 format. > >If you do that then indexing and slicing strings becomes very slow. True, but that isn't necessarily a show stopper. My im

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Alan Gauld via Python-list
On 26/05/2021 22:15, Tim Chase wrote: > If you don't decode it upon reading it in, it should still be 100MB > because it's a stream of encoded bytes. I usually convert them to utf8. > You don't specify what you then do with this humongous string, Mainly I search for regex patterns which can

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Tim Chase
On 2021-05-26 18:43, Alan Gauld via Python-list wrote: > On 26/05/2021 14:09, Tim Chase wrote: >>> If so, doesn't that introduce a pretty big storage overhead for >>> large strings? >> >> Yes. Though such large strings tend to be more rare, largely >> because they become unweildy for other reas

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Alan Gauld via Python-list
On 26/05/2021 14:09, Tim Chase wrote: >> If so, doesn't that introduce a pretty big storage overhead for >> large strings? > > Yes. Though such large strings tend to be more rare, largely because > they become unweildy for other reasons. I do have some scripts that work on large strings - mainl

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Terry Reedy
On 5/26/2021 12:07 PM, Chris Angelico wrote: On Thu, May 27, 2021 at 1:59 AM Jon Ribbens via Python-list wrote: On 2021-05-26, Alan Gauld wrote: On 25/05/2021 23:23, Terry Reedy wrote: In CPython's Flexible String Representation all characters in a string are stored with the same number of

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Chris Angelico
On Thu, May 27, 2021 at 1:59 AM Jon Ribbens via Python-list wrote: > > On 2021-05-26, Alan Gauld wrote: > > On 25/05/2021 23:23, Terry Reedy wrote: > >> In CPython's Flexible String Representation all characters in a string > >> are stored with the same number of bytes, depending on the largest >

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Jon Ribbens via Python-list
On 2021-05-26, Alan Gauld wrote: > On 25/05/2021 23:23, Terry Reedy wrote: >> In CPython's Flexible String Representation all characters in a string >> are stored with the same number of bytes, depending on the largest >> codepoint. > > I'm learning lots of new things in this thread! > > Does th

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Tim Chase
On 2021-05-26 08:18, Alan Gauld via Python-list wrote: > Does that mean that if I give Python a UTF8 string that is mostly > single byte characters but contains one 4-byte character that > Python will store the string as all 4-byte characters? As best I understand it, yes: the cost of each "chara

Re: string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Chris Angelico
On Wed, May 26, 2021 at 10:04 PM Alan Gauld via Python-list wrote: > > On 25/05/2021 23:23, Terry Reedy wrote: > > > In CPython's Flexible String Representation all characters in a string > > are stored with the same number of bytes, depending on the largest > > codepoint. > > I'm learning lots of

string storage [was: Re: imaplib: is this really so unwieldy?]

2021-05-26 Thread Alan Gauld via Python-list
On 25/05/2021 23:23, Terry Reedy wrote: > In CPython's Flexible String Representation all characters in a string > are stored with the same number of bytes, depending on the largest > codepoint. I'm learning lots of new things in this thread! Does that mean that if I give Python a UTF8 string