Re: Why is array.array('u') deprecated?

2015-05-11 Thread Mark Lawrence

On 08/05/2015 15:40, jonathan.slend...@gmail.com wrote:

Le vendredi 8 mai 2015 15:11:56 UTC+2, Peter Otten a écrit :

So, this works perfectly fine and fast. But it scares me that it's
deprecated and Python 4 will not support it anymore.


Hm, this doesn't even work with Python 3:


My mistake. I should have tested better.


data = array.array(u, ux*1000)
data[100] = y
re.search(y, data)

Traceback (most recent call last):
   File stdin, line 1, in module
   File /usr/lib/python3.4/re.py, line 166, in search
 return _compile(pattern, flags).search(string)
TypeError: can't use a string pattern on a bytes-like object

You can search for bytes


re.search(by, data)

_sre.SRE_Match object; span=(400, 401), match=b'y'

data[101] = z
re.search(by, data)

_sre.SRE_Match object; span=(400, 401), match=b'y'

re.search(byz, data)
re.search(by\0\0\0z, data)

_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'

but if that is good enough you can use a bytearray in the first place.


Maybe I'll try that. Thanks for the suggestions!

Jonathan



http://sourceforge.net/projects/pyropes/ of any use to you?

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Why is array.array('u') deprecated?

2015-05-08 Thread jonathan . slenders
Why is array.array('u') deprecated?

Will we get an alternative for a character array or mutable unicode string?

Thanks!
Jonathan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why is array.array('u') deprecated?

2015-05-08 Thread jonathan . slenders
Le vendredi 8 mai 2015 12:29:15 UTC+2, Steven D'Aprano a écrit :
 On Fri, 8 May 2015 07:14 pm, jonathan.slenders wrote:
 
  Why is array.array('u') deprecated?
  
  Will we get an alternative for a character array or mutable unicode
  string?
 
 
 Good question.
 
 Of the three main encodings for Unicode, two are variable-width: 
 
 * UTF-8 uses 1-4 bytes per character 
 * UTF-16 uses 2 or 4 bytes per character
 
 while UTF-32 is fixed-width (4 bytes per character). So you could try faking
 it with a 32-bit array and filling it with string.encode('utf-32').


I guess that doesn't work. I need to have something that I can pass to the re 
module for searching through it. Creating new strings all the time is no 
option. (Think about gigabyte strings.)


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why is array.array('u') deprecated?

2015-05-08 Thread Steven D'Aprano
On Fri, 8 May 2015 07:14 pm, jonathan.slend...@gmail.com wrote:

 Why is array.array('u') deprecated?
 
 Will we get an alternative for a character array or mutable unicode
 string?


Good question.

Of the three main encodings for Unicode, two are variable-width: 

* UTF-8 uses 1-4 bytes per character 
* UTF-16 uses 2 or 4 bytes per character

while UTF-32 is fixed-width (4 bytes per character). So you could try faking
it with a 32-bit array and filling it with string.encode('utf-32').



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why is array.array('u') deprecated?

2015-05-08 Thread jonathan . slenders
Le vendredi 8 mai 2015 15:11:56 UTC+2, Peter Otten a écrit :
  So, this works perfectly fine and fast. But it scares me that it's
  deprecated and Python 4 will not support it anymore.
 
 Hm, this doesn't even work with Python 3:

My mistake. I should have tested better.

  data = array.array(u, ux*1000)
  data[100] = y
  re.search(y, data)
 Traceback (most recent call last):
   File stdin, line 1, in module
   File /usr/lib/python3.4/re.py, line 166, in search
 return _compile(pattern, flags).search(string)
 TypeError: can't use a string pattern on a bytes-like object
 
 You can search for bytes
 
  re.search(by, data)
 _sre.SRE_Match object; span=(400, 401), match=b'y'
  data[101] = z
  re.search(by, data)
 _sre.SRE_Match object; span=(400, 401), match=b'y'
  re.search(byz, data)
  re.search(by\0\0\0z, data)
 _sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'
 
 but if that is good enough you can use a bytearray in the first place.

Maybe I'll try that. Thanks for the suggestions!

Jonathan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why is array.array('u') deprecated?

2015-05-08 Thread Peter Otten
jonathan.slend...@gmail.com wrote:

 Can you expand a bit on how array(u) helps here? Are the matches in the
 gigabyte range?
 
 I have a string of unicode characters, e.g.:
 
 data = array.array('u', u'x' * 10)
 
 Then I need to change some data in the middle of this string, for
 instance:
 
 data[50] = 'y'
 
 Then I want to use re to search in this text:
 
 re.search('y', data)
 
 This has to be fast. I really don't want to split and concatenate strings.
 Re should be able to process it and the expressions can be much more
 complex than this. (I think it should be anything that implements the
 buffer protocol).
 
 So, this works perfectly fine and fast. But it scares me that it's
 deprecated and Python 4 will not support it anymore.

Hm, this doesn't even work with Python 3:

 data = array.array(u, ux*1000)
 data[100] = y
 re.search(y, data)
Traceback (most recent call last):
  File stdin, line 1, in module
  File /usr/lib/python3.4/re.py, line 166, in search
return _compile(pattern, flags).search(string)
TypeError: can't use a string pattern on a bytes-like object

You can search for bytes

 re.search(by, data)
_sre.SRE_Match object; span=(400, 401), match=b'y'
 data[101] = z
 re.search(by, data)
_sre.SRE_Match object; span=(400, 401), match=b'y'
 re.search(byz, data)
 re.search(by\0\0\0z, data)
_sre.SRE_Match object; span=(400, 405), match=b'y\x00\x00\x00z'

but if that is good enough you can use a bytearray in the first place.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why is array.array('u') deprecated?

2015-05-08 Thread Peter Otten
jonathan.slend...@gmail.com wrote:

 Le vendredi 8 mai 2015 12:29:15 UTC+2, Steven D'Aprano a écrit :
 On Fri, 8 May 2015 07:14 pm, jonathan.slenders wrote:
 
  Why is array.array('u') deprecated?
  
  Will we get an alternative for a character array or mutable unicode
  string?
 
 
 Good question.
 
 Of the three main encodings for Unicode, two are variable-width:
 
 * UTF-8 uses 1-4 bytes per character
 * UTF-16 uses 2 or 4 bytes per character
 
 while UTF-32 is fixed-width (4 bytes per character). So you could try
 faking it with a 32-bit array and filling it with
 string.encode('utf-32').
 
 
 I guess that doesn't work. I need to have something that I can pass to the
 re module for searching through it. Creating new strings all the time is
 no option. (Think about gigabyte strings.)

Can you expand a bit on how array(u) helps here? Are the matches in the 
gigabyte range?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why is array.array('u') deprecated?

2015-05-08 Thread jonathan . slenders
 Can you expand a bit on how array(u) helps here? Are the matches in the 
 gigabyte range?

I have a string of unicode characters, e.g.:

data = array.array('u', u'x' * 10)

Then I need to change some data in the middle of this string, for instance:

data[50] = 'y'

Then I want to use re to search in this text:

re.search('y', data)

This has to be fast. I really don't want to split and concatenate strings. Re 
should be able to process it and the expressions can be much more complex than 
this. (I think it should be anything that implements the buffer protocol).

So, this works perfectly fine and fast. But it scares me that it's deprecated 
and Python 4 will not support it anymore.
-- 
https://mail.python.org/mailman/listinfo/python-list