numpy 00 character bug?

Nathaniel Rook Fri, 05 Jun 2009 09:16:22 -0700

Hello, all!

I've recently encountered a bug in NumPy's string arrays, where the 00ASCII character ('\x00') is not stored properly when put at the end of astring.


For example:

Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import numpy
>>> print numpy.version.version
1.3.0
>>> arr = numpy.empty(1, 'S2')
>>> arr[0] = 'ab'
>>> arr
array(['ab'],
      dtype='|S2')
>>> arr[0] = 'c\x00'
>>> arr
array(['c'],
      dtype='|S2')

It seems that the string array is using the 00 character to pad stringssmaller than the maximum size, and thus is treating any 00 characters atthe end of a string as padding. Obviously, as long as I don't usesmaller strings, there is no information lost here, but I don't want tohave to re-add my 00s each time I ask the array what it is holding.

Is this a well-known bug already? I couldn't find it on the NumPy bugtracker, but I could have easily missed it, or it could be triaged,deemed acceptable because there's no better way to deal witharbitrary-length strings. Is there an easy way to avoid this problem?Pretty much any performance-intensive part of my program is going to bedealing with these arrays, so I don't want to just replace them with aslower dictionary instead.

I can't imagine this issue hasn't come up before; I encountered it byusing NumPy arrays to store Python structs, something I can imagine isdone fairly often. As such, I apologize for bringing it up again!


Nathaniel
--
http://mail.python.org/mailman/listinfo/python-list

numpy 00 character bug?

Reply via email to