Re: Best way to deal with different data types in a list comprehension

2014-09-24 Thread Steven D'Aprano
Larry Martell wrote:

 I have some code that I inherited:
 
 ' '.join([self.get_abbrev()] +
[str(f['value')
 for f in self.filters
 if f.has_key('value')]).strip()
 
 
 This broke today when it encountered some non-ascii data.

It's already broken. It gives a Syntax Error part way through:

py ' '.join([self.get_abbrev()] +
...[str(f['value')
  File stdin, line 2
[str(f['value')
  ^
SyntaxError: invalid syntax

Please copy and paste the actual code, don't retype it.

This is my guess of what you actually have, reformatted to make it more
clear (at least to me):

' '.join(
[self.get_abbrev()] + 
[str(f['value']) for f in self.filters if f.has_key('value')]
).strip()

I *think* that the call to strip() is redundant. Hmmm... perhaps not, if the
self.get_abbrev() begins with whitespace, or the last f['value'] ends with
whitespace. You should consider removing that call to .strip(), but for now
I'll assume it actually is useful and leave it in.

First change: assuming the filters are dicts, do the test this way:

' '.join(
[self.get_abbrev()] + 
[str(f['value']) for f in self.filters if 'value' in f]
).strip()


Now, the *right* way to fix your problem is to convert the whole application
to use unicode strings everywhere instead of byte strings. I'm guessing you
are using Python 2.6 or 2.7.

You say it broke when given some non-ascii data, but that's extremely
ambiguous. {23: 42} is non-ascii data. What exactly do you have, and where
did it come from?

My *guess* is that you had a Unicode string, containing characters which
cannot be converted to ASCII.

py str(u'Ωπ')
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1:
ordinal not in range(128)


 I changed the str(f['value']) line to f['value'].encode('utf-8'),

Hmmm, I'm not sure that's a good plan:

py u'Ωπ'.encode('utf-8')
'\xce\xa9\xcf\x80'

Do you really want to find arbitrary bytes floating through your strings? A
better strategy is to convert the program to use unicode strings
internally, and only convert to byte strings when you read and write to
files.

But assuming you don't have the time or budget for that sort of re-write,
here's a minimal chance which might do the job:

u' '.join(
[self.get_abbrev()] + 
[unicode(f['value']) for f in self.filters if 'value' in f]
).strip()


That works correctly for random objects and ASCII byte strings:

py unicode([1, 2, 3])
u'[1, 2, 3]'
py unicode('bytes')
u'bytes'


Alas, it will fail for non-ASCII byte strings:

py unicode('bytes \xFF')
Traceback (most recent call last):
  File stdin, line 1, in module
UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6:
ordinal not in range(128)


Here's a version which prefers byte-strings, but should be able to handle
everything you throw at it:

' '.join(
  [self.get_abbrev()] + 
  [
   (x.encode('utf-8') if isinstance(x, unicode) else x) 
   for x in (f['value'] for f in self.filters if 'value' in f)
  ]
).strip()



Note the use of a generator expression inside the list comp.


-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list


Best way to deal with different data types in a list comprehension

2014-09-23 Thread Larry Martell
I have some code that I inherited:

' '.join([self.get_abbrev()] +
   [str(f['value')
for f in self.filters
if f.has_key('value')]).strip()


This broke today when it encountered some non-ascii data.

I changed the str(f['value']) line to f['value'].encode('utf-8'),
which works fine, except when f['value'] is not a string (it could be
anything).

Without rewriting this without the list comprehension, how can I write
this to deal with both strings and non-strings?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to deal with different data types in a list comprehension

2014-09-23 Thread Rock Neurotiko
Maybe there are a different way, but you can do this:

' '.join([self.get_abbrev()] +
   [str(f['value').encode('utf-8') if type(f['value']) is str else
str(f['value']
for f in self.filters
if f.has_key('value')]).strip()

2014-09-24 0:01 GMT+02:00 Larry Martell larry.mart...@gmail.com:

 I have some code that I inherited:

 ' '.join([self.get_abbrev()] +
[str(f['value')
 for f in self.filters
 if f.has_key('value')]).strip()


 This broke today when it encountered some non-ascii data.

 I changed the str(f['value']) line to f['value'].encode('utf-8'),
 which works fine, except when f['value'] is not a string (it could be
 anything).

 Without rewriting this without the list comprehension, how can I write
 this to deal with both strings and non-strings?
 --
 https://mail.python.org/mailman/listinfo/python-list




-- 
Miguel García Lafuente - Rock Neurotiko

Do it, the devil is in the details.
The quieter you are, the more you are able to hear.
Happy Coding. Code with Passion, Decode with Patience.
If we make consistent effort, based on proper education, we can change the
world.

El contenido de este e-mail es privado, no se permite la revelacion del
contenido de este e-mail a gente ajena a él.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to deal with different data types in a list comprehension

2014-09-23 Thread Chris Kaynor
On Tue, Sep 23, 2014 at 3:01 PM, Larry Martell larry.mart...@gmail.com
wrote:

 I have some code that I inherited:

 ' '.join([self.get_abbrev()] +
[str(f['value')
 for f in self.filters
 if f.has_key('value')]).strip()


 This broke today when it encountered some non-ascii data.


One option would be to do the processing in unicode, and convert to utf-8
only when needed:

u' '.join([self.get_abbrev()] +
   [unicode(f['value')
for f in self.filters
if f.has_key('value')]).strip()

If needed, add a .encode('utf-8') to the end.



 I changed the str(f['value']) line to f['value'].encode('utf-8'),
 which works fine, except when f['value'] is not a string (it could be
 anything).

 Without rewriting this without the list comprehension, how can I write
 this to deal with both strings and non-strings?
 --
 https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Best way to deal with different data types in a list comprehension

2014-09-23 Thread Larry Martell
On Tue, Sep 23, 2014 at 6:05 PM, Rock Neurotiko
miguelglafue...@gmail.com wrote:
 2014-09-24 0:01 GMT+02:00 Larry Martell larry.mart...@gmail.com:

 I have some code that I inherited:

 ' '.join([self.get_abbrev()] +
[str(f['value')
 for f in self.filters
 if f.has_key('value')]).strip()


 This broke today when it encountered some non-ascii data.

 I changed the str(f['value']) line to f['value'].encode('utf-8'),
 which works fine, except when f['value'] is not a string (it could be
 anything).

 Without rewriting this without the list comprehension, how can I write
 this to deal with both strings and non-strings?

 Maybe there are a different way, but you can do this:

 ' '.join([self.get_abbrev()] +
[str(f['value').encode('utf-8') if type(f['value']) is str else
 str(f['value']
 for f in self.filters
 if f.has_key('value')]).strip()

Thanks for the reply, but please don't top post.

This worked for me:

'.join([self.get_abbrev()] +
 [f['value'].encode('utf-8') if type(f['value']) is unicode
else str(f['value'])
  for f in self.filters
  if f.has_key('value')]).strip()
-- 
https://mail.python.org/mailman/listinfo/python-list