Re: Best way to deal with different data types in a list comprehension
Larry Martell wrote: I have some code that I inherited: ' '.join([self.get_abbrev()] + [str(f['value') for f in self.filters if f.has_key('value')]).strip() This broke today when it encountered some non-ascii data. It's already broken. It gives a Syntax Error part way through: py ' '.join([self.get_abbrev()] + ...[str(f['value') File stdin, line 2 [str(f['value') ^ SyntaxError: invalid syntax Please copy and paste the actual code, don't retype it. This is my guess of what you actually have, reformatted to make it more clear (at least to me): ' '.join( [self.get_abbrev()] + [str(f['value']) for f in self.filters if f.has_key('value')] ).strip() I *think* that the call to strip() is redundant. Hmmm... perhaps not, if the self.get_abbrev() begins with whitespace, or the last f['value'] ends with whitespace. You should consider removing that call to .strip(), but for now I'll assume it actually is useful and leave it in. First change: assuming the filters are dicts, do the test this way: ' '.join( [self.get_abbrev()] + [str(f['value']) for f in self.filters if 'value' in f] ).strip() Now, the *right* way to fix your problem is to convert the whole application to use unicode strings everywhere instead of byte strings. I'm guessing you are using Python 2.6 or 2.7. You say it broke when given some non-ascii data, but that's extremely ambiguous. {23: 42} is non-ascii data. What exactly do you have, and where did it come from? My *guess* is that you had a Unicode string, containing characters which cannot be converted to ASCII. py str(u'Ωπ') Traceback (most recent call last): File stdin, line 1, in module UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128) I changed the str(f['value']) line to f['value'].encode('utf-8'), Hmmm, I'm not sure that's a good plan: py u'Ωπ'.encode('utf-8') '\xce\xa9\xcf\x80' Do you really want to find arbitrary bytes floating through your strings? A better strategy is to convert the program to use unicode strings internally, and only convert to byte strings when you read and write to files. But assuming you don't have the time or budget for that sort of re-write, here's a minimal chance which might do the job: u' '.join( [self.get_abbrev()] + [unicode(f['value']) for f in self.filters if 'value' in f] ).strip() That works correctly for random objects and ASCII byte strings: py unicode([1, 2, 3]) u'[1, 2, 3]' py unicode('bytes') u'bytes' Alas, it will fail for non-ASCII byte strings: py unicode('bytes \xFF') Traceback (most recent call last): File stdin, line 1, in module UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 6: ordinal not in range(128) Here's a version which prefers byte-strings, but should be able to handle everything you throw at it: ' '.join( [self.get_abbrev()] + [ (x.encode('utf-8') if isinstance(x, unicode) else x) for x in (f['value'] for f in self.filters if 'value' in f) ] ).strip() Note the use of a generator expression inside the list comp. -- Steven -- https://mail.python.org/mailman/listinfo/python-list
Best way to deal with different data types in a list comprehension
I have some code that I inherited: ' '.join([self.get_abbrev()] + [str(f['value') for f in self.filters if f.has_key('value')]).strip() This broke today when it encountered some non-ascii data. I changed the str(f['value']) line to f['value'].encode('utf-8'), which works fine, except when f['value'] is not a string (it could be anything). Without rewriting this without the list comprehension, how can I write this to deal with both strings and non-strings? -- https://mail.python.org/mailman/listinfo/python-list
Re: Best way to deal with different data types in a list comprehension
Maybe there are a different way, but you can do this: ' '.join([self.get_abbrev()] + [str(f['value').encode('utf-8') if type(f['value']) is str else str(f['value'] for f in self.filters if f.has_key('value')]).strip() 2014-09-24 0:01 GMT+02:00 Larry Martell larry.mart...@gmail.com: I have some code that I inherited: ' '.join([self.get_abbrev()] + [str(f['value') for f in self.filters if f.has_key('value')]).strip() This broke today when it encountered some non-ascii data. I changed the str(f['value']) line to f['value'].encode('utf-8'), which works fine, except when f['value'] is not a string (it could be anything). Without rewriting this without the list comprehension, how can I write this to deal with both strings and non-strings? -- https://mail.python.org/mailman/listinfo/python-list -- Miguel García Lafuente - Rock Neurotiko Do it, the devil is in the details. The quieter you are, the more you are able to hear. Happy Coding. Code with Passion, Decode with Patience. If we make consistent effort, based on proper education, we can change the world. El contenido de este e-mail es privado, no se permite la revelacion del contenido de este e-mail a gente ajena a él. -- https://mail.python.org/mailman/listinfo/python-list
Re: Best way to deal with different data types in a list comprehension
On Tue, Sep 23, 2014 at 3:01 PM, Larry Martell larry.mart...@gmail.com wrote: I have some code that I inherited: ' '.join([self.get_abbrev()] + [str(f['value') for f in self.filters if f.has_key('value')]).strip() This broke today when it encountered some non-ascii data. One option would be to do the processing in unicode, and convert to utf-8 only when needed: u' '.join([self.get_abbrev()] + [unicode(f['value') for f in self.filters if f.has_key('value')]).strip() If needed, add a .encode('utf-8') to the end. I changed the str(f['value']) line to f['value'].encode('utf-8'), which works fine, except when f['value'] is not a string (it could be anything). Without rewriting this without the list comprehension, how can I write this to deal with both strings and non-strings? -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Best way to deal with different data types in a list comprehension
On Tue, Sep 23, 2014 at 6:05 PM, Rock Neurotiko miguelglafue...@gmail.com wrote: 2014-09-24 0:01 GMT+02:00 Larry Martell larry.mart...@gmail.com: I have some code that I inherited: ' '.join([self.get_abbrev()] + [str(f['value') for f in self.filters if f.has_key('value')]).strip() This broke today when it encountered some non-ascii data. I changed the str(f['value']) line to f['value'].encode('utf-8'), which works fine, except when f['value'] is not a string (it could be anything). Without rewriting this without the list comprehension, how can I write this to deal with both strings and non-strings? Maybe there are a different way, but you can do this: ' '.join([self.get_abbrev()] + [str(f['value').encode('utf-8') if type(f['value']) is str else str(f['value'] for f in self.filters if f.has_key('value')]).strip() Thanks for the reply, but please don't top post. This worked for me: '.join([self.get_abbrev()] + [f['value'].encode('utf-8') if type(f['value']) is unicode else str(f['value']) for f in self.filters if f.has_key('value')]).strip() -- https://mail.python.org/mailman/listinfo/python-list