Re: Pound sign problem

2017-04-11 Thread Mikhail V
On 10 April 2017 at 15:17, David Shi via Python-list
 wrote:
> In the data set, pound sign escape appears:
> u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000',
> When using table.to_csv after importing pandas as pd, an error message 
> persists as follows:
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 
> 0: ordinal not in range(128)
>

The error indicates clearly that you have a character which is not part
of the standard ASCII range, hence the message : "ordinal not in range(128)"
To understand it better, try to imagine characters as numbers and that basic
ASCII defines characters in this range.
see http://www.ascii-code.com/
So the pound character is out this range, its ordinal is being read by
your program as #a3 in hex (#163 in decimal). So *probably* your data
originally is in Latin-1 encoding,

First , you should find out where the data comes from:
is it text file, or some input, then in which application and
encoding was it created.

To get rid of errors, I'd say there are 2 common strategies:
ensure that all source data is saved in Unicode (save as UTF-8)
Or, replace the pound sign with something which is
representable in standard ASCII, e.g. replace the
pound sign with "GBP" in sources.

Otherwise, you must find out which encoding is
used in source data and apply re-encoding
accordingly to input-output format specification.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-11 Thread Tim Chase
On 2017-04-12 02:29, Steve D'Aprano wrote:
> >> In 2017, unless you are reading from old legacy files created
> >> using a non-Unicode encoding, you should just use UTF-8.  
> > 
> > Thanks for your opinion. My opinion differs.  
> 
> What would you suggest then, if not UTF-8?
> 
> My personal favourite legacy encoding is MacRoman, but I wouldn't
> recommend anyone use it except to interoperate with legacy Mac
> applications and/or data from the 80s and 90s.
> 
> What's your recommendation? "Anything but ASCII"?

Heh, how about "Unicode as ASCII-compatible-Python-strings"? ;-)

Got this from Peter Otten a while back in response to my request for
functionality something like this.

http://www.mail-archive.com/python-list@python.org/msg420100.html

-tkc



$ cat codecs_mynamereplace.py
# -*- coding: utf-8 -*-
import codecs
import unicodedata

try:
codecs.namereplace_errors
except AttributeError:
print("using mynamereplace")
def mynamereplace(exc):
return u"".join(
"\\N{%s}" % unicodedata.name(c)
for c in exc.object[exc.start:exc.end]
), exc.end
codecs.register_error("namereplace", mynamereplace)


print(u"mañana".encode("ascii", "namereplace").decode())
$ python3.5 codecs_mynamereplace.py
ma\N{LATIN SMALL LETTER N WITH TILDE}ana
$ python3.4 codecs_mynamereplace.py
using mynamereplace
ma\N{LATIN SMALL LETTER N WITH TILDE}ana
$ python2.7 codecs_mynamereplace.py
using mynamereplace
ma\N{LATIN SMALL LETTER N WITH TILDE}ana
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-11 Thread Steve D'Aprano
On Wed, 12 Apr 2017 02:23 am, Lew Pitcher wrote:

> I recommend whatever encoding is appropriate for the output. 

There are multiple encodings that are appropriate for ASCII + pound sign.
How should the OP choose between them without guidance? If he understood
the issue well enough to make an informed decision, he wouldn't have needed
to ask for help.


> That's not up 
> to you or me to decide; that's a question that only the OP can answer.

Nobody is asking you to *decide*. But you can make a recommendation. Do you
really think that the OP is capable of making an informed decision about
this issue on his own? If he was, he wouldn't have needed to ask for help
solving this problem in the first place.

If you're going to help, actually *help*, and don't just pretend to help:

"Hi, I'm a stranger in town and I'm trying to get to the post office. What's
the best way for me to get there please?"

"Well, that depends on whether you're flying the Space Shuttle, travelling
by sailing ship, dog sled, or advanced alien hyperdrive. You should take
whatever route is most appropriate for your transportation. You're
welcome."

I'm sorry to be so negative when you're only trying to be helpful, but I too
have been on the receiving end of poor-quality "advice" that leaves me just
as much in the dark as before I asked the question, so I'm quite sensitive
to it.

"What should I do here?"

"Do whatever you see fit."

(I'm not specifically referring to this community, just making a general
observation.)



> (Imagine, python on an IBM Zseries running ZOS; 

I can imagine many unlikely things that have come to pass, but that's not
one of them.

The OP is using Pandas, which requires Python 2.7 or better.

https://pypi.python.org/pypi/pandas

There is an unofficial, unmaintained(?), third-party port of Python 2.4 to
Z/OS, which appears to have had no attention for more than a decade.

http://www.teaser.fr/~jymengant/mvspython/mvsPythonPort.html


I suppose it is just barely within the realm of possibility that the OP has
hacked together his own port of Python 2.7 and Pandas to Z/OS. If so, he'd
have already had to deal with some much bigger problems relating to ASCII
versus EBCDIC, and if he managed to solve that, it's unlikely that he'd be
puzzled by a pound sign in his data.

But... even if I grant you your scenario that he's running on Big Iron, that
is irrelevant! Using Unicode for his data files is still the better idea.


> the "native" characterset 
> is one of the EBCDIC variants. Would UTF-8 be a better choice there? )

Yes it would.

The OP is using Unicode strings so regardless of the OS's native character
set, it is better to use Unicode rather than some 8-bit encoding. Today the
OP needs a pound sign. Tomorrow he may need a Greek Σ, yen sign, CJK
ideograph, or Arabic character. Possibly all in the same document. Using
legacy encodings, whether based on EBCDIC or ASCII, should be avoided.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-11 Thread Chris Angelico
On Wed, Apr 12, 2017 at 2:23 AM, Lew Pitcher
 wrote:
> Chris Angelico wrote:
>
>> On Wed, Apr 12, 2017 at 1:24 AM, Lew Pitcher
>>  wrote:
>>>
>>> What in "Try changing your target encoding to something other than ASCII"
>>> is encouragement to use "old legacy encodings"?
>>>
 In 2017, unless you are reading from old legacy files created using a
 non-Unicode encoding, you should just use UTF-8.
>>>
>>> Thanks for your opinion. My opinion differs.
>>
>> So what encoding *do* you recommend, and why is it better than UTF-8?
>
> I recommend whatever encoding is appropriate for the output. That's not up
> to you or me to decide; that's a question that only the OP can answer.
>
> (Imagine, python on an IBM Zseries running ZOS; the "native" characterset is
> one of the EBCDIC variants. Would UTF-8 be a better choice there? )

So if the OP needed to print out a number, would you take a similarly
spineless approach and say that only the OP can decide what numeric
base to use? Does every fledgeling programmer need to understand about
archaic systems where you needed to use BCD for your numbers? EBCDIC
derives from BCD, where a single decimal digit was encoded in four
bits... and I'm sure you could name systems even less popular, used on
important systems back in the 1960s or so. Does a modern Python
programmer need to look through all of those possible ways to
represent numbers? NO. Today's programmer should need to know about
very few ways to represent numbers, in priority order:

1) Decimal digits represented in ASCII
2) Packed binary, network byte order
3) Packed binary, little-endian.

A new programmer shouldn't need to worry about anything other than
decimal digits, in fact. Of course other systems do exist, like the
MIDI "variable length integer" that packs seven bits into a byte and
then uses the high bit as a continuation marker; or IEEE 80-bit
floating point, or a multi-limb format like GMP uses, but until you
actually need to work with it, you don't need to know about it.

Just use the one most obvious encoding. UTF-8 for all text.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-11 Thread Steve D'Aprano
On Wed, 12 Apr 2017 01:24 am, Lew Pitcher wrote:

[...]
>>> There is no "pound sign" in ASCII[1]. Try changing your target encoding
>>> to something other than ASCII.
>> 
>> Please don't encourage the use of old legacy encodings.
> 
> I wonder if you actually read my reply.

Of course I did.
 

> What in "Try changing your target encoding to something other than ASCII"
> is encouragement to use "old legacy encodings"?

The fact that "something other than ASCII" includes dozens of old legacy
encodings, including the most obvious one for Western Europeans coming from
a Windows environment: Latin-1.

There are only three practical choices for text: ASCII, Unicode, and legacy
encodings (or "code pages", as many people know them). TRON is effectively
only available in Japan, and even there hardly anyone uses it. (And
besides, Python doesn't support TRON.)

You've (rightly) eliminated ASCII, as the pound sign isn't available. Python
doesn't support TRON, so your instruction to the OP is logically equivalent
to "use Unicode or a legacy encoding". Its the second half of that which I
am objecting to.



>> In 2017, unless you are reading from old legacy files created using a
>> non-Unicode encoding, you should just use UTF-8.
> 
> Thanks for your opinion. My opinion differs.

What would you suggest then, if not UTF-8?

My personal favourite legacy encoding is MacRoman, but I wouldn't recommend
anyone use it except to interoperate with legacy Mac applications and/or
data from the 80s and 90s.

What's your recommendation? "Anything but ASCII"?




-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-11 Thread Lew Pitcher
Chris Angelico wrote:

> On Wed, Apr 12, 2017 at 1:24 AM, Lew Pitcher
>  wrote:
>>
>> What in "Try changing your target encoding to something other than ASCII"
>> is encouragement to use "old legacy encodings"?
>>
>>> In 2017, unless you are reading from old legacy files created using a
>>> non-Unicode encoding, you should just use UTF-8.
>>
>> Thanks for your opinion. My opinion differs.
> 
> So what encoding *do* you recommend, and why is it better than UTF-8?

I recommend whatever encoding is appropriate for the output. That's not up 
to you or me to decide; that's a question that only the OP can answer.

(Imagine, python on an IBM Zseries running ZOS; the "native" characterset is 
one of the EBCDIC variants. Would UTF-8 be a better choice there? )


-- 
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-11 Thread Chris Angelico
On Wed, Apr 12, 2017 at 1:24 AM, Lew Pitcher
 wrote:
>
> What in "Try changing your target encoding to something other than ASCII" is
> encouragement to use "old legacy encodings"?
>
>> In 2017, unless you are reading from old legacy files created using a
>> non-Unicode encoding, you should just use UTF-8.
>
> Thanks for your opinion. My opinion differs.

So what encoding *do* you recommend, and why is it better than UTF-8?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-11 Thread Lew Pitcher
Steve D'Aprano wrote:

> On Tue, 11 Apr 2017 12:50 am, Lew Pitcher wrote:
> 
>> David Shi wrote:
>> 
>>> In the data set, pound sign escape appears:
>>> u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000',
> 
> That looks like David is using Python 2.
> 
>>> When using table.to_csv after importing pandas as pd, an error message
>>> persists as follows: UnicodeEncodeError: 'ascii' codec can't encode
>>> character u'\xa3' in position 0: ordinal not in range(128)
>> 
>> There is no "pound sign" in ASCII[1]. Try changing your target encoding
>> to something other than ASCII.
> 
> Please don't encourage the use of old legacy encodings.

I wonder if you actually read my reply.

What in "Try changing your target encoding to something other than ASCII" is 
encouragement to use "old legacy encodings"?

> In 2017, unless you are reading from old legacy files created using a
> non-Unicode encoding, you should just use UTF-8.

Thanks for your opinion. My opinion differs.


-- 
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-10 Thread Steve D'Aprano
On Tue, 11 Apr 2017 12:50 am, Lew Pitcher wrote:

> David Shi wrote:
> 
>> In the data set, pound sign escape appears:
>> u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000',

That looks like David is using Python 2.

>> When using table.to_csv after importing pandas as pd, an error message
>> persists as follows: UnicodeEncodeError: 'ascii' codec can't encode
>> character u'\xa3' in position 0: ordinal not in range(128)
> 
> There is no "pound sign" in ASCII[1]. Try changing your target encoding to
> something other than ASCII.

Please don't encourage the use of old legacy encodings.

In 2017, unless you are reading from old legacy files created using a
non-Unicode encoding, you should just use UTF-8.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-10 Thread Lew Pitcher
David Shi wrote:

> In the data set, pound sign escape appears:
> u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000',
> When using table.to_csv after importing pandas as pd, an error message
> persists as follows: UnicodeEncodeError: 'ascii' codec can't encode
> character u'\xa3' in position 0: ordinal not in range(128)

There is no "pound sign" in ASCII[1]. Try changing your target encoding to 
something other than ASCII.

[1]: See http://std.dkuug.dk/i18n/charmaps/ascii for a list of valid ASCII 
values.

-- 
Lew Pitcher
"In Skills, We Trust"
PGP public key available upon request


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-10 Thread Peter Otten
David Shi via Python-list wrote:

> In the data set, pound sign escape appears:
> u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000',
> When using table.to_csv after importing pandas as pd, an error message
> persists as follows: UnicodeEncodeError: 'ascii' codec can't encode
> character u'\xa3' in position 0: ordinal not in range(128)

The default encoding in Python 2 is ascii, and the pound sign is not part of 
that.

> Can anyone help?

Specify an alternative encoding, preferably UTF-8:

>>> import pandas
>>> df = pandas.DataFrame([[u"\xa3123"], [u"\xa3321"]], columns=["Price"])
>>> df
  Price
0  £123
1  £321

[2 rows x 1 columns]
>>> df.to_csv("tmp.csv", encoding="utf-8")
>>> 
$ cat tmp.csv
,Price
0,£123
1,£321
$ 


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Pound sign problem

2017-04-10 Thread Ben Finney
David Shi via Python-list  writes:

> When using table.to_csv after importing pandas as pd

I don't know much about that library. What does its documentation say
for the ‘table.to_csv’ function?

Can you write a *very short* complete example, that we can run to
demonstrate the same behaviour you are seeing?

> an error message persists as follows:
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 
> 0: ordinal not in range(128)

This means the function has been told (or is assuming, in the absence of
better information) that the input data is in the ‘ascii’ text encoding.

That assumption turns out to be incorrect, for the actual data you have.
So that error occurs.

You will need to:

* Find out exactly what text encoding was used to write the file. Don't
  guess, because there are many ways to be wrong.

* Specify that encoding to the ‘table.to_csv’ function, or to whatever
  function opens the file. (This might be the Python built-in ‘open’
  function, but we'd need to see your short example to know.)

-- 
 \“Most people, I think, don't even know what a rootkit is, so |
  `\ why should they care about it?” —Thomas Hesse, Sony BMG, 2006 |
_o__)  |
Ben Finney

-- 
https://mail.python.org/mailman/listinfo/python-list


Pound sign problem

2017-04-10 Thread David Shi via Python-list
In the data set, pound sign escape appears:
u'price_currency': u'\xa3', u'price_formatted': u'\xa3525,000',
When using table.to_csv after importing pandas as pd, an error message persists 
as follows:
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa3' in position 0: 
ordinal not in range(128)

Can anyone help?
Regards.
David
-- 
https://mail.python.org/mailman/listinfo/python-list