from:"John Machin"

Re: python math problem

2013-02-15 Thread John Machin

On Feb 16, 6:39 am, Kene Meniru  wrote:

> x = (math.sin(math.radians(angle)) * length)
> y = (math.cos(math.radians(angle)) * length)

A suggestion about coding style:

from math import sin, cos, radians # etc etc
x = sin(radians(angle)) * length
y = cos(radians(angle)) * length

... easier to write, easier to read.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-12 Thread John Machin

On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:

>
> So, the UTF-16 UTF-32 is INTERNAL only, for Python

NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
encodings for the EXTERNAL representation of Unicode characters in byte
streams.

> I also was not aware that UTF-8 chars could be up to six(6) byes long
> from left to right.

It could be, once upon a time in ISO faerieland, when it was thought that
Unicode could grow to 2**32 codepoints. However ISO and the Unicode
consortium have agreed that 17 planes is the utter max, and accordingly a
valid UTF-8 byte sequence can be no longer than 4 bytes ... see below

>>> chr(17 * 65536)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: chr() arg not in range(0x11)
>>> chr(17 * 65536 - 1)
'\U0010'
>>> _.encode('utf8')
b'\xf4\x8f\xbf\xbf'
>>> b'\xf5\x8f\xbf\xbf'.decode('utf8')
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\python32\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0:
invalid start byte

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote:
>
> If the file you're writing to doesn't specify an encoding, Python will
> default to locale.getdefaultencoding(),

No such attribute. Perhaps you mean locale.getpreferredencoding()



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 1:44 pm, harrismh777 wrote:
> By
> default it looks like Python3 is writing output with UTF-8 as default...
> and I thought that by default Python3 was using either UTF-16 or UTF-32.
> So, I'm confused here...  also, I used the character sequence \u00A3
> which I thought was UTF-16... but Python3 changed my intent to  'c2a3'
> which is the normal UTF-8...

Python uses either a 16-bit or a 32-bit INTERNAL representation of Unicode
code points. Those NN bits have nothing to do with the UTF-NN encodings,
which can be used to encode the codepoints as byte sequences for EXTERNAL
purposes. In your case, UTF-8 has been used as it is the default encoding
on your platform.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 11:22 am, harrismh777 wrote:
> John Machin wrote:
>> (1) You cannot work without using bytes sequences. Files are byte
>> sequences. Web communication is in bytes. You need to (know / assume /
>> be
>> able to extract / guess) the input encoding. You need to encode your
>> output using an encoding that is expected by the consumer (or use an
>> output method that will do it for you).
>>
>> (2) You don't need to use bytes to specify a Unicode code point. Just
>> use
>> an escape sequence e.g. "\u0404" is a Cyrillic character.
>>
>
> Thanks John.  In reverse order, I understand point (2). I'm less clear
> on point (1).
>
> If I generate a string of characters that I presume to be ascii/utf-8
> (no \u0404 type characters)
> and write them to a file (stdout) how does
> default encoding affect that file.by default..?   I'm not seeing that
> there is anything unusual going on...

About """characters that I presume to be ascii/utf-8 (no \u0404 type
characters)""": All Unicode characters (including U+0404) are encodable in
bytes using UTF-8.

The result of sys.stdout.write(unicode_characters) to a TERMINAL depends
mostly on sys.stdout.encoding. This is likely to be UTF-8 on a
linux/OSX/platform. On a typical American / Western European /[former]
colonies Windows box, this is likely to be cp850 on a Command Prompt
window, and cp1252 in IDLE.

UTF-8: All Unicode characters are encodable in UTF-8. Only problem arises
if the terminal can't render the character -- you'll get spaces or blobs
or boxes with hex digits in them or nothing.

Windows (Command Prompt window): only a small subset of characters can be
encoded in e.g. cp850; anything else causes an exception.

Windows (IDLE): ignores sys.stdout.encoding and renders the characters
itself. Same outcome as *x/UTF-8 above.

If you write directly (or sys.stdout is redirected) to a FILE, the default
encoding is obtained by sys.getdefaultencoding() and is AFAIK ascii unless
the machine's site.py has been fiddled with to make it UTF-8 or something
else.

>   If I open the file with vi?  If
> I open the file with gedit?  emacs?

Any editor will have a default encoding; if that doesn't match the file
encoding, you have a (hopefully obvious) problem if the editor doesn't
detect the mismatch. Consult your editor's docs or HTFF1K.

> Another question... in mail I'm receiving many small blocks that look
> like sprites with four small hex codes, scattered about the mail...
> mostly punctuation, maybe?   ... guessing, are these unicode code
> points,

yes

> and if so what is the best way to 'guess' the encoding?

google("chardet") or rummage through the mail headers (but 4 hex digits in
a box are a symptom of inability to render, not necessarily caused by an
incorrect decoding)

 ... is
> it coded in the stream somewhere...protocol?

Should be.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: urllib2 request with binary file as payload

2011-05-11 Thread John Machin

On Thu, May 12, 2011 10:20 am, Michiel Sikma wrote:
> Hi there,
> I made a small script implementing a part of Youtube's API that allows
> you to upload videos. It's pretty straightforward and uses urllib2.
> The script was written for Python 2.6, but the server I'm going to use
> it on only has 2.5 (and I can't update it right now, unfortunately).
> It seems that one vital thing doesn't work in 2.5's urllib2:
>
> --
>
> data = open(video['filename'], 'rb')
>
> opener = urllib2.build_opener(urllib2.HTTPHandler)
> req = urllib2.Request(settings['upload_location'], data, {
>   'Host': 'uploads.gdata.youtube.com',
>   'Content-Type': video['type'],
>   'Content-Length': '%d' % os.path.getsize(video['filename'])
> })
> req.get_method = lambda: 'PUT'
> url = opener.open(req)
>
> --
>
> This works just fine on 2.6:
> send: 
> sendIng a read()able
>
> However, on 2.5 it refuses:
> Traceback (most recent call last):
[snip]
> TypeError: sendall() argument 1 must be string or read-only buffer, not
> file

I don't use this stuff, just curious. But I can read docs. Quoting from
the 2.6.6 docs:

"""
class urllib2.Request(url[, data][, headers][, origin_req_host][,
unverifiable])
This class is an abstraction of a URL request.

url should be a string containing a valid URL.

data may be a string specifying additional data to send to the server, or
None if no such data is needed. Currently HTTP requests are the only ones
that use data; the HTTP request will be a POST instead of a GET when the
data parameter is provided. data should be a buffer in the standard
application/x-www-form-urlencoded format. The urllib.urlencode() function
takes a mapping or sequence of 2-tuples and returns a string in this
format.
"""

2.6 is expecting a string, according to the above. No mention of file.
Moreover it expects the data to be urlencoded. 2.7.1 docs say the same
thing. Are you sure you have shown the code that worked with 2.6?

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode by default

2011-05-11 Thread John Machin

On Thu, May 12, 2011 8:51 am, harrismh777 wrote:
> Is it true that if I am
> working without using bytes sequences that I will not need to care about
> the encoding anyway, unless of course I need to specify a unicode code
> point?

Quite the contrary.

(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
output method that will do it for you).

(2) You don't need to use bytes to specify a Unicode code point. Just use
an escape sequence e.g. "\u0404" is a Cyrillic character.

-- 
http://mail.python.org/mailman/listinfo/python-list

codecs.open() doesn't handle platform-specific line terminator

2011-05-09 Thread John Machin

According to the 3.2 docs
(http://docs.python.org/py3k/library/codecs.html#codecs.open),

"""Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done on
reading and writing."""

The first point is that one would NOT expect "conversion of b'\n'" anyway.
One expects '\n' -> os.sep.encode(the_encoding) on writing and vice versa
on reading.

The second point is that there is no such restriction with the built-in
open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
'\n' -> b'\r\x00\n\x00' when writing and vice versa on reading, and not
striking out when thrown curve balls like '\u0a0a'.

Why is codecs.open() different? What does "encodings using 8-bit values"
mean? What data loss?



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: codec for UTF-8 with BOM

2011-05-02 Thread John Machin

On Monday, 2 May 2011 19:47:45 UTC+10, Chris Rebert  wrote:
> On Mon, May 2, 2011 at 1:34 AM, Ulrich Eckhardt
>  wrote:

> The correct name, as you found below and as is corroborated by the
> webpage, seems to be "utf_8_sig":
> >>> u"FOøbar".encode('utf_8_sig')
> '\xef\xbb\xbfFO\xc3\xb8bar'

To complete the picture, decoding swallows the BOM:

 >>> '\xef\xbb\xbfFO\xc3\xb8bar'.decode('utf_8_sig')
 u'FO\xf8bar'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Snowball to Python compiler

2011-04-21 Thread John Machin


On Friday, April 22, 2011 8:05:37 AM UTC+10, Matt Chaput wrote:

> I'm looking for some code that will take a Snowball program and compile 
> it into a Python script. Or, less ideally, a Snowball interpreter 
> written in Python.
> 
> (http://snowball.tartarus.org/)

If anyone has done such things they are not advertising them in the usual 
places.

A third (more-than-) possible solution: google("python snowball"); the first 
page of results has at least 3 hits referring to Python wrappers for Snowball.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: getting text out of an xml string

2011-03-05 Thread John Machin

On Mar 5, 8:57 am, JT  wrote:
> On Mar 4, 9:30 pm, John Machin  wrote:
>
> > Your data has been FUABARred (the first A being for Almost) -- the
> > "\u3c00" and "\u3e00" were once "<" and ">" respectively. You will
>
> Hi John,
>
>    I realized that a few minutes after posting.  I then realized that
> I could just extract the text between the stuff with \u3c00 xml
> preserve etc, which I did; it was good enough since it was a one-off
> affair, I had to convert a to-do list from one program to another.
> Thanks for replying and sorry for the noise :-)

Next time you need to extract some data from an xml file, please (for
your own good) don't do whatever you did in that code -- note that the
unicode equivalent of "<" is u"\u003c", NOT u"\u3c00"; I wasn't joking
when I said it had been FU.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: getting text out of an xml string

2011-03-04 Thread John Machin

On Mar 5, 6:53 am, JT  wrote:
> Yo,
>
>  So I have almost convinced a small program to do what I want it to
> do.  One thing remains (at least, one thing I know of at the moment):
> I am converting xml to some other format, and there are strings in the
> xml like this.
>
> The python:
>
> elif v == "content":
>                 print "content", a.childNodes[0].nodeValue
>
> what gets printed:
>
> content \u3c00note xml:space="preserve"\u3e00see forms in red inbox
> \u3c00/note\u3e00
>
> what this should say is "see forms in red inbox" because that is what
> the the program whose xml file i am trying to convert, properly
> displays, because that is what I typed in oh so long ago.  So my
> question to you is, how can I convert this "enhanced" version to a
> normal string?  Esp. since there is this "xml:space="preserve"" thing
> in there ... I suspect the rest is just some unicode issue.  Thanks
> for any help.
>
>        J "long time no post" T

Your data has been FUABARred (the first A being for Almost) -- the
"\u3c00" and "\u3e00" were once "<" and ">" respectively. You will
need to show (a) a snippet of the xml file including the data that has
the problem (b) the code that you have written, cut down to a small
script that is runnable and displays the problem. Tell us what version
of Python you are running, on what OS.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: py3k: converting int to bytes

2011-02-24 Thread John Machin

On Feb 25, 4:39 am, Terry Reedy wrote:

> Note: an as yet undocumented feature of bytes (at least in Py3) is that
> bytes(count) == bytes()*count == b'\x00'*count.

Python 3.1.3 docs for bytes() say same constructor args as for
bytearray(); this says about the source parameter: """If it is an
integer, the array will have that size and will be initialized with
null bytes"""
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 2to3 chokes on bad character

2011-02-24 Thread John Machin

On Feb 25, 12:00 am, Peter Otten <__pete...@web.de> wrote:
> John Machin wrote:

> > Your Python 2.x code should be TESTED before you poke 2to3 at it. In
> > this case just trying to run or import the offending code file would
> > have given an informative syntax error (you have declared the .py file
> > to be encoded in UTF-8 but it's not).
>
> The problem is that Python 2.x accepts arbitrary bytes in string constants.

Ummm ... isn't that a bug? According to section 2.1.4 of the Python
2.7.1 Language Reference Manual: """The encoding is used for all
lexical analysis, in particular to find the end of a string, and to
interpret the contents of Unicode literals. String literals are
converted to Unicode for syntactical analysis, then converted back to
their original encoding before interpretation starts ..."""

How do you reconcile "used for all lexical analysis" and "String
literals are converted to Unicode for syntactical analysis" with the
actual (astonishing to me) behaviour?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 2to3 chokes on bad character

2011-02-24 Thread John Machin

On Feb 23, 7:47 pm, "Frank Millman"  wrote:
> Hi all
>
> I don't know if this counts as a bug in 2to3.py, but when I ran it on my
> program directory it crashed, with a traceback but without any indication of
> which file caused the problem.
>
[traceback snipped]

> UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 5055:
> invalid start byte
>
> On investigation, I found some funny characters in docstrings that I
> copy/pasted from a pdf file.
>
> Here are the details if they are of any use. Oddly, I found two instances
> where characters 'look like' apostrophes when viewed in my text editor, but
> one of them was accepted by 2to3 and the other caused the crash.
>
> The one that was accepted consists of three bytes - 226, 128, 153 (as
> reported by python 2.6)

How did you incite it to report like that? Just use repr(the_3_bytes).
It'll show up as '\xe2\x80\x99'.

 >>> from unicodedata import name as ucname
 >>> ''.join(chr(i) for i in (226, 128, 153)).decode('utf8')
 u'\u2019'
 >>> ucname(_)
 'RIGHT SINGLE QUOTATION MARK'

What you have there is the UTF-8 representation of U+2019 RIGHT SINGLE
QUOTATION MARK. That's OK.

 or 226, 8364, 8482 (as reported by python3.2).

Sorry, but you have instructed Python 3.2 to commit a nonsense:

 >>> [ord(chr(i).decode('cp1252')) for i in (226, 128, 153)]
 [226, 8364, 8482]

In other words, you have taken that 3-byte sequence, decoded each byte
separately using cp1252 (aka "the usual suspect") into a meaningless
Unicode character and printed its ordinal.

In Python 3, don't use repr(); it has undergone the MHTP
transformation and become ascii().

>
> The one that crashed consists of a single byte - 146 (python 2.6) or 8217
> (python 3.2).

 >>> chr(146).decode('cp1252')
 u'\u2019'
 >>> hex(8217)
 '0x2019'

> The issue is not that 2to3 should handle this correctly, but that it should
> give a more informative error message to the unsuspecting user.

Your Python 2.x code should be TESTED before you poke 2to3 at it. In
this case just trying to run or import the offending code file would
have given an informative syntax error (you have declared the .py file
to be encoded in UTF-8 but it's not).

> BTW I have always waited for 'final releases' before upgrading in the past,
> but this makes me realise the importance of checking out the beta versions -
> I will do so in future.

I'm willing to bet that the same would happen with Python 3.1, if a
3.1 to 3.2 upgrade is what you are talking about

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python crash problem

2011-02-05 Thread John Machin

On Feb 3, 8:21 am, Terry Reedy  wrote:
> On 2/2/2011 2:19 PM, Yelena wrote:
>
.
>
> When having a problem with a 3rd party module, not part of the stdlib,
> you should give a source.
>    http://sourceforge.net/projects/dbfpy/
> This appears to be a compiled extension. Nearly always, when Python
> crashes running such, it is a problem with the extension. So you
> probably need to direct your question to the author or a project mailing
> list if there is one.

It has always appeared to me to be a pure-Python package. There are
no .c or .pyx files in the latest source (.tgz) distribution. The
Windows installer installs only files whose extensions match
"py[co]?".
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Interesting bug

2011-01-03 Thread John Machin

On Jan 2, 12:22 am, Daniel Fetchinson 
wrote:

> An AI bot is playing a trick on us.

Yes, it appears that the mystery is solved: Mark V. Shaney is alive
and well and living in Bangalore :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Modifying an existing excel spreadsheet

2010-12-22 Thread John Machin

On Dec 21, 8:56 am, Ed Keith  wrote:
> I have a user supplied 'template' Excel spreadsheet. I need to create a new 
> excel spreadsheet based on the supplied template, with data filled in.
>
> I found the tools 
> herehttp://www.python-excel.org/, andhttp://sourceforge.net/projects/pyexcelerator/.
>  I have been trying to use the former, since the latter seems to be devoid of 
> documentation (not even any docstrings).

pyExcelerator is abandonware. Use xlwt instead; it's a bug-fixed/
maintained/enhanced fork of pyExcelerator

Read the tutorial that you'll find mentioned on http://www.python-excel.org

Join the google group that's also mentioned there; look at past
questions, ask some more, ...
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ensuring symmetry in difflib.SequenceMatcher

2010-11-24 Thread John Machin

On Nov 24, 8:43 pm, Peter Otten <__pete...@web.de> wrote:
> John Yeung wrote:
> > I'm generally pleased with difflib.SequenceMatcher:  It's probably not
> > the best available string matcher out there, but it's in the standard
> > library and I've seen worse in the wild.  One thing that kind of
> > bothers me is that it's sensitive to which argument you pick as "seq1"
> > and which you pick as "seq2":
>
> > Python 2.6.1 (r261:67517, Dec  4 2008, 16:51:00) [MSC v.1500 32 bit
> > (Intel)] on
> > win32
> > Type "help", "copyright", "credits" or "license" for more information.
>  import difflib
>  difflib.SequenceMatcher(None, 'BYRD', 'BRADY').ratio()
> > 0.2
>  difflib.SequenceMatcher(None, 'BRADY', 'BYRD').ratio()
> > 0.3
>
> > Is this a bug?  I am guessing the algorithm is implemented correctly,
> > and that it's just an inherent property of the algorithm used.  It's
> > certainly not what I'd call a desirably property.  Are there any
> > simple adjustments that can be made without sacrificing (too much)
> > performance?
>
> def symmetric_ratio(a, b, S=difflib.SequenceMatcher):
>     return (S(None, a, b).ratio() + S(None, b, a).ratio())/2.0
>
> I'm expecting 50% performance loss ;)
>
> Seriously, have you tried to calculate the ratio with realistic data?
> Without looking into the source I would expect the two ratios to get more
> similar.
>
> Peter

Surnames are extremely realistic data. The OP should consider using
Levenshtein distance, which is "symmetric". A good (non-naive)
implementation should be much faster than difflib.

ratio = 1.0 - levenshtein(a, b) / float(max(len(a), len(b)))
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Raw Unicode docstring

2010-11-16 Thread John Machin

On Nov 17, 9:34 am, Alexander Kapps  wrote:

>  >>> ur"Scheißt\nderBär\nim Wald?"

Nicht ohne eine Genehmigung von der Umwelt Erhaltung Abteilung.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: A bug for raw string literals in Py3k?

2010-10-31 Thread John Machin

On Oct 31, 11:23 pm, Yingjie Lan  wrote:
> > > So I suppose this is a bug?
>
> > It's not, see
>
> >http://docs.python.org/py3k/reference/lexical_analysis.html#literals
>
> > # Specifically, a raw string cannot end in a single backslash
>
> Thanks! That looks weird to me ... doesn't this contradict with:
>
> All backslashes in raw string literals are interpreted literally.
> (seehttp://docs.python.org/release/3.0.1/whatsnew/3.0.html):

All backslashes in syntactically-correct raw string literals are
interpreted literally.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Runtime error

2010-10-29 Thread John Machin

On Oct 29, 3:26 am, Sebastian  wrote:
> Hi all,
>
> I am new to python and I don't know how to fix this error. I only try to
> execute python (or a cgi script) and I get an ouptu like
>
> [...]
> 'import site' failed; traceback:
> Traceback (most recent call last):
> File "/usr/lib/python2.6/site.py", line 513, in 
> main()
> File "/usr/lib/python2.6/site.py", line 496, in main
> known_paths = addsitepackages(known_paths)
> File "/usr/lib/python2.6/site.py", line 288, in addsitepackages
> addsitedir(sitedir, known_paths)
> File "/usr/lib/python2.6/site.py", line 185, in addsitedir
> addpackage(sitedir, name, known_paths)
> File "/usr/lib/python2.6/site.py", line 155, in addpackage
> exec line
> File "", line 1, in 
> File "/usr/lib/python2.6/site.py", line 185, in addsitedir
> addpackage(sitedir, name, known_paths)
> File "/usr/lib/python2.6/site.py", line 155, in addpackage
> exec line
> File "", line 1, in 
> File "/usr/lib/python2.6/site.py", line 185, in addsitedir
> addpackage(sitedir, name, known_paths)
> File "/usr/lib/python2.6/site.py", line 155, in addpackage
> exec line
> [...]
> File "/usr/lib/python2.6/site.py", line 185, in addsitedir
> addpackage(sitedir, name, known_paths)
> File "/usr/lib/python2.6/site.py", line 155, in addpackage
> exec line
> File "", line 1, in 
> File "/usr/lib/python2.6/site.py", line 175, in addsitedir
> sitedir, sitedircase = makepath(sitedir)
> File "/usr/lib/python2.6/site.py", line 76, in makepath
> dir = os.path.abspath(os.path.join(*paths))
> RuntimeError: maximum recursion depth exceeded
>
> What is going wrong with my python install? What do I have to change?

Reading the code for site.py, it looks like you may have a .pth file
that is self-referential (or a chain or 2 or more .pth files!) that
are sending you around in a loop. If you are having trouble
determining what files are involved, you could put some print
statements in your site.py at about lines 155 and 185 (which appear to
be in the loop, according to the traceback) or step through it with a
debugger.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Get alternative char name with unicodedata.name() if no formal one defined

2010-10-14 Thread John Machin

On Oct 14, 7:25 pm, Dirk Wallenstein  wrote:
> Hi,
> I'd like to get control char names for the first 32 codepoints, but they
> apparently only have an alias and no official name. Is there a way to
> get the alternative character name (alias) in Python?
>

AFAIK there is no programatically-available list of those names. Try
something like:

name = unicodedata.name(x, some_default) if x > u"\x1f" else ("NULL",
etc etc, "UNIT SEPARATOR")[ord(x)]

or similarly with a prepared dict:

C0_CONTROL_NAMES = {
u"\x00": "NULL",
# etc
u"\x1f": "UNIT SEPARATOR",
}

name = unicodedata.name(x, some_default) if x > u"\x1f" else
C0_CONTROL_NAMES[x]
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Wrong default endianess in utf-16 and utf-32 !?

2010-10-12 Thread John Machin

jmfauth  gmail.com> writes:

> When an endianess is not specified, (BE, LE, unmarked forms),
> the Unicode Consortium specifies, the default byte serialization
> should be big-endian.
> 
> See http://www.unicode.org/faq//utf_bom.html
> Q: Which of the UTFs do I need to support?
> and
> Q: Why do some of the UTFs have a BE or LE in their label,
> such as UTF-16LE?

Sometimes it is necessary to read right to the end of an answer:

Q: Why do some of the UTFs have a BE or LE in their label, such as UTF-16LE?

A: [snip] the unmarked form uses big-endian byte serialization by default, but
may include a byte order mark at the beginning to indicate the actual byte
serialization used.

-- 
http://mail.python.org/mailman/listinfo/python-list

cp936 uses gbk codec, doesn't decode `\x80` as U+20AC EURO SIGN

2010-10-10 Thread John Machin


|>>> '\x80'.decode('cp936')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'gbk' codec can't decode byte 0x80
 in position 0: incomplete multibyte sequence

However:

Retrieved 2010-10-10 from
http://www.unicode.org/Public
/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP936.TXT

#Name: cp936 to Unicode table
#Unicode version: 2.0
#Table version: 2.01
#Table format:  Format A
#Date:  1/7/2000
#
#Contact:   shawn.ste...@microsoft.com
...
0x7F0x007F  #DELETE
0x800x20AC  #EURO SIGN
0x81#DBCS LEAD BYTE

Retrieved 2010-10-10 from
http://msdn.microsoft.com/en-us/goglobal/cc305153.aspx

Windows Codepage 936
[pictorial mapping; shows 80 mapping to 20AC]

Retrieved 2010-10-10 from
http://demo.icu-project.org
/icu-bin/convexp?conv=windows-936-2000&s=ALL

[pictorial mapping for converter
"windows-936-2000" with
aliases including GBK, CP936, MS936;
shows 80 mapping to 20AC]

So Microsoft appears to think that
cp936 includes the euro,
and the ICU project seem to think that GBK and cp936
both include the euro.

A couple of questions:

Is this a bug or a shrug?

Where can one find the mapping tables
from which the various CJK codecs are derived?




-- 
http://mail.python.org/mailman/listinfo/python-list

strange results from sys.version

2010-09-27 Thread John Machin

I am trying to help a user of my xlrd package who says he is getting 
anomalous results on his "work computer" but not on his "home computer".


Attempts to reproduce his alleged problem in a verifiable manner on his 
"work computer" have failed, so far ... the only meaning difference in 
script output is in sys.version


User (work): sys.version: 2.7 (r27:82500, Aug 23 2010, 17:18:21) etc
Me : sys.version: 2.7 (r27:82525, Jul  4 2010, 09:01:59) etc

I have just now downloaded the Windows x86 msi from www.python.org and 
reinstalled it on another computer. It gives the same result as on my 
primary computer (above).


User result looks whacked: lower patch number, later date. 
www.python.org says "Python 2.7 was released on July 3rd, 2010."


Is it possible that the "work computer" is using an unofficial release? 
What other possibilities are there?


Thanks in advance ...
--
http://mail.python.org/mailman/listinfo/python-list

Re: Detect string has non-ASCII chars without checking each char?

2010-08-22 Thread John Machin

On Aug 23, 1:10 am, "Michel Claveau -
MVP" wrote:
> Re !
>
> > Try your code with u"abcd\xa1" ... it says it's ASCII.
>
> Ah?  in my computer, it say "False"

Perhaps your computer has a problem. Mine does this with both Python
2.7 and Python 2.3 (which introduced the unicodedata.normalize
function):

  >>> import unicodedata
  >>> t1 = u"abcd\xa1"
  >>> t2 = unicodedata.normalize('NFD', t1)
  >>> t3 = t2.encode('ascii', 'replace')
  >>> [t1, t2, t3]
  [u'abcd\xa1', u'abcd\xa1', 'abcd?']
  >>> map(len, _)
  [5, 5, 5]
  >>>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Detect string has non-ASCII chars without checking each char?

2010-08-22 Thread John Machin

On Aug 22, 5:07 pm, "Michel Claveau -
MVP" wrote:
> Hi!
>
> Another way :
>
>   # -*- coding: utf-8 -*-
>
>   import unicodedata
>
>   def test_ascii(struni):
>       strasc=unicodedata.normalize('NFD', struni).encode('ascii','replace')
>       if len(struni)==len(strasc):
>          return True
>       else:
>          return False
>
>   print test_ascii(u"abcde")
>   print test_ascii(u"abcdê")

-1

Try your code with u"abcd\xa1" ... it says it's ASCII.

Suggestions:
   test_ascii = lambda s: len(s.decode('ascii', 'ignore')) == len(s)
or
   test_ascii = lambda s: all(c < u'\x80' for c in s)
or
   use try/except

Also:
if a == b:
return True
else:
return False
is a horribly bloated way of writing
return a == b


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: re.sub and variables

2010-08-12 Thread John Machin

On Aug 13, 7:33 am, fuglyducky  wrote:
> On Aug 12, 2:06 pm, fuglyducky  wrote:
>
>
>
> > I have a function that I am attempting to call from another file. I am
> > attempting to replace a string using re.sub with another string. The
> > problem is that the second string is a variable. When I get the
> > output, it shows the variable name rather than the value. Is there any
> > way to pass a variable into a regex?
>
> > If not, is there any other way to do this? I need to be able to dump
> > the variable value into the replacement string.
>
> > For what it's worth this is an XML file so I'm not afraid to use some
> > sort of XML library but they look fairly complicated for a newbie like
> > me.
>
> > Also, this is py3.1.2 is that makes any difference.
>
> > Thanks!!!
>
> > #
>
> > import random
> > import re
> > import datetime
>
> > def pop_time(some_string, start_time):
> >     global that_string
>
> >     rand_time = random.randint(0, 30)
> >     delta_time = datetime.timedelta(seconds=rand_time)
>
> >     for line in some_string:
> >         end_time = delta_time + start_time
> >         new_string = re.sub("thisstring", "thisstring\\end_time",
> > some_string)
> >         start_time = end_time
>
> >     return new_string
>
> Disregard...I finally figured out how to use string.replace. That
> appears to work perfectly. Still...if anyone happens to know about
> passing a variable into a regex that would be great.

Instead of

new_string = re.sub(
"thisstring", "thisstring\\end_time", some_string)

you probably meant to use something like

new_string = re.sub(
"thisstring", "thisstring" + "\\" + end_time, some_string)

string.replace is antique and deprecated. You should be using methods
of str objects, not functions in the string module.

  >>> s1 = "foobarzot"
  >>> s2 = s1.replace("bar", "-")
  >>> s2
  'foo-zot'
  >>>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ascii to Unicode.

2010-07-30 Thread John Machin

On Jul 30, 4:18 am, Carey Tilden  wrote:
> In this case, you've been able to determine the
> correct encoding (latin-1) for those errant bytes, so the file itself
> is thus known to be in that encoding.

The most probably "correct" encoding is, as already stated, and agreed
by the OP to be, cp1252.


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Ascii to Unicode.

2010-07-28 Thread John Machin

On Jul 29, 4:32 am, "Joe Goldthwaite"  wrote:
> Hi,
>
> I've got an Ascii file with some latin characters. Specifically \xe1 and
> \xfc.  I'm trying to import it into a Postgresql database that's running in
> Unicode mode. The Unicode converter chokes on those two characters.
>
> I could just manually replace those to characters with something valid but
> if any other invalid characters show up in later versions of the file, I'd
> like to handle them correctly.
>
> I've been playing with the Unicode stuff and I found out that I could
> convert both those characters correctly using the latin1 encoder like this;
>
>         import unicodedata
>
>         s = '\xe1\xfc'
>         print unicode(s,'latin1')
>
> The above works.  When I try to convert my file however, I still get an
> error;
>
>         import unicodedata
>
>         input = file('ascii.csv', 'r')
>         output = file('unicode.csv','w')
>
>         for line in input.xreadlines():
>                 output.write(unicode(line,'latin1'))
>
>         input.close()
>         output.close()
>
> Traceback (most recent call last):
>   File "C:\Users\jgold\CloudmartFiles\UnicodeTest.py", line 10, in __main__
>     output.write(unicode(line,'latin1'))
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position
> 295: ordinal not in range(128)
>
> I'm stuck using Python 2.4.4 which may be handling the strings differently
> depending on if they're in the program or coming from the file.  I just
> haven't been able to figure out how to get the Unicode conversion working
> from the file data.
>
> Can anyone explain what is going on?

Hello hello ... you are running on Windows; the likelihood that you
actually have data encoded in latin1 is very very small. Follow MRAB's
answer but replace "latin1" by "cp1252".
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Where is the help page for re.MatchObject?

2010-07-28 Thread John Machin

On Jul 28, 1:26 pm, Peng Yu  wrote:
> I know the library reference webpage for re.MatchObject is 
> athttp://docs.python.org/library/re.html#re.MatchObject
>
> But I don't find such a help page in python help(). Does anybody know
> how to get it in help()?

Yes, but it doesn't tell you very much:

| >>> import re
| >>> help(re.match('x', 'x'))
| Help on SRE_Match object:
|
| class SRE_Match(__builtin__.object)
|
| >>>
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: newb

2010-07-27 Thread John Machin

On Jul 27, 9:07 pm, whitey  wrote:
> hi all. am totally new to python and was wondering if there are any
> newsgroups that are there specifically for beginners. i have bought a
> book for $2 called "learn to program using python" by alan gauld.
> starting to read it but it was written in 2001. presuming that the
> commands and info would still be valid? any websites or books that are a
> must for beginners? any input would be much appreciated...cheers

2001 is rather old. Most of what you'll want is on the web. See
http://wiki.python.org/moin/BeginnersGuide
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode error

2010-07-24 Thread John Machin

dirknbr  gmail.com> writes:

> I have kind of developped this but obviously it's not nice, any better
> ideas?
> 
> try:
> text=texts[i]
> text=text.encode('latin-1')
> text=text.encode('utf-8')
> except:
> text=' '

As Steven has pointed out, if the .encode('latin-1') works, the result is thrown
away. This would be very fortunate. 

It appears that your goal was to encode the text in latin1 if possible,
otherwise in UTF-8, with no indication of which encoding was used. Your second
posting confirmed that you were doing this in a loop, ending up with the
possibility that your output file would have records with mixed encodings.

Did you consider what a programmer writing code to READ your output file would
need to do, e.g. attempt to decode each record as UTF-8 with a fall-back to
latin1??? Did you consider what would be the result of sending a stream of
mixed-encoding text to a display device?

As already advised, the short answer to avoid all of that hassle; just encode in
UTF-8.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python 2.7 released

2010-07-04 Thread John Machin

On Jul 5, 12:27 pm, Martineau  wrote:
> On Jul 4, 8:34 am, Benjamin Peterson  wrote:
>
>
>
> > On behalf of the Python development team, I'm jocund to announce the second
> > release candidate of Python 2.7.
>
> > Python 2.7 will be the last major version in the 2.x series. However, it 
> > will
> > also have an extended period of bugfix maintenance.
>
> > 2.7 includes many features that were first released in Python 3.1. The 
> > faster io
> > module, the new nested with statement syntax, improved float repr, set 
> > literals,
> > dictionary views, and the memoryview object have been backported from 3.1. 
> > Other
> > features include an ordered dictionary implementation, unittests 
> > improvements, a
> > new sysconfig module, auto-numbering of fields in the str/unicode format 
> > method,
> > and support for ttk Tile in Tkinter.  For a more extensive list of changes 
> > in
> > 2.7, seehttp://doc.python.org/dev/whatsnew/2.7.htmlorMisc/NEWS in the Python
> > distribution.
>
> > To download Python 2.7 visit:
>
> >      http://www.python.org/download/releases/2.7/
>
> > 2.7 documentation can be found at:
>
> >      http://docs.python.org/2.7/
>
> > This is a production release and should be suitable for all libraries and
> > applications.  Please report any bugs you find, so they can be fixed in the 
> > next
> > maintenance releases.  The bug tracker is at:
>
> >      http://bugs.python.org/
>
> > Enjoy!
>
> > --
> > Benjamin Peterson
> > Release Manager
> > benjamin at python.org
> > (on behalf of the entire python-dev team and 2.7's contributors)
>
> Benjamin (or anyone else), do you know where I can get the Compiled
> Windows Help file -- python27.chm -- for this release? In the past
> I've been able to download it from the Python web site, but have been
> unable to locate it anywhere for this new release. I can't build it
> myself because I don't have the Microsoft HTML help file compiler.
>
> Thanks in advance.

If you have a Windows box, download the .msi installer for Python 2.7
and install it. The chm file will be in C:\Python27\Doc (if you choose
the default installation directory). Otherwise ask a friendly local
Windows user for a copy.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: SyntaxError not honoured in list comprehension?

2010-07-04 Thread John Machin

On Jul 5, 1:08 am, Thomas Jollans  wrote:
> On 07/04/2010 03:49 PM, jmfauth wrote:
> >   File "", line 1
> >     print9.0
> >            ^
> > SyntaxError: invalid syntax
>
> somewhat strange, yes.

There are two tokens, "print9" (a name) and ".0" (a float constant) --
looks like SyntaxError to me.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: escape character / csv module

2010-07-02 Thread John Machin

On Jul 2, 6:04 am, MRAB  wrote:

> The csv module imports from _csv, which suggests to me that there's code
> written in C which thinks that the "\x00" is a NUL terminator, so it's a
> bug, although it's very unusual to want to write characters like "\x00"
> to a CSV file, and I wouldn't be surprised if this is the first time
> it's been noticed! :-)

Don't be surprised, read the documentation (http://docs.python.org/
library/csv.html#module-csv):

"""Note

This version of the csv module doesn’t support Unicode input. Also,
there are currently some issues regarding ASCII NUL characters.
Accordingly, all input should be UTF-8 or printable ASCII to be safe;
see the examples in section Examples. These restrictions will be
removed in the future."""

The NUL/printable part of the note has been there since the module was
introduced in Python 2.3.0.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Handling text lines from files with some (few) starnge chars

2010-06-05 Thread John Machin

On Jun 6, 12:14 pm, MRAB  wrote:
> Paulo da Silva wrote:
> > Em 06-06-2010 00:41, Chris Rebert escreveu:
> >> On Sat, Jun 5, 2010 at 4:03 PM, Paulo da Silva
> >>  wrote:
> > ...
>
> >> Specify the encoding of the text when opening the file using the
> >> `encoding` parameter. For Windows-1252 for example:
>
> >> your_file = open("path/to/file.ext", 'r', encoding='cp1252')
>
> > OK! This fixes my current problem. I used encoding="iso-8859-15". This
> > is how my text files are encoded.
> > But what about a more general case where the encoding of the text file
> > is unknown? Is there anything like "autodetect"?
>
>  >
> An encoding like 'cp1252' uses 1 byte/character, but so does 'cp1250'.
> How could you tell which was the correct encoding?
>
> Well, if the file contained words in a certain language and some of the
> characters were wrong, then you'd know that the encoding was wrong. This
> does imply, though, that you'd need to know what the language should
> look like!
>
> You could try different encodings, and for each one try to identify what
> could be words, then look them up in dictionaries for various languages
> to see whether they are real words...

This has been automated (semi-successfully, with caveats) by the
chardet package ... see http://chardet.feedparser.org/
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: signed vs unsigned int

2010-06-02 Thread John Machin

On Jun 2, 4:43 pm, johnty  wrote:
> i'm reading bytes from a serial port, and storing it into an array.
>
> each byte represents a signed 8-bit int.
>
> currently, the code i'm looking at converts them to an unsigned int by
> doing ord(array[i]). however, what i'd like is to get the _signed_
> integer value. whats the easiest way to do this?

signed = unsigned if unsigned <= 127 else unsigned - 256
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: expat parsing error

2010-06-01 Thread John Machin

On Jun 2, 1:57 am, "kak...@gmail.com"  wrote:
> On Jun 1, 11:12 am, "kak...@gmail.com"  wrote:
>
>
>
> > On Jun 1, 11:09 am, John Bokma  wrote:
>
> > > "kak...@gmail.com"  writes:
> > > > On Jun 1, 10:34 am, Stefan Behnel  wrote:
> > > >> kak...@gmail.com, 01.06.2010 16:00:
>
> > > >> > how can i fix it, how to "ignore" the headers and parse only
> > > >> > the XML?
>
> > > >> Consider reading the answers you got in the last thread that you opened
> > > >> with exactly this question.
>
> > > >> Stefan
>
> > > > That's exactly, what i did but something seems to not working with the
> > > > solutions i had, when i changed my implementation from pure Python's
> > > > sockets to twisted library!
> > > > That's the reason i have created a new post!
> > > > Any ideas why this happened?
>
> > > As I already explained: if you send your headers as well to any XML
> > > parser it will choke on those, because the headers are /not/ valid /
> > > well-formed XML. The solution is to remove the headers from your
> > > data. As I explained before: headers are followed by one empty
> > > line. Just remove lines up and until including the empty line, and pass
> > > the data to any XML parser.
>
> > > --
> > > John Bokma                                                               
> > > j3b
>
> > > Hacking & Hiking in Mexico -  
> > > http://johnbokma.com/http://castleamber.com/-Perl&; Python Development
>
> > Thank you so much i'll try it!
> > Antonis
>
> Dear John can you provide me a simple working solution?
> I don't seem to get it

You're not wrong. Trysomething like this:

rubbish1, rubbish2, xml = your_guff.partition('\n\n')
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help with Regexp, \b

2010-05-31 Thread John Machin

On May 30, 1:30 am, andrew cooke  wrote:

>
> That's what I thought it did...  Then I read the docs and confused
> "empty string" with "space"(!) and convinced myself otherwise.  I
> think I am going senile.

Not necessarily. Conflating concepts like "string containing
whitespace", "string containing space(s)", "empty aka 0-length
string", None, (ASCII) NUL, and (SQL) NULL appears to be an age-
independent problem :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError having fetch web page

2010-05-26 Thread John Machin

Rob Williscroft  rtw.me.uk> writes:

> 
> Barry wrote in news:83dc485a-5a20-403b-99ee-c8c627bdbab3
> @m21g2000vbr.googlegroups.com in gmane.comp.python.general:
> 

> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1:
> > unexpected code byte
> 
> It may not be you, en.wiktionary.org is sending gzip 
> encoded content back,

It sure is; here's where the offending 0x8b comes from:

"""ID1 (IDentification 1)
   ID2 (IDentification 2)
These have the fixed values ID1 = 31 (0x1f, \037), ID2 = 139
(0x8b, \213), to identify the file as being in gzip format."""

(from http://www.faqs.org/rfcs/rfc1952.html)


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: help need to write a python spell checker

2010-05-18 Thread John Machin

On May 19, 1:37 pm, Steven D'Aprano  wrote:
> On Wed, 19 May 2010 13:01:10 +1000, Nigel Rowe wrote:
> > I'm happy to do you homework for you, cost is us$1000 per hour.  Email
> > to your professor automatically on receipt.
>
> I'll do it for $700 an hour!

he could save the money if he oogledgay orvignay ellspay eckerchay
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Puzzled by code pages

2010-05-15 Thread John Machin

Adam Tauno Williams  whitemice.org> writes:

> On Fri, 2010-05-14 at 20:27 -0400, Adam Tauno Williams wrote:
> > I'm trying to process OpenStep plist files in Python.  I have a parser
> > which works, but only for strict ASCII.  However plist files may contain
> > accented characters - equivalent to ISO-8859-2 (I believe).  For example
> > I read in the line:

> > '"skyp4_filelist_10201/localit\xc3\xa0 termali_sortfield" =
> > NSFileName;\n'
> > What is the correct way to re-encode this data into UTF-8 so I can use
> > unicode strings, and then write the output back to ISO8859-?

> Buried in the parser is a str(...) call.  Replacing that with
> unicode(...) and now the OpenSTEP plist parser is working with Italian
> plists.

Some observations:

Italian text is much more likely to be encoded in ISO-8859-1 than ISO-8859-2.
The latter covers eastern European languages (e.g. Polish, Czech, Hungarian)
that use the Latin alphabet with many "decorations" not found in western 
alphabets.

Let's look at the 'localit\xc3\xa0' example. Using ISO-8859-2, that decodes to
u'localit\u0102\xa0'. The second-last character is LATIN CAPITAL LETTER A WITH
BREVE (according to unicodedata.name()). The last character is NO-BREAK SPACE.
Doesn't look like an Italian word to me.

However, using UTF-8, that decodes to u'localit\xe0'. The last character is
LATIN SMALL LETTER A WITH GRAVE. Looks like a plausible Italian word to me. Also
to Wikipedia: "A località (literally "locality"; plural località) is the name
given in Italian administrative law to a type of territorial subdivision of a
comune ..."

Conclusions:

It's worth closely scrutinising "accented characters - equivalent to ISO-8859-2
(I believe)". Which variety of "OpenStep plist files" are you looking at:
NeXTSTEP, GNUstep, or MAC OS X? If the latter, it's evidently an XML document,
and you should be letting the XML parser decode it for you and in any case as an
XML document it's most likely UTF-8, not ISO-8859-2.

It's worth examining your definition of "working".

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Fastest way to calculate leading whitespace

2010-05-09 Thread John Machin

dasacc22  gmail.com> writes:

> 
> U presume entirely to much. I have a preprocessor that normalizes
> documents while performing other more complex operations.  Theres
> nothing buggy about what im doing

Are you sure?

Your "solution" calculates (the number of leading whitespace characters) + (the
number of TRAILING whitespace characters).

Problem 1: including TRAILING whitespace.
Example: "content" + 3 * " " + "\n" has 4 leading spaces according to your
reckoning; should be 0.
Fix: use lstrip() instead of strip()

Problem 2: assuming all whitespace characters have *effective* width the same as
" ".
Examples: TAB has width 4 or 8 or whatever you want it to be. There are quite a
number of whitespace characters, even when you stick to ASCII. When you look at
Unicode, there are heaps more. Here's a list of BMP characters such that
character.isspace() is True, showing the Unicode codepoint, the Python repr(),
and the name of the character (other than for control characters):

U+0009 u'\t' ?
U+000A u'\n' ?
U+000B u'\x0b' ?
U+000C u'\x0c' ?
U+000D u'\r' ?
U+001C u'\x1c' ?
U+001D u'\x1d' ?
U+001E u'\x1e' ?
U+001F u'\x1f' ?
U+0020 u' ' SPACE
U+0085 u'\x85' ?
U+00A0 u'\xa0' NO-BREAK SPACE
U+1680 u'\u1680' OGHAM SPACE MARK
U+2000 u'\u2000' EN QUAD
U+2001 u'\u2001' EM QUAD
U+2002 u'\u2002' EN SPACE
U+2003 u'\u2003' EM SPACE
U+2004 u'\u2004' THREE-PER-EM SPACE
U+2005 u'\u2005' FOUR-PER-EM SPACE
U+2006 u'\u2006' SIX-PER-EM SPACE
U+2007 u'\u2007' FIGURE SPACE
U+2008 u'\u2008' PUNCTUATION SPACE
U+2009 u'\u2009' THIN SPACE
U+200A u'\u200a' HAIR SPACE
U+200B u'\u200b' ZERO WIDTH SPACE
U+2028 u'\u2028' LINE SEPARATOR
U+2029 u'\u2029' PARAGRAPH SEPARATOR
U+202F u'\u202f' NARROW NO-BREAK SPACE
U+205F u'\u205f' MEDIUM MATHEMATICAL SPACE
U+3000 u'\u3000' IDEOGRAPHIC SPACE

Hmmm, looks like all kinds of widths, from zero upwards.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?

2010-05-04 Thread John Machin

On May 5, 3:43 am, Terry Reedy  wrote:
> On 5/4/2010 11:37 AM, Stefan Behnel wrote:
>
> > Barak, Ron, 04.05.2010 16:11:
> >> The XML file seems to be valid XML (all XML viewers I tried were able
> >> to read it).
>
>  From Internet Explorer:
>
> The XML page cannot be displayed
> Cannot view XML input using XSL style sheet. Please correct the error
> and then click the Refresh button, or try again later.
>
> 
>
> An invalid character was found in text content. Error processing
> resource 'file:///C:/Documents and Settings...
>
>       "BROLB21
>
>
>
> > This is what xmllint gives me:
>
> > ---
> > $ xmllint /home/sbehnel/tmp.xml
> > tmp.xml:6: parser error : Char 0x0 out of allowed range
> > "MainStorage_snap
> > ^
> > tmp.xml:6: parser error : Premature end of data in tag m_sanApiName1 line 6
> > "MainStorage_snap
> > ^
> > tmp.xml:6: parser error : Premature end of data in tag DbHbaGroup line 5
> > "MainStorage_snap
> > ^
> > tmp.xml:6: parser error : Premature end of data in tag database line 4
> > "MainStorage_snap
> > ^
> > ---
>
> > The file contains 0-bytes - clearly not XML.
>
> IE agrees.

Look closer. IE *DOESN'T* agree. It has ignored the problem on line 6
and lurched on to the next problem (in line 11). If you edit that file
to remove the line noise in line 11, leaving the 3 cases of multiple
\x00 bytes, IE doesn't complain at all about the (invalid) \x00 bytes.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to get xml.etree.ElementTree not bomb on invalid characters in XML file ?

2010-05-04 Thread John Machin

On May 5, 12:11 am, "Barak, Ron"  wrote:
> > -Original Message-
> > From: Stefan Behnel [mailto:stefan...@behnel.de]
> > Sent: Tuesday, May 04, 2010 10:24 AM
> > To: python-l...@python.org
> > Subject: Re: How to get xml.etree.ElementTree not bomb on
> > invalid characters in XML file ?
>
> > Barak, Ron, 04.05.2010 09:01:
> > >  I'm parsing XML files using ElementTree from xml.etree (see code
> > > below (and attached xml_parse_example.py)).
>
> > > However, I'm coming across input XML files (attached an example:
> > > tmp.xml) which include invalid characters, that produce the
> > following
> > > traceback:
>
> > > $ python xml_parse_example.py
> > > Traceback (most recent call last):
> > > xml.parsers.expat.ExpatError: not well-formed (invalid
> > token): line 6,
> > > column 34
>
> > I hope you are aware that this means that the input you are
> > parsing is not XML. It's best to reject the file and tell the
> > producers that they are writing broken output files. You
> > should always fix the source, instead of trying to make sense
> > out of broken input in fragile ways.
>
> > > I read the documentation for xml.etree.ElementTree and see
> > that it may
> > > take an optional parser parameter, but I don't know what
> > this parser
> > > should be - to ignore the invalid characters.
>
> > > Could you suggest a way to call ElementTree, so it won't
> > bomb on these
> > > invalid characters ?
>
> > No. The parser in lxml.etree has a 'recover' option that lets
> > it try to recover from input errors, but in general, XML
> > parsers are required to reject non well-formed input.
>
> > Stefan
>
> Hi Stefan,
> The XML file seems to be valid XML (all XML viewers I tried were able to read 
> it).
> You can verify this by trying to read the XML example I attached to the 
> original message (attached again here).
> Actually, when trying to view the file with an XML viewer, these offensive 
> characters are not shown.
> It's just that some of the fields include characters that the parser used by 
> ElementTree seems to chock on.
> Bye,
> Ron.
>
>  tmp_small.xml
> < 1KViewDownload

Have a look at your file with e.g. a hex editor or just Python repr()
-- see below. You will see that there are four cases of
good_data\x00garbage
where "garbage" is repeated \x00 or just random line noise or
uninitialised memory.

"MainStorage_snap\x00\x00*SNIP*\x00\x00"

"BROLB21\x00\xee"\x00\x00\x00\x90,\x02G\xdc\xfb\x04P\xdc
\xfb\x04\x01a\xfc>(\xe8\xfb\x04"

It's a toss-up whether the > in there is accidental or a deliberate
attempt to sanitise the garbage !-)

"Alstom\x00\x00o\x00m\x00\x00*SNIP*\x00\x00"

"V5R1.28.1 [R - LA]\x00\x00*SNIP*\x00\x00"

The garbage in the 2nd case is such as to make the initial
declaration
encoding="UTF-8"
an outright lie and I'm curious as to how the XML parser managed to
get as far as it did -- it must decode a line at a time.

As already advised: it's much better to reject that rubbish outright
than to attempt to repair it. Repair should be contemplated only if
it's a one-off exercise AND you can't get a fixed copy from the
source.

And while we're on the subject of rubbish: """The XML file seems to be
valid XML (all XML viewers I tried were able to read it).""" The
conclusion from that is that all XML viewers that you tried are
rubbish.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: condition and True or False

2010-05-02 Thread John Machin

On May 3, 9:14 am, Steven D'Aprano  wrote:

> If it is any arbitrary object, then "x and True or False" is just an
> obfuscated way of writing "bool(x)". Perhaps their code predates the
> introduction of bools, and they have defined global constants True and
> False but not bool. Then they removed the True and False bindings as no
> longer necessary, but neglected to replace the obfuscated conversion.

Or perhaps they are maintaining code that must run on any 2.X. True
and False would be set up conditional on Python version. Writing
"expression and True or False" avoids a function call.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: csv.py sucks for Decimal

2010-04-25 Thread John Machin

On Apr 23, 9:23 am, Phlip  wrote:

> When I use the CSV library, with QUOTE_NONNUMERIC, and when I pass in
> a Decimal() object, I must convert it to a string.

Why must you? What unwanted effect do you observe when you don't
convert it?

> the search for an alternate CSV module, without
> this bug, will indeed begin very soon!

What bug?

> I'm pointing out that QUOTE_NONNUMERIC would work better with an
> option to detect numeric-as-string, and absolve it. That would allow
> Decimal() to do its job, unimpeded.

Decimal()'s job is to create an instance of the decimal.Decimal class;
how is that being impeded by anything in the csv module?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: subtraction is giving me a syntax error

2010-03-15 Thread John Machin

On Mar 16, 5:43 am, Baptiste Carvello  wrote:
> Joel Pendery a écrit :

> > So I am trying to write a bit of code and a simple numerical
> > subtraction
>
> > y_diff = y_diff-H
>
> > is giving me the error
>
> > Syntaxerror: Non-ASCII character '\x96' in file on line 70, but no
> > encoding declared.
>
>
> I would say that when you press the minus key, your operating system doesn't
> encode the standard (ASCII) minus character, but some fancy character, which
> Python cannot interpret.

The likelihood that any operating system however brain-damaged and in
whatever locale would provide by default a "keyboard" or "input
method" that generated EN DASH when the '-' key is struck is somewhere
between zero and epsilon.

Already advanced theories like "used a word processor instead of a
programmer's editor" and "scraped it off the web" are much more
plausible.

> More precisely, I suspect you are unsing Windows with codepage 1252 (latin 1).

Codepage 1252 is not "latin1" in the generally accepted meaning of
"latin1" i.e. ISO-8859-1. It is a superset. MS in their wisdom or
otherwise chose to use most of the otherwise absolutely wasted slots
assigned to "C1 control characters" in latin1.

> With this encoding, you have 2 kinds of minus signs: the standard (45th
> character, in hex '\x2d') and the non-standard (150th character, in hex 
> '\x96').
>
> cf:http://msdn.microsoft.com/en-us/library/cc195054.aspx

The above link quite correctly says that '\x96` maps to U+2013 EN
DASH. EN DASH is not any kind of minus sign.

Aside: the syndrome causing the problem is apparent with cp125x for x
in range(9)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: datelib pythonification

2010-02-21 Thread John Machin

On Feb 21, 12:37 pm, alex goretoy  wrote:
> hello all,
>     since I posted this last time, I've added a new function dates_diff and

[SNIP]

I'm rather unsure of the context of this posting ... I'm assuming that
the subject "datelib pythonification" refers to trying to make
"datelib" more "pythonic", with which you appear to need help.

Looking just at the new "function" (looks like a method to me)
dates_diff, problems include:

1. Mostly ignores PEP-8 about spaces after commas, around operators
2. Checks types
3. Checks types using type(x) == type(y)
4. Inconsistent type checking: checks types in case of
dates_diff(date1, date2) but not in case of dates_diff([date1, date2])
5. Doesn't check for 3 or more args.
6. The 0-arg case is for what purpose?
7. The one-arg case is overkill -- if the caller has the two values in
alist, all you are saving them from is the * in dates_diff(*alist)
8. Calling type(date.today()) once per 2-arg call would be a gross
extravagance; calling it twice per 2-arg call is mind-boggling.
9. start,end=(targs[0][0],targs[0][1]) ... multiple constant
subscripts is a code smell; this one is pongier than usual because it
could easily be replaced by start, end = targs[0]

Untested fix of problems 1, 3, 4, 5, 8, 9:

DATE_TYPE = type(date.today())

def dates_diff(self, *targs):
nargs = len(targs)
if nargs == 0:
return self.enddate - self.startdate
if nargs == 1:
arg = targs[0]
if not isinstance(arg, (list, tuple)) or len(arg) != 2:
raise Exception(
"single arg must be list or tuple of length 2")
start, end = arg
elif nargs == 2:
start, end = targs
else:
raise Exception("expected 0,1, or 2 args; found %d" % nargs)
if isinstance(start, DATE_TYPE) and isinstance(end, DATE_TYPE):
return end - start
raise Exception("both values must be of type DATE_TYPE")

HTH,

John

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-14 Thread John Machin

On Jan 15, 3:41 pm, Paul McGuire  wrote:
> I never represented that this parser would handle any and all Excel
> formulas!
>  But I should hope the basic structure of a pyparsing
> solution might help the OP add some of the other features you cited,
> if necessary. It's actually pretty common to take an incremental
> approach in making such a parser, and so here are some of the changes
> that you would need to make based on the deficiencies you pointed out:
>
> functions can have a variable number of arguments, of any kind of
> expression
> - statFunc = lambda name : CaselessKeyword(name) + LPAR + delimitedList
> (expr) + RPAR
>
> sheet name could also be a quoted string
> - sheetRef = Word(alphas, alphanums) | QuotedString("'",escQuote="''")
>
> add boolean literal support
> - boolLiteral = oneOf("TRUE FALSE")
> - operand = numericLiteral | funcCall | boolLiteral | cellRange |
> cellRef

or a string literal ... you seem to have ignored the significant point
that the binary operators don't have narrow type requirements of their
args ("""2.3 & 4.5 produces text "2.34.5", while "2.3" + "4.5"
produces number 6.8"""); your attempt to enforce particular types for
args at compile-time is erroneous OVER-engineering.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-14 Thread John Machin

On Jan 14, 2:05 pm, "Gabriel Genellina" 
wrote:
> En Wed, 13 Jan 2010 05:15:52 -0300, Paul McGuire   
> escribió:
>
> >> vsoler wrote:
> >> > Hence, I need toparseExcel formulas. Can I do it by means only of re
> >> > (regular expressions)?
>
> > This might give the OP a running start:
>
> > from pyparsing import (CaselessKeyword, Suppress, ...
>
> Did you build those parsing rules just by common sense, or following some  
> actual specification?

Leave your common sense with the barkeep when you enter the Excel
saloon; it is likely to be a hindrance. The specification is what
Excel does.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-14 Thread John Machin

On Jan 13, 7:15 pm, Paul McGuire  wrote:
> On Jan 5, 1:49 pm, Tim Chase  wrote:
>
>
>
> > vsoler wrote:
> > > Hence, I need toparseExcel formulas. Can I do it by means only of re
> > > (regular expressions)?
>
> > > I know that for simple formulas such as "=3*A7+5" it is indeed
> > > possible. What about complex for formulas that include functions,
> > > sheet names and possibly other *.xls files?
>
> > Where things start getting ugly is when you have nested function
> > calls, such as
>
> >    =if(Sum(A1:A25)>42,Min(B1:B25), if(Sum(C1:C25)>3.14,
> > (Min(C1:C25)+3)*18,Max(B1:B25)))
>
> > Regular expressions don't do well with nested parens (especially
> > arbitrarily-nesting-depth such as are possible), so I'd suggest
> > going for a full-blown parsing solution like pyparsing.
>
> > If you have fair control over what can be contained in the
> > formulas and you know they won't contain nested parens/functions,
> > you might be able to formulate some sort of "kinda, sorta, maybe
> > parses some forms of formulas" regexp.
>
> > -tkc
>
> This might give the OP a running start:

Unfortunately "this" will blow up after only a few paces; see
below ...

>
> from pyparsing import (CaselessKeyword, Suppress, Word, alphas,
>     alphanums, nums, Optional, Group, oneOf, Forward, Regex,
>     operatorPrecedence, opAssoc, dblQuotedString)
>
> test1 = "=3*A7+5"
> test2 = "=3*Sheet1!$A$7+5"

test2a ="=3*'Sheet 1'!$A$7+5"
test2b ="=3*'O''Reilly''s sheet'!$A$7+5"


> test3 = "=if(Sum(A1:A25)>42,Min(B1:B25), " \
>      "if(Sum(C1:C25)>3.14, (Min(C1:C25)+3)*18,Max(B1:B25)))"

Many functions can take a variable number of args and they are not
restricted to cell references e.g.

test3a = "=sum(a1:a25,10,min(b1,c2,d3))"

The arg separator is comma or semicolon depending on the locale ... a
parser should accept either.


> EQ,EXCL,LPAR,RPAR,COLON,COMMA,DOLLAR = map(Suppress, '=!():,$')
> sheetRef = Word(alphas, alphanums)
> colRef = Optional(DOLLAR) + Word(alphas,max=2)
> rowRef = Optional(DOLLAR) + Word(nums)
> cellRef = Group(Optional(sheetRef + EXCL)("sheet") + colRef("col") +
>                     rowRef("row"))
>
> cellRange = (Group(cellRef("start") + COLON + cellRef("end"))
> ("range")
>                 | cellRef )
>
> expr = Forward()
>
> COMPARISON_OP = oneOf("< = > >= <= != <>")
> condExpr = expr + COMPARISON_OP + expr
>
> ifFunc = (CaselessKeyword("if") +
>           LPAR +
>           Group(condExpr)("condition") +

that should be any expression; at run-time it expects a boolean (TRUE
or FALSE) or a number (0 means false, non-0 means true). Text causes a
#VALUE! error. Trying to subdivide expressions into conditional /
numeric /text just won't work.


>           COMMA + expr("if_true") +
>           COMMA + expr("if_false") + RPAR)
> statFunc = lambda name : CaselessKeyword(name) + LPAR + cellRange +
> RPAR
> sumFunc = statFunc("sum")
> minFunc = statFunc("min")
> maxFunc = statFunc("max")
> aveFunc = statFunc("ave")
> funcCall = ifFunc | sumFunc | minFunc | maxFunc | aveFunc
>
> multOp = oneOf("* /")
> addOp = oneOf("+ -")

needs power op "^"

> numericLiteral = Regex(r"\-?\d+(\.\d+)?")

Sorry, that "-" in there is a unary minus operator. What about 1e23 ?

> operand = numericLiteral | funcCall | cellRange | cellRef
> arithExpr = operatorPrecedence(operand,
>     [
>     (multOp, 2, opAssoc.LEFT),
>     (addOp, 2, opAssoc.LEFT),
>     ])
>
> textOperand = dblQuotedString | cellRef
> textExpr = operatorPrecedence(textOperand,
>     [
>     ('&', 2, opAssoc.LEFT),
>     ])

Excel evaluates excessively permissively, and the punters are
definitely not known for self-restraint. The above just won't work:
2.3 & 4.5 produces text "2.34.5", while "2.3" + "4.5" produces number
6.8.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-12 Thread John Machin


On 12/01/2010 6:26 PM, Chris Withers wrote:

John Machin wrote:

The xlwt package (of which I am the maintainer) has a lexer and parser
for a largish subset of the syntax ... see  
http://pypi.python.org/pypi/xlwt


xlrd, no?


A facility in xlrd to decompile Excel formula bytecode into a text 
formula is currently *under discussion*.


The OP was planning to dig the formula text out using COM then parse the 
formula text looking for cell references and appeared to have a rather 
simplistic view of the ease of parsing Excel formula text -- that's why 
I pointed him at those facilities (existing, released, proven in the 
field) in xlwt.




--
http://mail.python.org/mailman/listinfo/python-list

Re: What is built-in method sub

2010-01-11 Thread John Machin

On Jan 12, 7:30 am, Jeremy  wrote:
> On Jan 11, 1:15 pm, "Diez B. Roggisch"  wrote:
>
>
>
> > Jeremy schrieb:
>
> > > On Jan 11, 12:54 pm, Carl Banks  wrote:
> > >> On Jan 11, 11:20 am, Jeremy  wrote:
>
> > >>> I just profiled one of my Python scripts and discovered that >99% of
> > >>> the time was spent in
> > >>> {built-in method sub}
> > >>> What is this function and is there a way to optimize it?
> > >> I'm guessing this is re.sub (or, more likely, a method sub of an
> > >> internal object that is called by re.sub).
>
> > >> If all your script does is to make a bunch of regexp substitutions,
> > >> then spending 99% of the time in this function might be reasonable.
> > >> Optimize your regexps to improve performance.  (We can help you if you
> > >> care to share any.)
>
> > >> If my guess is wrong, you'll have to be more specific about what your
> > >> sctipt does, and maybe share the profile printout or something.
>
> > >> Carl Banks
>
> > > Your guess is correct.  I had forgotten that I was using that
> > > function.
>
> > > I am using the re.sub command to remove trailing whitespace from lines
> > > in a text file.  The commands I use are copied below.  If you have any
> > > suggestions on how they could be improved, I would love to know.
>
> > > Thanks,
> > > Jeremy
>
> > > lines = self._outfile.readlines()
> > > self._outfile.close()
>
> > > line = string.join(lines)
>
> > > if self.removeWS:
> > >     # Remove trailing white space on each line
> > >     trailingPattern = '(\S*)\ +?\n'
> > >     line = re.sub(trailingPattern, '\\1\n', line)
>
> > line = line.rstrip()?
>
> > Diez
>
> Yep.  I was trying to reinvent the wheel.  I just remove the trailing
> whitespace before joining the lines.

Actually you don't do that. Your regex has three components:

(1) (\S*) zero or more occurrences of not-whitespace
(2) \ +? one or more (non-greedy) occurrences of SPACE
(3) \n a newline

Component (2) should be \s+?

In any case this is a round-about way of doing it. Try writing a regex
that does it simply: replace trailing whitespace by an empty string.

Another problem with your approach: it doesn't work if the line is not
terminated by \n -- this is quite possible if the lines are being read
from a file.

A wise person once said: Re-inventing the wheel is often accompanied
by forgetting to re-invent the axle.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Porblem with xlutils/xlrd/xlwt

2010-01-10 Thread John Machin

On Jan 10, 8:51 pm, pp  wrote:
> On Jan 9, 8:23 am, John Machin  wrote:
>
>
>
> > On Jan 9, 9:56 pm, pp  wrote:
>
> > > On Jan 9, 3:52 am, Jon Clements  wrote:
>
> > > > On Jan 9, 10:44 am, pp  wrote:
>
> > > > > On Jan 9, 3:42 am, Jon Clements  wrote:
>
> > > > > > On Jan 9, 10:24 am, pp  wrote:
> > > > > yeah all my versions are latest fromhttp://www.python-excel.org.
> > > > > just checked!!
>
> > How did you check?

You didn't answer this question.

>
> > > > > what could be the problem?
>
> > > > Does rb = xlrd.open_workbook('somesheet.xls', on_demand=True) work by
> > > > itself?
>
> > > Yes it does. The problem is with line: wb =  copy(rb)
> > > here I am getting the error: AttributeError: 'Book' object has no
> > > attribute 'on_demand'
>
> > Please replace the first 4 lines of your script by these 6 lines:
>
> > import xlrd
> > assert xlrd.__VERSION__ == "0.7.1"
> > from xlwt import easyxf
> > from xlutils.copy import copy
> > rb = xlrd.open_workbook(
> >     'source.xls',formatting_info=True, on_demand=False)
>
> > and run it again. Please copy all the output and paste it into your
> > response.
>
> This time when I ran the code sent by you I got the following
> results:I am using ipython for running the code.
>
> AssertionError                            Traceback (most recent call
> last)
>
> /home/parul/CODES/copy_1.py in ()
>       1
> > 2 import xlrd
>       3 assert xlrd.__VERSION__ == "0.7.1"
>       4 from xlwt import easyxf
>       5 from xlutils.copy import copy
>       6 rb = xlrd.open_workbook('source.xls',formatting_info=True,
> on_demand=False)
>
> AssertionError:
> WARNING: Failure executing file: 
>

Your traceback appears to show an AssertionError from an import
statement. We could do without an extra layer of noise in the channel;
please consider giving ipython the flick (for debug purposes, at
least) and use Python to run your script from the shell prompt.

Change the second line to read:

print xlrd.__VERSION__

> I used www.python-excel.org to get xlrd and xlwt .. so they are latest
> versions.

Let's concentrate on xlrd. I presume that means that you clicked
on the xlrd Download link which took you to http://pypi.python.org/pypi/xlrd
from which you can download the latest version of the package. That
page has "xlrd 0.7.1" in a relatively large font at the top. You would
have been presented with options to download one of these

xlrd-0.7.1.tar.gz
xlrd-0.7.1.win32.exe
xlrd-0.7.1.zip

(each uploaded on 2009-06-01).

Which one did you download, and then what did you do with it?

Or perhaps you ignored those and read further down to "Download link"
which took you to an out-of-date page but you didn't notice the
"0.6.1" in large bold type at the top nor the "Page last updated on 11
June 2007" at the bottom nor the "0.6.1" in the name of the file that
you downloaded ... sorry about that; I've smacked the webmaster about
the chops :-)

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Porblem with xlutils/xlrd/xlwt

2010-01-09 Thread John Machin

On Jan 9, 9:56 pm, pp  wrote:
> On Jan 9, 3:52 am, Jon Clements  wrote:
>
>
>
> > On Jan 9, 10:44 am, pp  wrote:
>
> > > On Jan 9, 3:42 am, Jon Clements  wrote:
>
> > > > On Jan 9, 10:24 am, pp  wrote:

> > > yeah all my versions are latest fromhttp://www.python-excel.org.
> > > just checked!!

How did you check?

> > > what could be the problem?
>
> > Does rb = xlrd.open_workbook('somesheet.xls', on_demand=True) work by
> > itself?
>
> Yes it does. The problem is with line: wb =  copy(rb)
> here I am getting the error: AttributeError: 'Book' object has no
> attribute 'on_demand'

Please replace the first 4 lines of your script by these 6 lines:

import xlrd
assert xlrd.__VERSION__ == "0.7.1"
from xlwt import easyxf
from xlutils.copy import copy
rb = xlrd.open_workbook(
'source.xls',formatting_info=True, on_demand=False)

and run it again. Please copy all the output and paste it into your
response.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How to get many places of pi from Machin's Equation?

2010-01-09 Thread John Machin

On Jan 9, 10:31 pm, "Richard D. Moores"  wrote:
> Machin's Equation is
>
> 4 arctan (1/5) - arctan(1/239) = pi/4
>
> Using Python 3.1 and the math module:
>
>
>
> >>> from math import atan, pi
> >>> pi
> 3.141592653589793
> >>> (4*atan(.2) - atan(1/239))*4
> 3.1415926535897936
> >>> (4*atan(.2) - atan(1/239))*4 == pi
> False
> >>> abs((4*atan(.2) - atan(1/239))*4) - pi < .01
> False
> >>> abs((4*atan(.2) - atan(1/239))*4) - pi < .0001
> False
> >>> abs((4*atan(.2) - atan(1/239))*4) - pi < .001
> True
>
> Is there a way in Python 3.1 to calculate pi to greater accuracy using
> Machin's Equation? Even to an arbitrary number of places?

Considering that my namesake calculated pi to 100 decimal places with
the computational equipment available in 1706 (i.e. not much), I'd bet
you London to a brick that Python (any version from 0.1 onwards) could
be used to simulate his calculations to any reasonable number of
places. So my answers to your questions are yes and yes.

Suggestion: search_the_fantastic_web("machin pi python")
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How do I access what's in this module?

2010-01-07 Thread John Machin

On Jan 8, 2:45 pm, Fencer 
wrote:
> On 2010-01-08 04:40, John Machin wrote:
>
>
>
> >> For example:
> >>   >>>  from lxml.etree import ElementTree
> >>   >>>  ElementTree.dump(None)
> >> Traceback (most recent call last):
> >>     File "", line 1, in
>
> > lxml.etree is a module. ElementTree is effectively a class. The error
> > message that you omitted to show us might have given you a clue.
>
> But I did show the error message? It's just above what you just wrote. I
> try to include all relevant information in my posts.


Traceback (most recent call last):
   File "", line 1, in 

Also, can I access those items ...


Error message should appear after line starting with "File". Above
excerpt taken from google groups; identical to what shows in
http://news.gmane.org/gmane.comp.python.general ... what are you
looking at?

With Windows XP and Python 2.5.4 I get:

Traceback (most recent call last):
  File "", line 1, in 
AttributeError: 'builtin_function_or_method' object has no attribute
'dump'

> It turns out I no longer want to access anything in there but I thank
> you for your information nontheless.

You're welcome -- the advice on _methods is portable :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: How do I access what's in this module?

2010-01-07 Thread John Machin

On Jan 8, 12:21 pm, Fencer 
wrote:
> Hello, look at this lxml documentation 
> page:http://codespeak.net/lxml/api/index.html

That's for getting details about an object once you know what object
you need to use to do what. In the meantime, consider reading the
tutorial and executing some of the examples:
http://codespeak.net/lxml/tutorial.html

> How do I access the functions and variables listed?
>
> I tried from lxml.etree import ElementTree and the import itself seems
> to pass without complaint by the python interpreter but I can't seem to
> access anything in ElementTree, not the functions or variables. What is
> the proper way to import that module?
>
> For example:
>  >>> from lxml.etree import ElementTree
>  >>> ElementTree.dump(None)
> Traceback (most recent call last):
>    File "", line 1, in 

lxml.etree is a module. ElementTree is effectively a class. The error
message that you omitted to show us might have given you a clue.

To save keystrokes you may like to try
from lxml import etree as ET
and thereafter refer to the module as "ET"

| >>> from lxml import etree as ET
| >>> type(ET)
| 
| >>> type(ET.ElementTree)
| 
| >>> help(ET.ElementTree)
| Help on built-in function ElementTree in module lxml.etree:
|
| ElementTree(...)
| ElementTree(element=None, file=None, parser=None)
|
| ElementTree wrapper class.

> Also, can I access those items that begin with an underscore if I get
> the import sorted?

Using pommy slang like "sorted" in an IT context has the potential to
confuse your transatlantic correspondents :-)

Can access? Yes. Should access? The usual Python convention is that an
object whose name begins with an underscore should be accessed only
via a documented interface (or, at your own risk, if you think you
know what you are doing).

HTH,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Astronomy--Programs to Compute Siderial Time?

2010-01-07 Thread John Machin

On Jan 7, 2:40 pm, "W. eWatson"  wrote:
> John Machin wrote:

>
> > What you have been reading is the "Internal maintenance
> > specification" (large font, near the top of the page) for the module.
> > The xml file is the source of the docs, not meant to be user-legible.
>
> What is it used for?

The maintainer of the module processes the xml file with some script
or other to create the user-legible docs.

> Do I need it?

No.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: TypeError

2010-01-06 Thread John Machin

On Jan 7, 1:38 pm, Steve Holden  wrote:
> John Machin wrote:
>
> [...]> I note that in the code shown there are examples of building an SQL
> > query where the table name is concocted at runtime via the %
> > operator ... key phrases: "bad database design" (one table per
> > store!), "SQL injection attack"
>
> I'm not trying to defend the code overall, but most databases won't let
> you parameterize the table or column names, just the data values.

That's correct, and that's presumably why the OP is constructing whole
SQL statements on the fly e.g.

cursor.execute('select max(ID) from %sCustomerData;' % store)

What is the reason for "but" in "but most databases won't ..."? What
are you rebutting?

Let me try again: One table per store is bad design. The
implementation of that bad design may use:

cursor.execute('select max(ID) from %sCustomerData;' % store)
or (if available)
cursor.execute('select max(ID) from ?CustomerData;', (store, ))
but the implementation means is irrelevant.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Astronomy--Programs to Compute Siderial Time?

2010-01-06 Thread John Machin

On Jan 7, 11:40 am, "W. eWatson"  wrote:
> W. eWatson wrote:
> > Is there a smallish Python library of basic astronomical functions?
> > There are a number of large such libraries that are crammed with
> > excessive functions not needed for common calculations.
>
> It looks like I've entered a new era in my knowledge of Python.

Mild curiosity: this would be a wonderful outcome, but what makes it
look so?

> I found
> a module somewhat like I want, siderial.py. You can see an intro to it
> at .
> It appears that I can get the code for it through section 1.2, near the
> bottom. I scooped it siderial.py up, and placed it in a corresponding
> file of the same name and type via NotePad. However, there is a xml file
> below it. I know little about it. I thought maybe I could do the same,
> but Notepad didn't like some characters in it. As I understand Python
> doc files are useful. So how do I get this done, and where do I put the
> files?

The file you need is sidereal.py, not your twice-mentioned siderial.py
(the existence of which on the referenced website is doubtful).

What you have been reading is the "Internal maintenance
specification" (large font, near the top of the page) for the module.
The xml file is the source of the docs, not meant to be user-legible.
A very tiny amount of googling "sidereal.py" (quotes included) leads
to the user documentation at 
http://infohost.nmt.edu/tcc/help/lang/python/examples/sidereal/

Where do you put the files? Well, we're now down to only one file,
sidereal.py, and you put it wherever you'd put any other module that
you'd like to call ... if there's only going to be one caller, put it
in the same directory as that caller's code. More generally, drop it
in /Lib/site-packages
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: TypeError

2010-01-06 Thread John Machin

On Jan 7, 11:14 am, John Machin  wrote:
> On Jan 7, 3:29 am, MRAB  wrote:
>
> > Victor Subervi wrote:
> > > ValueError: unsupported format character '(' (0x28) at index 54
> > >       args = ("unsupported format character '(' (0x28) at index 54",)
>
> > > Apparently that character is a "file separator", which I presume is an
> > > invisible character. I tried retyping the area in question, but with no
> > > avail (threw same error). Please advise. Complete code follows.
>
> OP is barking up the wrong tree. "file separator" has ordinal 28
> DECIMAL. Correct tree contains '(' (left parenthesis, ordinal 0x28
> (HEX)) as the error message says.

It took a bit of mucking about to get an example of that error message
(without reading the Python source code):

|>>> anything = object()

\|>>> "foo%(" % anything
Traceback (most recent call last):
  File "", line 1, in 
TypeError: format requires a mapping

|>>> "foo%(" % {}
Traceback (most recent call last):
  File "", line 1, in 
ValueError: incomplete format key

|>>> "foo%2(" % anything
Traceback (most recent call last):
  File "", line 1, in 
ValueError: unsupported format character '(' (0x28) at index 5

FWIW, the OP's message subject is "TypeError" but the reported message
contains ValueError ... possibly indicative of code that first builds
a format string (incorrectly) and then uses it with error messages
that can vary from run to run depending on exactly what was stuffed
into the format string.

I note that in the code shown there are examples of building an SQL
query where the table name is concocted at runtime via the %
operator ... key phrases: "bad database design" (one table per
store!), "SQL injection attack"

A proper traceback would be very nice ... at this stage it's not
certain what was the line of source that triggers the exception.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parsing an Excel formula with the re module

2010-01-06 Thread John Machin

On Jan 6, 6:54 am, vsoler  wrote:
> On 5 ene, 20:21, vsoler  wrote:
>
>
>
> > On 5 ene, 20:05, Mensanator  wrote:
>
> > > On Jan 5, 12:35 pm, MRAB  wrote:
>
> > > > vsoler wrote:
> > > > > Hello,
>
> > > > > I am acessing an Excel file by means of Win 32 COM technology.
> > > > > For a given cell, I am able to read its formula. I want to make a map
> > > > > of how cells reference one another, how different sheets reference one
> > > > > another, how workbooks reference one another, etc.
>
> > > > > Hence, I need to parse Excel formulas. Can I do it by means only of re
> > > > > (regular expressions)?
>
> > > > > I know that for simple formulas such as "=3*A7+5" it is indeed
> > > > > possible. What about complex for formulas that include functions,
> > > > > sheet names and possibly other *.xls files?
>
> > > > > For example    "=Book1!A5+8" should be parsed into ["=","Book1", "!",
> > > > > "A5","+","8"]
>
> > > > > Can anybody help? Any suggestions?
>
> > > > Do you mean "how" or do you really mean "whether", ie, get a list of the
> > > > other cells that are referred to by a certain cell, for example,
> > > > "=3*A7+5" should give ["A7"] and "=Book1!A5+8" should give ["Book1!A5]
>
> > > Ok, although "Book1" would be the default name of a workbook, with
> > > default
> > > worksheets labeled "Sheet1". "Sheet2", etc.
>
> > > If I had a worksheet named "Sheety" that wanted to reference a cell on
> > > "Sheetx"
> > > OF THE SAME WORKBOOK, it would be =Sheet2!A7. If the reference was to
> > > a completely
> > > different workbook (say Book1 with worksheets labeled "Sheet1",
> > > "Sheet2") then
> > > the cell might have =[Book1]Sheet1!A7.
>
> > > And don't forget the $'s! You may see =[Book1]Sheet1!$A$7.
>
> > Yes, Mensanator, but...  what re should I use? I'm looking for the re
> > statement. No doubt you can help!
>
> > Thank you.
>
> Let me give you an example:
>
> >>> import re
> >>> re.split("([^0-9])", "123+456*/")
>
> [’123’, ’+’, ’456’, ’*’, ’’, ’/’, ’’]
>
> I find it excellent that one single statement is able to do a lexical
> analysis of an expression!

That is NOT lexical analysis.
>
> If the expression contains variables, such as A12 or B9, I can try
> another re expression. Which one should I use?
>
> And if my expression contains parenthesis?   And the sin() function?

 You need a proper lexical analysis, followed by a parser. What you
are trying to do can NOT be accomplished in any generality with a
single regex. The Excel formula syntax has several tricky bits. E.g.
IIRC whether TAX09 is a (macro) name or a cell reference depends on
what version of Excel you are targetting but if it appears like TAX09!
A1:B2 then it's a sheet name.

The xlwt package (of which I am the maintainer) has a lexer and parser
for a largish subset of the syntax ... see  http://pypi.python.org/pypi/xlwt

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: 3 byte network ordered int, How To ?

2010-01-06 Thread John Machin

On Jan 7, 5:33 am, Matthew Barnett 
wrote:
> mudit tuli wrote:
> > For a single byte, struct.pack(')
> > For two bytes, struct.pack(')
> > what if I want three bytes ?
>
> Four bytes and then discard the most-significant byte:
>
> struct.pack(')[ : -1]

AARRGGHH! network ordering is BIGendian, struct.pack('<. is
LITTLEendian
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: TypeError

2010-01-06 Thread John Machin

On Jan 7, 3:29 am, MRAB  wrote:
> Victor Subervi wrote:

> > ValueError: unsupported format character '(' (0x28) at index 54
> >       args = ("unsupported format character '(' (0x28) at index 54",)
>
> > Apparently that character is a "file separator", which I presume is an
> > invisible character. I tried retyping the area in question, but with no
> > avail (threw same error). Please advise. Complete code follows.
>

OP is barking up the wrong tree. "file separator" has ordinal 28
DECIMAL. Correct tree contains '(' (left parenthesis, ordinal 0x28
(HEX)) as the error message says.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Significant whitespace

2010-01-03 Thread John Machin

On Jan 2, 10:29 am, Roy Smith  wrote:

>
> To address your question more directly, here's a couple of ways Fortran
> treated whitespace which would surprise the current crop of
> Java/PHP/Python/Ruby programmers:
>
> 1) Line numbers (i.e. the things you could GOTO to) were in column 2-7
> (column 1 was reserved for a comment indicator).  This is not quite
> significant whitespace, it's more like significant indentation.

That would also surprise former FORTRAN programmers (who rarely
referred to the language as "Fortran"). A comment was signified by a C
in col 1. Otherwise cols 1-5 were used for statement labels (the
things you could GOTO), col 6 for a statement continuation indicator,
cols 7-72 for statement text, and cols 73-80 for card sequence numbers.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: creating ZIP files on the cheap

2009-12-23 Thread John Machin

On Dec 24, 7:34 am, samwyse  wrote:
> I've got an app that's creating Open Office docs; if you don't know,
> these are actually ZIP files with a different extension.  In my case,
> like many other people, I generating from boilerplate, so only one
> component (content.xml) of my ZIP file will ever change.  Instead of
> creating the entire ZIP file each time, what is the cheapest way to
> accomplish my goal?  I'd kind-of like to just write the first part of
> the file as a binary blob, then write my bit, then write most of the
> table of contents as another blob, and finally write a TOC entry for
> my bit.  Has anyone ever done anything like this?  Thanks.

Option 1: set up a file that contains everything except the
content.xml. Then for each new file: copy the "empty" file, open the
copy with zipfile (mode 'a') and write your content.xml. This at least
is understandable and maintainable.

Option 2 (recommended): insert some timing apparatus into your script.
How much time is taken by the template stuff? Is it worth chancing
your arm on getting the "binary blob" stuff correct? Is it
maintainable? I.e. pretend that the next person to maintain your code
knows where you live and owns a chainsaw.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: dictionary with tuple keys

2009-12-15 Thread John Machin

Ben Finney  benfinney.id.au> writes:

> In this case, I'll use ‘itertools.groupby’ to make a new sequence of
> keys and values, and then extract the keys and values actually wanted.

Ah, yes, Zawinski revisited ... itertools.groupby is the new regex :-)

> Certainly it might be clearer if written as one or more loops, instead
> of iterators. But I find the above relatively clear, and using the
> built-in iterator objects will likely make for a less buggy
> implementation.

Relative clarity like relative beauty is in the eye of the beholder,
and few parents have ugly children :-)

The problem with itertools.groupby is that unlike SQL's "GROUP BY"
it needs sorted input. The OP's requirement (however interpreted)
can be met without sorting.

Your interpretation can be implemented simply:

from collections import defaultdict
result = defaultdict(list)
for key, value in foo.iteritems():
result[key[:2]].append(value)

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: parse a string of parameters and values

2009-12-12 Thread John Machin

Steven D'Aprano  REMOVE-THIS-cybersource.com.au> writes:

> 
> On Sat, 12 Dec 2009 16:16:32 -0800, bsneddon wrote:
> 

> 
> > I am going to read a text file that is an export from a control system.
> > It has lines with information like
> > 
> > base=1 name="first one" color=blue
> > 
> > I would like to put this info into a dictionary for processing. 
> 
> Have you looked at the ConfigParser module?
> 
> Assuming that ConfigParser isn't suitable, you can do this if each 
> key=value pair is on its own line:
> [snip]
> If you have multiple keys per line, you need a more sophisticated way of 
> splitting them. Something like this should work:
> 
> d = {}
> for line in open(filename, 'r'):
> if not line.strip():
> continue
> terms = line.split('=')
> keys = terms[0::2]  # every second item starting from the first
> values = terms[1::2]  # every second item starting from the second
> for key, value in zip(keys, values):
> d[key.strip()] = value.strip()
> 

There appears to be a problem with the above snippet, or you have a strange
interpretation of "put this info into a dictionary":

| >>> line = 'a=1 b=2 c=3 d=4'
| >>> d = {}
| >>> terms = line.split('=')
| >>> print terms
| ['a', '1 b', '2 c', '3 d', '4']
| >>> keys = terms[0::2]  # every second item starting from the first
| >>> values = terms[1::2]  # every second item starting from the second
| >>> for key, value in zip(keys, values):
| ... d[key.strip()] = value.strip()
| ...
| >>> print d
| {'a': '1 b', '2 c': '3 d'}
| >>>

Perhaps you meant

terms = re.split(r'[= ]', line)

which is an improvement, but this fails on cosmetic spaces e.g. a = 1  b = 2 ...

Try terms = filter(None, re.split(r'[= ]', line))

Now we get to the really hard part: handling the name="first one" in the OP's
example. The splitting approach has run out of steam.

The OP will need to divulge what is the protocol for escaping the " character if
it is present in the input. If nobody knows of a packaged solution to his
particular scheme, then he'll need to use something like pyparsing.




-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Request for py program to insert space between two characters and saved as text?

2009-12-08 Thread John Machin

On Dec 8, 9:42 pm, steve  wrote:
> On 12/08/2009 02:19 PM, John Machin wrote:
>
> >> [...snip...]
> > Perhaps there are some subtleties of which we are unaware ...
>
> > I would be very surprised if the OP could not find on a forum much
> > closer to home more people who know more about using Indic scripts on
> > computers than here.
>
> That's true. I'd recommend that the original poster, posts the query at the
> bangalore python user group mailing list:
>
> http://mail.python.org/mailman/listinfo/bangpypers
>
> ...alongwith some additional details of the requirements. I am sure they
> wouldn't mind reading and replying to the question in kannada itself.
>
> After all Kannada is the language of the sate of Karnataka, of which Bangalore
> (or Bengaluru as it is known these days) is the capital city.

Off-list, I've already solicited assistance for the OP from a
prominent bangpyper.

Cheers,
John


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Request for py program to insert space between two characters and saved as text?

2009-12-08 Thread John Machin

On Dec 8, 6:56 pm, Dennis Lee Bieber  wrote:
> On Tue, 8 Dec 2009 08:26:58 +0530, 74yrs old 
> declaimed the following in gmane.comp.python.general:
>
> > For Kannada project .txt(not .doc) is used, my requirement is to have one
> > space between two characters in Notepad file.  In MSword there is provision
> > to make space between two characters under "Font" and  can be saved as *.doc
> > *  But when tried to save as* .txt*  all formatting will disappear. I could
> > not understand how to do in notepad. Even tried copy and paste from doc to
> > notepad but failed.
>
> > In this context, I request you kindly for small python program - to make or
>
>         Excuse me -- you want one of US to supply you with a program that
> will be used for YOUR entry to some job site? (At least, that's what I
> seem to be finding for "Kannada project")

http://en.wikipedia.org/wiki/Kannada_script

I think "project" means any piece of software ...

>
> > insert space between two characters in the text file.
>
>         How difficult is it to read a file character by character, and write
> a file containing that character and a space?

Perhaps there are some subtleties of which we are unaware ...

I would be very surprised if the OP could not find on a forum much
closer to home more people who know more about using Indic scripts on
computers than here.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why Can't I Delete a File I Created with Win XP?

2009-12-07 Thread John Machin

On Dec 5, 9:57 pm, "W. eWatson"  wrote:
[snip]
>          s = self.current_path
s referred to something ...
>          s = "Analysis"
but now s refers to "Analysis" ... at best, there is redundant &
confusing code; at worst, the source of your problem.

>          s = os.path.join("Analysis",s)
and now s refers to r"Analysis\Analysis" (on your platform)
>          print "s joined ",s    <- debug print

[snip]

> There is no file created, just the folders Analysis\Analysis. One too
> many. The second Analysis shows as an icon for a file of size 0KB.
>
> I printed with the debug print above:
>    Path for Histogram Events\v20070206_055012.06.dat
>    s joined  
> Analysis\Analysis should only be Analysis.

Huh?? s = os.path.join("fubar", "fubar") should produce r"fubar
\fubar" (as documented) ... If you don't want s to refer to r"Analysis
\Analysis", then quite simply don't do that!

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why Can't I Delete a File I Created with Win XP?

2009-12-05 Thread John Machin

On Dec 6, 2:46 am, "W. eWatson"  wrote:

[snip]

>          f = file( s, "wb" )
>          if not f:
>              self.LogError( "File creation error 1" )
>              return False

Either you are shadowing the built-in function file() or you haven't
tested this code ... file() aka open() returns a file object (which
cannot have a "false" value) or it raises an exception; the
self.LogError() call can never be executed.


> I caused the redundancy by just changing the meaning of s with a new
> statement below it. I should have commented the first one out. Yeah, I
> probably screwed up here and should have and,for purposes of debugging,
> just used another file name like s="ADATAFILE2009_mmdd.txt", which does
> not exist at this point of the coding stage. So I should have produced
> Analysis\ADATAFILE2009_mmdd.txt. If I had done that then I would have
> ended up with an empty file in the Analysis folder.

Ever heard the phrase "need to know"?

> However, even at
> that, why can't I delete this empty file called Analysis?

Are you trying to delete the file from another command window while
Python is paused at the interactive prompt? In any case describe how
you are trying to delete the file. Fire up another command window, cd
to whatever is the current working directory for your script, do a dir
command, and copy/paste the RELEVANT parts (including all directory
names). Please don't top-post, and before you hit the send button,
please delete any text that is more appropriate to your memoirs than a
problem description.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Filling in a tuple from unknown size list

2009-11-30 Thread John Machin

On Nov 27, 11:18 pm, boblatest  wrote:
> Hello all,
>
> (sorry for posting from Google. I currently don't have access to my
> normal nntp account.)
>
> Here's my question: Given a list of onknown length, I'd like to be
> able to do the following:
>
> (a, b, c, d, e, f) = list
>
> If the list has fewer items than the tuple, I'd like the remaining
> tuple elements to be set to "None". If the list is longer, I'd like
> the excess elements to be ignored.

WRONG -- sweeping excess input under the carpet is a nasty perlish
trick.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Feature request: String-inferred names

2009-11-26 Thread John Machin

On Nov 27, 10:43 am, The Music Guy  wrote:
[snip]

> Nonetheless, the fact remains that the feature I'm proposing closely
> resembles one that has already been rejected... Well, it's been a few
> years since then. Maybe its about time for another PEP to be proposed?

Judging by the response you've got from about half-a-dozen sensible
people, I say maybe it's NOT about time.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Newsgroup for beginners

2009-11-24 Thread John Machin

On Nov 17, 2:56 pm, Grant Edwards  wrote:
> On 2009-11-17, Paul Rubin  wrote:
>
> > mrholtsr  writes:
> >> Is there a Python newsgroup for those who are strictly beginners at
> >> programming and python?
>
> > This group has its grouchy moments
>
> You've really got to try pretty hard to create one.  But if you
> want to, here's how to do it:
[snip]
>  2) Python programs are portable, so don't reveal what OS or
>     Python version you're using.  People will ask. Just ignore
>     them.

Don't supply a traceback, lest you inadvertently divulge such
information (and more!) e.g.

  File "C:\python26\lib\encodings\cp1252.py", line 15, in decode

A good safeguard against accidental disclosure of your Python version
is to avoid using the default installation folder:

File "C:\snake_teehee_ima_comedian\lib\etc_etc"

This technique, used in a question like "How can I have multiple
versions of Python installed" gives a good chance of getting a grumpy
response.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: My own accounting python euler problem

2009-11-11 Thread John Machin

On Nov 8, 8:39 am, vsoler  wrote:
> In the accounting department I am working for we are from time to time
> confronted to the following problem:
>
> A customer sends us a check for a given amount, but without specifying
> what invoices it cancels. It is up to us to find out which ones the
> payment corresponds to.
>
> For example, say that the customer has the following outstanding
> invoices:  $300, $200, $50; and say that the check is for $250. This
> time it is clear, the customer is paying bills $200 and $50.
>
> However, let's now say that the outstanding invoices are $300, $200,
> $100 and that the check is for $300. In this case there are already
> two possibilities. The customer is paying the $300 invoice or the $200
> and $100. In other words, there is more than one solution to the
> problem.

The problems that you mention are only a SUBSET of the total problem.
Example: oustanding invoices are for 300, 200, and 100 and the cheque
is for 450 -- in general the total of the cheque amounts does not
equal the total of any possible selection of outstanding invoice
amounts.

I would be very surprised if a real accounting department did not
already have a set of business rules for dealing with a problem that
has existed since invoices and cheques were invented.

I would be extremely surprised if a real accounting department could
be persuaded to imagine a subset of their unpaid/underpaid/overpaid
invoice problem as being an instance of the (extended) knapsack
problem :-)
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: My own accounting python euler problem

2009-11-10 Thread John Machin

On Nov 8, 8:39 am, vsoler  wrote:
> In the accounting department I am working for we are from time to time
> confronted to the following problem:
[snip]

> My second question is:
> 2. this time there are also credit notes outstanding, that is,
> invoices with negative amounts. For example,  I=[500, 400, -100, 450,
> 200, 600, -200, 700] and a check Ch=600

How can a credit note be "outstanding"? The accounting department
issues a credit note without recording what invoice it relates to?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: elementtree XML() unicode

2009-11-05 Thread John Machin

On Nov 5, 12:35 am, Stefan Behnel  wrote:
> John Machin, 04.11.2009 02:56:
>
> > On Nov 4, 12:14 pm, Kee Nethery wrote:
> >> The reason I am confused is that getResponse2 is classified as an  
> >> "str" in the Komodo IDE. I want to make sure I don't lose the non-
> >> ASCII characters coming from the URL.
>
> > str is all about 8-bit bytes.
>
> True in Py2.x, false in Py3.

And the context was 2.x.

> What you mean is the "bytes" type, which, sadly, was named "str" in Python 
> 2.x.

What you mean is the "bytes" concept.

> The problem the OP ran into was due to the fact that Python 2.x handled
> "ASCII characters in a unicode string" <-> "ASCII encoded byte string"
> conversion behind the scenes, which lead to all sorts of trouble to loads
> of people, and was finally discarded in Python 3.0.

What you describe is the symptom. The problems are (1) 2.X ET expects
a str object but the OP supplied a unicode object, and (2) 2.X ET
didn't check that, so it accidentally "worked" provided the contents
were ASCII-only, and otherwise gave a novice-mystifying error message.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: elementtree XML() unicode

2009-11-03 Thread John Machin

On Nov 4, 1:06 pm, Kee Nethery  wrote:
> On Nov 3, 2009, at 5:27 PM, John Machin wrote:
>
>
>
> > On Nov 4, 11:01 am, Kee Nethery  wrote:

> >> Why is this not working and what do I need to do to use Elementtree
> >> with unicode?
>
> > What you need to do is NOT feed it unicode. You feed it a str object
> > and it gets decoded according to the encoding declaration found in the
> > first line.
>
> That it uses "the encoding declaration found in the first line" is the  
> nugget of data that is not in the documentation that has stymied me  
> for days. Thank you!

And under the "don't repeat" principle, it shouldn't be in the
Elementtree docs; it's nothing special about ET -- it's part of the
definition of an XML document (which for universal loss-free
transportability naturally must be encoded somehow, and the document
must state what its own encoding is (if it's not the default
(UTF-8))).

> The other thing that has been confusing is that I've been using "dump"  
> to view what is in the elementtree instance and the non-ASCII  
> characters have been displayed as "numbered  
> entities" (柏市) and I know that is not the  
> representation I want the data to be in. A co-worker suggested that  
> instead of "dump" that I use "et.tostring(theResponseXml,  
> encoding='utf-8')" and then print that to see the characters. That  
> process causes the non-ASCII characters to display as the glyphs I  
> know them to be.
>
> If there was a place in the official docs for me to append these  
> nuggets of information to the sections for  
> "xml.etree.ElementTree.XML(text)" and  
> "xml.etree.ElementTree.dump(elem)" I would absolutely do so.

I don't understand ... tostring() is in the same section as dump(),
about two screen-heights away. You want to include the tostring() docs
in the dump() docs? The usual idea is not to get bogged down in the
first function that looks at first glance like it might do what you
want ("look at the glyphs") but doesn't (it writes a (transportable)
XML stream) but press on to the next plausible candidate.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: elementtree XML() unicode

2009-11-03 Thread John Machin

On Nov 4, 11:01 am, Kee Nethery  wrote:
> Having an issue with elementtree XML() in python 2.6.4.
>
> This code works fine:
>
>       from xml.etree import ElementTree as et
>       getResponse = u'''  
> bobblehead city>city'''
>       theResponseXml = et.XML(getResponse)
>
> This code errors out when it tries to do the et.XML()
>
>       from xml.etree import ElementTree as et
>       getResponse = u'''  
> \ue58d83\ue89189\ue79c8C
> \ue69f8f\ue5b882\ue9ab98\ue58d97\ue58fb03 shipping>'''
>       theResponseXml = et.XML(getResponse)
>
> In my real code, I'm pulling the getResponse data from a web page that  
> returns as XML and when I display it in the browser you can see the  
> Japanese characters in the data. I've removed all the stuff in my code  
> and tried to distill it down to just what is failing. Hopefully I have  
> not removed something essential.
>
> Why is this not working and what do I need to do to use Elementtree  
> with unicode?

On Nov 4, 11:01 am, Kee Nethery  wrote:
> Having an issue with elementtree XML() in python 2.6.4.
>
> This code works fine:
>
>   from xml.etree import ElementTree as et
>   getResponse = u'''
> bobblehead city>city'''
>   theResponseXml = et.XML(getResponse)
>
> This code errors out when it tries to do the et.XML()
>
>   from xml.etree import ElementTree as et
>   getResponse = u'''
> \ue58d83\ue89189\ue79c8C
> \ue69f8f\ue5b882\ue9ab98\ue58d97\ue58fb03 shipping>'''
>   theResponseXml = et.XML(getResponse)
>
> In my real code, I'm pulling the getResponse data from a web page that
> returns as XML and when I display it in the browser you can see the
> Japanese characters in the data. I've removed all the stuff in my code
> and tried to distill it down to just what is failing. Hopefully I have
> not removed something essential.
>
> Why is this not working and what do I need to do to use Elementtree
> with unicode?

What you need to do is NOT feed it unicode. You feed it a str object
and it gets decoded according to the encoding declaration found in the
first line. So take the str object that you get from the web (should
be UTF8-encoded already unless the header is lying), and throw that at
ET ... like this:

| Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit
(Intel)] on win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> from xml.etree import ElementTree as et
| >>> ucode = u'''
| ... 
| ... \ue58d83\ue89189\ue79c8C
| ... \ue69f8f\ue5b882
| ... \ue9ab98\ue58d97\ue58fb03
| ... '''
| >>> xml= et.XML(ucode)
| Traceback (most recent call last):
|   File "", line 1, in 
|   File "C:\python26\lib\xml\etree\ElementTree.py", line 963, in XML
| parser.feed(text)
|   File "C:\python26\lib\xml\etree\ElementTree.py", line 1245, in
feed
| self._parser.Parse(data, 0)
| UnicodeEncodeError: 'ascii' codec can't encode character u'\ue58d'
in position 69: ordinal not in range(128)
| # as expected
| >>> strg = ucode.encode('utf8')
| # encoding as utf8 is for DEMO purposes.
| # i.e. use the original web str object, don't convert it to unicode
| # and back to utf8.
| >>> xml2 = et.XML(strg)
| >>> xml2.tag
| 'customer'
| >>> for c in xml2.getchildren():
| ...print c.tag, repr(c.text)
| ...
| shipping '\n'
| >>> for c in xml2[0].getchildren():
| ...print c.tag, repr(c.text)
| ...
| state u'\ue58d83\ue89189\ue79c8C'
| city u'\ue69f8f\ue5b882'
| street u'\ue9ab98\ue58d97\ue58fb03'
| >>>

By the way: (1) it usually helps to be more explicit than "errors
out", preferably the exact copied/pasted output as shown above; this
is one of the rare cases where the error message is predictable (2)
PLEASE don't start a new topic in a reply in somebody else's thread.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: elementtree XML() unicode

2009-11-03 Thread John Machin

On Nov 4, 12:14 pm, Kee Nethery  wrote:
> On Nov 3, 2009, at 4:44 PM, Gabriel Genellina wrote:
>
> > En Tue, 03 Nov 2009 21:01:46 -0300, Kee Nethery   
> > escribió:
>
> >> I've removed all the stuff in my code and tried to distill it down  
> >> to just what is failing. Hopefully I have not removed something  
> >> essential.
>
> Sounds like I did remove something essential.

No, you added something that was not only inessential but caused
trouble.

> > et expects bytes as input, not unicode. You're decoding too early  
> > (decoding early is good, but not in this case, because et does the  
> > work for you). Either feed et.XML with the bytes before decoding, or  
> > reencode the received xml text in UTF-8 (since this is the declared  
> > encoding).
>
> Here is the code that hits the URL:
>          getResponse1 = urllib2.urlopen(theUrl)
>          getResponse2 = getResponse1.read()
>          getResponse3 = unicode(getResponse2,'UTF-8')
>         theResponseXml = et.XML(getResponse3)
>
> So are you saying I want to do:
>          getResponse1 = urllib2.urlopen(theUrl)
>          getResponse4 = getResponse1.read()
>         theResponseXml = et.XML(getResponse4)

You got the essence. Note: that in no way implies any approval of your
naming convention :-)

> The reason I am confused is that getResponse2 is classified as an  
> "str" in the Komodo IDE. I want to make sure I don't lose the non-
> ASCII characters coming from the URL.

str is all about 8-bit bytes. Your data comes from the web in 8-bit
bytes. No problem. Just don't palpate it unnecessarily.

> If I do the second set of code,  
> does elementtree auto convert the str into unicode?

Yes. See the example I gave in my earlier posting:

| ...print c.tag, repr(c.text)
| state u'\ue58d83\ue89189\ue79c8C'

That first u means the type is unicode.

> How do I deal with  
> the XML as unicode when I put it into elementtree as a string?

That's unfortunately rather ambiguous: (1) put past/present? (2)
string unicode/str? (3) what is referent of "it"?

All text in what et returns is unicode [*] so you read it out as
unicode (see above example) or written as unicode if you want to
change it:

your_element.text = u'a unicode object'

[*] As an "optimisation", et stores strings as str objects if they
contain only ASCII bytes (and are thus losslessly convertible to
unicode). In preparation for running your code under Python 3.X, it's
best to ignore this and use unicode constants u'foo' (if you need text
constants at all) even if et would let you get away with 'foo'.

HTH,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: '11' + '1' is '111'?

2009-10-29 Thread John Machin

On Oct 30, 11:52 am, Benjamin Kaplan  wrote:
> On Thu, Oct 29, 2009 at 8:43 PM, metal  wrote:
> > '11' + '1' == '111' is well known.
>
> > but it suprises me '11'+'1' IS '111'.
>
> > Why? Obviously they are two differnt object.
>
> > Is this special feature of imutable object?
>
> It's an implementation detail of small strings without spaces and
> small numbers. You're more likely to reuse those values, so Python
> caches them. You shouldn't rely on it. It's not guaranteed to stay the
> same between different implementations, or even different versions of
> CPython.

It also relies on the implementation detail that the CPython bytecode
has peephole optimisation applied to it:

| Python 2.6.4 (r264:75708, Oct 26 2009, 08:23:19) [MSC v.1500 32 bit
(Intel)] on win32
| Type "help", "copyright", "credits" or "license" for more
information.
| >>> def foo():
| ...return '11' + '1' is '111'
| ...
| >>> import dis
| >>> dis.dis(foo)
|   2   0 LOAD_CONST   4 ('111')
|   3 LOAD_CONST   3 ('111')
|   6 COMPARE_OP   8 (is)
|   9 RETURN_VALUE
| >>> def bar():
| ...a = '11'
| ...b = '1'
| ...return a + b is '111'
| ...
| >>> dis.dis(bar)
|   2   0 LOAD_CONST   1 ('11')
|   3 STORE_FAST   0 (a)
|
|   3   6 LOAD_CONST   2 ('1')
|   9 STORE_FAST   1 (b)
|
|   4  12 LOAD_FAST0 (a)
|  15 LOAD_FAST1 (b)
|  18 BINARY_ADD
|  19 LOAD_CONST   3 ('111')
|  22 COMPARE_OP   8 (is)
|  25 RETURN_VALUE
| >>> foo()
| True
| >>> bar()
| False
| >>>

In general, whether (expression1 is expression2) is true or false is
not useful knowledge when the expressions result in "scalars" like
str, int, float.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python2.6 + win32com crashes with unicode bug

2009-10-29 Thread John Machin

On Oct 30, 11:11 am, Terry Reedy  wrote:
> GerritM wrote:
[snip]
> >   File "C:\Python26\lib\site-packages\win32com\client\build.py", line
> > 542, in 
> >     return filter( lambda char: char in valid_identifier_chars, className)
> > UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in position 52:
> > ordinal not in range(128)
>
> I suspect that 2.6 fixed the bug of allowing non-ascii chars when using
> the ascii codec.  I would check to see if there is an 0x83 in
> D:/temp/test.vsd

  Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
(Intel)] on win32
  Type "help", "copyright", "credits" or "license" for more
information.
  >>> '\x83'.decode('ascii')
  Traceback (most recent call last):
File "", line 1, in 
  UnicodeDecodeError: 'ascii' codec can't decode byte 0x83 in
position 0: ordinal not in range(128)
  >>>

What bug??
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Another (simple) unicode question

2009-10-29 Thread John Machin

On Oct 29, 10:02 pm, Rustom Mody  wrote:
> Constructhttp://construct.wikispaces.com/is a kick-ass binary file
> structurer (written by a 21 year old!)
> I thought of trying to port it to python3 but it barfs on some unicode
> related stuff (after running 2to3) which I am unable to wrap my head
> around.
>
> Can anyone direct me to what I should read to try to understand this?

"unicode related stuff" is rather vague. Have you read the Python
Unicode HOWTO? Joel Spolsky's article?

http://www.amk.ca/python/howto/unicode
http://www.joelonsoftware.com/articles/Unicode.html

In any case, it's a debugging problem, isn't it? Could you possibly
consider telling us the error message, the traceback, a few lines of
the 3.x code around where the problem is, and the corresponding 2.x
lines? Are you using 3.1.1 and 2.6.4? Does your test work in 2.6?
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Bug(s) in Python 3.1.1 Windows installation

2009-10-28 Thread John Machin

On Oct 29, 11:06 am, "Alf P. Steinbach"  wrote:
> (3) Tkinter not bundled, misleading & incomplete documentation.
>
> With the file associations in place (the installer managed to do that) running
> console programs works fine.
>
> However, running Tkinter based programs does *not* work:
>
> 
> import Tkinter

What documentation are you reading? As you are running Python 3.1, you
might like to consider reading 3.1 documentation. Many things have
been changed from 2.X, including renaming unconventionally named
modules and packages. In particular see 
http://docs.python.org/3.1/library/tkinter.html
... in general, see the whatsnew docs that I pointed you at.

If you are trying to run 2.X code under 3.1, don't expect it to work
straight away. Find "2to3" in the docs.


[snip]
>
> Checking I find that while there is a Tkinter folder there is no file
> [Tkinter.py] in this installation, i.e. the Tkinter module is not bundled with
> this distribution.

but the tkinter module is bundled

>
> That's bad news for any novice wanting to start learning the language: a main
> "battery" is missing! The documentation gives the impression that Tkinter can

Which documentation?


> just be used, and it could just be used with ActivePython. Here the novice has

ActivePython 2.X or 3.X?

> to figure out not only that it isn't there, but also how to get it!
>
> Checking http://pypi.python.org/pypi/>, the package index, nope, no
> Tkinter there.

PyPI is for third-party modules and packages.


> Typing "tkinter" in the Firefox address bar leads to 
> http://wiki.python.org/moin/TkInter>, and it has a recipe for checking 
> for
> Tkinter support/installation:
>
> 
> Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit (Intel)] 
> on
> win32
> Type "help", "copyright", "credits" or "license" for more information.
>  >>> import _tkinter
>  >>> import Tkinter
> Traceback (most recent call last):
>    File "", line 1, in 
> ImportError: No module named Tkinter
>  >>>
> 
>
> The recipe now calls for adding path to directory with [Tkinter.py], but as
> mentioned no such file in this installation...
>
> Cheers,
>
> - Alf
>
> PS: This was not unexpected. It was exactly why I earlier didn't even look at
> CPython (umpteen bad experiences with *nix ports) but used ActivePython. I'm
> hopeful that I will find where Tkinter resides on the net, but hey, it should 
> at
> least be documented, up front I mean (it's possibly mentioned somewhere, 
> yes?).

Yes. You'll find that tkinter resides also on your hard disk (unless
you chose not to install it).

Python 3.1.1 (r311:74483, Aug 17 2009, 17:02:12) [MSC v.1500 32 bit
(Intel)] on
win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import tkinter
>>> tkinter.__file__
'C:\\python31\\lib\\tkinter\\__init__.py'

Didn't you do a search of your hard disk for "Tkinter"?


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Bug(s) in Python 3.1.1 Windows installation

2009-10-28 Thread John Machin

On Oct 29, 11:56 am, "Alf P. Steinbach"  wrote:
> Summarizing the main differences 2.6 -> 3.1.1 that I know of so far: print is
> now a function (nice), "/" now always produces float result (unsure about 
> that,
> it must surely break a lot or even most of existing code?), xrange() has been
> removed and range() now works like old xrange().

http://www.python.org/doc/3.0/whatsnew/3.0.html
http://www.python.org/doc/3.1/whatsnew/3.1.html

HTH,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and dbf files

2009-10-27 Thread John Machin

On Oct 28, 2:51 am, Ethan Furman  wrote:
> John Machin wrote:
> > On Oct 27, 7:15 am, Ethan Furman  wrote:
>
> >>Let me rephrase -- say I get a dbf file with an LDID of \x0f that maps
> >>to a cp437, and the file came from a german oem machine... could that
> >>file have upper-ascii codes that will not map to anything reasonable on
> >>my \x01 cp437 machine?  If so, is there anything I can do about it?
>
> > ASCII is defined over the first 128 codepoints; "upper-ascii codes" is
> > meaningless. As for the rest of your question, if the file's encoded
> > in cpXXX, it's encoded in cpXXX. If either the creator or the reader
> > or both are lying, then all bets are off.
>
> My confusion is this -- is there a difference between any of the various
> cp437s?

What various cp437s???

>  Going down the list at ESRI: 0x01, 0x09, 0x0b, 0x0d, 0x0f,
> 0x11, 0x15, 0x18, 0x19, and 0x1b all map to cp437,

Yes, this is called a "many-to-*one*" relationship.

> and they have names

"they" being the Language Drivers, not the codepages.

> such as US, Dutch, Finnish, French, German, Italian, Swedish, Spanish,
> English (Britain & US)... are these all the same?

When you read the Wikipedia page on cp437, did you see any reference
to different versions for French, German, Finnish, etc? I saw only one
mapping table; how many did you see? If there are multiple language
versions of a codepage, how do you expect to handle this given Python
has only one codec per codepage?

Trying again: *ONE* attribute of a Language Driver ID (LDID) is the
character set (codepage) that it uses. Other attributes may be things
like the collating (sorting) sequence, whether they use a dot or a
comma as the decimal point, etc. Many different languages in Western
Europe can use the same codepage. Initially the common one was cp 437,
then 850, then 1252.

There may possibly different interpretations of a codepage out there
somewhere, but they are all *intended* to be the same, and I advise
you to cross the different-cp437s bridge *if* it exists and you ever
come to it.

Have you got access to files with LDID not in (0, 1) that you can try
out?

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and dbf files

2009-10-26 Thread John Machin

On Oct 27, 7:15 am, Ethan Furman  wrote:
> John Machin wrote:
> > On Oct 27, 3:22 am, Ethan Furman  wrote:
>
> >>John Machin wrote:
>
> >>>Try this:
> >>>http://webhelp.esri.com/arcpad/8.0/referenceguide/
>
> >>Wow.  Question, though:  all those codepages mapping to 437 and 850 --
> >>are they really all the same?
>
> > 437 and 850 *are* codepages. You mean "all those language driver IDs
> > mapping to codepages 437 and 850". A codepage merely gives an
> > encoding. An LDID is like a locale; it includes other things besides
> > the encoding. That's why many Western European languages map to the
> > same codepage, first 437 then later 850 then 1252 when Windows came
> > along.
>
> Let me rephrase -- say I get a dbf file with an LDID of \x0f that maps
> to a cp437, and the file came from a german oem machine... could that
> file have upper-ascii codes that will not map to anything reasonable on
> my \x01 cp437 machine?  If so, is there anything I can do about it?

ASCII is defined over the first 128 codepoints; "upper-ascii codes" is
meaningless. As for the rest of your question, if the file's encoded
in cpXXX, it's encoded in cpXXX. If either the creator or the reader
or both are lying, then all bets are off.

> > BTW, what are you planning to do with an LDID of 0x00?
>
> Hmmm.  Well, logical choices seem to be either treating it as plain
> ascii, and barfing when high-ascii shows up; defaulting to \x01; or
> forcing the user to choose one on initial access.

It would be more useful to allow the user to specify an encoding than
an LDID.

You need to be able to read files created not only by software like
VFP or dBase but also scripts using third-party libraries. It would be
useful to allow an encoding to override an LDID that is incorrect e.g.
the LDID implies cp1251 but the data is actually encoded in koi8[ru]

Read this: http://en.wikipedia.org/wiki/Code_page_437
With no LDID in the file and no encoding supplied, I'd be inclined to
make it barf if any codepoint not in range(32, 128) showed up.

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and dbf files

2009-10-26 Thread John Machin

On Oct 27, 3:22 am, Ethan Furman  wrote:
> John Machin wrote:
> > On Oct 24, 4:14 am, Ethan Furman  wrote:
>
> >>John Machin wrote:
>
> >>>On Oct 23, 3:03 pm, Ethan Furman  wrote:
>
> >>>>John Machin wrote:
>
> >>>>>On Oct 23, 7:28 am, Ethan Furman  wrote:
>
> > Try this:
> >http://webhelp.esri.com/arcpad/8.0/referenceguide/
>
> Wow.  Question, though:  all those codepages mapping to 437 and 850 --
> are they really all the same?

437 and 850 *are* codepages. You mean "all those language driver IDs
mapping to codepages 437 and 850". A codepage merely gives an
encoding. An LDID is like a locale; it includes other things besides
the encoding. That's why many Western European languages map to the
same codepage, first 437 then later 850 then 1252 when Windows came
along.

> >>     '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy
>
> > Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
> > not alone. I suggest that you omit Kamenicky until someone actually
> > wants it.
>
> Yeah, I noticed that.  Tentative plan was to implement it myself (more
> for practice than anything else), and also to be able to raise a more
> specific error ("Kamenicky not currently supported" or some such).

The error idea is fine, but I don't get the "implement it yourself for
practice" bit ... practice what? You plan a long and fruitful career
inplementing codecs for YAGNI codepages?
>
> >>     '\x7b' : ('iso2022_jp', 'Japanese Windows'),        # wag
>
> > Try cp936.
>
> You mean 932?

Yes.

> Very helpful indeed.  Many thanks for reviewing and correcting.

You're welcome.

> Learning to deal with unicode is proving more difficult for me than
> learning Python was to begin with!  ;D

?? As far as I can tell, the topic has been about mapping from
something like a locale to the name of an encoding, i.e. all about the
pre-Unicode mishmash and nothing to do with dealing with unicode ...

BTW, what are you planning to do with an LDID of 0x00?

Cheers,

John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: python3 Unicode is slow

2009-10-25 Thread John Machin

On Oct 25, 11:12 pm, Dale Gerdemann 
wrote:
> I've written simple code in 2.6 and 3.0 to read every charcter of a
> set of files and print out some information for each of these
> characters. I tested each program on a large Cyrillic/Latin text. The
> result was that the 2.6 version was about 5x faster.

3.0? Nowadays nobody wants to know about benchmarks of 3.0. Much of
the new 3.X file I/O stuff was written in Python. It has since been
rewritten in C. In general AFAICT there is no good reason to be using
3.0. Consider updating to 3.1.1.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and dbf files

2009-10-24 Thread John Machin

On Oct 24, 4:14 am, Ethan Furman  wrote:
> John Machin wrote:
> > On Oct 23, 3:03 pm, Ethan Furman  wrote:
>
> >>John Machin wrote:
>
> >>>On Oct 23, 7:28 am, Ethan Furman  wrote:
>
> >>>>Greetings, all!
>
> >>>>I would like to add unicode support to my dbf project.  The dbf header
> >>>>has a one-byte field to hold the encoding of the file.  For example,
> >>>>\x03 is code-page 437 MS-DOS.
>
> >>>>My google-fu is apparently not up to the task of locating a complete
> >>>>resource that has a list of the 256 possible values and their
> >>>>corresponding code pages.
>
> >>>What makes you imagine that all 256 possible values are mapped to code
> >>>pages?
>
> >>I'm just wanting to make sure I have whatever is available, and
> >>preferably standard.  :D
>
> >>>>So far I have found this, plus 
> >>>>variations:http://support.microsoft.com/kb/129631
>
> >>>>Does anyone know of anything more complete?
>
> >>>That is for VFP3. Try the VFP9 equivalent.
>
> >>>dBase 5,5,6,7 use others which are not defined in publicly available
> >>>dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
> >>>source: ESRI support site.
>
> >>Well, a couple hours later and still not more than I started with.
> >>Thanks for trying, though!
>
> > Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
> > keywords and you couldn't come up with anything??
>
> Perhaps "nothing new" would have been a better description.  I'd already
> seen the clicketyclick site (good info there)

Do you think so? My take is that it leaves out most of the codepage
numbers, and these two lines are wrong:
65h Nordic MS-DOS   code page 865
66h Russian MS-DOS  code page 866


> and all I found at ESRI
> were folks trying to figure it out, plus one link to a list that was no
> different from the vfp3 list (or was it that the list did not give the
> hex values?  Either way, of no use to me.)

Try this:
http://webhelp.esri.com/arcpad/8.0/referenceguide/


>
> I looked at dbase.com, but came up empty-handed there (not surprising,
> since they are a commercial company).

MS and ESRI have docs ... does that mean that they are non-commercial
companies?

> I searched some more on Microsoft's site in the VFP9 section, and was
> able to find the code page section this time.  Sadly, it only added
> about seven codes.
>
> At any rate, here is what I have come up with so far.  Any corrections
> and/or additions greatly appreciated.
>
> code_pages = {
>      '\x01' : ('ascii', 'U.S. MS-DOS'),

All of the sources say codepage 437, so why ascii instead of cp437?

>      '\x02' : ('cp850', 'International MS-DOS'),
>      '\x03' : ('cp1252', 'Windows ANSI'),
>      '\x04' : ('mac_roman', 'Standard Macintosh'),
>      '\x64' : ('cp852', 'Eastern European MS-DOS'),
>      '\x65' : ('cp866', 'Russian MS-DOS'),
>      '\x66' : ('cp865', 'Nordic MS-DOS'),
>      '\x67' : ('cp861', 'Icelandic MS-DOS'),
>      '\x68' : ('cp895', 'Kamenicky (Czech) MS-DOS'),     # iffy

Indeed iffy. Python doesn't have a cp895 encoding, and it's probably
not alone. I suggest that you omit Kamenicky until someone actually
wants it.

>      '\x69' : ('cp852', 'Mazovia (Polish) MS-DOS'),      # iffy

Look 5 lines back. cp852 is 'Eastern European MS-DOS'. Mazovia
predates and is not the same as cp852. In any case, I suggest that you
omit Masovia until someone wants it. Interesting reading:

http://www.jastra.com.pl/klub/ogonki.htm

>      '\x6a' : ('cp737', 'Greek MS-DOS (437G)'),
>      '\x6b' : ('cp857', 'Turkish MS-DOS'),
>      '\x78' : ('big5', 'Traditional Chinese (Hong Kong SAR, Taiwan)\

big5 is *not* the same as cp950. The products that create DBF files
were designed for Windows. So when your source says that LDID 0xXX
maps to Windows codepage YYY, I would suggest that all you should do
is translate that without thinking to python encoding cpYYY.

>                 Windows'),       # wag

What does "wag" mean?

>      '\x79' : ('iso2022_kr', 'Korean Windows'),          # wag

Try cp949.


>      '\x7a' : ('iso2022_jp_2&

Re: unicode and dbf files

2009-10-23 Thread John Machin

On Oct 23, 3:03 pm, Ethan Furman  wrote:
> John Machin wrote:
> > On Oct 23, 7:28 am, Ethan Furman  wrote:
>
> >>Greetings, all!
>
> >>I would like to add unicode support to my dbf project.  The dbf header
> >>has a one-byte field to hold the encoding of the file.  For example,
> >>\x03 is code-page 437 MS-DOS.
>
> >>My google-fu is apparently not up to the task of locating a complete
> >>resource that has a list of the 256 possible values and their
> >>corresponding code pages.
>
> > What makes you imagine that all 256 possible values are mapped to code
> > pages?
>
> I'm just wanting to make sure I have whatever is available, and
> preferably standard.  :D
>
> >>So far I have found this, plus 
> >>variations:http://support.microsoft.com/kb/129631
>
> >>Does anyone know of anything more complete?
>
> > That is for VFP3. Try the VFP9 equivalent.
>
> > dBase 5,5,6,7 use others which are not defined in publicly available
> > dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
> > source: ESRI support site.
>
> Well, a couple hours later and still not more than I started with.
> Thanks for trying, though!

Huh? You got tips to (1) the VFP9 docs (2) the ESRI site (3) search
keywords and you couldn't come up with anything??
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode and dbf files

2009-10-22 Thread John Machin

On Oct 23, 7:28 am, Ethan Furman  wrote:
> Greetings, all!
>
> I would like to add unicode support to my dbf project.  The dbf header
> has a one-byte field to hold the encoding of the file.  For example,
> \x03 is code-page 437 MS-DOS.
>
> My google-fu is apparently not up to the task of locating a complete
> resource that has a list of the 256 possible values and their
> corresponding code pages.

What makes you imagine that all 256 possible values are mapped to code
pages?

> So far I have found this, plus 
> variations:http://support.microsoft.com/kb/129631
>
> Does anyone know of anything more complete?

That is for VFP3. Try the VFP9 equivalent.

dBase 5,5,6,7 use others which are not defined in publicly available
dBase docs AFAICT. Look for "language driver ID" and "LDID". Secondary
source: ESRI support site.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: ask help for a proble with invalid syntax

2009-10-13 Thread John Machin

On Oct 14, 9:09 am, leo zhao  wrote:
> I  try to a run a python numpy programe, however the python can't run
> this program.
> my python version is 2.6.2 , numpy  version is 1.3.0, however, the
> program can run in previous numpy version(1.2.0), who can help me to
> solve the problem, I will deeply appreciate!
> the program is below:
>
> import sys
> import os
> from datetime import *
> from random import *
> from numpy import *

Possibly nothing to do with your immediate problem, but 'from amodule
import *' is not recommended ... just import the objects that you
need.

> import py4cs.multipleloop as mp
>
> class ConsProd(object):
>     total_production =[0.0,0.0,0.0]
>     tech = 1.0
>     goods =['z','x','y']

Lists as class attributes? Are you sure that that's what you want?

>     def __init__(seld,identifier):

Typo above: seld instead of self. This and other problems mentioned
below make it hard to believe this code runs with any version of
anything.

>         self.identifier = identifier
>         self.demand_veector = array([0.0,0.0]

veector or vector?

You are missing a ")" from the end of the above line. That should
cause a syntax error when it hits the ":" in the next line.

>         if len(G.cps1) > number_of_1individuals:

G is not defined.
number_of_1individuals is not defined.

>                self.make = ConsProd.goods[0]
>                self.tech = ConsProd.tech
>                self.gross_production = (self. tech*G.L,0.0,0.0)
>                ConsProd.total_production[0] += self.gross_production
> [0]
>                G.cps1[self] = self.gross_production[0]

self as a dictionary key?? Have supplied a __hash__ method for the
class?

>         elif number_of_1individuals >= len(G.cps1) and len(G.cps2) <
> number_of_2indibiduals:

number_of_2indibiduals or number_of_2individuals?? In any case,
neither of these is defined.

>                self.make = ConsProd.goods[1]
>                self.tech = ConsProd.tech
>                self.gross_production = (0.0,self. tech*G.L,0.0)
>                ConsProd.total_production[1] += self.gross_production
> [1]
>                G.cps2[self] = self.gross_production[0]
>         else:
>                self.make = ConsProd.goods[2]
>                self.tech = ConsProd.tech
>                self.gross_production = (0.0,0.0,self. tech*G.L)
>                ConsProd.total_production[2] += self.gross_production
> [2]
>                G.cps3[self] = self.gross_production[2]
>
> the hint is the small window at python:

What is "the small window at python"??

>
> syntax error:
> There' an error in your program: invalid syntax.

I doubt that that is an exact copy of the error message. You haven't
included a copy of the line where the syntax error is alleged to
occur. If you provide an exact copy of the error message etc that you
see on your screen (use copy/paste), we should be able to help you.

Note that if you have a Python syntax error, it is very unlikely that
it would "work" with one version of Numpy and not "work" with another
version -- unless of course you have changed the source code between
Numpy versions.

HTH,
John
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why ELIF?

2009-10-11 Thread John Machin

MRAB  mrabarnett.plus.com> writes:

> 
> Simon Forman wrote:
> [snip]
> > 
> > I'll often do that this way:
> > 
> > args = re.split('\s', line)
> 
> This has the same result, but is shorter and quicker:
> 
> args = line.split()
> 

HUH?

Shorter and quicker, yes, but provides much better functionality; it's NOT the
same result:

 >>> line = '   aaa   bbb   ccc   '
 >>> import re
 >>> re.split('\s', line)
 ['', '', '', 'aaa', '', '', 'bbb', '', '', 'ccc', '', '', '']
 >>> line.split()
 ['aaa', 'bbb', 'ccc']
 >>>


-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Poll: Do you use csv.Sniffer?

2009-09-25 Thread John Machin


On 25/09/2009 7:04 PM, Tim Chase wrote:

Why do you need the sniffer? If your client can't do "save as" the
same way twice, just read the spreadsheets directly!


If I only had one contact and one client, it would be this easy...If you 
can get multiple points of contact at multiple client sites to reliably 
& competently agree on a format, what are you doing here on c.l.py 
instead of making your billions as a business-integration consultant? ;-)


Because like everyone else, I can't get the same contact at the same 
site to do the same thing twice in a row :-(


My point is that "save as CSV" is (a) a potentially lossy process and 
(b) an unnecessary step when you can read straight from the XLS file.

--
http://mail.python.org/mailman/listinfo/python-list

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3076 matches

Mail list logo