[issue21074] Too aggressive constant folding

2017-05-23 Thread Andrew Dalke

Andrew Dalke added the comment:

Again, I do not propose any changes to the existing optimizer. I do not need 
anything changed for my code to work.

My goal is to counter-balance comments which suggest that perfectly normal code 
is somehow folly and arcane. These caused me some bewilderment and self-doubt 
as I tried to establish that my test suite was not, in fact, poorly written. 
Others with the same issue should not face the same confusion. 

I especially do not want to see the years of experience with the current 
optimizer used to justify repeating the same decisions in some future AST-based 
optimizer. http://bugs.python.org/issue2506#msg64764 gives an example of how 
the lack of complaints over several years is used to argue against changing 
compiler behavior.

Terms like "folly" and "arcane" also suggest an outright rejection of 
considering to support in the future what seems like totally reasonable code. 

I realize now that there is a more immediately actionable item. I have just 
added #30440 as a request to document these effects. I have removed my name 
from its nosy list in hopes of reducing Raymond Hettinger's concerns about 
comfort and safety, and thus perhaps increase the likelihood that this will be 
documented.

"I apologize if you were offended", which I will take as being sincere, happens 
to also be one of the most common examples of an insincere apology. Bowing out 
when there is a reference to the CoC gives undue power to others, and hinders 
the ability to apply its spirit to all but the most egregious situations.

Even if I accept the idea that "sane" and "insane" have technical meanings, 
that does not exempt their use from questions about being considerate and 
respective. Django and others replaced their use of the technical terms 
"master" and "slave", following a trend which is at least 13 years old; see 
http://edition.cnn.com/2003/TECH/ptech/11/26/master.term.reut/ . Note that I am 
not proposing to avoid using the terms "sane" and "insane", only asserting that 
there is no clean exception for words which also have a technical sense or 
meaning, even when used for that technical sense.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21074>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30440] document peephole optimizer effects

2017-05-23 Thread Andrew Dalke

New submission from Andrew Dalke:

The peephole optimizer is an overall benefit to Python but it has some 
side-effects that occasionally cause problems. These are well-known in the 
issue tracker, but there is no other documentation which will help a Python 
programmer figure out which constructs may be a problem.

1) it will compute large integer constants and save them in the .pyc file. The 
following results in a 19M .pyc file. 

def compute_largest_known_prime():
  return 2**74207281 - 1

As an example of someone affected by the compile-time evaluation, see 
https://stackoverflow.com/questions/34113609/why-does-python-preemptively-hang-when-trying-to-calculate-a-very-large-number/
 . Note the many people who struggled to find definitive documentation.

2) it will create and discard large strings. Consider this variant of the code 
in test_zlib.py, where I have replaced the imported module variable "_1G" with 
its value:

@bigmemtest(size=_4G + 4, memuse=1, dry_run=False)
def test_big_buffer(self, size):
data = b"nyan" * (2**30 + 1)  # the original uses "_1G"
self.assertEqual(zlib.crc32(data), 1044521549)
self.assertEqual(zlib.adler32(data), 2256789997)

The byte compiler will create the ~4GB string then discard it, even though the 
function will not be called on machines with insufficient RAM.

As an example of how I was affected by this, see #21074 .

3) The optimizer affects control flow such that the coverage.py gives false 
positives about unreached code.

As examples of how people are affected, see #2506 and 
https://bitbucket.org/ned/coveragepy/issues/198/continue-marked-as-not-covered 
. The last item on the coverage.py tracker asks "Is this limitation documented 
anywhere?"

I do not believe that the current peephole optimizer should be changed to 
support these use cases, only that there should be documentation about how the 
optimizer may negatively affect real-world code.

The domain expert on this topic is Raymond Hettinger. He does not feel safe in 
issues where I am involved. As I believe my continued presence on this issue 
will inhibit the documentation changes which I think are needed, I will remove 
my name from this issue and not be further involved.

--
assignee: docs@python
components: Documentation
messages: 294248
nosy: dalke, docs@python
priority: normal
severity: normal
status: open
title: document peephole optimizer effects
versions: Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30440>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30440] document peephole optimizer effects

2017-05-23 Thread Andrew Dalke

Changes by Andrew Dalke <da...@dalkescientific.com>:


--
nosy:  -dalke

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30440>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21074] Too aggressive constant folding

2017-05-22 Thread Andrew Dalke

Andrew Dalke added the comment:

I do not think quoting the Zen of Python helps anything. As I wrote, "it gives 
different answers depending on where one draws the line." This includes 
"practicality beats purity".

>From my viewpoint, the peephole optimizer isn't going to change because the 
>core developers prioritize the purity of not adding special cases over the 
>practicality of supporting reasonable real-world code. Or the purity of the 
>long-awaited AST optimizer over the practicality of changing the existing, 
>fragile peephole optimizer.

I also appreciate the viewpoint that the practicality of a maintainable 
peephole optimizer beats the impossible purity of trying to support all use 
cases gracefully. My line, of course, only wants it to handle my use case, 
which is the issue reported here.

My goal from this is not to re-open the topic. It is to provide a 
counter-balance to opinions expressed here that place all blame onto the 
programmer whose 'folly' lead to 'arcane' and 'insane' code. (The 'insane' is 
used at http://bugs.python.org/issue30293#msg293172 as "Burdening the optimizer 
with insanity checks just slows down the compilation of normal, sane code.")

The use case pulled from my project, which is very near to the original report 
by INADA Naoki, seems entirely sane and not at all arcane. How else might one 
test 64-bit addressing than by constructing values which are over 4GB in 
length? Indeed, Python itself has similar test code. Quoting 
Lib/test/test_zlib.py:


# Issue #10276 - check that inputs >=4GB are handled correctly.
class ChecksumBigBufferTestCase(unittest.TestCase):

@bigmemtest(size=_4G + 4, memuse=1, dry_run=False)
def test_big_buffer(self, size):
data = b"nyan" * (_1G + 1)
self.assertEqual(zlib.crc32(data), 1044521549)
self.assertEqual(zlib.adler32(data), 2256789997)


Is the difference between happiness and "folly" really the difference between 
writing "_1G" and "2**30"? If so, how are people supposed to learn the true 
path? Is that not exactly the definition of 'arcane'?

The Code of Conduct which governs comments here requests that we be considerate 
and respective. Terms like 'folly' and 'arcane', at least for what I think is 
an entirely reasonable use case, seems to run counter to that spirit.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21074>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue30416] constant folding opens compiler to quadratic time hashing

2017-05-21 Thread Andrew Dalke

Andrew Dalke added the comment:

A complex solution is to stop constant folding when there are more than a few 
levels of tuples. I suspect there aren't that many cases where there are more 
than 5 levels of tuples and where constant creation can't simply be assigned 
and used as a module variable.

This solution would become even more complex should constant propagation be 
supported.

Another option is to check the value about to be added to co_consts. If it is a 
container, then check if it would require more than a few levels of hash calls. 
If so, then simply add it without ensuring uniqueness.

This could be implemented because the compiler could be told how to carry out 
that check for the handful of supported container types.

--

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30416>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21074] Too aggressive constant folding

2017-05-21 Thread Andrew Dalke

Andrew Dalke added the comment:

I know this issue was closed many years ago, and I don't propose re-opening it. 
I write this comment because some of the earlier comments here make it sound 
like only a foolish or perverse programmer might be affected by this 'too 
aggressive constant folding'. I'll provide a real-world example of how it 
affected me. It took me several hours to track it down, and even longer to 
decide that the fault shouldn't be solely attributed to poor coding practices 
on my side.

I recently updated a code base from Python 2.7 to Python 3.5+. It contains a C 
extension which handles 64-bit indexing. One of the test files, 
"test_large.py", exercises the API with multi-gigabyte strings. It typically 
takes about 10 minutes to run so it isn't part of the normal test suite. 
Instead, it's decorated with a @unittest.skipUnless(), and only enabled if the 
file is executed directly or if an environment variable is set.

The file creates about 25 multi-GB string constants, like 's = b"\xfe" * 
(2**32+1)'. Those alone require a minute to create, but that's acceptable 
overhead because these tests are usually skipped, and when not skipped are only 
10% of the total run-time. Here is an example extracted from my code; this 
tests the population count on a byte string:

RUN_ALL = "LARGE_TESTS" in os.environ
if __name__ ==  "__main__":
RUN_ALL = True

@unittest.skipUnless(RUN_ALL, "large tests not enabled; set LARGE_TESTS")
class LargeTests(unittest.TestSuite):
def test_popcount(self):
s = b"\xfe\xff" * (2**31 + 1)
self.assertEqual(bitops.byte_popcount(s), 15*(2**31 + 1))

if __name__ ==  "__main__":
unittest.main()

As part of the update I did a 'move function' refactoring across the code base 
and re-ran the tests. Unit test discovery seemed to hang and ^C didn't work. 
Investigation showed it was in __import__("test_large"), which was needed 
because I had changed code in test_large.py. I finally figured out it was due 
to constant folding, which created the string, found it was too large, and 
discarded it. Test discovery took a minute, even though all of the tests were 
marked as 'skip' and would not be called.

Once done, the compiler generated a .pyc file. I hadn't noticed the problem 
earlier because the .py file rarely changed, so rarely needed to be recompiled. 
It would have a bigger issue if I ran test_large.py directly, as that will 
always trigger the one minute compilation, even if I specified a test which 
didn't use them. (There were no bugs in 64-bit handling during the update so I 
didn't need to do that.)

I was going to report the issue, then found that INADA Naoki had reported 
almost exactly the same issue here, back in 2014.

I was bewildered by some of the comments here, because they seemed to suggest I 
was at fault for writing such poor quality code in the first place. Others may 
be in the same boat as me, so I'll make a few points to add some 
counter-balance.

"Are we really supposed to protect programmers from their own folly by 
second-guessing when constant folding might be required and when it might not?"

If there is a 'folly', it is shared with the developers of Python's 
constant-folding implementation who thought there wouldn't be a problem, and 
provide no mechanisms (like #2506 proposed in 2008 to disable optimization; 
also available in #26145) which might help people like me diagnose a problem. 
But I don't think there was any folly. There was an engineering decision that 
the benefits of constant folding outweighed the negatives. Just like in my case 
there was an engineering decision that constant expressions which worked in 
Python 2.5-2.7 didn't need to be made future-proof against improved 
constant-folding.

"How is the interpreter supposed to know the function isn't called?"

Perhaps a standard-library decorator which says that a function will be skipped 
when run as a unit test? But really, the question should be "how is the 
*byte-code compiler* supposed to know". This highlights a shift between the 
Python I started with, which left everything up to the run-time virtual machine 
interpreter, and the smarter compile-time optimizer we have now. As it gets 
smarter, we developers seem to need to know more about how the optimizer works 
in order to avoid unwanted side-effects. Currently this knowledge is 'arcane'.

"simply declare a manifest constant and use that instead"

The fundamental problem is there's no way for a programmer to create large 
constant value which is safe from a sufficiently smart compiler, and nothing 
which outlines how smart the compiler will get. Instead, people figure out what 
works operationally, but that's specific to a given CPython version.

My code ran into problems because Python's constant folding improved from under 
me. Even if I follow that advice, how do I avo

[issue30416] constant folding opens compiler to quadratic time hashing

2017-05-20 Thread Andrew Dalke

New submission from Andrew Dalke:

Others have reported issues like #21074 where the peephole compiler generates 
and discards large strings, and #30293 where it generates multi-MB integers and 
stores them in the .pyc.

This is a different issue. The code:

  def tuple20():
return 1,)*20,)*20,)*20,)*20,)*20,)*20,)*20,)*20

takes over four minutes (257 seconds) to compile on my machine. The seemingly 
larger:

  def tuple30():
return 1,)*30,)*30,)*30,)*30,)*30,)*30,)*30,)*30

takes a small fraction of a second to compile, and is equally fast to run.

Neither code generates a large data structure. In fact, they only needs about 
1K.

A sampling profiler of the first case, taken around 30 seconds into the run, 
shows that nearly all of the CPU time is spent in computing the hash of the 
highly-nested tuple20, which must visit a very large number of elements even 
though there are only a small number of unique elements. The call chain is:

Py_Main -> PyRun_SimpleFileExFlags -> PyAST_CompileObject -> compiler_body -> 
compiler_function -> compiler_make_closure -> compiler_add_o -> PyDict_GetItem 
and then into the tuple hash code.

It appears that the peephole optimizer converts the highly-nested tuple20 into 
a constant value. The compiler then creates the code object with its co_consts. 
Specifically, compiler_make_closure uses a dictionary to ensure that the 
elements in co_consts are unique, and mapped to the integer used by LOAD_CONST.

It takes about 115 seconds to compute hash(tuple20). I believe the hash is 
computed twice, once to check if the object is present, and the second to 
insert it. I suspect most of the other 26 seconds went to computing the 
intermediate constants in the tuple.

Based on the previous issues I highlighted in my first paragraph, I believe 
this report will be filed under "Doctor, doctor, it hurts when I do this"/"Then 
don't do it." I see no easy fix, and cannot think of how it would come about in 
real-world use.

I point it out because in reading the various issues related to the peephole 
optimizer there's a subset of people who propose a look-before-you-leap 
technical solution of avoiding an optimization where the estimated result is 
too large. While it does help, it does not avoid all of the negatives of the 
peephole optimizer, or any AST-based optimizer with similar capabilities. I 
suspect even most core developers aren't aware of this specific negative.

--
components: Interpreter Core
messages: 294050
nosy: dalke
priority: normal
severity: normal
status: open
title: constant folding opens compiler to quadratic time hashing
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue30416>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue29211] assertRaises with exceptions re-raised from a generator kills generator

2017-01-08 Thread Andrew Dalke

New submission from Andrew Dalke:

The unittest assertRaises/assertRaisesRegex implementation calls 
traceback.clear_frames() because of issue9815 ("assertRaises as a context 
manager keeps tracebacks and frames alive").

However, if the traceback is from an exception created in a generator, caught, 
and re-raised outside of the generator, then the clear_frames() will cause the 
generator to raise a StopIteration exception the next time it is used.

Here is a reproducible where I create a generator and wrap it inside of an 
object API:

def simple_gen():
yield 1, None
try:
1/0
except ZeroDivisionError as err:
yield None, err
yield 3, None

class Spam:
def __init__(self):
self.gen = simple_gen()
def get_next(self):
value, err = next(self.gen)
if err is not None:
raise err
return value

I can test this without unittest using the following:

def simple_test():
spam = Spam()
assert spam.get_next() == 1
try:
spam.get_next()
except ZeroDivisionError:
pass
else:
raise AssertionError
assert spam.get_next() == 3
print("simple test passed")

simple_test()


This prints "simple test passed", as expected.

The unittest implementation is simpler:

import unittest

class TestGen(unittest.TestCase):
def test_gen(self):
spam = Spam()
self.assertEqual(spam.get_next(), 1)
with self.assertRaises(ZeroDivisionError):
spam.get_next()
self.assertEqual(spam.get_next(), 3)

unittest.main()

but it reports an unexpected error:

==
ERROR: test_gen (__main__.TestGen)
--
Traceback (most recent call last):
 File "clear.py", line 40, in test_gen
   self.assertEqual(spam.get_next(), 3)
 File "clear.py", line 13, in get_next
   value, err = next(self.gen)
StopIteration

--
Ran 1 test in 0.000s

FAILED (errors=1)

I have tracked it down to the call to traceback.clear_frames(tb) in 
unittest/case.py. The following ClearFrames context manager will call 
traceback.clear_frames() if requested. The test code uses ClearFrames to 
demonstrate that the call to clear_frames() is what causes the unexpected 
StopIteration exception:


import traceback

class ClearFrames:
   def __init__(self, clear_frames):
   self.clear_frames = clear_frames
   def __enter__(self):
   return self

   def __exit__(self, exc_type, exc_value, tb):
   assert exc_type is ZeroDivisionError, exc_type
   if self.clear_frames:
   traceback.clear_frames(tb)  # This is the only difference between 
the tests.
   return True

# This is essentially the same test case as before, but structured using
# a context manager that either does or does not clear the traceback frames.
def clear_test(clear_frames):
spam = Spam()
assert spam.get_next() == 1
with ClearFrames(clear_frames):
spam.get_next()
try:
assert spam.get_next() == 3
except StopIteration:
print(" ... got StopIteration")
return
print(" ... clear_test passed")

print("\nDo not clear frames")
clear_test(False)
print("\nClear frames")
clear_test(True)


The output from this test is:

Do not clear frames
 ... clear_test passed

Clear frames
 ... got StopIteration

There are only a dozen or so tests in my code which are affected by this. 
(These are from a test suite which I am porting from 2.7 to 3.5.) I can easily 
re-write them to avoid using assertRaisesRegex.

I have no suggestion for a longer-term solution.

--
components: Library (Lib)
messages: 285006
nosy: dalke
priority: normal
severity: normal
status: open
title: assertRaises with exceptions re-raised from a generator kills generator
type: behavior
versions: Python 3.5

___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue29211>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue23455] file iterator deemed broken; can resume after StopIteration

2015-02-12 Thread Andrew Dalke

New submission from Andrew Dalke:

The file iterator is deemed broken. As I don't think it should be made 
non-broken, I suggest the documentation should be changed to point out when 
file iteration is broken. I also think the term 'broken' is a label with 
needlessly harsh connotations and should be softened.

The iterator documentation uses the term 'broken' like this (quoting here from 
https://docs.python.org/3.4/library/stdtypes.html):

  Once an iterator’s __next__() method raises StopIteration,
  it must continue to do so on subsequent calls. Implementations
  that do not obey this property are deemed broken.

(Older versions comment This constraint was added in Python 2.3; in Python 
2.2, various iterators are broken according to this rule.)

An IOBase is supposed to support the iterator protocol (says 
https://docs.python.org/3.4/library/io.html#io.IOBase ). However, it does not, 
nor does the documentation say that it's broken in the face of a changing file 
(eg, when another process appends to a log file).

  % ./python.exe 
  Python 3.5.0a1+ (default:4883f9046b10, Feb 11 2015, 04:30:46) 
  [GCC 4.8.4] on darwin
  Type help, copyright, credits or license for more information.
   f = open(empty)
   next(f)
  Traceback (most recent call last):
File stdin, line 1, in module
  StopIteration
  
   ^Z
  Suspended
  % echo Hello!  empty
  % fg
  ./python.exe

   next(f)
  'Hello!\n'

This is apparently well-known behavior, as I've come across several references 
to it on various Python-related lists, including this one from Miles in 2008: 
https://mail.python.org/pipermail/python-list/2008-September/491920.html .

  Strictly speaking, file objects are broken iterators:

Fredrik Lundh in the same thread ( 
https://mail.python.org/pipermail/python-list/2008-September/521090.html ) says:

  it's a design guideline, not an absolute rule

The 7+ years of 'broken' behavior in Python suggests that /F is correct. But 
while 'broken' could be considered a meaningless label, it carries with it some 
rather negative connotations. It sounds like developers are supposed to make 
every effort to avoid broken code, when that's not something Python itself 
does. It also means that my code can be called broken solely because it 
assumed Python file iterators are non-broken. I am not happy when people say my 
code is broken.

It is entirely reasonable that a seek(0) would reset the state and cause 
next(it) to not continue to raise a StopIteration exception. However, errors 
can arise when using Python file objects, as an iterator, to parse a log file 
or any other files which are appended to by another process.

Here's an example of code that can break. It extracts the first and last 
elements of an iterator; more specifically, the first and last lines of a file. 
If there are no lines it returns None for both values; and if there's only one 
line then it returns the same line as both values.

  def get_first_and_last_elements(it):
first = last = next(it, None)
for last in it:
pass
return first, last

This code expects a non-broken iterator. If passed a file, and the file were 1) 
initially empty when the next() was called, and 2) appended to by the time 
Python reaches the for loop, then it's possible for first value to be None 
while last is a string.

This is unexpected, undocumented, and may lead to subtle errors.

There are work-arounds, like ensuring that the StopIteration only occurs once:

  def get_first_and_last_elements(it):
first = last = next(it, None)
if last is not None:
for last in it:
pass
return first, last

but much existing code expects non-broken iterators, such as the Python example 
implementation at 
https://docs.python.org/2/library/itertools.html#itertools.dropwhile . (I have 
a reproducible failure using it, a fork(), and a file iterator with a sleep() 
if that would prove useful.)

Another option is to have a wrapper around file object iterators to keep 
raising StopIteration, like:

   def safe_iter(it):
   yield from it

   # -or-  (line for line in file_iter)

but people need to know to do this with file iterators or other potentially 
broken iterators. The current documentation does not say when file iterators 
are broken, and I don't know which other iterators are also broken.

I realize this is a tricky issue.

I don't think it's possible now to change the file's StopIteration behavior. I 
expect that there is code which depends on the current brokenness, the ability 
to seek() and re-iterate is useful, and the idea that next() returns text if 
and only if readline() is not empty is useful and well-entrenched. Pypy has the 
same behavior as CPython so any change will take some time to propagate to the 
other implementations.

Instead, I'm fine with a documentation change in io.html . It currently says:

  IOBase (and its subclasses) support the iterator protocol,
  meaning that an IOBase object can be iterated over yielding

[issue21523] quadratic-time compilation in the number of 'and' or 'or' expressions

2014-05-21 Thread Andrew Dalke

Andrew Dalke added the comment:

Live and learn. I did my first bisect today.

The first bad revision is:
changeset:   51920:ef8fe9088696
branch:  legacy-trunk
parent:  51916:4e1556012584
user:Jeffrey Yasskin jyass...@gmail.com
date:Sat Feb 28 19:03:21 2009 +
summary: Backport r69961 to trunk, replacing JUMP_IF_{TRUE,FALSE} with

I confirmed that the parent did not have the problem.

If you want me to diagnose this further, then I'll need some hints on what to 
do next.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21523
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue21523] quadratic-time compilation in the number of 'and' or 'or' expressions

2014-05-18 Thread Andrew Dalke

New submission from Andrew Dalke:

Python's compiler has quadratic-time time behavior based on the number of and 
or or expressions. A profile shows that stackdepth_walk is calling itself in 
a stack at least 512 levels deep. (My profiler doesn't go higher than that.)

I've reduced it to a simple test case. Compiling functions of the form

def f(x):
x * x  # Repeat N times

takes linear time in the number of lines N, while functions of the form

def g(x):
x and x  # Repeat N times

takes quadratic time in N. Here's an example of running the attached 
demonstration code on a fresh build of Python from version control:

Results from 3.5.0a0 (default:de01f7c37b53, May 18 2014, 13:18:43)  

  numusing  using
 tests'*'   'and'
   100   0.002  0.002
   200   0.003  0.004
   400   0.005  0.010
   800   0.012  0.040
  1600   0.023  0.133
  3200   0.042  0.906
  6400   0.089  5.871
 12800   0.188  27.581
 25600   0.404  120.800

The same behavior occurs when I replace 'and' with 'or'.

The same behavior also occurs under Python 2.7.2, 3.3.5, 3.4.0. (I don't have 
builds of 3.1 or 3.2 for testing.)

However, the demonstration code shows linear time under Python 2.6.6:

Results from 2.6.6 (r266:84374, Aug 31 2010, 11:00:51)  

  numusing  using
 tests'*'   'and'
   100   0.003  0.001
   200   0.002  0.002
   400   0.006  0.008
   800   0.010  0.010
  1600   0.019  0.022
  3200   0.039  0.045
  6400   0.085  0.098
 12800   0.176  0.203
 25600   0.359  0.423
 51200   0.726  0.839

I came across this problem because my code imports a large machine-generated 
module. It was originally written for Python 2.6, where it worked just fine. 
When I tried to import it under new Pythons, the import appeared to hang, and I 
killed it after a minute or so.

As a work-around, I have re-written the code generator to use chained 
if-statements instead of the short-circuit and operator.

--
components: Interpreter Core
files: quadratic_shortcircuit_compilation.py
messages: 218742
nosy: dalke
priority: normal
severity: normal
status: open
title: quadratic-time compilation in the number of 'and' or 'or' expressions
type: performance
versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5
Added file: 
http://bugs.python.org/file35279/quadratic_shortcircuit_compilation.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue21523
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7827] recv_into() argument 1 must be pinned buffer, not bytearray

2012-02-04 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

It does look like #8104 resolved it. I tested on 2.7.2 and verified that it's 
no longer a problem, so I moved this to closed/duplicate.

--
resolution:  - duplicate
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7827
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13653] reorder set.intersection parameters for better performance

2011-12-23 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

My belief is that the people who use set.intersection with more than two terms 
are 1) going to pass in a list of sets, and 2) don't care about the specific 
order.

To check the validity of my belief, I did a Google Code Search to find cases of 
people using set intersection in Python. I searched for set\.intersection\(\* 
and \.intersection\(.*\, lang:^python$, among others.

I am sad to report that the most common way to compute set.intersection(*list) 
is by using reduce, like:

possible = (set(index[c]) for c in set(otp))
possible = reduce(lambda a, b: a.intersection(b), possible)


That comes from:
  git://github.com/Kami/python-yubico-client.git /yubico/modhex.py
and similar uses are in:
  git://github.com/sburns/PyCap.git /redcap/rc.py
  http://hltdi-l3.googlecode.com/hg//xdg/languages/morpho/fst.py
  http://dsniff.googlecode.com/svn/trunk/dsniff/lib/fcap.py


As well as in the Rosetta Code example for a simple inverted index, at:
  http://rosettacode.org/wiki/Inverted_index#Python

This was also implemented more verbosely in:

http://eats.googlecode.com/svn/trunk/server/eats/views/main.py
intersected_set = sets[0]
for i in range(1, len(sets)):
intersected_set = intersected_set.intersection(sets[i])

and 

http://iocbio.googlecode.com/svn/trunk/iocbio/microscope/cluster_tools.py
s = set (range (len (data[0])))
for d in zip(*data):
s = s.intersection(set(find_outliers(d, zoffset=zoffset)))
return sorted(s)

In other words, 7 codebases use manual pairwise reduction rather than use the 
equivalent code in Python. (I have not checked for which are due to backwards 
compatibility requirements.)

On the other hand, if someone really wants to have a specific intersection 
order, this shows that it's very easy to write.


I found 4 other code bases where set intersection was used for something other 
than binary intersection, and used the built-in intersection().



git://github.com/valda/wryebash.git/experimental/bait/bait/presenter/impl/filters.py
def get_visible_node_ids(self, filterId):
if filterId in self.idMask:
visibleNodeIdSets = [f.get_visible_node_ids(filterId) for f in 
self._filters]
return set.intersection(*[v for v in visibleNodeIdSets if v is not 
None])
return None



http://wallproxy.googlecode.com/svn/trunk/local/proxy.py
if threads[ct].intersection(*threads.itervalues()):
raise ValueError('All threads failed')
(here, threads' values contain sets)



git://github.com/argriffing/xgcode.git/20100623a.py
header_sets = [set(x) for x in header_list]
header_intersection = set.intersection(*header_sets)




http://pyvenn.googlecode.com/hg//venn.py
to_exclude = set()
for ii in xrange(0, len(self.sets)):
if (i  (2**ii)):
sets_to_intersect.append(sets_by_power_of_two[i  (2**ii)])
else:
to_exclude = to_exclude.union(sets_by_power_of_two[(2**ii)])
final = set.intersection(*sets_to_intersect) - to_exclude



These all find the intersection of sets (not iterators), and the order of 
evaluation does not appear like it will affect the result.

I do not know though if there will be a performance advantage in these cases to 
reordering. I do know that in my code, and any inverted index, there is an 
advantage.

And I do know that the current CPython implementation has bad worst-case 
performance.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13653
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13653] reorder set.intersection parameters for better performance

2011-12-22 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

In Issue3069, Arnaud Delobelle proposed support for multiple values to 
set.intersection() and set.union(), writing Intersection is optimized by 
sorting all sets/frozensets/dicts in increasing order of size and only 
iterating over elements in the smallest.

Raymond Hettinger commented therein that he had just added support for multiple 
parameters. However, he did not pick up the proposed change in the attached 
patch which attempts to improve the intersection performance.

Consider the attached benchmark, which constructs an inverted index mapping a 
letter to the set of words which contain that letter. (Rather, to word index.) 
Here's the output:

## Example output:
# a has 144900 words
# j has 3035 words
# m has 62626 words
# amj takes 5.902/1000 (verify: 289)
# ajm takes 0.292/1000 (verify: 289)
# jma takes 0.132/1000 (verify: 289)


Searching set.intersection(inverted_index[j], inverted_index[m], 
inverted_index[a]) is fully 44 times faster than searching a, m, j!

Of course, the set.intersection() supports any iterable, so would only be an 
optimization for when all of the inputs are set types.

BTW, my own experiments suggest that sorting isn't critical. It's more 
important to find the most anti-correlated set to the smallest set, and the 
following does that dynamically by preferentially choosing sets which are 
likely to not match elements of the smallest set:

def set_intersection(*input_sets):
N = len(input_sets)
min_index = min(range(len(input_sets)), key=lambda x: len(input_sets[x]))
best_mismatch = (min_index+1)%N

new_set = set()
for element in input_sets[min_index]:
# This failed to match last time; perhaps it's a mismatch this time?
if element not in input_sets[best_mismatch]:
continue

# Scan through the other sets
for i in range(best_mismatch+1, best_mismatch+N):
j = i % N
if j == min_index:
continue
# If the element isn't in the set then perhaps this
# set is a better rejection test for the next input element
if element not in input_sets[j]:
best_mismatch = j
break
else:
# The element is in all of the other sets
new_set.add(element)
return new_set


Using this in the benchmark gives

amj takes 0.972/1000 (verify: 289)
ajm takes 0.972/1000 (verify: 289)
jma takes 0.892/1000 (verify: 289)

which clearly shows that this Python algorithm is still 6 times faster (for the 
worst case) than the CPython code.

However, the simple sort solution:


def set_intersection_sorted(*input_sets):
input_sets = sorted(input_sets, key=len)
new_set = set()
for element in input_sets[0]:
if element in input_sets[1]:
if element in input_sets[2]:
new_set.add(element)
return new_set

gives times of 

amj takes 0.492/1000 (verify: 289)
ajm takes 0.492/1000 (verify: 289)
jma takes 0.422/1000 (verify: 289)

no doubt because there's much less Python overhead than my experimental 
algorithm.

--
components: Interpreter Core
files: set_intersection_benchmark.py
messages: 150124
nosy: dalke
priority: normal
severity: normal
status: open
title: reorder set.intersection parameters for better performance
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file24081/set_intersection_benchmark.py

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue13653
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1602133] non-framework built python fails to define environ properly

2011-07-17 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

I confirm that under Python 2.7.2 while trying to build a 3rd-party package 
(from rdkit.org) I get the error


Linking CXX shared library ../../lib/libRDBoost.dylib
ld: warning: path '/usr/local/lib/libpython2.7.a' following -L not a directory
Undefined symbols:
  _environ, referenced from:
  _initposix in libpython2.7.a(posixmodule.o)
 (maybe you meant: cstring=ignore_environment)
ld: symbol(s) not found
collect2: ld returned 1 exit status

My Python-2.7 was configured with ./configure and is not a framework install. 
I applied the patch to my local 2.7 copy and the third party package builds 
without a problem.

--
nosy: +dalke

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue1602133
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10809] complex() comments wrongly say it supports NaN and inf

2011-01-02 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

complex(nan) raises ValueError: complex() arg is a malformed string  while 
complex(float(nan)) returns (nan+0j). This was reported in 
http://bugs.python.org/issue2121 with the conclusion wont fix.

complex(inf) has the same behaviors.

The implementation in complexobject.c says 


/* a valid complex string usually takes one of the three forms:

 float  - real part only
 floatj - imaginary part only
 floatsigned-floatj   - real and imaginary parts

   where float represents any numeric string that's accepted by the
   float constructor (including 'nan', 'inf', 'infinity', etc.), and
   signed-float is any string of the form float whose first
   character is '+' or '-'.

This comment is wrong and it distracted me for a while as I tried to figure out 
why complex(nan) wasn't working. It should be fixed, with the word 
including replaced by excluding.

I don't have a real need for complex(nan) support - this was of intellectual 
interest only. Also of intellectual interest, PyPy 1.4 does accept 
complex(nan) but converts complex(nan+nanj) to (nannanj), so it suffers 
from the strange corner cases which Raymond points out when advocating for 
wont fix.

Because

--
assignee: d...@python
components: Documentation
messages: 125104
nosy: dalke, d...@python
priority: normal
severity: normal
status: open
title: complex() comments wrongly say it supports NaN and inf
versions: Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10809] complex() comments wrongly say it supports NaN and inf

2011-01-02 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

Well that's ... interesting. While I compiled 2.7 and was looking at the 2.7 
code my tests were against 2.6. 


Python 2.7 (trunk:74969:87651M, Jan  2 2011, 21:58:12) 
[GCC 4.2.1 (Apple Inc. build 5664)] on darwin
Type help, copyright, credits or license for more information.
 complex(nan-nanj)
(nan+nanj)
 

This means that the comments are correct and the error was in my understanding, 
as influenced by issue2121.

I therefore closed this.

--
resolution:  - out of date
status: open - closed

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10809
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10698] doctest load_tests() typo

2010-12-13 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

doctest.html Section 24.2.5 Unittest API says:


def load_tests(loader, tests, ignore):
tests.addTests(doctest.DocTestSuite(my_module_with_doctests))
return test

That last line should be return tests

--
assignee: d...@python
components: Documentation
messages: 123904
nosy: dalke, d...@python
priority: normal
severity: normal
status: open
title: doctest load_tests() typo
versions: Python 3.2

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue10698
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



GothPyCon - Gothenburg Python Conference

2010-04-12 Thread Andrew Dalke
The Gothenburg, Sweden Python User's Group (GothPy) will host our
first ever GothPyCon on 29 May 2010.

For details see http://www.meetup.com/GothPy/calendar/13107391/ .

The meeting will have normal length talks, lightning talks, and
breakout groups in the afternoon for sprinting, code katas, demos, or
whatever you can come up with.

If you wish to present something or have questions, please email the
organizers at gothpy...@dalkescientific.com . Deadline for full-
lengths talks is 14 May. Everything else can be arranged while at the
conference.

There will be a fee of about 150 kronor to cover lunch and fika and
similar costs.

Vi ses där!

Andrew Dalke
da...@dalkescientific.com
-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations/


Re: recv_into(bytearray) complains about a pinned buffer

2010-02-01 Thread Andrew Dalke
On Feb 2, 12:12 am, Martin v. Loewis wrote:
 My recommendation would be to not use recv_into in 2.x, but only in 3.x.

 I don't think that's the full solution. The array module should also
 implement the new buffer API, so that it would also fail with the old
 recv_into.

Okay. But recv_into was added in 2.5 and the test case in
2.6's test_socket.py clearly allows an array there:


def testRecvInto(self):
buf = array.array('c', ' '*1024)
nbytes = self.cli_conn.recv_into(buf)
self.assertEqual(nbytes, len(MSG))
msg = buf.tostring()[:len(MSG)]
self.assertEqual(msg, MSG)

Checking koders and Google Code search engines, I found one project
which used recv_into, with the filename bmpreceiver.py . It
uses a array.array(B, [0] * length) .

Clearly it was added to work with an array, and it's
being used with an array. Why shouldn't people use it
with Python 2.x?

Andrew
da...@dalkescientific.com
-- 
http://mail.python.org/mailman/listinfo/python-list


[issue7827] recv_into() argument 1 must be pinned buffer, not bytearray

2010-02-01 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

Since I see the change to test needed, I've attached a diff against Python 
2.6's test_socket.py. I would have generated one against the 2.7 version in 
subversion but that test doesn't exit.

--
keywords: +patch
Added file: http://bugs.python.org/file16082/test_socket.py.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7827
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



recv_into(bytearray) complains about a pinned buffer

2010-01-31 Thread Andrew Dalke
In Python 2.6 I can't socket.recv_into(a byte array instance). I get a
TypeError which complains about a pinned buffer. I have only an
inkling of what that means. Since an array.array(b) works there, and
since it works in Python 3.1.1, and since I thought the point of a
bytearray was to make things like recv_into easier, I think this
exception is a bug in Python 2.6.

I want to double check before posting it to the tracker.

Here's my reproducibles:

Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51)
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin
Type help, copyright, credits or license for more information.
 import socket
 sock = socket.socket()
 sock.connect( (python.org, 80) )
 sock.send(bGET / HTTP/1.0\r\n\r\n)
18
 buf = bytearray(b  * 10)
 sock.recv_into(buf)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: recv_into() argument 1 must be pinned buffer, not bytearray


I expected a bytearray to work there. In fact, I thought the point of
bytearray was to allow this to work.

By comparison, an array of bytes does work:

 import array
 arr = array.array(b)
 arr.extend(map(ord, This is a test))
 len(arr)
14
 sock.recv_into(arr)
14
 arr
array('b', [72, 84, 84, 80, 47, 49, 46, 49, 32, 51, 48, 50, 32, 70])
 .join(map(chr, arr))
'HTTP/1.1 302 F'

I don't even know what a pinned buffer means, and searching
python.org isn't helpful.

Using a bytearray in Python 3.1.1 *does* work:

Python 3.1.1 (r311:74480, Jan 31 2010, 23:07:16)
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type help, copyright, credits or license for more information.
 import socket
 sock = socket.socket()
 sock.connect( (python.org, 80) )
 sock.send(bGET / HTTP/1.0\r\n\r\n)
18
 buf = bytearray(b  * 10)
 sock.recv_into(buf)
10
 buf
bytearray(b'HTTP/1.1 3')


Is this a bug in Python 2.6 or a deliberate choice regarding
implementation concerns I don't know about?

If it's a bug, I'll add it to the tracker.

Andrew Dalke
da...@dalkescientific.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: recv_into(bytearray) complains about a pinned buffer

2010-01-31 Thread Andrew Dalke
On Feb 1, 1:04 am, Antoine Pitrou solip...@pitrou.net wrote:
 The problem is that socket.recv_into() in 2.6 doesn't recognize the new
 buffer API which is needed to accept bytearray objects.
 (it does in 3.1, because the old buffer API doesn't exist anymore there)

That's about what I thought it was, but I don't know if this was a
deliberate choice or accidental.

BTW, 2.7 (freshly built from version control) also has the same
exception.

 You could open an issue on the bug tracker for this.

I've done that. It's http://bugs.python.org/issue7827 .

Cheers!
Andrew
da...@dalkescientific.com

-- 
http://mail.python.org/mailman/listinfo/python-list


[issue7827] recv_into() argument 1 must be pinned buffer, not bytearray

2010-01-31 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

In Python 2.6 and Python 2.7a2+, I can't socket.recv_into(a byte array 
instance).

I get a TypeError which complains about a pinned buffer. I have only an 
inkling of what that means. Since an array.array(b) works there, and  since 
it works in Python 3.1.1, and since I thought the point of a bytearray was to 
make things like recv_into easier, I think this exception is a bug in Python 
2.6 and 2.7.

Here's my reproducibles: 

Python 2.6.1 (r261:67515, Jul  7 2009, 23:51:51) 
[GCC 4.2.1 (Apple Inc. build 5646)] on darwin 
Type help, copyright, credits or license for more information. 
 import socket 
 sock = socket.socket() 
 sock.connect( (python.org, 80) ) 
 sock.send(bGET / HTTP/1.0\r\n\r\n) 
18 
 buf = bytearray(b  * 10) 
 sock.recv_into(buf) 

Traceback (most recent call last): 
  File stdin, line 1, in module 
TypeError: recv_into() argument 1 must be pinned buffer, not bytearray 

I expected a bytearray to work there. In fact, I thought the point of 
bytearray was to allow this to work. 
By comparison, an array of bytes does work: 
 import array 
 arr = array.array(b) 
 arr.extend(map(ord, This is a test)) 
 len(arr) 
14 
 sock.recv_into(arr) 
14 
 arr
array('b', [72, 84, 84, 80, 47, 49, 46, 49, 32, 51, 48, 50, 32, 70]) 
 .join(map(chr, arr))
'HTTP/1.1 302 F' 

I don't even know what a pinned buffer means, and searching 
python.org isn't helpful.

Using a bytearray in Python 3.1.1 *does* work: 
Python 3.1.1 (r311:74480, Jan 31 2010, 23:07:16) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin 
Type help, copyright, credits or license for more information. 
 import socket 
 sock = socket.socket() 
 sock.connect( (python.org, 80) ) 
 sock.send(bGET / HTTP/1.0\r\n\r\n) 
18 
 buf = bytearray(b  * 10) 
 sock.recv_into(buf) 
10 
 buf 
bytearray(b'HTTP/1.1 3') 

For reference, here's an example with 2.7a2+ (freshly built out of version 
control) showing that it does not work there.

Python 2.7a2+ (trunk:74969:77901M, Feb  1 2010, 02:44:24) 
[GCC 4.2.1 (Apple Inc. build 5646) (dot 1)] on darwin
Type help, copyright, credits or license for more information.
 import socket
 sock = socket.socket()
 sock.connect( (python.org, 80)  )
 b = bytearray(b  * 10)
 sock.recv_into(b)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: recv_into() argument 1 must be pinned buffer, not bytearray


--
components: IO
messages: 98644
nosy: dalke
severity: normal
status: open
title: recv_into() argument 1 must be pinned buffer, not bytearray
type: behavior
versions: Python 2.6, Python 2.7

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7827
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7192] webbrowser.get(firefox) does not work on Mac with installed Firefox

2009-10-23 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

I have Firefox and Safari installed on my Mac. Safari is the default.

I wanted to try out Crunchy (http://code.google.com/p/crunchy/). It's 
developed under Firefox and does not work under Safari. I tried. ;)

It starts the web browser with the following.

try:
client = webbrowser.get(firefox)
client.open(url)
return
except:
try:
client = webbrowser.get()
client.open(url)
return
except:
print('Please open %s in Firefox.' % url)

On my Mac, webbrowser.get(firefox) fails, so this ends up opening in 
Safari. Which does not work to view the code.

Thing is, I have Firefox installed, so it should work. But the Mac code in 
webbrowser appears to only open in the default browser.

The following bit of code works well enough to get crunchy to work

class MacOSXFirefox(BaseBrowser):
def open(self, url, new=0, autoraise=True):
subprocess.check_call([/usr/bin/open, -b, 
org.mozilla.firefox, url])

register(firefox, None, MacOSXFirefox('firefox'), -1)

but I don't know enough about the Mac nor about webbrowser to know if I'm 
the right path. For example, I don't know if there are ways to support 
'new' and 'autoraise' through /usr/bin/open or if there's a better 
solution.

Attached is the full diff.

--
components: Library (Lib)
files: webbrowser.py.diff
keywords: patch
messages: 94387
nosy: dalke
severity: normal
status: open
title: webbrowser.get(firefox) does not work on Mac with installed Firefox
type: feature request
Added file: http://bugs.python.org/file15188/webbrowser.py.diff

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7192
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7172] BaseHTTPServer.BaseHTTPRequestHandler.responses[405] has a small mistake

2009-10-19 Thread Andrew Dalke

New submission from Andrew Dalke da...@dalkescientific.com:

BaseHTTPServer.BaseHTTPRequestHandler.responses contains a mapping from 
HTTP status codes to the 2-ple (shortmessage, longmessage), based on RFC 
2616.

The 2-ple for 405 is ('Method Not Allowed','Specified method is invalid 
for this server.'),

RFC 405 says An origin server SHOULD return the status code 405 (Method 
Not Allowed) if the method is known by the origin server but not allowed 
for the requested resource.

I think the message should be Specified method is invalid for this 
resource. That is, change server to resource.

--
components: Library (Lib)
messages: 94262
nosy: dalke
severity: normal
status: open
title: BaseHTTPServer.BaseHTTPRequestHandler.responses[405] has a small mistake
type: feature request

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7172
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7172] BaseHTTPServer.BaseHTTPRequestHandler.responses[405] has a small mistake

2009-10-19 Thread Andrew Dalke

Andrew Dalke da...@dalkescientific.com added the comment:

Wasn't thinking. I'm not quoting from RFC 405, I'm quoting the 405 
section from RFC 2616.

--

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue7172
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Python training for cheminformatics, Leipzig, 27-29 April

2009-03-12 Thread Andrew Dalke

My next training course on Python for cheminformatics will
be in Leipzig, Germany on 27-29 April. For full details see

  http://dalkescientific.com/training/

The schedule for the three day course is

Day 1: overview of Python and OEChem,
Day 2: plotting with matplotlib, communicating with Excel,
   XML processing, calling command-line programs, numeric
   computing with NumPy and R.
Day 3: SQL databases and web development with Django.

The course is designed for working computational chemists
who know how to do some programming and want more training
in how to use Python effectively for their research.

The examples and hands-on exercises are all drawn
from cheminformatics.

If you have any questions or to register, please contact me.


Andrew
da...@dalkescientific.com


--
http://mail.python.org/mailman/listinfo/python-announce-list

   Support the Python Software Foundation:
   http://www.python.org/psf/donations.html


Python training in cheminformatics

2008-11-10 Thread Andrew Dalke
I will be teaching several training courses in
Python programming, designed for cheminformatics
researchers who want to be more effective at the
software side of their work. These courses will
be in San Francisco in December, in Leipzig in
March and in Boston in April.

The next course is in San Francisco on December
4-5, 2008 and space is still available. Topics I
will cover include:

  - an overview of the Python language
  - the IPython interactive shell
  - plotting with matplotlib
  - OpenEye's OEChem
  - parsing CSV, SMILES and SD files with Python
   and OEChem
  - substructure matching with SMARTS, using OEChem
  - calling other programs
  - working with a SQL database


I am planning a course for Leipzig on 2-4 March 2009.
This three-day course will cover a few additional topics,
like working with Excel, and include more time for
hands-on and self-directed work.

Please contact me if you are interested and I'll notify
you on the details when they are finalized.


I have started planning a course for Boston in April,
2009 and am working on finding a location and time.
Please contact me if you are interested in this class
or have a suggestion for a location.


All courses are limited to 8 people. Registration
includes all teaching materials, coffee breaks, and
lunch. For full details including course topics and
prerequisite experience, see

  http://dalkescientific.com/training/ .

--
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-09-11 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

I'm still undecided on if this is a bug or not.  The problem occurs even 
when I'm not reading data from a file of an unknown size.  My example 
causes a MemoryError on my machine even though the file I'm reading 
contains 0 bytes.

The problem is Python's implementation is alloc the requested bytes and 
truncate if needed vs what I expected read chunks at a time up to the 
requested number of bytes.  There's nothing in the documentation which 
states the implementation, although Note that this method may call the 
underlying C function fread more than once in an effort to acquire as 
close to size bytes as possible. leans slightly towards my 
interpretation.

I looked a little for real-world cases that could cause a denial-of-
service attack but didn't find one.

If there is a problem, it will occur very rarely.  Go ahead an mark it 
as will not fix or something similar.  I don't think the change in the 
code is justifiable.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



Python training in Cheminformatics

2008-09-10 Thread Andrew Dalke
Python training in Cheminformatics

Andrew Dalke is offering a course in Python programming
for cheminformatics in Leipzig, Germany on 6-7 October
and in the San Francisco Bay Area in early December. Early
registration for the Leipzig course ends 12 September.

For full details see http://dalkescientific.com/training/
or contact Andrew directly at [EMAIL PROTECTED] .

The courses are designed for working computational
chemists with some programming experience who want to
be more effective at the software aspect of the field.
The course is hands-on, with examples directly drawn
from common needs in cheminformatics research.

Some of the topics covered are:
  - an overview of the Python language
  - plotting with matplotlib
  - OpenEye's OEChem
  - parsing CSV, SMILES and SD files
  - substructure matching with SMARTS
  - generating and searching fingerprints
  - scripting PyMol
  - calling command-line programs like InChI
  - web scraping servers like PubChem
  - working with Excel

Andrew Dalke
[EMAIL PROTECTED]

--
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


[issue2271] msi installs to the incorrect location (C drive)

2008-08-18 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

Yes, that installed Python 2.6 into the correct location (C:\Python26 
instead of into the root directory).

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2271] msi installs to the incorrect location (C drive)

2008-08-15 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

I also have this problem.  (2.5 msi installer under Win2K with a non-
admin account granted admin privs).  Python installs just fine under 
C:\ (instead of C:\Python25) but then I run into problems installing 
the win32 extensions.

Searching the web I found this posting from 2005
  http://mail.python.org/pipermail/python-list/2005-
September/341874.html

That poster created an SF bug report which is now issue1298962.  He 
linked to http://tinyurl.com/82dt2 which states:

Windows Installler has no recognition of power
users, so these users fall into the category of 
non admin when running an install. 

That describes exactly my situation.  The solution is, apparently:

To mark certain properties as safe for configuration,
you can add them to the SecureCustomProperties list
in the property table of the MSI file. 

which Martin reported here.  Martin suggested using orca, but I have no 
idea of what that is (unix/mac dweeb that I am), and it doesn't exist 
on this machine.

I know this is pretty much a me too report.  I'm doing so to say that 
it has been an ongoing problem here at my client's site.  They are not 
software developers here, and rather than trying to track down the 
right person with full admin rights to come to each person's desktop, 
they've been installing an old pre-msi version of Python.

I would like to see this fixed before 2.6 is released.  All I can do to 
help though is to test an installer, which I will do gladly.

--
nosy: +dalke

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2271
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

I tested it with Python 2.5 on a Mac, Python 2.5 on FreeBSD, and Python 
2.6b2+ (from SVN as of this morning) on a Mac.

Perhaps the memory allocator on your machine is making a promise it can't 
keep?

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

You're right.  I mistook the string implementation for the list one 
which does keep a preallocated section in case of growth.  Strings of 
course don't grow so there's no need for that.

I tracked the memory allocation all the way down to 
obmalloc.c:PyObject_Realloc .  The call goes to realloc(p, nbytes) which 
is a C lib call.  It appears that the memory space is not reallocated.

That was enough to be able to find the python-dev thread Darwin's 
realloc(...) implementation never shrinks allocations from Jan. 2005, 
Bob Ippolito's post realloc.. doesn’t? 
(http://bob.pythonmac.org/archives/2005/01/01/realloc-doesnt/ ) and 
Issue1092502 .

Mind you, I also get the problem on FreeBSD 2.6 so it isn't Darwin 
specific.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-09 Thread Andrew Dalke

Andrew Dalke [EMAIL PROTECTED] added the comment:

FreeBSD is why my hosting provider uses.  Freebsd.org calls 2.6 legacy 
but the latest update was earlier this year.

There is shared history with Macs.  I don't know the details though.  I 
just point out that the problem isn't only on Darwin.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3531] file read preallocs 'size' bytes which can cause memory problems

2008-08-08 Thread Andrew Dalke

New submission from Andrew Dalke [EMAIL PROTECTED]:

I wrote a buggy PNG parser which ended up doing several file.read(large 
value).  It causes a MemoryError, which was strange because the file was 
only a few KB long.

I tracked it down to the implementation of read().  When given a size 
hint it preallocates the return string with that size.  If the hint is 
for 10MB then the string returned will be preallocated fro 10MB, even if 
the actual read is empty.

Here's a reproducible

BLOCKSIZE = 10*1024*1024

f=open(empty.txt, w)
f.close()

f=open(empty.txt)
data = []
for i in range(1):
s = f.read(BLOCKSIZE)
assert len(s) == 0
data.append(s)


I wasn't sure if this is properly a bug, but since the MemoryError 
exception I got was quite unexpected and required digging into the 
source code to figure out, I'll say that it is.

--
components: Interpreter Core
messages: 70924
nosy: dalke
severity: normal
status: open
title: file read preallocs 'size' bytes which can cause memory problems
type: resource usage

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3531
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2009] Grammar change to prevent shift/reduce problem with varargslist

2008-02-05 Thread Andrew Dalke

Andrew Dalke added the comment:

I've been working from the Grammar file from CVS for 2.6 ... I thought.  
For example, I see # except_clause: 'except' [test [('as' | ',') 
test]] which is a 2.6-ism.

svn log says it hasn't changed since 2007-05-19, when except/as was 
added.

What did I miss?

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2009
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2009] Grammar change to prevent shift/reduce problem with varargslist

2008-02-04 Thread Andrew Dalke

New submission from Andrew Dalke:

I wrote a translator from the CFG used in the Grammar file into a form for PLY. 
 I 
found one problem with

varargslist: ((fpdef ['=' test] ',')*
  ('*' NAME [',' '**' NAME] | '**' NAME) |
  fpdef ['=' test] (',' fpdef ['=' test])* [','])

This grammar definition is ambiguous until the presence/lack of a *.  PLY 
complains:

state 469

(28) varargslist - fpdef EQUAL test COMMA .
(32) varargslist_star - fpdef EQUAL test COMMA .
(35) varargslist_star3 - COMMA . fpdef
(36) varargslist_star3 - COMMA . fpdef EQUAL test
(39) fpdef - . NAME
(40) fpdef - . LPAR fplist RPAR

  ! shift/reduce conflict for NAME resolved as shift.
  ! shift/reduce conflict for LPAR resolved as shift.

RPARreduce using rule 28 (varargslist - fpdef EQUAL test COMMA 
.)
COLON   reduce using rule 28 (varargslist - fpdef EQUAL test COMMA 
.)
STARreduce using rule 32 (varargslist_star - fpdef EQUAL test 
COMMA .)
DOUBLESTAR  reduce using rule 32 (varargslist_star - fpdef EQUAL test 
COMMA .)
NAMEshift and go to state 165
LPARshift and go to state 163

  ! NAME[ reduce using rule 32 (varargslist_star - fpdef EQUAL 
test 
COMMA 
.) ]
  ! LPAR[ reduce using rule 32 (varargslist_star - fpdef EQUAL 
test 
COMMA 
.) ]

fpdef  shift and go to state 515



My fix was to use this definition when I did the translation.

varargslist: ((fpdef ['=' test] (',' fpdef ['=' test])* 
   (',' '*' NAME [',' '**' NAME] | ',' '**' NAME | [','])) |
  ('*' NAME [',' '**' NAME]) |
  ('**' NAME))


So far I've not found a functional difference between these two definitions, 
and 
the only change to ast.c is to update the comment based on this section.

By making this change it would be easier for the handful of people who write 
parsers for Python based on a yacc-like look-ahead(1) parser to use that file 
more 
directly.

--
components: None
messages: 62055
nosy: dalke
severity: minor
status: open
title: Grammar change to prevent shift/reduce problem with varargslist
type: rfe

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2009
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2011] compiler.parse(1; ) adds unexpected extra Discard(Const(None)) to parse tree

2008-02-04 Thread Andrew Dalke

New submission from Andrew Dalke:

Python 2.6a0 (trunk:60565M, Feb  4 2008, 01:21:28) 
[GCC 4.0.1 (Apple Computer, Inc. build 5367)] on darwin
Type help, copyright, credits or license for more information.
 from compiler import parse
 parse(1;)
Module(None, Stmt([Discard(Const(1)), Discard(Const(None))]))

I did not expect the Discard(Const(None)).  Instead, I expected

Module(None, Stmt([Discard(Const(1))]))

--
components: Library (Lib)
messages: 62057
nosy: dalke
severity: minor
status: open
title: compiler.parse(1;) adds unexpected extra Discard(Const(None)) to parse 
tree
type: behavior
versions: Python 2.6

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2011
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2011] compiler.parse(1; ) adds unexpected extra Discard(Const(None)) to parse tree

2008-02-04 Thread Andrew Dalke

Andrew Dalke added the comment:

This really is a minor point.  I don't track the 3K list and I see now that the 
compiler module won't be in Python 3k - good riddance - so feel free to discard 
this as well as the other open compiler module bugs.

I want to experiment with adding instrumentation for branch coverage.  To do 
that I 
want to get the character ranges of each term in the AST.  The Python compiler 
module doesn't keep track of that so I'm developing a new parser based on PLY.

I've developed it and I'm now cross-checking the generated ASTs to verify they 
are 
identical.  In this case the compiler module generates an extra node in the AST 
so 
I had to add backwards compatibility support.

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue2011
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1889] string literal documentation differs from implementation

2008-01-21 Thread Andrew Dalke

New submission from Andrew Dalke:

The reference manual documentation for raw string literals says

Note also that a single backslash followed by a newline is
interpreted as those two characters as part of the string, *not* as a line
continuation.

This is not the observed behavior.

 s = ABC\
... 123
 s
'ABC123'
 

Line continuations are ignored by triple quoted strings.



In addition, the reference manual documentation for \x escapes says

| ``\xhh``| Character with hex value *hh*   | (4,5) |

where footnote (4) stays

   Unlike in Standard C, at most two hex digits are accepted.

However, the implementation requires exactly two hex digits:

 \x41
'A'
 \x4.
ValueError: invalid \x escape
 \x4 
ValueError: invalid \x escape


--
components: Documentation
messages: 61484
nosy: dalke
severity: minor
status: open
title: string literal documentation differs from implementation
versions: Python 2.5

__
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1889
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1367711] Remove usage of UserDict from os.py

2008-01-13 Thread Andrew Dalke

Andrew Dalke added the comment:

Ahh, so the bug here that the environ dict should use neither UserDict nor 
dict, it should implement the core {get,set,del}item and keys and use 
DictMixin.

Martin mentioned that the patch doesn't support setdefault.  He didn't note 
though that the current code also doesn't support the dictionary interface 
consistently.  This shows a problem with popitem.

 import os
 os.environ[USER]
'dalke'
 os.environ[USER] = nobody
 os.system(echo $USER)
nobody
0
 del os.environ[USER]
 os.system(echo $USER)

0
 os.environ[USER] = dalke
 while os.environ: print os.environ.popitem()
... 
('GROUP', 'staff')
('XDG_DATA_HOME', '/Users/dalke/.local/share')
('TERM_PROGRAM_VERSION', '133')
('CVS_RSH', 'ssh')
('LOGNAME', 'dalke')
('USER', 'dalke')
... removed for conciseness ...
('QTDIR', '/usr/local/qt')
 os.system(echo $USER)
dalke
0
 

Not enough people know about DictMixin.

_
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1367711
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1367711] Remove usage of UserDict from os.py

2008-01-12 Thread Andrew Dalke

Andrew Dalke added the comment:

I was optimization tuning and wondered why UserDict was imported by os.  
Replacing 
UserDict with dict passes all existing regression tests.

I see the concerns that doing that replacement is not future proof.  Strange 
then 
that Cookie.py is acceptable.  There are three places in Lib which derive from 
dict, 
and two are in Cookie.py and in both cases it's broken because set_default does 
not 
go through the same checks that __setitem__ goes through.

(The other place is an internal class in _strptime.)

In looking over existing third-party code, I see this nuance of when to use 
UserDict 
vs. dict isn't that well known.  The documentation says The need for this 
class has 
been largely supplanted by the ability to subclass directly from dict, but 
that 
isn't true if anyone is worried about future-proofing and where the subclass 
changes 
one of the standard methods.

--
nosy: +dalke

_
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1367711
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1367711] Remove usage of UserDict from os.py

2008-01-12 Thread Andrew Dalke

Andrew Dalke added the comment:

I should have added my preference.  I would like to see UserDict replaced with 
dict.  I didn't like seeing the extra import when I was doing my performance 
testing, through truthfully it's not a bit overhead.

As for future-proofing, of course when there's a change in a base class then 
there can be problems with derived classes.  When that happens, change all of 
the affected classes in the code base, and make sure to publish the change so 
third parties know about it.

Yes, there's a subtlety here that most people don't know about.  But it's not 
going to go away.

As for the evil that is 'exec':

  exec locals().data['MACHTYPE']=1; print MACHTYPE in {}, os.environ

gives me another way to mess things up.

A point of unit tests is to allow changes like this without worry about code 
breakage.  And it's not like other non-buggy code wasn't updated over time to 
reflect changing style and best practices.

If it's not compatible with Jython or IronPython or PyPy then ignore what I 
said, but fix Cookie and update the docs to make that clear as people do think 
that it's better to derived from dict for things like this than to derive from 
UserDict or UserDictMixin.

I can give a lightning talk about this at PyCon.  :)

_
Tracker [EMAIL PROTECTED]
http://bugs.python.org/issue1367711
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



ANN: LOLPython 1.0

2007-06-04 Thread Andrew Dalke
Following along with the current lolcat fad, and taking inspiration
from lolcode,
I've implemented LOLPython.  For details and downloads see

  
http://www.dalkescientific.com/writings/diary/archive/2007/06/01/lolpython.html

Here's an example implementation of a Fibonacci number generator

SO IM LIKE FIBBING WIT N OK?
LOL ITERATE FIBONACCI TERMS LESS THAN N /LOL
SO GOOD N BIG LIKE EASTERBUNNY
BTW, FIBONACCI LIKE BUNNIES! LOL
U BORROW CHEEZBURGER
U BORROW CHEEZBURGER
I CAN HAZ CHEEZBURGER
HE CAN HAZ CHEEZBURGER
WHILE I CUTE?
I AND HE CAN HAZ HE AND I ALONG WITH HE
IZ HE BIG LIKE N?
KTHXBYE
U BORROW HE

The lolpython.py runtime converts LOLPython to Python.

def FIBBING ( N ) :
'ITERATE FIBONACCI TERMS LESS THAN N'
assert N = 0
# BTW, FIBONACCI LIKE BUNNIES! LOL
yield 1
yield 1
I = 1
HE = 1
while 1:
I , HE = HE , I + HE
if HE = N :
break
yield HE

and by default exec's the translated code.

You might also be interested looking at the code because I
use PLY for tokenization and translate the token stream into
Python code which is then exec'ed.  The neatest part was
making the exec'ed code act like it was in __main__ using

  module_name = __main__
  python_s = to_python(lolpython_s)
  m = types.ModuleType(module_name)
  sys.modules[module_name] = m
  exec python_s in m.__dict__

which is a trick others might use when implementing
interesting import hooks.

LOLPython, at
  
http://www.dalkescientific.com/writings/diary/archive/2007/06/01/lolpython.html

Please note that LOLPython does not implement the lolcode standard
language.  While I was influenced by some of the language I wanted
something which was semantically equivalent to Python, including
support for classes, exceptions and the yield statement.

For an implementation of lolcode in Python (and also using PLY)
see sjlol at:
  http://lolcode.com/implementations/sjlol

and a full list of implementations at
  http://lolcode.com/implementations/implementations
including IDE support in Visual Studio.

Andrew Dalke
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-announce-list

Support the Python Software Foundation:
http://www.python.org/psf/donations.html


Update the sorting mini-howto

2005-11-29 Thread Andrew Dalke
Years ago I wrote the Sorting mini-howto, currently at

   http://www.amk.ca/python/howto/sorting/sorting.html

I've had various people thank me for that, in person and
through email.

It's rather out of date now given the decorate-sort-undecorate
option and 'sorted' functions in Python 2.4.  Hmmm, and perhaps
also some mention of rich comparisons.

I don't particularly want to update it myself so I'm tossing it
to the winds.  Anyone here want to take care of it?  I'll
provide feedback if you want it.

Email me if you're interested.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Update the sorting mini-howto

2005-11-29 Thread Andrew Dalke
I wrote:
 Years ago I wrote the Sorting mini-howto, currently at

   http://www.amk.ca/python/howto/sorting/sorting.html

Thanks to amk it's now on the Wiki at
   http://wiki.python.org/moin/HowTo/Sorting

so feel free to update it directly.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-31 Thread Andrew Dalke
Peter Hansen wrote:
 A scattered assortment of module-level global function names, and 
 builtins such as open(), make it extraordinarily difficult to do 
 effective and efficient automated testing with mock objects.

I have been able to do this by inserting my own module-scope function
that intercepts the lookup before it gets to builtins.  A problem
though is that a future (Python 3K?) Python may not allow that.

For example,

module.open = mock_open
try:
  ...
finally:
  module.open = open

By looking at the call stack it is possible to replace the built-in
open to have new behavior only when called from specific modules or
functions, but that gets to be rather hairy.

 Object-oriented solutions like Path make it near trivial to substitute a 
 mock or other specialized object which (duck typing) acts like a Path 
 except perhaps for actually writing the file to disk, or whatever other 
 difference you like.

By analogy to the other builtins, another solution is to have a
protocol by which open() dispatches to an instance defined method.

 So, for the PEP, another justification for Path is that its use can 
 encourage better use of automated testing techniques and thereby improve 
 the quality of Python software, including in the standard library.

But then what does the constructor for the file object take?

I've also heard mention that a future (Py3K era) 'open' may allow
URLs and not just a path string.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Christopher Subich wrote:
 My  naive solution:
  ...
for i in ilist:
   try:
  g = i.next()
  count += 1
   except StopIteration: # End of iter
  g = None
  ...

What I didn't like about this was the extra overhead of all
the StopIteration exceptions.  Eg, 

zipfill(a, range(1000))

will raise 1000 exceptions (999 for a and 1 for the end of the range).

But without doing timing tests I'm not sure which approach is
fastest, and it may depend on the data set.

Since this is code best not widely used, I don't think it's something
anyone should look into either.  :)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: os._exit vs. sys.exit

2005-07-29 Thread Andrew Dalke
Bryan wrote:
 Why does os._exit called from a Python Timer kill the whole process while 
 sys.exit does not?  On Suse.

os._exit calls the C function _exit() which does an immediate program
termination.  See for example
  
http://developer.apple.com/documentation/Darwin/Reference/ManPages/man2/_exit.2.html
and note the statement can never return.

sys.exit() is identical to raise SystemExit().  It raises a Python
exception which may be caught at a higher level in the program stack.

Andrew
[EMAIL PROTECTED]


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Peter Otten wrote:
 Combining your clever and your elegant approach to something fast
 (though I'm not entirely confident it's correct):
 
 def fillzip(*seqs):
 def done_iter(done=[len(seqs)]):
 done[0] -= 1
 if not done[0]:
 return
 while 1:
 yield None
 seqs = [chain(seq, done_iter()) for seq in seqs]
 return izip(*seqs)

Ohh, that's pretty neat passing in 'done' via a mutable default argument.

It took me a bit to even realize why it does work.  :)

Could make it one line shorter with

from itertools import chain, izip, repeat
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
if not done[0]:
return []
return repeat(None)
seqs = [chain(seq, done_iter()) for seq in seqs]
return izip(*seqs)

Go too far on that path and the code starts looking likg

from itertools import chain, izip, repeat
forever, table = repeat(None), {0: []}.get
def fillzip(*seqs):
def done_iter(done=[len(seqs)]):
done[0] -= 1
return table(done[0], forever)
return izip(*[chain(seq, done_iter()) for seq in seqs])

Now add the performance tweak

  def done_iter(done=[len(seqs)], forever=forever, table=table)

Okay, I'm over it.  :)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Me:
 Could make it one line shorter with
  
 from itertools import chain, izip, repeat
 def fillzip(*seqs):
 def done_iter(done=[len(seqs)]):
 done[0] -= 1
 if not done[0]:
 return []
 return repeat(None)
 seqs = [chain(seq, done_iter()) for seq in seqs]
 return izip(*seqs)

Peter Otten:
 that won't work because done_iter() is now no longer a generator.
 In effect you just say
 
 seqs = [chain(seq, repeat(None)) for seq in seqs[:-1]] + [chain(seq[-1],
 [])]

It does work - I tested it.  The trick is that izip takes iter()
of the terms passed into it.  iter([]) - an empty iterator and
iter(repeat(None)) - the repeat(None) itself.

'Course then the name should be changed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Scott David Daniels wrote:
 Can I play too? How about:

Sweet!


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-29 Thread Andrew Dalke
Peter Otten wrote:
 Seems my description didn't convince you. So here's an example:

Got it.  In my test case the longest element happened to be the last
one, which is why it didn't catch the problem.

Thanks.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-28 Thread Andrew Dalke
Steven Bethard wrote:
 Here's one possible solution:
 
 py import itertools as it
 py def zipfill(*lists):
 ...   max_len = max(len(lst) for lst in lists)

A limitation to this is the need to iterate over the
lists twice, which might not be possible if one of them
is a file iterator.

Here's a clever, though not (in my opinion) elegant solution

import itertools

def zipfill(*seqs):
count = [len(seqs)]
def _forever(seq):
for item in seq: yield item
count[0] -= 1
while 1: yield None
seqs = [_forever(seq) for seq in seqs]
while 1:
x = [seq.next() for seq in seqs]
if count == [0]:
break
yield x

for x in zipfill(This, is, only, a, test.):
print x

This generates

['T', 'i', 'o', 'a', 't']
['h', 's', 'n', None, 'e']
['i', None, 'l', None, 's']
['s', None, 'y', None, 't']
[None, None, None, None, '.']

This seems a bit more elegant, though the replace dictionary is
still a bit of a hack

from itertools import repeat, chain, izip

sentinel = object()
end_of_stream = repeat(sentinel)

def zipfill(*seqs):
replace = {sentinel: None}.get
seqs = [chain(seq, end_of_stream) for seq in seqs]
for term in izip(*seqs):
for element in term:
if element is not sentinel:
break
else:
# All sentinels
break

yield [replace(element, element) for element in term]


(I originally had a element == tuple([sentinel]*len(seqs)) check
but didn't like all the == tests incurred.)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: can list comprehensions replace map?

2005-07-27 Thread Andrew Dalke
David Isaac wrote:
 I have been generally open to the proposal that list comprehensions
 should replace 'map', but I ran into a need for something like
 map(None,x,y)
 when len(x)len(y).  I cannot it seems use 'zip' because I'll lose
 info from x.  How do I do this as a list comprehension? (Or,
 more generally, what is the best way to do this without 'map'?)

If you know that len(x)=len(y) and you want the same behavior as
map() you can use itertools to synthesize a longer iterator


 x = [1,2,3,4,5,6]
 y = Hi!
 from itertools import repeat, chain
 zip(x, chain(y, repeat(None)))   
[(1, 'H'), (2, 'i'), (3, '!'), (4, None), (5, None), (6, None)]
 

This doesn't work if you want the result to be max(len(x), len(y))
in length - the result has length len(x).

As others suggested, if you want to use map, go ahead.  It won't
disappear for a long time and even if it does it's easy to
retrofit if needed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to write a line in a text file

2005-07-26 Thread Andrew Dalke
 [EMAIL PROTECTED] wrote:
 Well, it's what (R)DBMS are for, but plain files are not.

Steven D'Aprano wrote:
 This isn't 1970, users expect more from professional 
 programs than keep your fingers crossed that nothing 
 bad will happen. That's why applications have multiple 
 levels of undo (and some of them even save the undo 
 history in the file) and change-tracking, and auto-save 
 and auto-backup. 

This isn't 1970.  Why does your app code work directly with
files?  Use a in-process database library (ZODB, SQLLite,
BerkeleyDB, etc.) to maintain your system state and let the
library handle transactions for you.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [path-PEP] Path inherits from basestring again

2005-07-25 Thread Andrew Dalke
 Reinhold Birkenfeld wrote:
 Current change:
 
 * Add base() method for converting to str/unicode.

Now that [:] slicing works, and returns a string,
another way to convert from path.Path to str/unicode
is path[:]

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: [path-PEP] Path inherits from basestring again

2005-07-24 Thread Andrew Dalke
Reinhold Birkenfeld wrote:
 Okay. While a path has its clear use cases and those don't need above methods,
 it may be that some brain-dead functions needs them.

brain-dead?

Consider this code, which I think is not atypical.

import sys

def _read_file(filename):
  if filename == -:
# Can use '-' to mean stdin
return sys.stdin
  else:
return open(filename, rU)


def file_sum(filename):
  total = 0
  for line in _read_file(filename):
total += int(line)
  return total

(Actually, I would probably write it

def _read_file(file):
  if isinstance(file, basestring):
if filename == -:
  # Can use '-' to mean stdin
  return sys.stdin
else:
  return open(filename, rU)
  return file

)

Because the current sandbox Path doesn't support
the is-equal test with strings, the above function
won't work with a filename = path.Path(-).  It
will instead raise an exception saying
  IOError: [Errno 2] No such file or directory: '-'

(Yes, the code as-is can't handle a file named '-'.
The usual workaround (and there are many programs
which support '-' as an alias for stdin) is to use ./-

% cat  './-'
This is a file
% cat ./-
This is a file
% cat -
I'm typing directly into stdin.
^D
I'm typing directly into stdin.
% 
)


If I start using the path.Path then in order to use
this function my upstream code must be careful on
input to distinguish between filenames which are
really filenames and which are special-cased pseudo
filenames.

Often the code using the API doesn't even know which
names are special.  Even if it is documented,
the library developer may decide in the future to
extend the list of pseudo filenames to include, say,
environment variable style expansion, as
  $HOME/.config

Perhaps the library developer should have come up
with a new naming system to include both types of
file naming schemes, but that's rather overkill.

As a programmer calling the API should I convert
all my path.Path objects to strings before using it?
Or to Unicode?  How do I know which filenames will
be treated specially through time?

Is there a method to turn a path.Path into the actual
string?  str() and unicode() don't work because I
want the result to be unicode if the OSPython build
support it, otherwise string.

Is that library example I mentioned brain-dead?
I don't think so.  Instead I think you are pushing
too much for purity and making changes that will
cause problems - and hard to fix problems - with
existing libraries.



Here's an example of code from an existing library
which will break in several ways if it's passed a
path object instead of a string.  It comes from
spambayes/mboxutils.py

#

This is mostly a wrapper around the various useful classes in the
standard mailbox module, to do some intelligent guessing of the
mailbox type given a mailbox argument.

+foo  -- MH mailbox +foo
+foo,bar  -- MH mailboxes +foo and +bar concatenated
+ALL  -- a shortcut for *all* MH mailboxes
/foo/bar  -- (existing file) a Unix-style mailbox
/foo/bar/ -- (existing directory) a directory full of .txt and .lorien
 files
/foo/bar/ -- (existing directory with a cur/ subdirectory)
 Maildir mailbox
/foo/Mail/bar/ -- (existing directory with /Mail/ in its path)
 alternative way of spelling an MH mailbox

  

def getmbox(name):
Return an mbox iterator given a file/directory/folder name.

if name == -:
return [get_message(sys.stdin)]

if name.startswith(+):
# MH folder name: +folder, +f1,f2,f2, or +ALL
name = name[1:]
import mhlib
mh = mhlib.MH()
if name == ALL:
names = mh.listfolders()
elif ',' in name:
names = name.split(',')
else:
names = [name]
mboxes = []
mhpath = mh.getpath()
for name in names:
filename = os.path.join(mhpath, name)
mbox = mailbox.MHMailbox(filename, get_message)
mboxes.append(mbox)
if len(mboxes) == 1:
return iter(mboxes[0])
else:
return _cat(mboxes)

if os.path.isdir(name):
# XXX Bogus: use a Maildir if /cur is a subdirectory, else a MHMailbox
# if the pathname contains /Mail/, else a DirOfTxtFileMailbox.
if os.path.exists(os.path.join(name, 'cur')):
mbox = mailbox.Maildir(name, get_message)
elif name.find(/Mail/) = 0:
mbox = mailbox.MHMailbox(name, get_message)
else:
mbox = DirOfTxtFileMailbox(name, get_message)
else:
fp = open(name, rb)
mbox = mailbox.PortableUnixMailbox(fp, get_message)
return iter(mbox)



It breaks with the current sandbox path because:
  - a path can't be compared to -
  - range isn't supported, as name = name[1:]

note that this example uses __contains__ (, in name)


Is this function brain-dead?  Is it reasonable that people might
want to pass a path.Path() directly to it?  If not, what's
the way 

Re: PEP on path module for standard library

2005-07-23 Thread Andrew Dalke
George Sakkis wrote:
 That's why phone numbers would be a subset of integers, i.e. not every
 integer would correspond to a valid number, but with the exception of
 numbers starting with zeros, all valid numbers would be an integers.

But it's that exception which violates the LSP.

With numbers, if x==y then (x,y) = (y,x) makes no difference.
If phone numbers are integers then 001... == 01... but swapping
those two numbers makes a difference.  Hence they cannot be modeled
as integers.

 Regardless, this was not my point; the point was that adding
 two phone numbers or subtracting them never makes sense semantically.

I agree. But modeling them as integers doesn't make sense either.
Your example of adding phone numbers depends on them being represented
as integers.  Since that representation doesn't work, it makes sense
that addition of phone number is suspect.

 There are (at least) two frequently used path string representations,
 the absolute and the relative to the working directory. Which one *is*
 the path ? Depending on the application, one of them woud be more
 natural choice than the other.

Both.  I don't know why one is more natural than the other.

 I trust my intuition on this, I just don't know how to justify it, or
 correct it if I'm wrong.
 
 My intuition also happens to support subclassing string, but for
 practical reasons rather than conceptual.

As you may have read elsewhere in this thread, I give some examples
of why subclassing from string fits best with existing code.

Even if there was no code base, I think deriving from string is the
right approach.  I have a hard time figuring out why though.  I think
if the lowest level Python/C interface used a get the filename
interface then perhaps it wouldn't make a difference.  Which means
I'm also more guided by practical reasons than conceptual.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unit test nested functions

2005-07-23 Thread Andrew Dalke
Andy wrote:
 How can you unit test nested functions?

I can't think of a good way.  When I write a nested function it's because
the function uses variables from the scope of the function in which it's
embedded, which means it makes little sense to test it independent of the
larger function.

My tests in that case are only of the enclosing function.

 Or do you have to pull them out to
 unit test them, which basically means I will never use nested functions.

You don't test every line in a function by itself, right?  Nor
every loop in a function.  It should be possible to test the outer
function enough that the implementation detail - of using an inner
function - doesn't make much difference.

 Also, same thing with private member functions protected by __.  Seems
 like there is a conflict there between using these features and unit
 testing.

In that case the spec defined that the real function name is of
the form _CLASSNAME__METHODNAME.  For example

 class Spam:
...   def __sing(self):
... print I don't see any Vikings.
... 
 spam = Spam()
 spam._Spam__sing()
I don't see any Vikings.
 

I've found though that the double-leading-underscore is overkill.
Using a single underscore is enough of a hint that the given
method shouldn't be called directly.

Then again, I don't write enough deep hierarchies where I need
to worry about a subclass implementation using the same private
name as a superclass.
Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
Michael Hoffman wrote:
 Having path descend from str/unicode is extremely useful since I can 
 then pass a path object to any function someone else wrote without 
 having to worry about whether they were checking for basestring. I think 
 there is a widely used pattern of accepting either a basestring[1] or a 
 file-like object as a function argument, and using isinstance() to 
 figure out which it is.

Reinhold Birkenfeld wrote:
 Where do you see that pattern? IIRC it's not in the stdlib.

Here's the first place that comes to mind for me

xml.sax.saxutils

def prepare_input_source(source, base = ):
This function takes an InputSource and an optional base URL and
returns a fully resolved InputSource object ready for reading.

if type(source) in _StringTypes:
source = xmlreader.InputSource(source)
elif hasattr(source, read):
f = source
source = xmlreader.InputSource()
source.setByteStream(f)
if hasattr(f, name):
source.setSystemId(f.name)


and xml.dom.pulldom

def parse(stream_or_string, parser=None, bufsize=None):
if bufsize is None:
bufsize = default_bufsize
if type(stream_or_string) in _StringTypes:
stream = open(stream_or_string)
else:
stream = stream_or_string
if not parser:
parser = xml.sax.make_parser()
return DOMEventStream(stream, parser, bufsize)

Using the power of grep

aifc.py
def __init__(self, f):
if type(f) == type(''):
f = __builtin__.open(f, 'rb')
# else, assume it is an open file object already
self.initfp(f)

binhex.py
class HexBin:
def __init__(self, ifp):
if type(ifp) == type(''):
ifp = open(ifp)

imghdr.py
if type(file) == type(''):
f = open(file, 'rb')
h = f.read(32)
else:
location = file.tell()
h = file.read(32)
file.seek(location)
f = None

mimify.py
if type(infile) == type(''):
ifile = open(infile)
if type(outfile) == type('') and infile == outfile:
import os
d, f = os.path.split(infile)
os.rename(infile, os.path.join(d, ',' + f))
else:
ifile = infile

wave.py
def __init__(self, f):
self._i_opened_the_file = None
if type(f) == type(''):
f = __builtin__.open(f, 'rb')
self._i_opened_the_file = f
# else, assume it is an open file object already
self.initfp(f)


compiler/transformer.py:

if type(file) == type(''):
file = open(file)
return self.parsesuite(file.read())

plat-mac/applesingle.py
if type(input) == type(''):
input = open(input, 'rb')
# Should we also test for FSSpecs or FSRefs?
header = input.read(AS_HEADER_LENGTH)

site-packages/ZODB/ExportImport.py
if file is None: file=TemporaryFile()
elif type(file) is StringType: file=open(file,'w+b')


site-packages/numarray/ndarray.py
if type(file) == type():
name = 1
file = open(file, 'wb')


site-packages/kiva/imaging/GdImageFile.py
if type(fp) == type():
import __builtin__
filename = fp
fp = __builtin__.open(fp, rb)
else:
filename = 

site-packages/reportlab/graphics/renderPM.py
   if type(image.path) is type(''):
im = _getImage().open(image.path).convert('RGB')
   else:
im = image.path.convert('RGB')


site-packages/twisted/protocols/irc.py
def __init__(self, file):
if type(file) is types.StringType:
self.file = open(file, 'r')

(hmm, that last one looks buggy.  It should
have a else: self.file = file afterwards.)


Used in the std. lib and used by many different
people.  (I excluded the Biopython libraries
in this list, btw, because I may have influenced
the use of this sort of type check.)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
Duncan Booth wrote:
 Personally I think the concept of a specific path type is a good one, but 
 subclassing string just cries out to me as the wrong thing to do.

I disagree.  I've tried using a class which wasn't derived from
a basestring and kept running into places where it didn't work well.
For example, open and mkdir take strings as input.  There is no
automatic coercion.

 class Spam:
...   def __getattr__(self, name):
... print Want, repr(name)
... raise AttributeError, name
... 
 open(Spam())
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
 import os
 os.mkdir(Spam())
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: coercing to Unicode: need string or buffer, instance found
 

The solutions to this are:
  1) make the path object be derived from str or unicode.  Doing
this does not conflict with any OO design practice (eg, Liskov
substitution).

  2) develop a new I represent a filename protocol, probably done
via adapt().

I've considered the second of these but I think it's a more
complicated solution and it won't fit well with existing APIs
which do things like


  if isinstance(input, basestring):
input = open(input, rU)
  for line in input:
print line

I showed several places in the stdlib and in 3rd party packages
where this is used.


 In other words, to me a path represents something in a filesystem,

Being picky - or something that could be in a filesystem.

 the fact that it 
 has one, or indeed several string representations does not mean that the 
 path itself is simply a more specific type of string.

I didn't follow this.

 You should need an explicit call to convert a path to a string and that 
 forces you when passing the path to something that requires a string to 
 think whether you wanted the string relative, absolute, UNC, uri etc.

You are broadening the definition of a file path to include URIs?
That's making life more complicated.  Eg, the rules for joining
file paths may be different than the rules for joining URIs.
Consider if I have a file named mail:[EMAIL PROTECTED] and I
join that with file://home/dalke/badfiles/.

Additionally, the actions done on URIs are different than on file
paths.  What should os.listdir(http://www.python.org/;) do?

As I mentioned, I tried some classes which emulated file
paths.  One was something like

class TempDir:
  removes the directory when the refcount goes to 0
  def __init__(self):
self.filename = ... use a function from the tempfile module
  def __del__(self):
if os.path.exists(self.filename):
  shutil.rmtree(self.filename)
  def __str__(self):
return self.filename

I could do

  dirname = TempDir()

but then instead of

  os.mkdir(dirname)
  tmpfile = os.path.join(dirname, blah.txt)

I needed to write it as

  os.mkdir(str(dirname))
  tmpfile = os.path.join(str(dirname), blah.txt))

or have two variables, one which could delete the
directory and the other for the name.  I didn't think
that was good design.


If I had derived from str/unicode then things would
have been cleaner.

Please note, btw, that some filesystems are unicode
based and others are not.  As I recall, one nice thing
about the path module is that it chooses the appropriate
base class at import time.  My str() example above
does not and would fail on a Unicode filesystem aware
Python build.

 It may even be that we need a hierarchy of path
 classes: URLs need similar but not identical manipulations
 to file paths, so if we want to address the failings
 of os.path perhaps we should also look at the failings 
 of urlparse at the same time.

I've found that hierarchies are rarely useful compared
to the number of times they are proposed and used.  One
of the joys to me of Python is its deemphasis of class
hierarchies.

I think the same is true here.  File paths and URIs are
sufficiently different that there are only a few bits
of commonality between them.  Consider 'split' which
for files creates (dirname, filename) while for urls
it creates (scheme, netloc, path, query, fragment)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
George Sakkis wrote:
 You're right, conceptually a path
 HAS_A string description, not IS_A string, so from a pure OO point of
 view, it should not inherit string.

How did you decide it's has-a vs. is-a?

All C calls use a char * for filenames and paths,
meaning the C model file for the filesystem says
paths are strings.

Paths as strings fit the Liskov substitution principle
in that any path object can be used any time a
string is used (eg, loading from  + filename)

Good information hiding suggests that a better API
is one that requires less knowledge.  I haven't
seen an example of how deriving from (unicode)
string makes things more complicated than not doing so.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Difference between and '

2005-07-22 Thread Andrew Dalke
François Pinard wrote:
 There is no strong reason to use one and avoid the other.  Yet, while
 representing strings, Python itself has a _preference_ for single
 quotes. 

I use double quoted strings in almost all cases because I
think it's easier to see than 'single quoted quotes'.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Iterators from urllib2

2005-07-22 Thread Andrew Dalke
Joshua Ginsberg wrote:

   dir(ifs)
 ['__doc__', '__init__', '__iter__', '__module__', '__repr__', 'close',  
 'fileno', 'fp', 'geturl', 'headers', 'info', 'next', 'read',  
 'readline', 'readlines', 'url']
 
 Yep. But what about in my code? I modify my code to print dir(ifs)  
 before creating the DictReader...
 
 ['__doc__', '__init__', '__module__', '__repr__', 'close', 'fp',  
 'geturl', 'headers', 'info', 'read', 'readline', 'url']
 ...
 Whoa! Where did the __iter__, readlines, and next attributes
 go? Ideas?

That difference comes from this code in urllib.py:addbase

class addbase:
Base class for addinfo and addclosehook.

def __init__(self, fp):
self.fp = fp
self.read = self.fp.read
self.readline = self.fp.readline
if hasattr(self.fp, readlines): self.readlines = self.fp.readlines
if hasattr(self.fp, fileno): self.fileno = self.fp.fileno
if hasattr(self.fp, __iter__):
self.__iter__ = self.fp.__iter__
if hasattr(self.fp, next):
self.next = self.fp.next

It looks like the fp for your latter code
doesn't have the additional properties.  Try
adding the following debug code to figure out
what's up

print dir(ifs)
print fp=, ifs.fp
print dir(fp), dir(ifs.fp)

Odds are you'll get different results.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Something that Perl can do that Python can't?

2005-07-22 Thread Andrew Dalke
Dr. Who wrote:
 Well, I finally managed to solve it myself by looking at some code.
 The solution in Python is a little non-intuitive but this is how to get
 it:
 
 while 1:
 line = stdout.readline()
 if not line:
 break
 print 'LINE:', line,
 
 If anyone can do it the more Pythonic way with some sort of iteration
 over stdout, please let me know.

Python supports two different but related iterators over
lines of a file.  What you show here is the oldest way.
It reads up to the newline (or eof) and returns the line.

The newer way is

  for line in stdout:
...

which is equivalent to

  _iter = iter(stdout)
  while 1:
try:
  line = _iter.next()
except StopIteration:
  break

...

The file.__iter__() is implemented by doing
a block read and internally breaking the block
into lines.  This make the read a lot faster
because it does a single system call for the
block instead of a system call for every
character read.  The downside is that the read
can block (err, a different use of block)
waiting for enough data.

If you want to use the for idiom and have
the guaranteed no more than a line at a time
semantics, try this 

  for line in iter(stdout.readline, ):
print LINE:, line
sys.stdout.flush()

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: PEP on path module for standard library

2005-07-22 Thread Andrew Dalke
George Sakkis wrote:
 Bringing up how C models files (or anything else other than primitive types
 for that matter) is not a particularly strong argument in a discussion on
 OO design ;-)

While I have worked with C libraries which had a well-developed
OO-like interface, I take your point.

Still, I think that the C model of a file system should be a
good fit since after all C and Unix were developed hand-in-hand.  If
there wasn't a good match then some of the C path APIs should be
confusing or complicated.  Since I don't see that it suggests that
the path is-a string is at least reasonable.

 Liskov substitution principle imposes a rather weak constraint

Agreed.  I used that as an example of the direction I wanted to
go.  What principles guide your intuition of what is a is-a
vs a has-a?

 Take for example the case where a PhoneNumber class is subclass
 of int. According to LSP, it is perfectly ok to add phone numbers
 together, subtract them, etc, but the result, even if it's a valid
 phone number, just doesn't make sense.

Mmm, I don't think an integer is a good model of a phone number.
For example, in the US
  00148762040828
will ring a mobile number in Sweden while
  148762040828
will give a this isn't a valid phone number message.

Yet both have the same base-10 representation.  (I'm not using
a syntax where leading '0' indicates an octal number. :)

 I wouldn't say more complicated, but perhaps less intuitive in a few cases, 
 e.g.:
 
 path(r'C:\Documents and Settings\Guest\Local Settings').split()
 ['C:\\Documents', 'and', 'Settings\\Guest\\Local', 'Settings']
 instead of
 ['C:', 'Documents and Settings', 'Guest', 'Local Settings']

That is why the path module using a different method to split
on pathsep vs. whitespace.  I get what you are saying, I just think
it's roughly equivalent to appealing to LSP in terms of weight.

Mmm, then there's a question of the usefulness of .lower() and
.expandtabs() and similar methods.  Hmmm

 I just noted that conceptually a path is a composite object consisting of
 many properties (dirname, extension, etc.) and its string representation
 is just one of them. Still, I'm not suggesting that a 'pure' solution is
 better that a more practical that covers most usual cases.

For some reason I think that

  path.dirname()

is better than

  path.dirname

Python has properties now so the implementation of the latter is
trivial - put a @property on the line before the def dirname(self):.

I think that the string representation of a path is so important that
it *is* the path.  The other things you call properties aren't quite
properties in my model of a path and are more like computable values.

I trust my intuition on this, I just don't know how to justify it, or
correct it if I'm wrong.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-15 Thread Andrew Dalke
Terry Hancock wrote:
 Of course, since children are vastly better at learning
 than adults, perhaps adults are stupid to do this. ;-)

Take learning a language.  I'm learning Swedish.  I'll
never have a native accent and 6 year olds know more
of the language than I do.  But I make much more
complicated sentences than 6 year olds.  (Doesn't mean
they are grammatically correct, but I can get my point
across given a lot of time.)

 Quantum mechanics notwithstanding, I'm not sure there
 is a bottom most-reduced level of understanding. It's
 certainly not clear that it is relevant to programming.

I agree.  That's why I make this thread branch.  I think
learning is often best taught from extending what you know
and not from some sort of top/bottom approach. I'm also
one who bristles at hierarchies.  Maybe that's why I like
Python and duck typing. :)

Some learning works by throwing yourself in the deep end.
Languages are sometimes learned that way.  The Suzuki method
extends that to music, though that's meant for kids.

 Python is actually remarkably good at solving things in a
 nearly optimal way.

Have you read Richard Gabriel's Worse is Better essay?
 http://www.dreamsongs.com/WIB.html
Section 2.2.4 Totally Inappropriate Data Structures
relates how knowing the data structure for Lisp affects
the performance and seems relevant to your point.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: also to balance else ?

2005-06-14 Thread Andrew Dalke
Ron Adam wrote:
 True, but I think this is considerably less clear.  The current for-else 
 is IMHO is reversed to how the else is used in an if statement.

As someone else pointed out, that problem could be resolved in
some Python variant by using a different name, like at end.
Too late for anything before P3K.

 I'm asking if changing the current 'else' in a for statement to 'also'
 would make it's current behavior clearer.  It's been stated before here
 that current behavior is confusing.

It's been stated is the passive tense.  You are one, and I
saw a couple others.  But it isn't the same as many people say
that the current behavior is confusing.  If memory serves, I
don't even recall an FAQ on this, while there is a FAQ regarding
the case statement.

 You are correct that the 'else' behavior could be nested in the if:break
 statement.  I think the logical non-nested grouping of code in the
 for-also-else form is easier to read.  The block in the if statement
 before the break isn't part of the loop, IMO,  being able to move it to
 after the loop makes it clear it evaluates after the loop is done.

There is a tension with code coherency.  In my version the code
that occurs a result of the condition is only in one place while
in yours its in two spots.

If all (1) break statements in the loop have the same post-branch
code then it might make some sense.  But as I said, I don't think
it occurs all that often.

Given the Python maxim of
  There should be one-- and preferably only one --obvious way to do it.

which of these is the preferred and obvious way?

while f():
  print Hello!
  if g():
break
else:
  print this is a test
also:
  print this is not a pipe

 -or-

while f():
  print Hello!
  if g():
print this is a test
break
else:
  print this is not a pipe


I prefer the second over the first.

Which of these is preferred?

while f():
  print Hello
  if g():
a = 10
print world, a
break
  if h():
a = 12
print world,a
break

  -or-

while f():
  print Hello
  if g():
a = 10
break
  if h():
a = 12
break
else:  # your else, not std. python's
  print world, a

The latter is fragile, in some sense.  Suppose I added

  if hg():
a = 14
print there
break

Then I have to change all of the existing code to put the
else: block back into the loop.

That for me makes it a big no.

 That is ... funky.  When is it useful?
 
 Any time you've writen code that repeats a section of code at the end of
 all the if/elif statements or sets a variable to check so you can
 conditionally execute a block of code after the if for the same purpose.

Let me clarify.  When is it useful in real code?  Most cases
I can think of have corner cases which treat some paths different
than others.


 My thinking is that this would be the type of thing that would be used
 to argue against more specialized suggestions.  ie...   No a fill in
 new suggested keyword here isn't needed because the also-else form
 already does that.  ;-)

An argument for 'X' because it prevents people from asking for
some theoretical 'Y' isn't that strong.  Otherwise Python would
have had a goto years ago.

 An example of this might be the case statement suggestions which have
 some support and even a PEP.  The if-alif-also-else works near enough to
 a case statement to fulfill that need.  'alif' (also-if) could  be
 spelled 'case' and maybe that would be clearer as many people are
 already familiar with case statements from other languages.

Assuming you are talking about PEP 275 (Switching on Multiple
Values), how does this fulfill that need any better than the
existing if/elif/else chain?

 Vetoing a suggestion on grounds of it can be done in another way, is
 also not sufficient either as by that reasoning we would still be using
 assembly language.  So the question I'm asking here is can an inverse to
   the 'else' be useful enough to be considered?

I disagree.  Given the one -- and preferably only one -- obvious
way to do it there is already a strong bias against language
features which exist only to do something another way but not
a notably better way.

 I'll try to find some use case examples tomorrow, it shouldn't be too
 hard.  It probably isn't the type of thing that going to make huge
 differences.  But I think it's a fairly common code pattern so shouldn't
 be too difficult to find example uses from pythons library.

My guess is that it will be be hard.  There's no easy pattern
to grep for and I don't think the use case you mention comes up
often, much less often enough to need another control mechanism.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: also to balance else ?

2005-06-14 Thread Andrew Dalke
Terry Hancock wrote:
 No, I know what it should be.  It should be finally.   It's already
 a keyword, and it has a similar meaning w.r.t. try.

Except that a finally block is executed with normal and exceptional
exit, while in this case you would have 'finally' only called
when the loop exited without a break.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-14 Thread Andrew Dalke
Andreas Kostyrka wrote:
 On Tue, Jun 14, 2005 at 12:02:29AM +, Andrea Griffini wrote:
 Caching is indeed very important, and sometimes the difference
 is huge.
 ...
 Easy Question:
 You've got 2 programs that are running in parallel.
 Without basic knowledge about caches, the naive answer would be that
 the programs will probably run double time. The reality is different.

Okay, I admit I'm making a comment almost solely to have
Andrea, Andreas and Andrew in the same thread.

I've seen superlinear and sublinear performance for this.
Superlinear when the problem fits into 2x cache size but not
1x cache size and is nicely decomposable, and sublinear when
the data doesn't have good processor affinity.

Do I get an A for Andre.*?  :)
 
Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-14 Thread Andrew Dalke
Peter Maas wrote:
 Yes, but what did you notice first when you were a child - plants
 or molecules? I imagine little Andrew in the kindergarten fascinated
 by molecules and suddenly shouting Hey, we can make plants out of
 these little thingies! ;)

One of the first science books that really intrigued me
was a book on stars I read in 1st or 2nd grade.

As I mentioned, I didn't understand the science of biology
until I was in college.

Teaching kids is different than teaching adults.  The
latter can often take bigger steps and start from a
sound understanding of logical and intuitive thought.
Simple for an adult is different than for a child.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-14 Thread Andrew Dalke
Andrea Griffini wrote:
 Wow... I always get surprises from physics. For example I
 thought that no one could drop confutability requirement
 for a theory in an experimental science...

Some physicists (often mathematical physicists) propose
alternate worlds because the math is interesting.

There is a problem in physics in that we know (I was
trained as a physicist hence the we :) quantum mechanics
and gravity don't agree with each other.  String theory
is one attempt to reconcile the two.  One problem is
the math of string theory is hard enough that it's hard
to make a good prediction.  Another problem is the
realm where QM and GR disagree requires such high energies
that it's hard to test directly.

 I was told that
 in physics there are current theories for which there
 is no hypotetical experiment that could prove them wrong...
 (superstrings may be ? it was a name like that but I
 don't really remember).

If we had a machine that could reach Planck scale energies
then I'm pretty sure there are tests.  But we don't, by
a long shot.

Andrew Dalke

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: new string function suggestion

2005-06-13 Thread Andrew Dalke
Andy wrote:
 What do people think of this?
 
 'prefixed string'.lchop('prefix') == 'ed string'
 'string with suffix'.rchop('suffix') == 'string with '
 'prefix and suffix.chop('prefix', 'suffix') == ' and '

Your use case is

 I get tired of writing stuff like:
 
 if path.startswith('html/'):
   path = path[len('html/'):]
 elif s.startswith('text/'):
   path = path[len('text/'):]
 
 It just gets tedious, and there is duplication.  Instead I could just write:
 
 try:
   path = path.lchop('html/')
   path = path.lchop('text/')
 except SomeException:
   pass

But your posted code doesn't implement your use case.  Consider
if path == html/text/something.  Then the if/elif code sets
path to text/something while the lchop code sets it to something.

One thing to consider is a function (or string method) which
is designed around the 'or' function, like this.  (Named 'lchop2'
but it doesn't give the same interface as your code.)

def lchop2(s, prefix):
  if s.startswith(prefix):
return s[len(prefix):]
  return None

path = lchop2(path, html/) or lchop2(path, text/) or path


If I saw a function named lchop (or perhaps named lchomp) I
would expect it to be (named 'lchomp3' so I can distinguish
between it and the other two)

def lchop3(s, prefix):
  if s.startswith(prefix):
return s[len(prefix):]
  return s

and not raise an exception if the prefix/suffix doesn't match.
Though in this case your use case is not made any simpler.
Indeed it's uglier with either

newpath = path.lchop3(html/)
if newpath == path
  newpath = path.lchop3(text/)
  if newpath == path:
...

or

if path.startswith(html/):
  path = path.lstrip(html/)
elif path.startswith(text/):
  path = path.lstrip(text/)
   ...



I tried finding an example in the stdlib of code that would be
improved with your proposal.  Here's something that would not
be improved, from mimify.py (it was the first grep hit I
looked at)

if prefix and line[:len(prefix)] == prefix:
line = line[len(prefix):]
pref = prefix
else:
pref = ''

In your version it would be:

if prefix:
try:
line = line.rstrip(prefix)
except TheException:
pref = ''
else:
pref = prefix
else:
pref = ''

which is longer than the original.

 From pickle.py (grepping for 'endswith(' and a context of 2)

pickle.py-if ashex.endswith('L'):
pickle.py:ashex = ashex[2:-1]
pickle.py-else:
pickle.py:ashex = ashex[2:]

this would be better with my '3' variant, as

  ashex = ashex.rchop3('L')[2:]

while your version would have to be

  try:
ashex = ashex.rchomp('L')[2:]
  except SomeException:
ashex = ashex[2:]


Even with my '2' version it's the simpler

  ashex = (ashex.rchop2('L') or ashex)[2:]

The most common case will be for something like this

tarfile.py-if self.name.endswith(.gz):
tarfile.py-self.name = self.name[:-3]

My 3 code handles it best

  self.name = self.name.rstrip3(.gz)

Because your code throws an exception for what isn't
really an exceptional case it in essence needlessly
requires try/except/else logic instead of the simpler
if/elif logic.

 Does anyone else find this to be a common need?  Has this been suggested 
 before?

To summarize:
  - I don't think it's needed that often
  - I don't think your implementation's behavior (using an
   exception) is what people would expect
  - I don't think it does what you expect

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-13 Thread Andrew Dalke
Peter Maas wrote:
 I think Peter is right. Proceeding top-down is the natural way of
 learning (first learn about plants, then proceed to cells, molecules,
 atoms and elementary particles).

Why in the world is that way natural?  I could see how biology
could start from molecular biology - how hereditary and self-regulating
systems work at the simplest level - and using that as the scaffolding
to describe how cells and multi-cellular systems work.

Plant biology was my least favorite part of my biology classes.  In
general I didn't like the learn the names of all these parts approach
of biology.  Physics, with its more directly predictive view of the world,
was much more interesting.  It wasn't until college when I read some
Stephen J. Gould books that I began to understand that biology was
different than 'the mitochondria is the powerhouse of the cell', here's
the gall bladder, that plant's a dicot, this is a fossilized trilobite.

Similarly, programming is about developing algorithmic thought.
A beginner oriented programming language should focus on that, and
minimize the other details.

Restating my belief in a homologous line: proceeding from simple to
detailed is the most appropriate way of learning.  Of course in some
fields even the simplest form takes a long time to understand, but
programming isn't string theory.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: also to balance else ?

2005-06-13 Thread Andrew Dalke
Ron Adam wrote:
 It occurred to me (a few weeks ago while trying to find the best way to 
 form a if-elif-else block, that on a very general level, an 'also' 
 statement might be useful.  So I was wondering what others would think 
 of it.

 for x in iteriable:
 BLOCK1
 if condition: break   # do else block
 also:
 BLOCK2
 else:
 BLOCK3


For this specific case you could rewrite the code in
current Python as

for x in iterable:
  BLOCK1
  if condition:
BLOCK3
break
else:
  BLOCK2

In order for your proposal to be useful you would need an
example more like the following in current Python

for x in iterable:
  ...
  if condition:
BLOCK3
break
  ...
  if condition:
BLOCK3
break
else:
  BLOCK2

That is, where BLOCK3;break occurs multiple times in
the loop.  My intuition is that that doesn't occur often
enough to need a new syntax to simplify it.

Can you point to some existing code that would be improved
with your also/else?

 while condition1:
  BLOCK1
  if condition2: break# jump to else
 also:
  BLOCK2
 else:
  BLOCK3
 
 Here if the while loop ends at the while condition1, the BLOCK2 
 executes,  or if the break is executed, BLOCK3 executes.

which is the same (in current Python) as


while condition:
  BLOCK1
  if condition2:
BLOCK3
break
else:
  BLOCK2

 In and if statement...
 
 if condition1:
  BLOCK1
 elif condition2:
  BLOCK2
 elif condition3:
  BLOCK3
 also:
  BLOCK4
 else:
  BLOCK5
 
 Here, the also block would execute if any previous condition is true, 
 else the else block would execute.

That is ... funky.  When is it useful?

One perhaps hackish solution I've done for the rare cases when
I think your proposal is useful is

while 1:
  if condition1:
BLOCK1
  elif condition2:
BLOCK2
  elif condition3:
BLOCK3
  else:
# couldn't do anything
break
  BLOCK4
  break

 I think this gives Pythons general flow control some nice symmetrical 
 and consistent characteristics that seem very appealing to me.  Anyone 
 agree?

No.  Having more ways to do control flow doesn't make for code that's
easy to read.

My usual next step after thinking (or hearing) about a new Python
language change is to look at existing code and see if there's
existing code which would be easier to read/understand and get an
idea if it's a common or rare problem.  Perhaps you could point
out a few examples along those lines?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is different with Python ?

2005-06-13 Thread Andrew Dalke
Andrea Griffini wrote:
 This is investigating. Programming is more similar to building
 instead (with a very few exceptions). CS is not like physics or
 chemistry or biology where you're given a result (the world)
 and you're looking for the unknown laws. In programming *we*
 are building the world. This is a huge fundamental difference!

Philosophically I disagree.  Biology and physics depends on
models of how the world works.  The success of a model depends
on how well it describes and predicts what's observed.

Programming too has its model of how things work; you've mentioned
algorithmic complexity and there are models of how humans
interact with computers.  The success depends in part on how
well it fits with those models.

In biology there's an extremely well developed body of evidence
to show the general validity of evolution.  That doesn't mean
that a biological theory of predator-prey cycles must be based
in an evolutionary model.  Physics too has its share of useful
models which aren't based on QCD or gravity; weather modeling
is one and the general term is phenomenology.

In programming you're often given a result (an inventory
management system) and you're looking for a solution which
combines models of how people, computers, and the given domain work.

Science also has its purely observational domains.  A
biologist friend of mine talked about one of his conferences
where the conversations range from the highly theoretical
to the look at this sucker we caught!

My feeling is that most scientists do not develop new fundamental
theories.  They instead explore and explain things within
existing theory.  I think programming is similar.  Both fields
may build new worlds, but success is measured by its impact
in this world.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dealing with marketing types...

2005-06-12 Thread Andrew Dalke
Paul Rubin replied to me:
 If you're running a web site with 100k users (about 1/3 of the size of
 Slashdot) that begins to be the range where I'd say LAMP starts
 running out of gas.

Let me elaborate a bit.  That claim of 100K from me is the
entire population of people who would use bioinformatics or
chemical informatics.  It's the extreme upper bound of the
capacity I ever expect.  It's much more likely I'll only
need to handle a few thousand users.


 I believe
 LiveJournal (which has something more like a million users) uses
 methods like that, as does ezboard.  There was a thread about it here
 a year or so ago.

I know little about it, though I read at
http://goathack.livejournal.org/docs.html
] LiveJournal source is lots of Perl mixed up with lots of MySQL

I found more details at
http://jeremy.zawodny.com/blog/archives/001866.html

It's a bunch of things - Perl, C, MySQL-InnoDB, MyISAM, Akamai,
memcached.  The linked slides say lots of MySQL usage.
60 servers.

I don't see that example as validating your statement that
LAMP doesn't scale for mega-numbers of hits any better than
whatever you might call printing press systems.

 As a simple example, that article's advice of putting all fine grained
 session state into the database (so that every single browser hit sets
 off SQL queries) is crazy.

To be fair, it does say database plus cache though the author
suggests the place for the cache is at the HTTP level and not
at the DB level.  I would have considered something like memcached
perhaps backed by an asychronous write to a db if you want the
user state saved even after the cache is cleared/reset.

How permanent though does the history need to be?  Your
approach wipes history when the user clears the cookie and it
might not be obvious that doing so should clear the history.

In any case, the implementation cost for this is likely
higher than what you did.  I mention it to suggest an
alternative.


 As for big, hmm, I'd say as production web sites go, 100k users is
 medium sized, Slashdot is largish, Ebay is big, Google is huge.

I'ld say that few sites have 100k users, much less
daily users with personalized information. As a totally made-up
number, only few dozens of sites (maybe a couple hundred?) would
need to worry about those issues.

If that's indeed the case then I'll also argue that each of
them is going to have app-specific choke points which are best
hand-optimized and not framework optimized.  Is there enough
real-world experience to design a EnterpriseWeb-o-Rama (your
printing press) which can handle those examples you gave
any better than starting off with a LAMP system and hand-caching
the parts that need it?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: ElementTree Namespace Prefixes

2005-06-12 Thread Andrew Dalke
On Sun, 12 Jun 2005 15:06:18 +, Chris Spencer wrote:

 Does anyone know how to make ElementTree preserve namespace prefixes in 
 parsed xml files?

See the recent c.l.python thread titled ElemenTree and namespaces
and started May 16 2:03pm.  One archive is at

http://groups-beta.google.com/group/comp.lang.python/browse_thread/thread/31b2e9f4a8f7338c/363f46513fb8de04?rnum=3hl=en

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Dealing with marketing types...

2005-06-12 Thread Andrew Dalke
Paul Rubin wrote:
 Andrew Dalke [EMAIL PROTECTED] writes:
  ...
 I found more details at
 http://jeremy.zawodny.com/blog/archives/001866.html
 
 It's a bunch of things - Perl, C, MySQL-InnoDB, MyISAM, Akamai,
 memcached.  The linked slides say lots of MySQL usage. 60 servers.
 
 LM uses MySQL extensively but what I don't know is whether it serves
 up individual pages by the obvious bunch of queries like a smaller BBS
 might.  I have the impression that it's more carefully tuned than that.

The linked page links to a PDF describing the architecture.
The careful tuning comes in part from a high-performance caching
system - memcached.

 I don't see that example as validating your statement that
 LAMP doesn't scale for mega-numbers of hits any better than
 whatever you might call printing press systems.
 
 What example?  Slashdot?

Livejournal.  You gave it as a counter example to the LAMP
architecture used by /.

]  It seems to me that by using implementation methods that
] map more directly onto the hardware, a site with Slashdot's
] traffic levels could run on a single modest PC (maybe a laptop).
] I believe LiveJournal (which has something more like a million
] users) uses methods like that, as does ezboard. 

Since LJ uses a (highly hand-tuned) LAMP architecture, it isn't
an effective counterexample.

  It uses way more hardware than it needs to,
 at least ten servers and I think a lot more.  If LJ is using 6x as
 many servers and taking 20x (?) as much traffic as Slashdot, then LJ
 is doing something more efficiently than Slashdot.  

I don't know where the 20x comes from.  Registered users?  I
read /. but haven't logged into it in 5+ years.  I know I
hit a lot /. more often than I do LJ (there's only one diary
I follow there).  The use is different as well; all people
hit one story / comments page, and the comments are ranked
based on reader-defined evaluations.  LJ has no one journal
that gets anywhere as many hits and there is no ranking scheme.

 I'ld say that few sites have 100k users, much less
 daily users with personalized information. As a totally made-up
 number, only few dozens of sites (maybe a couple hundred?) would
 need to worry about those issues.
 
 Yes, but for those of us interested in how big sites are put together,
 those are the types of sites we have to think about ;-).

My apologies since I know this sounds snide, but then why didn't
you (re)read the LJ architecture overview I linked to above?
That sounds like something you would have been interested in
reading and would have directly provided information that
counters what you said in your followup.

The ibm-poop-heads article by Ryan Tomayko gives pointers to 
several other large-scale LAMP-based web sites.  You didn't
like the Google one.  I checked a couple of the others:

  IMDB -
  http://www.findarticles.com/p/articles/mi_zdpcm/is_200408/ai_ziff130634
  As you might expect, the site is now co-located with other Amazon.com
  sites, served up from machines running Linux and Apache, but ironically,
  most of the IMDb does not use a traditional database back end. Its
  message boards are built on PostgreSQL, and certain parts of IMDb
  Pro-including its advanced search-use MySQL, but most of the site is
  built with good old Perl script.

  del.icio.us
  Took some digging but I found
  http://lists.del.icio.us/pipermail/discuss/2004-November/001421.html
  The database gets corrupted because the machine gets power-cycled,
  not through any fault of MySQL's.

The point is that LAMP systems do scale, both down and up.  It's
a polemic against architecture astronauts who believe the only
way to handle large sites (and /., LJ, IMDB, and del.icio.us are
larger than all but a few sites) is with some spiffy enterprise
architecture framework.

 I'd say
 there's more than a few hundred of them, but it's not like there's
 millions.  And some of them really can't afford to waste so much
 hardware--look at the constant Wikipedia fundraising pitches for more
 server iron because the Wikimedia software (PHP/MySQL, natch) can't
 handle the load.

Could that have, for example, bought EnterpriseWeb-O-Rama and done
any better/cheaper?  Could they have even started the project
had they gone that route?

 Yes, of course there is [exprience in large-scale web apps]. 
 Look at the mainframe transaction systems of the 60's-70's-80's, for
 example. Look at Google.

For the mainframe apps you'll have to toss anything processed
in batch mode, like payrolls.  What had the customization level
and scale comparable to 100K+ sites of today?  ATMs?  Stock trading?

Google is a one-off system.  At present there's no other system
I know of - especially one with that many users - where a single
user request can trigger searches from hundreds of machines.
That's all custom software.  Or should most servers implement
what is in essence a new distributed operating system just to
run a web site?

  Then there's the tons of experience we all have with LAMP systems

Re: Dealing with marketing types...

2005-06-11 Thread Andrew Dalke
Paul Rubin wrote:
 That article makes a lot of bogus claims and is full of hype.  LAMP is
 a nice way to throw a small site together without much fuss, sort of
 like fancy xerox machines are a nice way to print a small-run
 publication without much fuss.  If you want to do something big, you
 still need an actual printing press.

In the comments the author does say he's trying to be provocative.

My question to you is - what is something big?  I've not been
on any project for which LAMP can't be used, and nor do I
expect to be.  After all, there's only about 100,000 people in
the world who might possibly interested using my software.  (Well,
the software I get paid to do; not, say, the couple of patches I've
sent in to Python).

I had one client consider moving from Python/CGI/flat files to
Java/WebLogic/Oracle.  The old code took nearly 10 seconds to
display a page (!).  They were convinced that they had gone past
the point where Python/CGI was useful, and they needed to use a
more scalable enterprise solution.  The conviction meant they
didn't profile the system.  With about a day of work I got the
performance down to under a second by removing some needless imports,
delaying others until they were needed, making sure all the
.pyc files existed, etc.

I could have gotten more performance switching to a persistent
Python web server and using a database instead of a bunch of
flat files in a directory, but that wasn't worth the time.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python Developers Handbook

2005-06-10 Thread Andrew Dalke
wooks wrote:
 If I had posted or invited the group to look at my full list of items
 rather than just the python book link then I could see where you are
 coming from.

Take a look at http://www.templetons.com/brad/spamterm.html
for some of the first spams and reactions thereof.

There's a 30+ year history of posts which one person thinks
is relevant or important and others find off-topic, crass,
and rude.  A rough sort of social norms - called netiquette -
have come from that experience.

 If my intention was to spam this NG then the complaints as they were
 phrased would  only have served to make me more determined.

The intention is to prevent it from happening in the future.

If your intention is indeed to spam the group then there
are mechanisms to stop you, including such lovely terms as
killfiles and cancelbots.  Too much of it and you might find
your account suspended.  Or have you not wondered why few
spams make it here?

If your intention is to continue posting then it's a
warning of sorts that as in every community there are social
forms to follow, and often good reasons for those forms.

Terry backed up his response explaining not only the
convention for what you were doing, but also mentioned
(briefly) why he responded in the way he did.


I personally found your original posting blunt.  I thought
it was a virus or spam.  You see, I don't do eBay and
whenever I see that term in my mail in a URL it's either
a spam or a phishing attack.  So I ignored it.  If you
really wanted to sell it then following Terry's advice
and holding to social forms would have been better for
your auction.  There's little incentive for anyone to
follow that link without knowing more about it.

 Maybe we will all learn something from each other.

Hopefully you, but not likely the others involved.  As
I said, this sort of thing has a long history and for
anyone who's been doing this for years (like me) there's
little new to learn on the topic.  

To give an idea of the history, there's even an RFC
on netiquette from 10 years ago:
  http://www.faqs.org/rfcs/rfc1855.html

The directly relevant part is

- Advertising is welcomed on some lists and Newsgroups, and abhorred
  on others!  This is another example of knowing your audience
  before you post.  Unsolicited advertising which is completely
  off-topic will most certainly guarantee that you get a lot of
  hate mail.

Most assuredly, what Terry sent you is *not* hate mail.


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Incorrect number of arguments

2005-06-09 Thread Andrew Dalke
Steven D'Aprano wrote:
 *eureka moment*
 
 I can use introspection on the function directly to see 
 how many arguments it accepts, instead of actually 
 calling the function and trapping the exception.

For funsies, the function 'can_call' below takes a function 'f'
and returns a new function 'g'.  Calling 'g' with a set of
arguments returns True if 'f' would take the arguments,
otherwise it returns False.  See the test case for an
example of use.


import new

def noop():
pass


def can_call(func):
# Make a new function with the same signature

# code(argcount, nlocals, stacksize, flags, codestring, constants, names,
#  varnames, filename, name, firstlineno, lnotab[, freevars[, 
cellvars]])
code = func.func_code
new_code = new.code(code.co_argcount,
code.co_nlocals,

noop.func_code.co_stacksize,

code.co_flags,

noop.func_code.co_code,  # don't do anything

code.co_consts,
code.co_names,
code.co_varnames,
code.co_filename,
can_call_ + code.co_name,
code.co_firstlineno,

noop.func_code.co_lnotab, # for line number info

code.co_freevars,
# Do I need to set cellvars?  Don't think so.
)

# function(code, globals[, name[, argdefs[, closure]]])
new_func = new.function(new_code, func.func_globals,
can_call_ + func.func_name,
func.func_defaults)

# Uses a static scope
def can_call_func(*args, **kwargs):
try:
new_func(*args, **kwargs)
except TypeError, err:
return False
return True
try:
can_call_func.__name__ = can_call_ + func.__name__
except TypeError:
# Can't change the name in Python 2.3 or earlier
pass
return can_call_func


 test

def spam(x, y, z=4):
raise AssertionError(Don't call me!)


can_spam = can_call(spam)

for (args, kwargs) in (
((1,2), {}),
((1,), {}),
((1,), {x: 2}),
((), {x: 1, y: 2}),
((), {x: 1, z: 2}),
((1,2,3), {}),
((1,2,3), {x: 3}),
):
can_spam_result = can_spam(*args, **kwargs)
try:
spam(*args, **kwargs)
except AssertionError:
could_spam = True
except TypeError:
could_spam = False

if can_spam_result == could_spam:
continue

print Failure:, repr(args), repr(kwargs)
print Could I call spam()?, could_spam
print Did I think I could?, can_spam_result
print

print Done.


 Still a good question though. Why is it TypeError?

My guess - in most languages with types, functions are
typed not only on is callable but on the parameter
signature.  For example, in C


dalke% awk '{printf(%3d %s\n, NR, $0)}' tmp.c
  1 
  2 int f(int x, int y) {
  3 }
  4 
  5 int g(int x) {
  6 }
  7 
  8 main() {
  9   int (*func_ptr)(int, int);
 10   func_ptr = f;
 11   func_ptr = g;
 12 }
% cc tmp.c
tmp.c: In function `main':
tmp.c:11: warning: assignment from incompatible pointer type
% 

'Course the next question might be then how about an
ArgumentError which is a subclasss of TypeError?

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: tail -f sys.stdin

2005-06-09 Thread Andrew Dalke
garabik:
 what about:
 
 for line in sys.stdin:
 process(line)

This does not meet the OP's requirement, which was
 I'd like to write a prog which reads one line at a time on its sys.stdin
 and immediately processes it.
 If there are'nt any new lines wait (block on input).

It's a subtle difference.  The implementation of iter(file)
reads a block of data at a time and breaks that into lines,
along with the logic to read another block as needed.  If
there isn't yet enough data for the block then Python will
sit there waiting.

The OP already found the right solution which is to call
the readline() method.

Compare the timestamps in the following

% ( echo a ; sleep 2 ; echo b ) | python -c import sys, time\
for line in sys.stdin:\   
  print time.time(),  repr(line)

1118335675.45 'a\n'
1118335675.45 'b\n'
% ( echo a ; sleep 2 ; echo b ) | python -c import sys, time\
while 1:\
line = sys.stdin.readline()\
if not line: break \
print time.time(), repr(line)
1118335678.56 'a\n'
1118335680.28 'b\n'
% 

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Is pyton for me?

2005-06-09 Thread Andrew Dalke
Mark de la Fuente wrote:
 Here is an example of the type of thing I would like to be able to do. 
 Can I do this with python?  How do I get python to execute command line
 functions?
 ...
 # simple script to create multiple sky files.
 
 foreach hour (10 12 14)
   gensky 3 21 $hour  sky$hour.rad
 end

Dan Bishop gave one example using os.system.  The important
thing to know is that in the shell all programs can be used
as commands while in Python there isn't a direct connection.
Instead you need to call a function which translates a
request into something which calls the command-line program.

There are several ways to do that.  In Python before 2.4
the easiest way is with os.system(), which takes the command-line
text as a string.  For example,

import os
os.system(gensky 3 21 10  sky10.rad)

You could turn this into a Python function rather easily

import os

def gensky(hour):
  os.system(gensky 3 21 %d  sky%d.rad % (hour, hour))

for hour in (10, 12, 14):
  gensky(hour)


Python 2.4 introduces the subprocess module which makes it
so much easier to avoid nearly all the mistakes that can
occur in using os.system().  You could replace the 'gensky'
python function with

import subprocess
def gensky(hour):
  subprocess.check_call([gensky, 3, 21, str(hour)],
   stdout = open(sky%d.rad % (hour,), w))


The main differences here are:
 - the original code didn't check the return value of os.system().
It should do this because, for example, the gensky program might
not be on the path.  The check_call does that test for me.

 - I needed to do the redirection myself.  (I wonder if the
subprocess module should allow

  if isinstance(stdout, basestring):
stdout = open(stdout, wb)

Hmmm)


 If I try and do a gensky command from the python interpreter or within a
 python.py file, I get an error message:
 
 NameError: name ‘gensky’ is not defined

That's because Python isn't set up to search the command path
for an executable.  It only knows about variable names defined
in the given Python module or imported from another Python
module.

 If anyone has any suggestions on how to get python scripts to execute
 this sort of thing, what I should be looking at, or if there is
 something else I might consider besides python, please let me know.

You'll have to remember that Python is not a shell programming
language.  Though you might try IPython - it allows some of the
things you're looking for, though not all.

You should also read through the tutorial document on Python.org
and look at some of the Python Cookbook..  Actually, start with
  http://wiki.python.org/moin/BeginnersGuide

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fast text display?

2005-06-08 Thread Andrew Dalke
Christopher Subich wrote:
 My first requirement is raw speed; none of what I'm doing is 
 processing-intensive, so Python itself shouldn't be a problem here.

There's raw speed and then there's raw speed.  Do you want to
display, say, a megacharacter/second?

 it's highly desirable to have very fast text updates (text 
 inserted only at the end)-- any slower than 20ms/line stretches 
 usability for fast-scrolling.

Ahh, that's 400 bytes per second.  That's pretty slow.

 The second requirement is that it support text coloration.

 The third requirement is cross-platform-osity

qtextedit has all of those.  See
  http://doc.trolltech.com/3.3/qtextedit.html

Looks like LogText mode is exactly what you want
 http://doc.trolltech.com/3.3/qtextedit.html#logtextmode

] Setting the text format to LogText puts the widget in a special mode
] which is optimized for very large texts. Editing, word wrap, and rich
] text support are disabled in this mode (the widget is explicitly made
] read-only). This allows the text to be stored in a different, more
] memory efficient manner.

 and

] By using tags it is possible to change the color, bold, italic and
] underline settings for a piece of text. 

Depending on what you want, curses talking to a terminal might be
a great fit.  That's how we did MUDs back in the old days.  :)

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fast text display?

2005-06-08 Thread Andrew Dalke
Christopher Subich wrote:
 You're off by a decimal, though, an 80-column line 
 at 20ms is 4kbytes/sec.

D'oh!  Yeah, I did hundredths of a second instead of thousands.

 My guess is that any faster throughput than 
 10kbytes/sec is getting amusing for a mud, which in theory intends for 
 most of this text to be read anyway.

Which is why I don't think you'll have a problem with any of
the standard GUI libraries.
 
 That looks quite good, except that Trolltech doesn't yet have a GPL-qt 
 for Win32. 

Cost and license weren't listed as requirements.  :)

You *did* say hobby though in post-hoc justification, I've known
people with some pretty expensive hobbies.


 See the scrolling problem in the original post, as to why I can't use it 
 as a temporary user interface. :)

Indeed, but MUDs 15 years ago could run in a terminal and display
colored text via ANSI terminal controls, letting the terminal
itself manage history and scrolling.  I had some sort of TSR for
the latter, under DOS.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: split up a list by condition?

2005-06-07 Thread Andrew Dalke
Reinhold Birkenfeld wrote:
 So I think: Have I overlooked a function which splits up a sequence
 into two, based on a condition? Such as
 
 vees, cons = split(wlist[::-1], lambda c: c in vocals)

 This is clear. I actually wanted to know if there is a function which I
 overlooked which does that, which wouldn't be a maintenance nightmare at
 all.

Not that I know of, but if there is one it should be named
bifilter, or difilter if you prefer Greek roots. :)


def bifilter(test, seq):
  passes = []
  fails = []
  for term in seq:
if test(term):
  passes.append(term)
else:
  fails.append(term)
  return passes, fails


 bifilter(aeiou.__contains__, This is a test)
(['i', 'i', 'a', 'e'], ['T', 'h', 's', ' ', 's', ' ', ' ', 't', 's', 't'])
 

Another implementation, though in this case I cheat because I
do the test twice, is

 from itertools import ifilter, ifilterfalse, tee
 def bifilter(test, seq):
...   seq1, seq2 = tee(seq)
...   return ifilter(test, seq1), ifilterfalse(test, seq2)
... 
 bifilter(aeiou.__contains__, This is another test)
(itertools.ifilter object at 0x57f050, itertools.ifilterfalse object at 
0x57f070)
 map(list, _)
[['i', 'i', 'a', 'o', 'e', 'e'], ['T', 'h', 's', ' ', 's', ' ', 'n', 't', 'h', 
'r', ' ', 't', 's', 't']]
 


Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Software licenses and releasing Python programs for review

2005-06-06 Thread Andrew Dalke
max:
 For me, the fact 
 that corporations are considered people by the law is ridiculous. 

Steven D'Aprano wrote:
 Ridiculous? I don't think so. Take, for example, Acme Inc. Acme purchases
 a new factory. Who owns the factory? The CEO? The Chairperson of the Board
 of Directors? Split in equal shares between all the directors? Split
 between all the thousands of shareholders? Society has to decide between
 these methods.

Getting off-topic for c.l.py.  Might want to move this to, for example,
the talk thread for
  http://en.wikipedia.org/wiki/Corporate_personhood
which is
  http://en.wikipedia.org/wiki/Talk:Corporate_personhood
and read also
  http://en.wikipedia.org/wiki/Corporation

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


RE: About size of Unicode string

2005-06-06 Thread Andrew Dalke
Frank Abel Cancio Bello wrote:
 Can I get how many bytes have a string object independently of its encoding?
 Is the len function the right way of get it?

No.  len(unicode_string) returns the number of characters in the
unicode_string.

Number of bytes depends on how the unicode character are represented.
Different encodings will use different numbers of bytes.

 u = uG\N{Latin small letter A with ring above}
 u
u'G\xe5'
 len(u)
2
 u.encode(utf-8)
'G\xc3\xa5'
 len(u.encode(utf-8))
3
 u.encode(latin1)  
'G\xe5'
 len(u.encode(latin1))
2
 u.encode(utf16) 
'\xfe\xff\x00G\x00\xe5'
 len(u.encode(utf16))
6
 

 Laci look the following code:
 
   import urllib2
   request = urllib2.Request(url= 'http://localhost:6000')
   data = 'data to send\n'.encode('utf_8')
   request.add_data(data)
   request.add_header('content-length', str(len(data)))
   request.add_header('content-encoding', 'UTF-8')
   file = urllib2.urlopen(request)
 
 Is always true that the size of the entity-body is len(data)
 independently of the encoding of data?

For this case it is true because the logical length of 'data'
(which is a byte string) is equal to the number of bytes in the
string, and the utf-8 encoding of a byte string with character
values in the range 0-127, inclusive, is unchanged from the
original string.

In general, as if 'data' is a unicode strings, no.

len() returns the logical length of 'data'.  That number does
not need to be the number of bytes used to represent 'data'.
To get the bytes you must encode the object.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: the python way?

2005-06-06 Thread Andrew Dalke
Reinhold Birkenfeld wrote:
 To make it short, my version is:
 
 import random
 def reinterpolate2(word, vocals='aeiouy'):
 wlist = list(word)
 random.shuffle(wlist)
 vees = [c for c in wlist[::-1] if c in vocals]
 cons = [c for c in wlist[::-1] if c not in vocals]

Why the [::-1]?  If it's randomly shuffled the order isn't important.

 short, long = sorted((cons, vees), key=len)
 return ''.join(long[i] + short[i] for i in range(len(short))) + 
 ''.join(long[len(short):])

All the cool kids are using 2.4 these days.  :)

Another way to write this is (assuming the order of characters
can be swapped)

 N = min(len(short), len(long))
 return (''.join( [c1+c2 for (c1, c2) in zip(cons, vees)] +
 cons[N:] + vees[N:])

The main change here is that zip() stops when the first iterator finishes
so there's no need to write the 'for i in range(len(short))'

If the order is important then the older way is

if len(cons) = len(vees):
short, long = vees, cons
else:
short, long = cons, vees
return (''.join( [c1+c2 for (c1, c2) in zip(short, long)] +
 long[len(short):])


'Course to be one of the cool kids, another solution is to use the
roundrobin() implementation found from http://www.python.org/sf/756253

from collections import deque
def roundrobin(*iterables):
pending = deque(iter(i) for i in iterables)
while pending:
task = pending.popleft()
try:
yield task.next()
except StopIteration:
continue
pending.append(task)



With it the last line becomes

 return ''.join(roundrobin(short, long))

Anyone know if/when roundrobin() will be part of the std. lib?
The sf tracker implies that it won't be.

Andrew
[EMAIL PROTECTED]
 
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Creating file of size x

2005-06-06 Thread Andrew Dalke
Jan Danielsson wrote:
Is there any way to create a file with a specified size?

Besides the simple

def make_empty_file(filename, size):
  f = open(filename, wb)
  f.write(\0 * size)
  f.close()

?

If the file is large, try (after testing and fixing any
bugs):

def make_empty_file(filename, size, block = 32*1024):
  f = open(filename, wb)
  written = 0
  s = \0 * block
  for i in range(size//block):
f.write(s)
  remainder = size%block
  f.write(s[:remainder])
  f.close()

As Grant Edwards pointed out, you can do a seek(size-1)
but I don't know if it's fully portable.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: any macro-like construct/technique/trick?

2005-06-05 Thread Andrew Dalke
Mike Meyer wrote:
 I've never tried it with python, but the C preprocessor is available
 as 'cpp' on most Unix systesm. Using it on languages other than C has
 been worthwhile on a few occasions. It would certainly seem to
 directly meet the OP's needs.

Wouldn't that prohibit using #comments in the macro-Python code?
I suppose they could be made with strings, as in


  here is a comment
  do_something()

but it's ... strange.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-04 Thread Andrew Dalke
Nicolas Fleury wrote:
 There's no change in order of deletion, it's just about defining the 
 order of calls to __exit__, and they are exactly the same.

BTW, my own understanding of this is proposal is still slight.
I realize a bit better that I'm not explaining myself correctly.

 As far as I 
 know, PEP343 has nothing to do with order of deletion, which is still 
 implementation-dependant.  It's not a constructor/destructor thing like 
 in C++ RAII, but __enter__/__exit__.

I'm mixing (because of my lack of full comprehension) RAII with
your proposal.

What I meant to say was in the PEP

 with locking(someMutex)
 with opening(readFilename) as input
 with opening(writeFilename) as output
 ...

it's very well defined when the __exit__() methods are
called and in which order.  If it's


 with locking(someMutex)
 with opening(readFilename) as input
 with opening(writeFilename) as output

with the __exit__()s called at the end of the scope (as if it
were a __del__, which it isn't) then the implementation could
still get the __exit__ order correct, by being careful.  Though
there would be no way to catch an exception raised in an __exit__.
I think.
 
 Your approach wouldn't allow the following
 
 No, I said making the ':' *optional*.  I totally agree supporting ':' is 
 useful.

Ahh, I think I understand.  You want both

with abc:
  with cde:
pass

and

with abc
with def

and to have the second form act somewhat like RAII in that
the __exit__() for that case is called when the scope ends.


Hmm.  My first thought is I don't like it because I'm a stodgy
old traditionalist and don't like the ambiguity of having to look
multiple tokens ahead to figure out which form is which.  

I can see that it would work.  Umm, though it's tricky.  Consider

with abc

with defg:
  with ghi
  with jkl:
1/0



The implementation would need to track all the with/as forms
in a block so they can be __exit__()ed as appropriate.  In this
case ghi.__exit() is called after jkl.__exit__() and
before defg.__exit__

The PEP gives an easy-to-understand mapping from the proposed
change to how it could be implemented by hand in the existing
Python.  Can you do the same?

 True.  But does it look as good?  Particularly the _ part?

I have not idea if the problem you propose (multiple with/as
blocks) will even exist so I can't comment on which solution
looks good.  It may not be a problem in real code, so not needing
any solution.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to get name of function from within function?

2005-06-04 Thread Andrew Dalke
I'm with Steven Bethard on this; I don't know what you
(Christopher J. Bottaro) are trying to do.

Based on your example, does the following meet your needs?

 class Spam(object):
...   def funcA(self):
... print A is called
...   def __getattr__(self, name):
... if name.startswith(_):
...   raise AttributeError, name
... f = get_function(name)
... if f is not None:
...   return f
... raise AttributeError, name
... 
 def get_function(name):
... return globals().get(name + IMPL, None)
... 
 x = Spam()
 x.funcA()
A is called
 x.funcB()
Traceback (most recent call last):
  File stdin, line 1, in ?
  File stdin, line 10, in __getattr__
AttributeError: funcB
 def funcBIMPL():
...   print Calling all bees
... 
 x.funcB()
Calling all bees
 


Confused-ly-your's

Andrew
[EMAIL PROTECTED]


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-04 Thread Andrew Dalke
On Sat, 04 Jun 2005 10:43:48 -0600, Steven Bethard wrote:
 Ilpo Nyyssönen wrote:
 How about this instead:
 
 with locking(mutex), opening(readfile) as input:
 ...

 I don't like the ambiguity this proposal introduces.  What is input 
 bound to?

It would use the same logic as the import statement, which already
supports an 'as' like this

 import sys, math, cStringIO as StringIO, xml.sax.saxutils as su
 

 But the point is 
 that, whatever decision you make, I now have to *memorize* that decision.

It's the same rule so the rule would be ahh, uses the 'as' form.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-04 Thread Andrew Dalke
Nicolas Fleury wrote:
 I think it is simple and that the implementation is as much 
 straight-forward.  Think about it, it just means that:

Okay, I think I understand now.

Consider the following

server = open_server_connection()
with abc(server)
with server.lock()
do_something(server)

server.close()

it would be translated to

server = open_server_connection()
with abc(server):
  with server.lock()
do_something(server)
server.close()

when I meant for the first code example to be implemented
like this

server = open_server_connection()
with abc(server):
  with server.lock()
do_something(server)

server.close()


(It should probably use the with-block to handle the server open
and close, but that's due to my lack of imagination in coming up
with a decent example.)

Because of the implicit indentation it isn't easy to see that
the server.close() is in an inner block and not at the outer
one that it appears to be in.  To understand the true scoping
a reader would need to scan the code for 'with' lines, rather
than just looking at the layout.


 Good point.  As a C++ programmer, I use RAII a lot.

And I've used it a few times in Python, before I found
out it wasn't a guaranteed behavior by the language.

 So I come to another conclusion: the indentation syntax will most of the 
 time result in a waste of space.  Typically a programmer would want its 
 with-block to end at the end of the current block.

A test for how often this is needed would be to look in existing
code for the number of try/finally blocks.  I have seen and
written some gnarly deeply stacked blocks but not often - once
a year?

That's not to say it's a good indicator.  A lot of existing code
looks like this

def get_first_line(filename):
  f = open(filename)
  return f.readline()

depending on the gc to clean up the code.  A more ... not
correct, but at least finicky ... implementation could be

def get_first_line(filename):
  f = open(filename)
  try:
return f.readline()
  finally:
f.close()

Almost no one does that.  With the PEP perhaps the idiomatic
code would be

def get_first_line(filename):
  with open(filename) as f:
return f.readline()


(Add __enter__/__exit__ semantics to the file object?  Make
a new 'opening' function?  Don't know.)

What I mean by all of this is that the new PEP may encourage
more people to use indented blocks, in a way that can't be
inferred by simply looking at existing code.  In that case
your proposal, or the one written

  with abc, defg(mutex) as D, server.lock() as L:
..

may be needed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: For review: PEP 343: Anonymous Block Redux and Generator Enhancements

2005-06-04 Thread Andrew Dalke
Steven Bethard wrote:
 Ahh, so if I wanted the locking one I would write:
 
  with locking(mutex) as lock, opening(readfile) as input:
  ...

That would make sense to me.

 There was another proposal that wrote this as:
 
  with locking(mutex), opening(readfile) as lock, input:
  ...
 
 which is what was confusing me.  Mirroring the 'as' from the import 
 statement seems reasonable.

Ahh, you're right.  That was an earlier proposal.

 
 But it doesn't address my other concern, namely, is
 
  with locking(mutex), opening(readfile) as input:
  ...
 
 equivalent to the nested with-statements, e.g.:

I would think it's the same as

with locking(mutex):
  with opening(readfile) as input:
...

which appears to map to the first of your alternatives

 Or is it equivalent to something different, perhaps:
 
  _locking = locking(mutex)
  _opening = opening(readfile)
  _exc = (None, None, None)
  _locking.__enter__()
  input = _opening.__enter__()
  try:
  try:
  ...
  except:
  _exc = sys.exc_info()
  raise
  finally:
  _opening.__exit__(*exc)
  _locking.__exit__(*exc)

That wouldn't work; consider if _opening.__enter__() raised
an exception.  The _locking.__exit__() would never be called,
which is not what anyone would expect from the intent of
this PEP.

 Or maybe:
 
  _locking = locking(mutex)
  _opening = opening(readfile)
  _exc = (None, None, None)
  _locking.__enter__()
  input = _opening.__enter__()

Same problem here

  finally:
  # same order as __enter__ calls this time!!
  _locking.__exit__(*exc)
  _opening.__exit__(*exc)

and the order would be wrong since consider multiple
statements as

with server.opening() as connection, connection.lock(column) as C:
  C.replace(X, Y)

The inner with depends on the outer and must be closed
in inverted order.


 And if it *is* just equivalent to the nested with-statements, how often 
 will this actually be useful?  Is it a common occurrence to need 
 multiple with-statements?  Is the benefit of saving a level of 
 indentation going to outweigh the complexity added by complicating the 
 with-statement?

Agreed.

Andrew
[EMAIL PROTECTED]

-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   >