Re: Comparing lists

2005-10-16 Thread James Dennett
Steven D'Aprano wrote:
> On Sat, 15 Oct 2005 18:17:36 +0200, Christian Stapfer wrote:
> 
> 
I'd prefer a (however) rough characterization
of computational complexity in terms of Big-Oh
(or Big-whatever) *anytime* to marketing-type
characterizations like this one...
>>>
>>>Oh how naive.
>>
>>Why is it that even computer science undergrads
>>are required to learn the basics of Big-Oh and
>>all that? 
> 
> 
> So that they know how to correctly interpret what Big O notation means,
> instead of misinterpreting it. Big O notation doesn't tell you everything
> you need to know to predict the behaviour of an algorithm. It doesn't even
> tell you most of what you need to know about its behaviour. Only actual
> *measurement* will tell you what you need to know.

In my experience, I need both knowledge of algorithmic
complexity (in some pragmatic sense) and measurements.
Neither alone is sufficient.

The proponents of algorithmic complexity measures don't
make the mistake of thinking that constants don't matter
for real-world performance, but they also don't make the
mistake of thinking that you can always measure enough
to tell you how your code will perform in all situations
in which it might be used.

Measurement is complicated -- very often, it just shows
you that tuning to match cache sizes is greatly important
to keep the constant factors down.  And yes, for small
data sizes often a linear-time algorithm can beat one
whose execution time grows only logarithmically, while
often a logarithmic time is close enough to constant over
the range of interest.  How an operation runs on a heavily
loaded system where it shares resources with other tasks
can also be greatly different from what microbenchmarks
might suggest.

If we don't oversimplify, we'll measure some appropriate
performance numbers and combine that with some knowledge
of the effects of caches, algorithmic complexity and other
factors that might matter in given situations.  And of
course there will be many situations where programmer time
and simplicity are more important than saving a millisecond,
or even a second, and we won't waste excessive resources in
optimising runtime at the expense of other factors.

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


[offtopic] Re: Set of Dictionary

2005-06-26 Thread James Dennett
Steven D'Aprano wrote:

> On Thu, 16 Jun 2005 21:21:50 +0300, Konstantin Veretennicov wrote:
> 
> 
>>On 6/16/05, Vibha Tripathi <[EMAIL PROTECTED]> wrote:
>>
>>>I need sets as sets in mathematics:
>>
>>That's tough. First of all, mathematical sets can be infinite. It's
>>just too much memory :)
>>Software implementations can't fully match mathematical abstractions.
> 
> 
> :-)
> 
> But lists can be as long as you like, if you have enough memory. 

But you never have enough memory to store, for example,
a list of all the prime integers (not using a regular list,
anyway).

> So
> can longs and strings. So I don't think the infinity issue is a big one.
> 
> 
>>>   sets of any unique type of objects including those
>>>of dictionaries, I should then be able to do:
>>>a_set.__contains__(a_dictionary) and things like that.
> 
> 
> Standard Set Theory disallows various constructions, otherwise you get
> paradoxes.
> 
> For example, Russell's Paradox: the set S of all sets that are not an
> element of themselves. Then S should be a set. If S is an element of
> itself, then it belongs in set S. But if it is in set S, then it is an
> element of itself and it is not an element of S. Contradiction.
> 
> The price mathematicians pay to avoid paradoxes like that is that some
> sets do not exist. For instance, there exists no universal set (the set
> of all sets), no set of all cardinal numbers, etc.
> 
> So even in mathematics, it is not true that sets can contain anything.

See "Set Theory With a Universal Set" by T. Forster, which covers
some set theories in which there *is* a set of all things, and
in which Russell's paradox is avoided in other ways (such as by
restricting the comprehension axioms).

(Sorry for drifting offtopic, I happen to find non-standard
set theories interesting and thought that some others here
might too.)

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-03 Thread James Dennett
Tom Anderson wrote:

> On Sun, 3 Jul 2005, George Sakkis wrote:
> 
>> "Tom Anderson" <[EMAIL PROTECTED]> wrote:
>>
> And finally, does Guido know something about arithmetic that i 
> don't, or
> is this expression:
>
> -1.0 ** 0.5
>
> Evaluated wrongly?


 No, it is evaluated according to the rules of precedence. It is
 equivalent to -(1.0**0.5). You are confusing it with (-1.0)**0.5 which
 fails as expected.
>>>
>>>
>>> Ah. My mistake. I submit that this is also a bug in python's grammar.
>>> There's probably some terribly good reason for it, though.
>>
>>
>> How about 'conformance with standard mathematic notation', does this 
>> count for a terribly good reason?
> 
> 
> Yes. However, it's an excellent reason why python's precedence rules are 
> wrong - in conventional mathematical notation, the unary minus, used to 
> denote the sign of a literal number, does indeed have higher precedence 
> than exponentiation: -1^2 evaluates to 1, not -1.

No... that would mean that -x^2 == x^2 always, and we certainly
expect the lhs to always be negative or zero (for real x), and
the rhs to be non-negative.  Unary minus has lower precendence
than exponentiation, in conventional mathematics.  You'll often
see (-1)^n written in mathematics when we want the alternating
sequence +1, -1, +1, -1, ..., as -1^n wouldn't do it (giving
-1, -1, -1, -1).  -1^n evaluates to -1, for all non-zero integral
n.

> 
>> What would one expect to be the result of 5^2 - 2^2, 29 or 21?
> 
> 
> 21.

True, unambiguously; there are no negative numbers involved
at any time, just a simple binary minus.

> 
>> Would you expect 5^2 + - 2^2 to be different, even if you write it as 
>> 5^2 + -2 ^ 2 ?
> 
> 
> Yes: 5^2 + -2^2 is 29, however you write it.

*If* you take -2 as a number, but not if you take the number
as 2 and the unary minus as an operator with lower precedence
than exponentiation.

> 
>> White space is not significant in math AFAIK ;-)
> 
> 
> No, but there's a very big difference between unary and binary minus.

Not in this respect.  However, the question is whether that's
a unary -, which binds lower than exponentiation in most systems,
or part of the negative number minus 2.

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: math.nroot [was Re: A brief question.]

2005-07-04 Thread James Dennett
Steven D'Aprano wrote:

> James Dennett wrote:
> 
>> > Yes: 5^2 + -2^2 is 29, however you write it.
>>
>> *If* you take -2 as a number, but not if you take the number
>> as 2 and the unary minus as an operator with lower precedence
>> than exponentiation.
> 
> 
> [snip]
> 
>> Not in this respect.  However, the question is whether that's
>> a unary -, which binds lower than exponentiation in most systems,
>> or part of the negative number minus 2.
> 
> 
> In Python, all numbers are positive. -2 is a unary minus in front of 2.

Indeed, as I hoped was clear from the rest of my post.  When
we're discussing what the rules of Python *should* be, however,
it's necessary to step back and not assume that the rules of
Python are "right", whatever we might mean by that.

I was illustrating that the choice made by Python in this
respect, i.e., of viewing "-2" as an expression where unary
minus applies to the number 2, naturally leads to
5**2 + -2**2 being evaluated as 21 (because of the precedence
of unary minus), whereas taking "-2" as a number would lead
to the answer 29 (which is not what Python gives).

That said, Python's choice is consistent with conventional
mathematical notation, which is one reason not to lex -2 as
a number in a language with an infix exponentiation operator.

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What is Expresiveness in a Computer Language?

2005-07-10 Thread James Dennett
Pete Barrett wrote:

> On 10 Jul 2005 02:57:04 -0700, "Xah Lee" <[EMAIL PROTECTED]> wrote:
> 
> 
>>Similarly, in computer languages, expressiveness is significant with
>>respect to semantics, not syntactical variation.
>>
> 
> It may just be me, but I tend to think of a computer language as a
> tool for directing computers to perform specific actions. Do we talk
> about the expressiveness of a spade?

Spades, however, are rarely used for communication between
human beings, whereas languages, definitely including
programming languages, are.  While there's a good case to
be made that communication from a person to a computer is
the *primary* use of a programming language, communication
between programmers is a major secondary use.

> There's a similar concept in the 'possible uses' of a tool (a spade is
> an excellent tool for digging the garden, but you wouldn't use it to
> clean your teeth; you *could* use a toothbrush to dig the garden, but
> you wouldn't if a spade was available). Similarly with computer
> languages - some are better for certain tasks than others, but I don't
> think 'expressiveness' is the way to describe that.

You're missing, maybe, the very special nature of one task,
namely that of communication between programmers.  Computers
care not if you use assembly code or Python, but people often
find one or the other easier to read and understand.  (Which
is considered easier depends on the background of the reader.)

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: inheritance

2005-07-10 Thread James Dennett
J wrote:

> just to follow up. I managed to get it working by having a closer look
> at the xxmodules. However, the example does the following
> 
> typedef struct
> {
> PyStructBaseClass  mBase;
> int mValue;
> } PyStructDerivedClass;
> 
> Does anything speak agains doing this ?
> 
> typedef struct : public PyStructBaseClass
> {
> int mValue;
> } PyStructDerivedClass;
> 
> It seems to work for me. The advantage is that access to the members in
> that structure is easier. The .mBase is not needed 

Well, it's C++, not C, and in C++ you wouldn't idiomatically
use typedef there, but rather write

struct pyStructDerivedClass : PyStructBaseClass
{
   int mValue;
};

You'll also be relying on assumptions of how single inheritance
is implemented by your C++ compiler -- but that said, I've used
this assumption in the past and found it to be valid for a wide
range of compiler/ABIs.

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How can I import a py script by its absolute path name?

2005-07-15 Thread James Dennett
J.Bijsterbosch wrote:

> Hello Edward,
> 
> "Edvard Majakari" <[EMAIL PROTECTED]> schreef in bericht
> news:[EMAIL PROTECTED]
> 
>>Thorsten Kampe <[EMAIL PROTECTED]> writes:
>>
>>
>>>"sys.path.append('c:\\xxx\\yyy')" or "sys.path.append('c:/xxx/yyy')"
>>
>>Well, of course. As I said, it was untested :) I just copied the path
> 
> string,
> 
>>and didn't remember Windows uses path names which need special
>>treatment.
> 
> 
> Hmm, what you call special treatment comes from pythons deep underlying C
> and C++ language heietidge I presume. A backslash in a C or C++ string means
> the following character is a so called escape character, like \n represents
> a newline and \r a return to the beginning of a line.
> If you really want a backslash you need to type it twice like so \\. Has
> nothing to do with Windows...;-))

Actually, it does have a connection to Windows.

On Unix, backslashes are rarely used for anything *except* escape
characters.  Pathnames tend not to include backslashes, so in most
cases it's not necessary to escape backslashes in path names.  On
Windows, however, backslash is a valid path separator, and must be
escaped.

So, on Unix, for a path separator, you type "/".  On Windows you
can either do the same, or type "\\".  (Or (ab)use raw strings.)

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to write a line in a text file

2005-07-31 Thread James Dennett
Peter Hansen wrote:

> Steven D'Aprano wrote:
> 
>> On Wed, 27 Jul 2005 04:26:31 +, Andrew Dalke wrote:
>>
>>> This isn't 1970.  Why does your app code work directly with
>>> files?  Use a in-process database library (ZODB, SQLLite,
>>> BerkeleyDB, etc.) to maintain your system state and let the
>>> library handle transactions for you.
>>
>>
>> And when users are happy to install a relational database system in order
>> for their editor to save their letter to grandma, I'm sure your scheme
>> will work well for them.
> 
> 
> Given that ZODB and PySQLite are simply Python extension modules, which 
> get bundled by your builder tool and are therefore installed 
> transparently along with your app by your installer, this is a total 
> non-issue at least with those packages.
> 
> After all, it's not 1970 any more. ;-)

Indeed; since 1970 we learned to prefer straightforward
file formats where possible, reserving use of databases
for structured data where the extra costs are justified.

Sometime maybe databases will get better to the point
that we don't need to distinguish so much between them
and filesystems, but we're not there yet.  Managing raw
files, carefully, still has a place.

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to determine that if a folder is empty?

2005-08-08 Thread James Dennett
Reinhold Birkenfeld wrote:

> could ildg wrote:
> 
>>I want to check if a folder named "foldername" is empty.
>>I use os.listdir(foldername)==[] to do this,
>>but it will be very slow if the folder has a lot of sub-files.
>>Is there any efficient ways to do this?
> 
> 
> try:
> os.rmdir(path)
> empty = True
> except OSError:
> empty = False
> 
> should be efficient. A directory (please stop calling them "folders")
> can only be removed if it's empty.
> 
> Reinhold

Unfortunately that destroys the directory if it was empty,
and then you can't recreate it unless (a) you went to the
trouble of preserving all of its attributes before deleting
it, and (b) your script has OS-level permissions to recreate
the directory with its original attributes (such as owners
and permissions).

Also note that modifying a filesystem can be significantly
slower than reading from it (it varies widely across platforms).

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to write a line in a text file

2005-08-10 Thread James Dennett
Calvin Spealman wrote:
> On 7/31/05, James Dennett <[EMAIL PROTECTED]> wrote:
> 
>>Peter Hansen wrote:
>>
>>
>>>Steven D'Aprano wrote:
>>>
>>>Given that ZODB and PySQLite are simply Python extension modules, which
>>>get bundled by your builder tool and are therefore installed
>>>transparently along with your app by your installer, this is a total
>>>non-issue at least with those packages.
>>>
>>>After all, it's not 1970 any more. ;-)
>>
>>Indeed; since 1970 we learned to prefer straightforward
>>file formats where possible, reserving use of databases
>>for structured data where the extra costs are justified.
>>
>>Sometime maybe databases will get better to the point
>>that we don't need to distinguish so much between them
>>and filesystems, but we're not there yet.  Managing raw
>>files, carefully, still has a place.
>>
>>-- James
> 
> 
> Filesystems are a horrible way to organize information, and even worse
> at structuring it. The mentality that there are any benefits of
> low-overhead the outweigh the benefits of minimal database layers,
> such as ZODB, BSD DB, and SQLite, is a large part of the reason we are
> still stuck with such a mess. Those "extra costs" are so minimal, that
> you wouldn't even notice them, if you hadn't convinced yourself of
> their presense before performing or researching any realy benchmarks.
> A simple RDBMS can be much easier to work with than any flat file
> format, will likely be far faster in processing the data, and has the
> benefit of years of coding making the code that actually reads and
> writes your data as fast and stable as possible.

Read again, and you'll note that I didn't say the costs were
performance related (though in some, just some, situations the
performance cost of using an RDBMS is a reason to avoid it,
and in other cases an RDBMS will be faster than naive file
access).

The reasons for using other formats are things such as better
interoperability, which extends to better future proofing,
easier installations (in some situations -- note again that
there are no hard and fast rules), and the fact that power
users can edit the files by hand (more easily than they can
tweak information in databases in many, but not all, cases).
It can also (again, in some cases) be easier to port filesystem
based storage between systems than to move database access code.

Oh, and please don't assume that I make my decisions without
measuring performance.  If you store enough millions, or
billions, of items, then performance is worth measuring.  I've
found situations where using various database engines made sense,
I've found situations where flat files made more sense, and I've
seen cases where either was used where the other might have been
a better choice.

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Euclid's Algorithm in Python?

2005-08-14 Thread James Dennett
Antoon Pardon wrote:

> On 2005-08-08, Bengt Richter <[EMAIL PROTECTED]> wrote:
> 
>>On 7 Aug 2005 17:31:02 -0700, "Jordan Rastrick" <[EMAIL PROTECTED]> wrote:
>>
>>
>>>Good point. I suppose I'd only ever seen it implemented with the if
>>>test, but you're right, the plain while loop should work fine. Silly
>>>me.
>>>
>>>def gcd(a,b):
>>>while b != 0:
>>>  a, b = b, a%b
>>>   return a
>>>
>>>Even nicer.
>>>
>>
>>what is the convention for handling signed arguments? E.g.,
> 
> 
> As far as I understand the convention is it doesn't make
> sense to talk about a gcd if not all numbers are positive.
> 
> I would be very interested if someone knows what the gcd
> of 3 and -3 should/would be.

Within the integers, common definitions of gcd don't distinguish
positive from negative numbers, so if 3 is a gcd of x and y then
-3 is also a gcd.  That's using a definition of gcd as something
like "g is a gcd of x and y if g|x and g|y and, for any h such
that h|x and h|y, h|g", i.e., a gcd is a common divisor and is
divisible by any other common divisor.  The word "greatest" in
this context is wrt the partial ordering imposed by | ("divides").
(Note that "|" in the above is the mathematical use as a|b if
there is an (integral) c such that ac=b, i.e., b is divisible
by a.  These definitions generalize to rings other than the
integers, without requiring a meaningful < comparison on the
ring.)

All of that said, it would be reasonable when working in the
integers to always report the positive value as the gcd, to
make gcd a simpler function.  For some applications, it won't
matter if you choose positive or negative, so going positive
does no harm, and others might be simplified by assuming gcd>0.

-- James
-- 
http://mail.python.org/mailman/listinfo/python-list