Re: lambda closure question

2005-02-20 Thread Dirk Thierbach
Ted Lilley <[EMAIL PROTECTED]> wrote:
> As a side note, I'm familiar with the term currying from a friend who
> learned ML and Scheme quite some time ago.  Not sure if that's the true
> origin, but it was a sufficiently different context from Python (or at
> least I thought) that I didn't want to rely on its meaning.  I was also
> sufficiently unsure of it's _exact_ meaning, 

The exact meaning is reasonably well explained, together with the
origins, on Wikipedia:

http://en.wikipedia.org/wiki/Currying

That meaning shouldn't change if applied to different computer languages.
I can understand the temptation to confuse partial application with
currying (for languages like Python which take tuples of arguments
by default, currying is one way to make partial application possible;
supplying default arguments is another), but nonetheless they are different 
things.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list of unique non-subset sets

2005-03-19 Thread Dirk Thierbach
Raymond Hettinger <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] ]

>>> OK, so I need to be more precise.
>>> Given a list of sets, output the largest list of sets (from this list,
>>> order does not matter) such that:
>>> 1) there is no set that is a PROPER subset of another set in this list
>>> 2) no two sets have exactly the same members (100% overlap)

> [Bengt Richter]
>> two from the above come out length 5:

> With multiple outputs satisfying the constraints, I would suspect
> that there is something wrong with the original spec (not as a
> stand-alone problem, but as component of a real application).

I don't know what the application is trying to do, but if I understand
it correctly, he is asking for something called a "maximal anti-chain".

For any partial order (e.g., subset relation), a chain is a set of
elements which are all comparable (and hence there is a linear or
total order on this set). In a similar way, an anti-chain is a set of
elements consisting of elements that are incomparable to each
other. Anti-chains themselves can be ordered by subset inclusion, and
thefore maximal ("largest") anti-chains can be found, but in general
there is no unique "greatest" anti-chain. 

So the spec is perfectly ok and makes a lot of sense mathematically,
it's just that there is no unique solution. 

Maybe googling for "maximal anti-chain" digs up some more information
(I haven't tried).

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: protocols, inheritance and polymorphism

2004-11-29 Thread Dirk Thierbach
Dirk Thierbach <[EMAIL PROTECTED]> wrote:
> Donn Cave <[EMAIL PROTECTED]> wrote:
>> Quoth Jacek Generowicz <[EMAIL PROTECTED]>:

>>> Specifically, dynamic polymorphism is impossible without dynamic
>>> typing.

> I haven't heard the term "dynamic typing" in this context, 

Sorry, that should have been "dynamic polymorphism".

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Puzzling OO design problem

2005-04-09 Thread Dirk Thierbach
George Sakkis <[EMAIL PROTECTED]> wrote:

> 1. There is a (single inheritance) hierarchy of domain classes, say
> A<-B<-..<-Z (arrows point to the parent in the inheritance tree).
> 2. This hierarchy evolved over time to different versions for each
> class. So for example, version's 1 hierarchy would be A_v1 <-B_v1
> <-..<-Z_v1.
> 3. At runtime, only one version is selected by a factory function.

> Up to this point, the inheritance graph would be the following:
> 
> A <- A_V1 ... <- A_Vn
> ^ ^   ^
> | |   |
> B <- B_V1 ... <- B_Vn
> . .   .
> . .   .
> . .   .
> ^ ^   ^
> | |   |
> Z <- Z_V1 ... <- Z_Vn

Interesting problem.

> This could be implemented either with multiple inheritance (e.g.
> B_V1(B,A_V1)) 

To help myself thinking about that, let's make a somewhat complicated
example, somewhere in the middle of the graph, with the possibility of
introducing a hole at B3. I also shorten A_Vn to An etc. Consider the
subgraph (with nonstandard annotations of method definition after the bar
(to save space) as explained below):

A2 | f   <-   A3 | f
^ ^
| |
B2   <-   B3   
^ ^
| |
C2 | g   <-   C3 | h

Assume a method g that is present in C2 but not changed in C3. Now g
calls a method f, which is inherited unchanged in C2 from A2 (and not
changed in B2, B3 or C3, either), but changed in A3. Finally, let h
call g in C3. As here the "inheritance columns" have "priority", one
would expect then g to call f in A3, and not in A2, for example.

So what you need is that every method, even if not originating from
the "real" class, is looked up first in the column above the "real"
class, then in the column left to that, and so on.

Ok. Multiple inheritance can often select priority for conflicting
methods. If you can specify yhat tou want "column priority" for
each class, you're fine.

> or using the bridge design pattern |Z| times, one per each row.

When I read your description above, I also thought immediately
"bridge pattern", but then I tried to write down details,
and got stuck. How would you do it?

> Now the problem is that there are 'holes' in this
> inheritance lattice: Not all versions introduced new variations of
> all types; [...]

> My first thought was to create all the missing classes dynamically,
> but it's somewhat obscure and it may not be that simple. Is there a
> more elegant solution, either a general design pattern or some
> clever python metaprogramming hack ?

In the most general case, you need to access time the whole "upper
left" subgraph at class creation, collect all methods defined in this
subgraph with "column priority", and overwrite or add to that any
methods defined in the newly defined class.

I don't know enough about the intricacies of Python's class creation
to make a concrete suggestion, but I'd think that would be possible
with the help of __metaclass__. You would need some sort of
repository for the complete inheritance. One way to do that would
be to create the chain A ... Z first, letting A inherit from some
special class with __metaclass__ set, and then store an array
of all versions somewhere inside the class namespace.

You'd also need some special syntax to create a new version of a class
(say, again inheriting from some special class with __metaclass__
set).  You could set the version inside the class definition, and then
let the __metaclass__ routine disable the normal inheritance
mechanism, and add missing methods as appropriate.

This could for example look like

class A(Versioning):
  ...

class B(A):
  ...

class C(B):
  def h ...
  ...

class A2(NewVersion,A):
  __version__ = 2
  def f(): ...

class B2(NewVersion,B):
  __version__ = 2

class C2(NewVersion,C): 
  __version__ = 2
  def g(): ... f() ...

class A3(NewVersion,A):
  __version__ = 3
  def f(): ...

class C3(NewVersion,C): 
  __version__ = 3
  def h(): ... g() ...

with a hole at B3, as in the example. C3 will get g from C2 and f from
A3, and hence the call chain will work correctly. Also, C3 will have no
base classes (or maybe only the __metaclass__ ones), the inherited
class A, B, C are just used by the class creation process to find
out where to look for the inheritance matrix.

Others who know more about the class creation mechanism will no doubt
improve this suggestion, point out my errors, and tell you how to
implement it :-) Note that you're basically completely changing the
normal inheritance mechanism, and the class objects will be larger,
because they'll have to copy all the necessary methods.

I cannot think of any pattern that would give similar flexibility.

- Dirk

  
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Puzzling OO design problem

2005-04-09 Thread Dirk Thierbach
Dirk Thierbach <[EMAIL PROTECTED]> wrote:

> Ok. Multiple inheritance can often select priority for conflicting
> methods. If you can specify yhat tou want "column priority" for
> each class, you're fine.

On second thought, I had doubts. Consider the following scenario:

   A2 | f   <-   A3 
   ^ ^
   | |
   B2 | f   <-   B3   
   ^ ^
   | |
   C2   <-   C3 | g

Assume g calls f. Since f is defined in B version 2, it should taken
over unchanged into B version 3, and that should be the version g
calls. However, with multiple inheritance doing depth-first search,
giving the "upper" class priority over the "left", the first path
searched is B3 - A3 - A2, and hence it finds f at A2. A quick test
confirms this for old-style classes, which failed in exactly that
way. But with new-style classes, it works, and finds f at B2 instead.

So I dug through the documentation and found that new-style classes
compute a monotonic linearization of the inheritance graph, observing
local precedence order, using the algorithm also used in Dylan
described here:

http://www.webcom.com/haahr/dylan/linearization-oopsla96.html

And this algorithm will even work correctly if one leaves out B3
in the example above, inheriting directly as in C3(A3,C2), because
the ordering constraint that A2 comes before B2 will still be
valid in the linearization for C3. 

However, in the following case (shortening the notation even more)
it will fail, if a method defined at C3 is looked up at D4:

   A1 - A2 - A3 - A4 - ...
   ||||
   B1 - B2 - +  - B4 - ...
   ||||
   C1 - +  - C3 - +  - ...
   ||||
   D1 - D2 - +  - D4 - ...
   ||||

The solution is simply to include C3 in the list of parents of D4, as
in D4(C3,B4,D2). So for every hole in a column, you have to include
the first class (or classes, if the hole spans multiple rows) to the
left of the hole as parents if the class just below the hole, in order
from bottom to top. This adds the missing constraints, and should
solve the problem without any need to write __metaclass__ stuff.

Interesting question. I learned a lot while thinking about that.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Puzzling OO design problem

2005-04-10 Thread Dirk Thierbach
George Sakkis <[EMAIL PROTECTED]> wrote:
>>A1 - A2 - A3 - A4 - ...
>>||||
>>B1 - B2 - +  - B4 - ...
>>||||
>>C1 - +  - C3 - +  - ...
>>||||
>>D1 - D2 - +  - D4 - ...
>>||||

>> The solution is simply to include C3 in the list of parents of D4,
>> as in D4(C3,B4,D2). So for every hole in a column, you have to
>> include the first class (or classes, if the hole spans multiple
>> rows) to the left of the hole as parents if the class just below
>> the hole, in order from bottom to top. 

> Nice. I had taken for granted that you need to fill in the holes
> (D3,B3,C2), either manually or automa[tg]ically, but if you allow a
> class to inherit from more than two bases, you can pick a set of
> parents that does the job, without any boilerplate code or
> __metaclass__ magic. The downside of this approach is that it's even
> harder to see the big picture, as in the schematic notation above;
> remember that each column is a different version that resides in a
> separate module, so it's not obvious which classes should be the
> parents of each variation. 

It's obvious if you know which versions are available, and which
aren't. If you don't have this information, and you're only looking at
each module locally, than it's probably even safer (i.e., less likely
to confuse a casual reader of the source) to just generate all the
dummy classes manually. But that sort of makes your original problem 
irrelevant :-)

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Termination and type systems

2006-06-26 Thread Dirk Thierbach
David Hopwood <[EMAIL PROTECTED]> wrote:
> Marshall wrote:
>> David Hopwood wrote:

>>> A type system that required an annotation on all subprograms that
>>> do not provably terminate, OTOH, would not impact expressiveness
>>> at all, and would be very useful.

>> Interesting. I have always imagined doing this by allowing an
>> annotation on all subprograms that *do* provably terminate. 

Maybe the paper "Linear types and non-size-increasing polynomial time
computation" by Martin Hofmann is interesting in this respect.
>From the abstract:

  We propose a linear type system with recursion operators for inductive
  datatypes which ensures that all definable functions are polynomial
  time computable. The system improves upon previous such systems in
  that recursive definitions can be arbitrarily nested; in particular,
  no predicativity or modality restrictions are made.

It does not only ensure termination, but termination in polynomial time,
so you can use those types to carry information about this as well.

> If the annotation marks not-provably-terminating subprograms, then it
> calls attention to those subprograms. This is what we want, since it is
> less safe/correct to use a nonterminating subprogram than a terminating
> one (in some contexts).

That would be certainly nice, but it may be almost impossible to do in
practice. It's already hard enough to guarantee termination with the
extra information present in the type annotation. If this information
is not present, then the language has to be probably restricted so
severely to ensure termination that it is more or less useless.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Termination and type systems

2006-06-27 Thread Dirk Thierbach
David Hopwood <[EMAIL PROTECTED]> wrote:
> Dirk Thierbach wrote:

> That's interesting, but linear typing imposes some quite severe
> restrictions on programming style. From the example of 'h' on page 2,
> it's clear that the reason for the linearity restriction is just to
> ensure polynomial-time termination, and is not needed if we only want
> to prove termination.

Yes. It's just an example of what can be actually done with a typesystem.

>> It's already hard enough to guarantee termination with the extra
>> information present in the type annotation. If this information is
>> not present, then the language has to be probably restricted so
>> severely to ensure termination that it is more or less useless.

> I think you're overestimating the difficulty of the problem. The fact
> that *in general* it can be arbitrarily hard to prove termination, can
> obscure the fact that for large parts of a typical program, proving
> termination is usually trivial.

Yes. The problem is the small parts where it's not trivial (and the
size of this part depends on the actual problem the program is trying
to solve). Either you make the language so restrictive that these
parts are hard (see your remark above) or impossible to write, or
you'll be left with a language where the compiler cannot prove
termination of those small parts, so it's not a "type system" in the
usual sense.

Of course one could write an optional tool that automatically proves
termination of the easy parts, and leaves the rest to the programmer,
but again, that's a different thing.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-22 Thread Dirk Thierbach
[Had to drop alt.comp.lang.haskell, otherwise my newsserver doesn't accept it]

Dinko Tenev <[EMAIL PROTECTED]> wrote:
> OK, here's a case that will make your program run in exponential time:
> S = { a, b }, W = { *a*b, *b*a } -- on my machine, it starts getting
> ugly as soon as n is 15 or so.  Note that S^n - W = { a^n, b^n }.

> In general, whenever all the patterns in the set match against the last
> position, your current implementation is guaranteed to have to sift
> through all of S^n.  I'd say the very idea of checking against a
> blacklist is fundamentally flawed, as far as performance is concerned.

If more time during preprocessing is allowed, another idea is to
treat the wildcard expressions as regular expressions, convert
each into a finite state machine, construct the "intersection" of
all these state machines, minimize it and then swap final and non-final
states. Then you can use the resulting automaton to efficiently 
enumerate S^n - W. In the above case, the resulting FSM would have just 
three states.

And it doesn't really matter what language you use to implement this
algorithm, it's the idea that counts. Notation aside, all
implementations will be quite similar.

- Dirk



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-22 Thread Dirk Thierbach
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> What are some good references for finite state machines? Minimization?

A classic is "Introduction to automata theory, languages and computation"
by Hopcroft and Ullman. But any other book about finite state machines
should cover these topics, too. There are good chances that you can just
google for a detailed explanation.

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-23 Thread Dirk Thierbach
Dinko Tenev <[EMAIL PROTECTED]> wrote:
> Dirk Thierbach wrote:
>> If more time during preprocessing is allowed, another idea is to
>> treat the wildcard expressions as regular expressions, convert
>> each into a finite state machine, construct the "intersection" of
>> all these state machines, minimize it and then swap final and non-final
>> states.

> Given the requirements, did you mean taking the *union* and swapping
> states?  Or maybe swapping states first, and then taking the
> intersection?

Whatever the requirements were. Take your pick. :-)

>> Then you can use the resulting automaton to efficiently
>> enumerate S^n - W. In the above case, the resulting FSM would have just
>> three states.

> I don't see immediately how exactly this is going to work.  Unless I'm
> very much mistaken, a FSA in the classical sense will accept or reject
> only after the whole sequence has been consumed, and this spells
> exponential times.  

Exponential in respect to what? You just recursively walk the
automaton for every word of length n, so at most you'll have to check
every word in S^n once. Hence, the complexity is not worse than the
"generate and test" approach (it's in fact better, since testing is
trivial).

However, if the result set is simple (as in the example), then the
result FSM will have states that won't have transitions for every
letters, so I guess the average case will be a lot better.

> For improved asymptotic complexity in this case,
> you need to be able to at least reject in mid-sequence, 

One can do that if there is no valid transition out of some state.

I guess one could even optimize on that: In the minimal automaton,
every state has some transition sequence that will end up in a final
state. Annotate each state with the minimum number of steps for
such a sequence. While walking the automaton, keep the maximum
number of letters to produce in a variable. If this number is small
then the number in the annotation, stop and backtrace.

This should guarantee that only those states are visited that really
produce some output. 

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-23 Thread Dirk Thierbach
Dinko Tenev <[EMAIL PROTECTED]> wrote:
>> > I don't see immediately how exactly this is going to work.  Unless I'm
>> > very much mistaken, a FSA in the classical sense will accept or reject
>> > only after the whole sequence has been consumed, and this spells
>> > exponential times.

>> Exponential in respect to what?

> With respect to sequence length.  

But you cannot get rid of this. Consider S = {a, b}, W = {a}.
Then there are |S|^n result elements for n > 1, and you have to enumerate 
all of them.

The best thing one can hope for is to just actually process those
elements that are really in the result set. With the FSM, you're
getting at least close to that.

>> However, if the result set is simple (as in the example), then the
>> result FSM will have states that won't have transitions for every
>> letters, so I guess the average case will be a lot better.

> I don't believe this case should ever arise out of the current
> definition of the problem: labels are specified explicitly only for the
> excluded subset, and you have to accept everything else by default.

If the result set is {a^n, b^n | n \in N}, then you have an automaton
where exactly this happens. Since minimum automatons are unique
up to renaming of states, that's the result you'll get.

>> > For improved asymptotic complexity in this case,
>> > you need to be able to at least reject in mid-sequence,

>> One can do that if there is no valid transition out of some state.

> One possibly can, but such states should never be encountered (see
> above.)

The automaton is:

S: a -> A, b -> B
A: a -> A
B: b -> B

All the states are final. A and B have just one transition, so
you'll be able to generate either a^n or b^n efficiently.

> Looking at the minimal DFA for the above example, it may be more
> appropriate to detect being in a state from which there's no path to a
> final state:

This cannot happen, because minimizing removes all those states.
(Or, in case you have a different definition of automaton in mind:
There'll be just one "stuck" state, where all transitions that never
go to a final state end up. Remove this state, and you'll end up
with the other definition).

>> I guess one could even optimize on that: In the minimal automaton,
>> every state has some transition sequence that will end up in a final
>> state. Annotate each state with the minimum number of steps for
>> such a sequence. While walking the automaton, keep the maximum
>> number of letters to produce in a variable. If this number is small
>> then the number in the annotation, stop and backtrace.

> I don't think this could cut efficiently for patterns like *a*b.

The point is not to "cut efficiently", the point is to enumerate
only those words that are actually in the result set. If you are
enumerating all words shorter than a given length, the above method
should guarantee this.

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-24 Thread Dirk Thierbach
Dinko Tenev <[EMAIL PROTECTED]> wrote:
> Dirk Thierbach wrote:

[One cannot escape exponential behaviour]

>> But you cannot get rid of this. Consider S = {a, b}, W = {a}.
>> Then there are |S|^n result elements for n > 1, and you have to enumerate
>> all of them.

> Yes, but then, they are in the target set.  

Which is the point. If they are in the target set, you have to enumerate
them. If the target set is of exponential size with respect to n,
then you'll need exponential time to do that.

> The point here is whether
> you can generate S^n - W in Theta( n * |S^n - W| ), which may be
> dramatically different from Theta( n * |S^n| ).

Exactly. Hence, you use a construction that guarantees that the time
needed is proportional to n*|S^n - W|: Every step you do will be
enecessary to produce at least one word in the output set. Now, if 
|S^n - W| is still exponential, then you'll still need exponential
time. But nevertheless, that's the best you can hope for.

>> The automaton is:
>>
>> S: a -> A, b -> B
>> A: a -> A
>> B: b -> B

> The target set is specified as S^n - W, where W is everything matching
> (.*a.*b|.*b.*a).  Following the construction procedure in point, this
> exclusion set is matched exactly by my DFA with S initial and F final.
> Then, swapping final and non-final states makes {S, A, B} final, and F
> non-final.  Your DFA above may be equivalent, but to me it is far from
> clear exactly what algorithm would build it from the given data.

Well, it's just the result from the minimazation algorithm, where my
variant of the algorithm just prunes away the "stuck" state which can
never produce any output.

>> The point is not to "cut efficiently", the point is to enumerate
>> only those words that are actually in the result set.

> No, this is not the point.  Naive filtering already does that. 

No, it doesn't. Naive filtering always will look at the complete
input set, so, no matter what size |S^n - W| actually is, it will
always take time in proportion to |S^n|.

>  By "cutting efficiently" I mean skipping over search sub-trees that
> don't contain any results from the target set.

Yes. Consider a different example: With the wildcard expressions
W = { b*, aa*, ab* }, you'll get S^* - W = { a }. The resulting
minimum FSM will just accept 'a' (start state, one final state, and
the "stuck" state if you insist on it), so you skip over every
other subtree when enumerating results from that automaton.

And for the previous example, you'll need something like 2*n time
to enumerate the output set instead of 2^n, because once you're in
the "a-branch", you're producing only a's, and you're pruning
away all the subtrees that start with a "b". Similarly in the 
"b-branch".

Now clearer?

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-24 Thread Dirk Thierbach
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Call a wc 'free' if it satisfies the propery that every letter 'a' in
> it appears only in the form '*a*', and 'anchored' otherwise. 

Would '*ab*' be free or anchored?

> What if all wc's are free? How does this affect the DFA?

I don't know. The important point here is the interaction of all
the wc's. I don't think properties like this do reduce the complexity
of interaction in an obvious way.

> Does it minimize nontrivially?

I am not sure what you mean by this. Every DFA minimizes to some
other DFA, which is unique up to renaming of states. Now the
question is if that minimization reduces the complexity enough
to be noticeable (maybe that's what you mean by "nontrivially").
In general, I don't think one can say much about that without
looking at concrete examples.

- Dirk
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Programming challenge: wildcard exclusion in cartesian products

2006-03-25 Thread Dirk Thierbach
Dinko Tenev <[EMAIL PROTECTED]> wrote:
> Dirk Thierbach wrote:

> [A lot of stuff]

>> Now clearer?

> Let's leave it there, and take a break.

Maybe it would help to just take a concrete example, and work through
it. Then you'll see exactly what happens.

- Dirk

-- 
http://mail.python.org/mailman/listinfo/python-list