subject:"Re\\\: The Gump3 branch"

Re: The Gump3 branch

2005-01-10 Thread David Crossley

Leo Simons wrote:
 
 Pfew. We really should start writing some unit tests. If I had the time I
 would start from scratch one more time using a test-first approach, but I
 haven't figured out how to comfortably do test-first python development yet.

That paragraph sounds extremely important to this novice.

--David

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: The Gump3 branch

2005-01-09 Thread Leo Simons

On 08-01-2005 15:21, Adam R. B. Jack [EMAIL PROTECTED] wrote:
 Phew, have I been busy :-D.
 
 You certainly have.

Ooh, long e-mail! I'm gonna try and split this up... :-D

Inter-component-communication
-
 I'm sure your IOC/container experiences have required you to answer this
 before, but how do you allow components to communicate/collaborate?

I firmly believe there is very little need for different components to
communicate. If you architect things the IOC way, components will use just
one or two other components, and their parent can just set up the references
between all those components.

What will happen is that a component needs a certain kind of result
available. For example, something that pushes information in the dynagump
database needs that information, which might be put there by an ant builder
or something like that. This kind of stuff is trivial in python; you just
set the property on the relevant part of the model and then retrieve it
later.

Note that such communication is pretty indirect. For example the start of
the CvsUpdater plugin I did just pushes information into the model (the log
of the cvs command, exit status, etc) without worrying who uses that
information (at the moment, it is just ignored).

 There
 were times when building logic wanted to know something historically (had
 this built before, etc.) in order to determine how much effort (or what
 switches) to use. Is inter-component communications like this a real no-no,
 or is this something that might be coincidentally allowed via steps in
 pre-processing, etc.

We don't need steps. Think unix command line utilities. You can make them
communicate:

  find . -type f | xargs -v .svn

Without steps. That | there in gump is achieved by setting a property on a
piece of the model.

Threading
-
 Do you think we have a chance to re-instate threading in this model? [It is
 a minor nit, not a show stopper, but I liked the large run-time reduction of
 concurrent checkouts.]

Yes. We can probably reuse the worker code from gump2. I left it out on
purpose because it was clouding the gump2 code (several of the gump2 bits
all worry about multi-threading) and making it difficult to read.

What you can do for example is multithread each of the three stages, then
join the threads in between. And each plugin might do multithreading on its
own. 

What I want to see first is where we need it. Instrument the different bits
of the build and find out where we need the speedups. Keep most of the code
simple! :-D

CLI
---
 I've gotten the Gump3 branch into a state where
 everything works (for me), as far as stuff is implemented. The main
 core
 thing that is missing is cyclic dependency detection. I've got the right
 algorithm written down on paper, just need to make it happen. The hooks
 for
 it are there already though (the gump.engine.modeller.Verifier class).
 
 Mind pumping a few command lines up to a wiki or somewhere? I'd like to run
 the engine, and unit tests, and such. Gump2 was a pain to run (we never
 cured it's confusion) and I'd like to start comfortably with Gump3 fro mthe
 get go.

Uhm, yeah, I do :-D. The interface should be so easy to use you don't need
the docs. Try ./gump help for starters. There's work TODO here, but I
really prefer to update the code rather than the wiki!

 On thought in that regard is partial runs. I think Gump2 was beleived
 (although not actually true) to be less incremental build friendly since
 it wouldn't allow one to do build X, update X. [It was there in Gump3,
 just the command line was so crude folks never got to use it.]. I feel we
 need Gump3 to be easy to run in pieces, and in parts.

I disagree, actually! The reason we needed to do stuff like that was because
gump is so complex and difficult to use that one resorts to a model of
let's try this and see if works. We need to fix gump so that you don't
need to do that. IE, make it easy to write correct metadata.

I would like to make the hacky bits like this not part of the core. If you
need an adjusted profile with just a few projects, then change the profile!

 Easily asking for
 things that include/exclude components on the fly. Nicola's (and Sam's)
 wxPython GUI was a nice user this way. Any thoughts on re-instating that?

I'm not against GUIs, but I feel CLI is way more important to get right
first.

Plugins
--- 
  I think that generating plug-ins (perhaps even for loading, and such) is
 key. I'm not sure (yet) if the new model is any better than the old in
 allowing the core steps (loading, modelling) to be pluged-in, but I think
 it need to be investigated.

Yes, its easy. Change the get_verifier() in config.py to provide a different
implementation, and that's it!

 I see you have a Maven parser, but could/should
 that be a plug-in?

I doubt we should be talking about this kind of stuff as a plugin. There's
very specific bits of functionality that *need* to be performed (right
contracts) for gump to

Re: The Gump3 branch

2005-01-09 Thread Leo Simons

On 08-01-2005 20:58, Stefano Mazzocchi [EMAIL PROTECTED] wrote:
big snip of lots of stuff/
 I see you have a Maven parser, but could/should
 that be a plug-in?
 
 This is *EXACTLY* the kind of question we should *NOT* be answering. It
 does *NOT* matter if it's a plugin or not, as long as it does the job.
 
 Early refactoring is the root of all evil, even worse than early
 optimization.

Well, I think I disagree that's the case here. What I did was a pretty
late refactoring of gump2. What Adam is basically asking is that
refactoring now done? and the answer is probably not completely. It makes
sense to figure out at this point if there's some big architectural flaws to
catch now and change.

Please do be critical and ask those questions! The answer could just be
no, but that just makes us all more confident that we're on the right
path...I really don't mind takin a week or two to be sure of that.

Thinking more about that Q re the to-be-built maven parser...basically you
could have 0..n Normalizers that chain up to all change small bits of the
xml model. Sounds like a pipeline of cocoon transformers :-D. Fortunately
that change is kind-of isolated since you could just refactor the current
Normalizer internally to consist of multiple smaller components.

An alternative might be replacing the normalizer with a wrapper around an
XSLT script to handle the transformation. Or ... Or ... :-D

The same could probably be said of all the other xml handling bits. For
example the Objectifier could probably be split into one small class for
each different kind of tag that needs to be turned into a python object.

I suggest we don't worry about that for now (no need to build another cocoon
in python! :-D) but keep in mind that its possible.

 A lot of what Leo has done is to reduce the (bloated, percieved or real
 I don't know) complexity of Gump2, if you start moving stuff over from
 Gump2 to Gump3 before *others* had a look at the new (and much simpler)
 code, we go back to a one man show and the entire effort is useless.

If that would really be the case then the refactoring effort would have
failed. I would hope that adding an RDF generator plugin would be adding a
single sourcefile somewhere where it is easily ignored by someone just
learning the system internals.

Nevertheless, I do agree with:
 so, please, let's work as a team in identifying what needs to be done,
 outline the priorities and then allowing code to get in.
 
 priority #1: avoid one man shows.
 priority #2: keep it simple, stupid (to help people understanding the
 code, then helping on #1)
 priority #3: achieve separation of build from presentation
 priority #4: implement a very usable command line interface
 
 everything else (including sending email!) will come later.

Concretely, doing a grep for TODO should show lots of places where I think
the existing code needs work :-D
 
 Avalon had the notion of a component lifecycle and this is what Leo is
 doing here.

Ssssh! :-D

 no, and we don't need one: you need mysql and if you don't have it Gump3
 won't run. That's as simple as that.

+1 to that. Some decisions about that which are implicit in the new CLI as
I've been building it:

* python 2.3 with all its libs
* unix environment (cygwin will do, probably)
* mysql
* bunch of python libraries we choose to use (like MySQLdb)
* bash, java, ...

The bash script simply checks for all this and complains if something is not
there. Grepping that script for the word check hopefully results in a full
list of required software :-D

Next week is scheduled to be rather busy so it might take a while before I
have time to reply!

Cheers,

- Leo



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DynaGump (was Re: The Gump3 branch)

2005-01-09 Thread Adam R. B. Jack

 The goal now is to allow Gump3 to perform builds and put its data into
 the database so that dynagump can start publishing it.

 Everything else is secondary.

I agree, but I think Gump3 is a good idea and I'd like to see it for the
long run. The *right*/focused plan for now is to accept that Gump3 is months
(and a lot of work) off (I know from experience) and that the shortest path
to DynaGump is not Gump3. Work with me to finish the DynaGump actor for
Gump2 that I wrote for you, and let's get it up and running. Let's start
exercising/integrating DynaGump now, not wait for a core re-write.

The best thing that happened to Gump2 is that folks were running Gump1 in
parrelel. Countless bugs were detected/resolved by being able to run side by
side and compare. The best things we can do for Gump3 is allow Gump2 to talk
to DynaGump in parrellel.

If we create a workspace on Brutus called DynaGump and configure it to a DB
with both old and new DB schemas in it, we can have DynaGump up a running in
no time. Nothing (IMHO) better than running DynaGump against DBs formed by
old and new Gump (2  3) and also comparing it to the HTML results generated
by Gump2.

Let's allow Gump3 to be team formed by giving it time, whilst we make one
incremental improvement and allow DynaGump to be born. Can we agree on this
as a step in the plan?


 But I also hope that we'll work as a team this time.

Stefano, you make me smile. :-) You are so strong in your opinions (at least
how they read to others) that you come perilously close to stymieing the
community you love. I gave up on Depot, leaving behind parts I love/long to
see, mainly 'cos it was becoming a one man band. Gump, however, is thriving
community, and even when I was the only Python coder we had vast community
efforts in metadata/management/communications
(Wikis/Documentation/Blogs)/problem resolution/and so on. Gump's code is
only a small component of it's whole.

I welcome more coders into Gump code, in fact I've longed for it  tried to
encourage entry many times.Gump2 was a one man band 'cos nobody else wished
to invest time and effort in a possibly dying venture, and yet out of it (in
part by you helping it becoming a TLP) Gump was re-born and is once again
thriving. Gump thrives based of it's contributions to the community, and
hence their contributions to it (via metadata/effort) not due to the code. I
welcome Gump3 as great opportunity for discussion and solving some mistakes
of Gump2. Leo has address some, but not all (as I'll write) that need
solving. I see no point in doing a re-write if after months and much effort
we are no better off, and we've just shifted the one man team to a new man
who we'll near burn out w/ all the 'implementation nits' that pure theory
doesn't prepare you for. I'm no Leo, but I know this, I've been there.

Stefano, we are a team, and as a team we will have different world
views/skill sets/insights -- and yes, have different weakness/make different
mistakes. I'll keep raising concerns/issues based off my one-man-band wealth
of experience, and hope we'll all keep an open mind to what is re-instating
a past mistake, and what is a practical insight. Part of being a team is,
perhaps, you educating me into your views/insights and me pressure testing
them on me.

Let's not let our desire for progress to weaken our team.

regards,

Adam


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: The Gump3 branch

2005-01-09 Thread Adam R. B. Jack

 Ooh, long e-mail! I'm gonna try and split this up... :-D

Sorry Dude, I got excited. :-) I'll try to keep them shorter or split them.
[I'll reply a few times to this one.]

Having slept on what I saw, I do have some serious questions, and (to keep
it short, I'll come right to the point, knowing you know I mean it
respectfully) I wonder if there is as much significant difference between
Gump2 and Gump3 as I first thought.  They are much the same.

I have a slight deja-vu feeling here. You've built a nice (clean) start,
like Sam did, but to get from this to a live running system will take much
the same work that I added last time, and I'm not sure the key problems of
Gump2 have been understood/corrected. I'm going to try (over time) to list
every place in Gump2 that I feel would be as bad in Gump3 so we can address
them. This isn't me being petty, but me trying to pressure test this new
approach against my understanding of reality (for all it's/my warts).

 I firmly believe there is very little need for different components to
 communicate. If you architect things the IOC way, components will use just
 one or two other components, and their parent can just set up the
references
 between all those components.

[ BTW: I still could use help with IOC. I have a crude understanding of it,
but please don't forget to enlighten me if you see I'm missing a point.]

Sure, I see that components ought not need to communicate directly. In Gump2
we have a model tree (workspace/modules/projects) and a (theoretically
separate, but not) tree of results. That tree is for a few projects, or all,
based off the filter of work to do. As components do work on that tree they
store data at the right level (run/workspace/module/project), perhaps even
setting state (failed, etc.). This is Gump2, and (as I hear it) Gump3, no
differences.

I feel it is that tree that is the weakness people consider bloat. Not
it's memory size, but it's complexity, all the data stored in there -- and
the fact it is a batch. That is a key similarity between Gump2/Gump3 and
(IMHO) a key issue to address. The closer I look the more I realize the
similarities between Gump2 and Gump3.

 What will happen is that a component needs a certain kind of result
 available. For example, something that pushes information in the dynagump
 database needs that information, which might be put there by an ant
builder
 or something like that. This kind of stuff is trivial in python; you just
 set the property on the relevant part of the model and then retrieve it
 later.
 [...]
 Note that such communication is pretty indirect. For example the start of
 the CvsUpdater plugin I did just pushes information into the model (the
log
 of the cvs command, exit status, etc) without worrying who uses that
 information (at the moment, it is just ignored).

Part of the problem is ordering/sequencing. The CVS updating would not  halt
all efforts on a module (builds would occur) 'cos the CVS failed if it had a
semi-fresh copy. (This was due to SF.net CVS being so flakey for so long
even for Gump-wise stable things like JUnit.) As such, prior to CVS updating
we needed to bring some stats/history information into memory, so enforces
an implicit dependency. [Note: Stats Actor today stores Stats on the Tree,
so users (CVS Actor) just ask for it from there, they don't talk directly.]

I know you can do inter component communications w/ Python properties,
Gump2 does, but it has no contract (as Stefano would say) it is not clean,
it is intricate internals knowledge from one component to annother. It is
stuff like this (and order dependencies like this) that ties components
together, and keeps things fat. [Gump2 at least used typed member
data/methods on the tree in order to allow some contracts.]

What you are suggesting in almost exactly how Gump2 works, and is (I fear)
where the thoughts to bloat come from.

  There
  were times when building logic wanted to know something historically
(had
  this built before, etc.) in order to determine how much effort (or what
  switches) to use. Is inter-component communications like this a real
no-no,
  or is this something that might be coincidentally allowed via steps in
  pre-processing, etc.

 We don't need steps. Think unix command line utilities. You can make
them
 communicate:

   find . -type f | xargs -v .svn

I'm a PIPE lover the much as the next guy, but simple flat stream pipes are
not what we are building. Our components use complex results. Do we need
contracts for those, or things (like DOM tree/XML structures) that we can
persist/stream/validate. [How does Cocoon address this?]

 Without steps. That | there in gump is achieved by setting a property on
a
 piece of the model.

As with Gump2, but the properties grow and need management. They (and
implicit dependencies) are the bloat.

 Plugins
 --- 
   I think that generating plug-ins (perhaps even for loading, and such)
is
  key. I'm not sure (yet) if the new model is any better than

Re: DynaGump (was Re: The Gump3 branch)

2005-01-09 Thread Stefano Mazzocchi

Boy, this really came across wrong.
First of all (and not for the first time, but probably not for the last 
either.. unfortunately) allow me to apologize: I *really* would love to 
just have time to spend on this, showing how gump could potentially be 
the killer app of the semantic web... but no, I'm supposed to deliver 
other things... :-(

Anyway, this is not an excuse to be rude and disrespectful and I'm sorry 
for that.

Adam R. B. Jack wrote:
The goal now is to allow Gump3 to perform builds and put its data into
the database so that dynagump can start publishing it.
Everything else is secondary.
I agree, but I think Gump3 is a good idea and I'd like to see it for the
long run. The *right*/focused plan for now is to accept that Gump3 is months
(and a lot of work) off (I know from experience) and that the shortest path
to DynaGump is not Gump3. Work with me to finish the DynaGump actor for
Gump2 that I wrote for you, and let's get it up and running. Let's start
exercising/integrating DynaGump now, not wait for a core re-write.\
Good point: SoC also enforces polymorphism.
The best thing that happened to Gump2 is that folks were running Gump1 in
parrelel. Countless bugs were detected/resolved by being able to run side by
side and compare. The best things we can do for Gump3 is allow Gump2 to talk
to DynaGump in parrellel.
Very good point as well.
If we create a workspace on Brutus called DynaGump and configure it to a DB
with both old and new DB schemas in it, we can have DynaGump up a running in
no time. Nothing (IMHO) better than running DynaGump against DBs formed by
old and new Gump (2  3) and also comparing it to the HTML results generated
by Gump2.
Let's allow Gump3 to be team formed by giving it time, whilst we make one
incremental improvement and allow DynaGump to be born. Can we agree on this
as a step in the plan?
+1
But I also hope that we'll work as a team this time.
Stefano, you make me smile. :-) You are so strong in your opinions (at least
how they read to others) that you come perilously close to stymieing the
community you love. 
Yeah, well, (looking down) I know.
I gave up on Depot, leaving behind parts I love/long to
see, mainly 'cos it was becoming a one man band. Gump, however, is thriving
community, and even when I was the only Python coder we had vast community
efforts in metadata/management/communications
(Wikis/Documentation/Blogs)/problem resolution/and so on. Gump's code is
only a small component of it's whole.
Agreed. Yet, we *must* have more people touching the code and, IMO, we 
should do so by thinking that every line of python code puts us a little 
bit farther away from that goal.

I welcome more coders into Gump code, in fact I've longed for it  tried to
encourage entry many times.
Yes, I know. I'm *not* blaming it on you. I'm blaming it more on me.
Gump2 was a one man band 'cos nobody else wished
to invest time and effort in a possibly dying venture, and yet out of it (in
part by you helping it becoming a TLP) Gump was re-born and is once again
thriving. 
Oh, here I really came across wrong: if it wasn't for your effort, I 
wouldn't have been involved in the first place since I thought that 
Sam's try just had failed to attract attention and momentum. Your energy 
and vitality gave me new hope and I think that's why we have a lot more 
gumpers today (even if they still don't touch the code!).

Hopefully, the next wave will be the final one: when the community 
behaves just like any other and it's diversified enough to sustain any 
single individual leaving.

Gump thrives based of it's contributions to the community, and
hence their contributions to it (via metadata/effort) not due to the code. I
welcome Gump3 as great opportunity for discussion and solving some mistakes
of Gump2. Leo has address some, but not all (as I'll write) that need
solving. I see no point in doing a re-write if after months and much effort
we are no better off, and we've just shifted the one man team to a new man
who we'll near burn out w/ all the 'implementation nits' that pure theory
doesn't prepare you for. I'm no Leo, but I know this, I've been there.
Completely agree.
Stefano, we are a team, and as a team we will have different world
views/skill sets/insights -- and yes, have different weakness/make different
mistakes. I'll keep raising concerns/issues based off my one-man-band wealth
of experience, and hope we'll all keep an open mind to what is re-instating
a past mistake, and what is a practical insight. Part of being a team is,
perhaps, you educating me into your views/insights and me pressure testing
them on me.
Let's not let our desire for progress to weaken our team.
Very wise.
Again, I apologize. I came across wrong, rude and disrespectful. It was 
not my intention.

I very much welcome the idea of using gump2 and gump3 *both* to drive 
dynagump.

I say let's do it :-)
--
Stefano.
-
To unsubscribe, e-mail: [EMAIL

Re: DynaGump (was Re: The Gump3 branch)

2005-01-09 Thread Leo Simons

On 09-01-2005 17:40, Adam R. B. Jack [EMAIL PROTECTED] wrote:
 The goal now is to allow Gump3 to perform builds and put its data into
 the database so that dynagump can start publishing it.
 
 Everything else is secondary.
 
 I agree, but I think Gump3 is a good idea and I'd like to see it for the
 long run.

I think we're all in violent agreement :-D

 The *right*/focused plan for now is to accept that Gump3 is months
 (and a lot of work) off

Yep! Oh man, so much work...

 (I know from experience) and that the shortest path
 to DynaGump is not Gump3. Work with me to finish the DynaGump actor for
 Gump2 that I wrote for you, and let's get it up and running. Let's start
 exercising/integrating DynaGump now, not wait for a core re-write.

The power I'm hoping to maximize on (and I think that was something Stefano
was getting at) is the power of the clean sheet. Sam did this as well,
only the clean sheet he provided was so amazingly smart that we all had the
hardest time understanding it!

I'm guessing one of the key bits that makes a gump2/dynagump integration
difficult is just how much cool ideas Stefano has in his head wrt dynagump
and how immensely difficult it is to figure out how to weld those ideas into
gump2. Since gump3 has no outputs, no builders, no generators, we can start
with the kind of output we need and go from there. A reversed approach if
you will.

By starting with an empty shell like gump3, it becomes easier to visualize
how to do those things (its a hard hard problem to get right). From there,
we should be able to figure out together how to make gump2 deliver the
outputs that dynagump needs to fly.

It's not an or/or decision, its and/and. Let's just go with the flow. That's
always worked for the gump community so far. We're that good ;)

 The best thing that happened to Gump2 is that folks were running Gump1 in
 parrelel. Countless bugs were detected/resolved by being able to run side by
 side and compare. The best things we can do for Gump3 is allow Gump2 to talk
 to DynaGump in parrellel.

That makes a lot of sense.

 If we create a workspace on Brutus called DynaGump and configure it to a DB
 with both old and new DB schemas in it, we can have DynaGump up a running in
 no time.

Hehehe. Don't be too optimistic. The dynagump database schema as stefano
built seems to be completely different in some ways from how gump2 is set
up. Thinking about it hurts :-D

 Nothing (IMHO) better than running DynaGump against DBs formed by
 old and new Gump (2  3) and also comparing it to the HTML results generated
 by Gump2.
 
 Let's allow Gump3 to be team formed by giving it time, whilst we make one
 incremental improvement and allow DynaGump to be born. Can we agree on this
 as a step in the plan?

Like I said, it makes sense. What I'd really love to see would be for you to
fully digest all the fledgling concepts in gump3 (after we figure out what
they are :-D) so you can figure out what kind of migration/integration/
reorganisation strategy makes sense.

And also, as I mentioned in my previous e-mail, I think we really don't need
a grand plan to all agree on. Baby steps. Python makes it so easy to glue
things up, there's a miriad of possibilities to make different versions of
gump all interoperate. We can figure all that out!

 But I also hope that we'll work as a team this time.
 
 Stefano, you make me smile. :-) You are so strong in your opinions (at least
 how they read to others) that you come perilously close to stymieing the
 community you love.

Hehehe. Take that, you passionate Italian! :-D

 Gump, however, is thriving
 community, and even when I was the only Python coder we had vast community
 efforts in metadata/management/communications
 (Wikis/Documentation/Blogs)/problem resolution/and so on. Gump's code is
 only a small component of it's whole.
 
 I welcome more coders into Gump code, in fact I've longed for it  tried to
 encourage entry many times.Gump2 was a one man band 'cos nobody else wished
 to invest time and effort in a possibly dying venture, and yet out of it (in
 part by you helping it becoming a TLP) Gump was re-born and is once again
 thriving. Gump thrives based of it's contributions to the community, and
 hence their contributions to it (via metadata/effort) not due to the code. I
 welcome Gump3 as great opportunity for discussion and solving some mistakes
 of Gump2. Leo has address some, but not all (as I'll write) that need
 solving. I see no point in doing a re-write if after months and much effort
 we are no better off, and we've just shifted the one man team to a new man
 who we'll near burn out w/ all the 'implementation nits' that pure theory
 doesn't prepare you for. I'm no Leo, but I know this, I've been there.
 
 Stefano, we are a team, and as a team we will have different world
 views/skill sets/insights -- and yes, have different weakness/make different
 mistakes. I'll keep raising concerns/issues based off my one-man-band wealth
 of experience, and hope

Re: The Gump3 branch

2005-01-09 Thread Stefano Mazzocchi

Adam R. B. Jack wrote:
I'm a PIPE lover the much as the next guy, but simple flat stream pipes are
not what we are building. Our components use complex results. Do we need
contracts for those, or things (like DOM tree/XML structures) that we can
persist/stream/validate. [How does Cocoon address this?]
Cocoon pipelines are not streams of characters but streams of structured 
events (using the SAX API). So, for example, if you have ab//a 
to pass along, the events are:

 - startElement(a)
 - startElement(b)
 - endElement(b)
 - endElement(a)
--
Stefano.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: The Gump3 branch

2005-01-09 Thread Leo Simons

On 09-01-2005 18:28, Adam R. B. Jack [EMAIL PROTECTED] wrote:
 Ooh, long e-mail! I'm gonna try and split this up... :-D
 
 Sorry Dude, I got excited. :-)

Excitement is good!

 I wonder if there is as much significant difference between
 Gump2 and Gump3 as I first thought.

Probably not. Then again, I don't know what you thought...

 I have a slight deja-vu feeling here. You've built a nice (clean) start,
 like Sam did, but to get from this to a live running system will take much
 the same work that I added last time, and I'm not sure the key problems of
 Gump2 have been understood/corrected. I'm going to try (over time) to list
 every place in Gump2 that I feel would be as bad in Gump3 so we can address
 them. This isn't me being petty, but me trying to pressure test this new
 approach against my understanding of reality (for all it's/my warts).

It's a good idea. By all means. Software architecture is *hard*.

 [ BTW: I still could use help with IOC. I have a crude understanding of it,
 but please don't forget to enlighten me if you see I'm missing a point.]

That takes years! :-D. Think military and about who is in command. Maybe
you're familiar with patterns like chain of responsibility. IOC is a
pattern in the same sense. Its a tree of commands. General at the top.

 Sure, I see that components ought not need to communicate directly. In Gump2
 we have a model tree (workspace/modules/projects) and a (theoretically
 separate, but not) tree of results. That tree is for a few projects, or all,
 based off the filter of work to do. As components do work on that tree they
 store data at the right level (run/workspace/module/project), perhaps even
 setting state (failed, etc.). This is Gump2, and (as I hear it) Gump3, no
 differences.

There's a few I think. I had the hardest time fully reading through the
gump2 model code. I decided I needed to start with the XML, retrieve the
fundamental abstractions, and rewrite the tree. It was so much fun I just
kept going.

The gump3 tree is totally passive, and much closer to the way a
mathematician would build a tree. You can let loose algorithms on it that
were figured out in the 30s (ie the topological sort in the walker code is
one of those).

The gump3 tree does not do any kind of validation. It does only the most
minimal of defaults.

The gump3 tree is more fully normalized. All references are fully two-way,
like with DOM. The difference between depend/ and option/ sucks
conceptually, so now the option-ness is just a property of the edge that
connects two vertices.

I think its much simpler.

 I feel it is that tree that is the weakness people consider bloat. Not
 it's memory size, but it's complexity, all the data stored in there -- and
 the fact it is a batch. That is a key similarity between Gump2/Gump3 and
 (IMHO) a key issue to address.

Right.

 Part of the problem is ordering/sequencing. The CVS updating would not  halt
 all efforts on a module (builds would occur) 'cos the CVS failed if it had a
 semi-fresh copy. (This was due to SF.net CVS being so flakey for so long
 even for Gump-wise stable things like JUnit.) As such, prior to CVS updating
 we needed to bring some stats/history information into memory, so enforces
 an implicit dependency. [Note: Stats Actor today stores Stats on the Tree,
 so users (CVS Actor) just ask for it from there, they don't talk directly.]

That's a big part of the problem. The solution is in the back of my head,
nearly constantly as I look at gump. Basically these kinds of decisions are
all encapsulated into the graphical algebra formulae Stefano and me found in
September. It would be real nice to meet face-to-face so we could talk about
that one!

 I know you can do inter component communications w/ Python properties,
 Gump2 does, but it has no contract (as Stefano would say) it is not clean,
 it is intricate internals knowledge from one component to annother. It is
 stuff like this (and order dependencies like this) that ties components
 together, and keeps things fat. [Gump2 at least used typed member
 data/methods on the tree in order to allow some contracts.]

That's a fundamental difference right there! Strong typing is the way we
write contracts in java, but that really doesn't work as well in python. We
miss the interface keyword. Python OO needs to be built for dynamism. Take a
look at how hard the Zope people tried and failed to add that in and how
immensely hard that has hit them in the face and how bloated their design is
now!

The way to specify contracts in python is to document them.

The CvsUpdater plugin will set a string property cvs_update_log on each
module that is of type 'cvs'. The property contains the log output from the
cvs update command of course.

That's a contract right there. Solidify the contract in a unit test for the
updater. Model stays clean, and blissfully unaware.

 What you are suggesting in almost exactly how Gump2 works, and is (I fear)
 where the thoughts to bloat come from.

The

Re: The Gump3 branch

2005-01-08 Thread Adam R. B. Jack

 Phew, have I been busy :-D.

You certainly have.

I got up real early (before I go cut up cars w/ the jaws of life) so I could
take a read of this. I'm impress, inspired and (frankly) a little awed. I
love how you've been far bolder than I ever was with putting your stamp on
this thing, and enforcing clean practices. I was trying to replicate what
existed, and make incremental deviations, but you've stood your ground from
the start, enforcing your will/beliefs on this thing. I'm sure your previous
container/component works have given you a lot of experience to inject here,
and I think Gump lucked out that you gave it the time/framework.

Ok, so love fest over  a few questions (and there will be more, 'cos I
don't have enough time now.) I guess my questions are concerns about where
the pure theory meets the many practicalities that Gump bumps into.

I'm sure your IOC/container experiences have required you to answer this
before, but how do you allow components to communicate/collaborate? There
were times when building logic wanted to know something historically (had
this built before, etc.) in order to determine how much effort (or what
switches) to use. Is inter-component communications like this a real no-no,
or is this something that might be coincidentally allowed via steps in
pre-processing, etc.

Do you think we have a chance to re-instate threading in this model? [It is
a minor nit, not a show stopper, but I liked the large run-time reduction of
concurrent checkouts.]

 I've gotten the Gump3 branch into a state where
 everything works (for me), as far as stuff is implemented. The main
core
 thing that is missing is cyclic dependency detection. I've got the right
 algorithm written down on paper, just need to make it happen. The hooks
for
 it are there already though (the gump.engine.modeller.Verifier class).

Mind pumping a few command lines up to a wiki or somewhere? I'd like to run
the engine, and unit tests, and such. Gump2 was a pain to run (we never
cured it's confusion) and I'd like to start comfortably with Gump3 fro mthe
get go.

On thought in that regard is partial runs. I think Gump2 was beleived
(although not actually true) to be less incremental build friendly since
it wouldn't allow one to do build X, update X. [It was there in Gump3,
just the command line was so crude folks never got to use it.]. I feel we
need Gump3 to be easy to run in pieces, and in parts. Easily asking for
things that include/exclude components on the fly. Nicola's (and Sam's)
wxPython GUI was a nice user this way. Any thoughts on re-instating that?

 I think that generating plug-ins (perhaps even for loading, and such) is
key. I'm not sure (yet) if the new model is any better than the old in
allowing the core steps (loading, modelling) to be pluged-in, but I think
it need to be investigated. I see you have a Maven parser, but could/should
that be a plug-in? If you can leverage the framework here, perhaps in
multi-stage runs (e.g. pre/run/post for loading metadata, pre/run/post for
building, etc.) that might be nice. [Not sure if it is overkill, but I think
it was a big weakness in Gump2 that needs to be addresses.]

Another weakness of Gump2 was the (eventually huge) in-memory trees
combining model and results. Hmm, I'm not sure if this goes away here (or
not), and I fear not. How are we going to allow (say) a results plug-in to
inject the build log (and/or commandline or whatever) into the results DB? I
suspect it needs to reach out and touch the memory structures. Maybe little
has changed here. [I half wondered about using XML file between components
so we could completed run build and later run results generate. I never got
to it 'cos I felt it was a lot of work and maybe overkill. Thoughts?] [Hmm,
do we need a Wiki page w/ re-design goals/objectives to measure this
framework against?]

I think we need to treat internal plug-ins the same as community added, i.e.
east our own dog food. Do you know Python patterns for discovering and
loading such plug-ins? I'd like to start by writting plug-ins that this
framework can run. Is (say) an RDF generating plug-in missing the point of
DynaGump, or something allowable? I'm game to start work on the DB interface
for generating history, or others.

 The other stuff that's missing is a lot of plugins. The new architecture
as
 I set things up identifies three stages:

 - preprocessing
 - build/run
 - postprocessing

This a tried and tested model you've used a lot in containers? Just curious
of it's origins. I wonder if (eventually) we'd like to be able to break
Gump3 completely from the sequential run, perhaps into an event-based
engine.

 And each of those can have plugins (basically what are now called actors).
 Preprocessing plugins that need to be built include source repository
 updaters. Build tools that need to be built include all the handlers for
the
 different Commands. Postprocessing that needs to be built include the
 dynagump adapter. Basically everything

Re: The Gump3 branch

2005-01-08 Thread Stefano Mazzocchi

Adam R. B. Jack wrote:
Phew, have I been busy :-D.
You certainly have.
I got up real early (before I go cut up cars w/ the jaws of life) so I could
take a read of this. I'm impress, inspired and (frankly) a little awed. I
love how you've been far bolder than I ever was with putting your stamp on
this thing, and enforcing clean practices. I was trying to replicate what
existed, and make incremental deviations, but you've stood your ground from
the start, enforcing your will/beliefs on this thing. I'm sure your previous
container/component works have given you a lot of experience to inject here,
and I think Gump lucked out that you gave it the time/framework.
yes, I agree with Adam (and expressed my sentiments to Leo privately so 
far), Gump3 is very avalonish, in the pure IoC way, which is *very* 
refreshing for me :-)

[and also refreshing to see that all the years spent in trying to get 
avalon working were not that useless]

Ok, so love fest over  a few questions (and there will be more, 'cos I
don't have enough time now.) I guess my questions are concerns about where
the pure theory meets the many practicalities that Gump bumps into.
I'm sure your IOC/container experiences have required you to answer this
before, but how do you allow components to communicate/collaborate?
well, you don't :-)
No, seriously, IoC drives your ability to interact. Leo is not using a 
proper component manager (and I agree that would be overkill), but it's 
using the idea that if a component needs something, it calls a factory 
(or a method factory) that will return a component, or a proxy/facade 
for the component to talk to.

This allows isolation and polymorphism.
There
were times when building logic wanted to know something historically (had
this built before, etc.) in order to determine how much effort (or what
switches) to use. Is inter-component communications like this a real no-no,
or is this something that might be coincidentally allowed via steps in
pre-processing, etc.
if the building component needs the historical component, it will ask 
its parent for it and the parent will either know how to obtain one or 
will delegate to its parent until somebody knows how.

Do you think we have a chance to re-instate threading in this model? [It is
a minor nit, not a show stopper, but I liked the large run-time reduction of
concurrent checkouts.]
I don't see any architectural impediment for this.
I've gotten the Gump3 branch into a state where
everything works (for me), as far as stuff is implemented. The main
core
thing that is missing is cyclic dependency detection. I've got the right
algorithm written down on paper, just need to make it happen. The hooks
for
it are there already though (the gump.engine.modeller.Verifier class).

Mind pumping a few command lines up to a wiki or somewhere? I'd like to run
the engine, and unit tests, and such. Gump2 was a pain to run (we never
cured it's confusion) and I'd like to start comfortably with Gump3 fro mthe
get go.
No, I think Wikis and documentations here are more harmful than useful 
at this stage, because they get out of synch.

I would much rather spend time in making the code self-documented and 
the error messages, CLI-interface very very very explicit.

Leo already did a great start on this.
On thought in that regard is partial runs. I think Gump2 was beleived
(although not actually true) to be less incremental build friendly since
it wouldn't allow one to do build X, update X. [It was there in Gump3,
just the command line was so crude folks never got to use it.].
I think you mean Gump2 and yes, the command line was aweful.
I feel we
need Gump3 to be easy to run in pieces, and in parts. 
Absolutely.
Easily asking for
things that include/exclude components on the fly. Nicola's (and Sam's)
wxPython GUI was a nice user this way. Any thoughts on re-instating that?
-1 as well. it's pretty much useless to have a GUI when you are running 
gump over a server and accessing it over a SSH text shell. We don't need 
anything that allows to include/exclude components on the fly (if not a 
configuration file) and we don't need a gui to edit the metadata.

what we need is:
 1) the ability to run gump on a single project or on a group of them
 2) the ability to validate metadata on a single project on on a group 
of them
 3) keep it simple stupid
 4) reduce the complexity to a minimum
 5) prevent YAGNI

 I think that generating plug-ins (perhaps even for loading, and such) is
key. I'm not sure (yet) if the new model is any better than the old in
allowing the core steps (loading, modelling) to be pluged-in, but I think
it need to be investigated.
Adam, please, let's not commit the same mistakes again: we should fix 
things only if they are broken.

The goal now is to allow Gump3 to perform builds and put its data into 
the database so that dynagump can start publishing it.

Everything else is secondary.
I see you have a Maven parser, but could/should
that be a plug-in? 
This is

Re: The Gump3 branch

2005-01-07 Thread Adam R. B. Jack

 What I would like now is a beer and some feedback :-D

First feedback ... my ailing 802.11b WISP network practically puked on all
those Cocoon JARS. Yikes! I had to give up on Eclipse SVN and use command
line SVN so as not to time out. 4+ hours (on third try) and counting. And
no, I won't move to the flatlands, the mountains are worth this pain.
Perhaps I ought re-install the modem...

More (and hopefully useful) feedback once I've been able to play with this.
Thanks for all your efforts on this. I'm excited to see what you've
injected...

regards

Adam


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: The Gump3 branch

Re: The Gump3 branch

Re: The Gump3 branch

DynaGump (was Re: The Gump3 branch)

Re: The Gump3 branch

Re: DynaGump (was Re: The Gump3 branch)

Re: DynaGump (was Re: The Gump3 branch)

Re: The Gump3 branch

Re: The Gump3 branch

Re: The Gump3 branch

Re: The Gump3 branch

Re: The Gump3 branch

12 matches

Site Navigation

Mail list logo

Footer information