Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-07-01 Thread Upayavira
On 1 Jul 2003 at 14:47, Vadim Gritsenko wrote:

> Jeff Turner wrote:
> 
> >I'm not very familiar with the code; is there some cost in keeping
> >the two-pass CLI alive, in the faint hope that caching comes to its
> >rescue one day?
> >
> 
> Guys,
> 
> Before you implement some approach here... Let me suggest something.
> 
> Right now sitemap implementation automatically adds link gatherer to
> the pipeline when it is invoked by CLI. This link gatherer is in fact
> is "hard-coded links view". I suggest to replace this "hard-coded
> links view" a.k.a link gatherer with the "real" links view, BUT attach
> it as a tee to a main pipeline instead of running it as a pipeline by
> itself. As a result, links view "baby" will be used, two-pass "water"
> will be drained, and sitemap syntax will stay the same. Moreover, the
> links view will be still accessible from the outside, meaning that you
> can spider the site using out-of-the-process spiders.
> 
> Example:
> Given the pipeline:
>   G --> T1 (label="content") --> T2 --> S,
> 
> And the links view:
>   from-label="content" --> T3 --> LinkSerializer,
> 
> The pipeline built for the CLI request should be:
>   G --> T1 --> Tee --> T2 --> S --> OutputStream
>  \
>--> LinkSerializer --> NullOutputStream
>\
>  --> List of links in environment
> 
> In one request, you will get:
>  * Regular output of the pipeline which will go to the destination
>  Source * List of links in the environment which is what link gatherer
>  was made for

Splendid. I think that is exactly what I would want to do. We'd then have single(ish) 
pass generation with the benefits of link view. And if you just feed directly from the 
label into a serializer, it'll be pretty much the same in terms of performance as the 
LinkGatherer that we have now.

I would need help implementing this. Are you able to explain how?

There's a lot of pipeline building there that I wouldn't yet know how to do (but I'm 
willing to give it a go with guidance).

If we're to use my current approach, we'd add a different serializer at the end of the 
second sub-pipe, which would take the links and put them into a specific List in the 
ObjectModel. In fact, we could create a LinkGatheringOutputStream that'd be handed 
to the LinkSerializer to do that. That would leave most of the complexity simply in 
building the pipeline.

Can you guarantee that cocoon.process() will not complete until both sub-pipelines 
have completed their work?

I'll take a bit of a look into the pipeline building code (if I can find it) to see 
what I can 
work out.

This approach excites me. With help, I'd like to see if I can make it happen.

Regards, Upayavira





Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-07-01 Thread Vadim Gritsenko
Jeff Turner wrote:

I'm not very familiar with the code; is there some cost in keeping the
two-pass CLI alive, in the faint hope that caching comes to its rescue
one day?
Guys,

Before you implement some approach here... Let me suggest something.

Right now sitemap implementation automatically adds link gatherer to the 
pipeline when it is invoked by CLI. This link gatherer is in fact is 
"hard-coded links view". I suggest to replace this "hard-coded links 
view" a.k.a link gatherer with the "real" links view, BUT attach it as a 
tee to a main pipeline instead of running it as a pipeline by itself. As 
a result, links view "baby" will be used, two-pass "water" will be 
drained, and sitemap syntax will stay the same. Moreover, the links view 
will be still accessible from the outside, meaning that you can spider 
the site using out-of-the-process spiders.

Example:
Given the pipeline:
 G --> T1 (label="content") --> T2 --> S,
And the links view:
 from-label="content" --> T3 --> LinkSerializer,
The pipeline built for the CLI request should be:
 G --> T1 --> Tee --> T2 --> S --> OutputStream
\
  --> LinkSerializer --> NullOutputStream
  \
--> List of links in environment
In one request, you will get:
* Regular output of the pipeline which will go to the destination Source
* List of links in the environment which is what link gatherer was made for
Comments?

Vadim




Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-29 Thread Jeff Turner
On Sun, Jun 29, 2003 at 11:34:01AM +0200, Nicola Ken Barozzi wrote:
> Jeff Turner wrote, On 29/06/2003 8.03:
...
> >- We're abusing the name 'transformer', since nothing is transformed.
> >  If we're really going to go this way, let's define a new sitemap
> >  element, .
> 
> There are transformers that do not transform, it's not unusual,

I can't think of any others?



> "
> So basically we are adding a contract to the sitemap, by saying that
> each sitemap implementation has to provide a list of links if requested
> to (as seen above).
> "
> 
> As you state, a Transformer does not feel right. In fact, a sitemap has 
> now a new contract that it has to give links. The question is: how can 
> it be made more versatile? Who can we tell the pipeline where we want 
> the link gathering to occur?
> 
> What about a named pipeline that is inserted by the link gatherer where 
> it gets the links? What about using a spacial label to indicate where to 
> gather links?

Hmm.. interesting.  Perhaps we just need to augment Resources a bit:


  
  


Ie, a Resource inserted in each pipeline after the 'content' label.
Rather AOP'ish.

> Just food for thought.

Tasty..

--Jeff

> -- 
> Nicola Ken Barozzi   [EMAIL PROTECTED]
> - verba volant, scripta manent -
>(discussions get forgotten, just code remains)
> -


Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-29 Thread Nicola Ken Barozzi


Jeff Turner wrote, On 29/06/2003 8.03:
...
I still have the feeling that a link-gatherer transformer is mixing
concerns a bit, and that two-pass is conceptually nicer:
- We're abusing the name 'transformer', since nothing is transformed.
  If we're really going to go this way, let's define a new sitemap
  element, .
There are transformers that do not transform, it's not unusual, 
although, since the sitemap has a new contract on links (see at the 
bottom), it might make sense.

- Link gathering is irrelevant for online situations, so we pay some
  performance penalty having a link-gatherer transformer.  This
  illustrates why I think it mixes concerns.
Exactly.

- It's easy to forget to define a link-gatherer transformer for new
  pipelines.  Link-view is cross-cutting and doesn't have this
  problem.
Again, exactly.

I'm not very familiar with the code; is there some cost in keeping the
two-pass CLI alive, in the faint hope that caching comes to its rescue
one day?
Actually it was three-pass.

http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104013686220328&w=2

Thanks for engaging with me on this - I appreciate it.


Thank _you_; an improved CLI will make Forrest significantly more
usable.
For your pleasure, and of interested parties, the previous threads:

http://marc.theaimsgroup.com/?t=10272571031&r=1&w=2
http://marc.theaimsgroup.com/?t=10401370156&r=1&w=2
http://marc.theaimsgroup.com/?t=10460931492&r=1&w=2
http://marc.theaimsgroup.com/?t=10488703345&r=1&w=2
And a couple of mails:

http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104610949203967&w=2
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104679840022563&w=2
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104687731531754&w=2
The last mail in particular explains the current new-CLI method:

"
So basically we are adding a contract to the sitemap, by saying that
each sitemap implementation has to provide a list of links if requested
to (as seen above).
"
As you state, a Transformer does not feel right. In fact, a sitemap has 
now a new contract that it has to give links. The question is: how can 
it be made more versatile? Who can we tell the pipeline where we want 
the link gathering to occur?

What about a named pipeline that is inserted by the link gatherer where 
it gets the links? What about using a spacial label to indicate where to 
gather links?

Just food for thought.

--
Nicola Ken Barozzi   [EMAIL PROTECTED]
- verba volant, scripta manent -
   (discussions get forgotten, just code remains)
-



RE: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-29 Thread Conal Tuohy
Jeff Turner wrote:

> > That's an issue I've come up against too - it seems that views are
> > still too "tangled" up with labels and can't cut across pipelines
> > properly. At least, that's how I understand it - maybe I'm missing
> > something?
>
> I think labels and Views are independent of each other.  You
> can have a
> view defined with 'from-position', and not use labels.
> Labels are just
> generic markers, with nothing to say they're only useful for defining
> views.

But with from-position you can have only "first" and "last" which is even
more restrictive than labels. If you want to do anything very sophisticated
don't you need labels?

> Views give _every_ public URL in a sitemap an alternative
> form.  If you
> only need an alternative form of some URLs, then that can be
> done just as
> you've described above, with a request-param selector.

So ... I could just have use a RequestParamSelector to create my different
views for the crawler? Damn!

My problem was that I wanted to use Lucene to index a "content" view of 2
different pipelines, one of them based on TEI and another on HTML. In the
case of the TEI pipeline I didn't want to convert the TEI to HTML first and
then produce a "content" view based on an HTML-ized view of the TEI - I
wanted an indexable view of the TEI. This is the same issue as you mention
below:

> The problem is that Views don't know the type of data they're getting.
> If we have a view with from-label="content", we know it's content, but
> what _type_ of content?  What schema?  What transformation
> can we apply
> to create a links-view of this content?

If you could create more than one view with the same name, then we could use
labels to specify the schema:

e.g. 2 pipelines containing:
...

...

and



... and 2 views called "content", one with from-label="tei" and the other
with from-label="html".

Cheers

Con



Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Jeff Turner
On Sun, Jun 29, 2003 at 05:36:45PM +1200, Conal Tuohy wrote:
> Jeff Turner wrote:
> 
> 
> 
> > Also, it resolves another little dilemma I've had with link views.
> > It's all very well having the notion of a cross-cutting 'view', but
> > there's no way to override the 'view' for a specific pipeline.  With
> > an explicit gather-links transformer, one could have different link
> > analysis for each pipeline.  A *.css pipeline could list @import's as
> > links, for example.
> 
> That's an issue I've come up against too - it seems that views are
> still too "tangled" up with labels and can't cut across pipelines
> properly. At least, that's how I understand it - maybe I'm missing
> something?

I think labels and Views are independent of each other.  You can have a
view defined with 'from-position', and not use labels.  Labels are just
generic markers, with nothing to say they're only useful for defining
views.

> For instance I couldn't see how to have 2 pipelines share a view (i.e.
> both support a view) unless the 2 pipelines had a common stage
> somewhere.
> 
> I've always wondered why views weren't implemented using a Selector?
> 
> 
>   
>   
>   
>   
>   
>   
>   
>   
> 

Views give _every_ public URL in a sitemap an alternative form.  If you
only need an alternative form of some URLs, then that can be done just as
you've described above, with a request-param selector.

The problem is that Views don't know the type of data they're getting.
If we have a view with from-label="content", we know it's content, but
what _type_ of content?  What schema?  What transformation can we apply
to create a links-view of this content?

That's why I'm looking forward to Cocoon 4.0, which will have strongly
typed pipelines.  Then the links view can see what kind of content its
getting (say *.css), and apply an appropriate transformation to extract
links (@import'ed files).  Given the current release rate, Cocoon 4.0 is
due in early 2030.

--Jeff

> In this way different pipelines could have quite different views,
> without sharing a commonly-labelled component. I guess this is more
> verbose than the current approach, where the view transformations are
> attached by name using a label, but for some reason the label approach
> reminds me powerfully of GOTO.
> 
> Cheers
> 
> Con
> 


Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Jeff Turner
On Sat, Jun 28, 2003 at 03:38:55PM +0100, Upayavira wrote:
...
> Okay. How about defining a namespace  which
> gets consumed by the transformer, that way you choose in your
> previous XSLT which links you want to be spidered by presenting the
> links in that  namespace (and then repeat them for the sake
> of the output).

Sounds good.  So you mean, eg, transforming  into , and the gather-links transformer uses
the link:href attribute?

...
> Now the only question that remains is whether to have an implicit
> gatherer if no explicit one is specified. I'd probably say no, as
> other discussions have erred away from hidden things like that.

+1

> I think that telling the sitemap where your links are is a pretty
> reasonable adjustment to your site. In fact, we could have two
> transformers - one that just looks for hrefs and xlinks, and another
> that uses a links namespace - the former would make it real easy to
> convert your site for spidering, and the latter providing a method
> to do complex link management.

+1, was just going to suggest that.

> Another question - do we still leave link view (two pass) link
> following in the CLI? Or does this method deprecate and thus replace
> it?

I still have the feeling that a link-gatherer transformer is mixing
concerns a bit, and that two-pass is conceptually nicer:

- We're abusing the name 'transformer', since nothing is transformed.
  If we're really going to go this way, let's define a new sitemap
  element, .
- Link gathering is irrelevant for online situations, so we pay some
  performance penalty having a link-gatherer transformer.  This
  illustrates why I think it mixes concerns.
- It's easy to forget to define a link-gatherer transformer for new
  pipelines.  Link-view is cross-cutting and doesn't have this
  problem.

I'm not very familiar with the code; is there some cost in keeping the
two-pass CLI alive, in the faint hope that caching comes to its rescue
one day?

> Thanks for engaging with me on this - I appreciate it.

Thank _you_; an improved CLI will make Forrest significantly more
usable.

--Jeff

> Regards, Upayavira
> 


RE: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Conal Tuohy
Jeff Turner wrote:



> Also, it resolves another little dilemma I've had with link
> views.  It's
> all very well having the notion of a cross-cutting 'view',
> but there's no
> way to override the 'view' for a specific pipeline.  With an explicit
> gather-links transformer, one could have different link
> analysis for each
> pipeline.  A *.css pipeline could list @import's as links,
> for example.

That's an issue I've come up against too - it seems that views are still too
"tangled" up with labels and can't cut across pipelines properly. At least,
that's how I understand it - maybe I'm missing something?

For instance I couldn't see how to have 2 pipelines share a view (i.e. both
support a view) unless the 2 pipelines had a common stage somewhere.

I've always wondered why views weren't implemented using a Selector?












In this way different pipelines could have quite different views, without
sharing a commonly-labelled component. I guess this is more verbose than the
current approach, where the view transformations are attached by name using
a label, but for some reason the label approach reminds me powerfully of
GOTO.

Cheers

Con



Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Nicola Ken Barozzi
Upayavira wrote, On 28/06/2003 17.00:

Nicola Ken, 

Sorry Jeff, but I don't have time nor energy to dwelve into this
discussion further. I'm getting a bit tired about it too. Don't get me
wrong, it's not about you, it's just that sometimes one loses interest
in some things.
FWIW, I'm pleased that Jeff is prepared to go along with these discussions - I think 
our original discussions only went so far. We got it down to one pass, and were pretty 
happy - we didn't really engage further with what the real consequences of that were, 
and what we potentially lost. And I think because we didn't do this, we haven't brought 
the rest of the Forrest community along with us. But now it is happening, which can 
only be good.
In two, things can only go so far. Jeff has finally brought the stuff in 
"the real world (TM)" and highlighted things we did not think about.

I'm happy that you're here to work with him on this now.
What you have described is what I mean to say too. :-)
--
Nicola Ken Barozzi   [EMAIL PROTECTED]
- verba volant, scripta manent -
   (discussions get forgotten, just code remains)
-



Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Upayavira
Nicola Ken, 

> Sorry Jeff, but I don't have time nor energy to dwelve into this
> discussion further. I'm getting a bit tired about it too. Don't get me
> wrong, it's not about you, it's just that sometimes one loses interest
> in some things.

FWIW, I'm pleased that Jeff is prepared to go along with these discussions - I think 
our original discussions only went so far. We got it down to one pass, and were pretty 
happy - we didn't really engage further with what the real consequences of that were, 
and what we potentially lost. And I think because we didn't do this, we haven't 
brought 
the rest of the Forrest community along with us. But now it is happening, which can 
only be good.

> First of all, all I want is speed and less memory usage. At least the
> same speed we are now getting with the new CLI. If any alternative
> scheme can be devised to get to comparable speed and possibly memory
> usage, I'm *completely* fine with it. IIUC What came out of the
> initial "new CLI" dicussion is that a single pass can be regarded as
> both technically and conceptually better.

I agree entirely.

> Secondly, it seems to me that you are mixing conceptual decisions with
> what are in fact just implementation "details". Things you have
> pointed out, like fixed gatherer position for example, are just fruit
> of the initial implementation, not a thorough and important design
> decision, and thus have still to be improved upon and testes in the
> real world (for us it's Forrest). 

Exactly. And that is the point we are now coming to - we can move the link gathering 
stage where ever we like without slowing down the CLI. In fact, having an explicit one 
will speed things up, as we'll get rid of its use on cocoon: protocol pipelines.

> Finally, the new CLI is a WIP, so I applaud your effort in getting it
> better, so that it does not throw out the baby (link view) with the
> water (3 pass generation). I'm trying to see that in this process also
> the new features of the CLI (one pass gathering) are not thrown out
> themselves in the process of saving the baby ;-)

With the idea of two link gathering transformers, both of which can be placed 
anywhere in a pipeline, one which extracts hrefs and xlinks (as does the current link 
gatherer) and one which consumes a links namespace (which allows complete 
control over your followed links, just like the links view), I think we've got the 
best of 
both worlds. An empty bath with a baby in it :-)

Regards, Upayavira



Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Upayavira
Jeff wrote:

> > So are you saying you can manage without the XSLT stage?
> 
> I'm not sure, perhaps you can advise.  In Forrest we filter the links
> to:
> 
>  - Remove API doc links
>  - Remove links to directories, which break the CLI
>  - Remove image links that have been hacked to work with FOP
> 
> 1) belongs in cli.xconf.  Perhaps the new CLI handles 2) better than
> the original.  I think 3) is obsolete, as LinkSerializer ignores
> XSL:FO-namespaced links anyway.
> 
> > Perhaps I should explain what I had in mind a bit more with that - I
> > guess I would call it a tee, a pipeline element with one input and
> > two outputs. The input is passed unchanged on through to the next
> > stage in the pipeline. But it is also passed through an XSLT before
> > links are gathered from it.
> 
> I'd call it a hack ;)  Why favour XSLT and not STX, or any other
> transformer?  What about XSLT parameters? etc.  If people need XSLT,
> let them use a link view.  I'd suggest just sticking with the basics:
> 

Okay. How about defining a namespace  which gets 
consumed by the transformer, that way you choose in your previous XSLT which links 
you want to be spidered by presenting the links in that  namespace (and then 
repeat them for the sake of the output).

This would be an extremely simple transformer to write. Beyond writing the 
transformer, it would take a minimal amount (1/2 hour) of changes to the rest of the 
CLI.

> Which isn't a hack.  In fact it would be great for Forrest, because we
> only have a few matchers where links are relevant.  All the cocoon:
> and image pipelines could go without.

Yup.

> Also, it resolves another little dilemma I've had with link views. 
> It's all very well having the notion of a cross-cutting 'view', but
> there's no way to override the 'view' for a specific pipeline.  With
> an explicit gather-links transformer, one could have different link
> analysis for each pipeline.  A *.css pipeline could list @import's as
> links, for example.

Great.

> > > It certainly fixes the hard-wired'ness problem you mention above
> > > (that 'content' != XML before the serializer).
> > 
> > And it sounds as if it could be a trivial solution.
> 
> 'Solves' the cocoon: sub-pipeline problem too.

Yup.

Now the only question that remains is whether to have an implicit gatherer if no 
explicit one is specified. I'd probably say no, as other discussions have erred away 
from hidden things like that.

I think that telling the sitemap where your links are is a pretty reasonable 
adjustment 
to your site. In fact, we could have two transformers - one that just looks for hrefs 
and 
xlinks, and another that uses a links namespace - the former would make it real easy 
to convert your site for spidering, and the latter providing a method to do complex 
link 
management.

Another question - do we still leave link view (two pass) link following in the CLI? 
Or 
does this method deprecate and thus replace it?

Thanks for engaging with me on this - I appreciate it.

Regards, Upayavira





Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Jeff Turner
On Sat, Jun 28, 2003 at 11:07:45AM +0100, Upayavira wrote:
> On 28 Jun 2003 at 18:45, Jeff Turner wrote:
...
> > > 
> > >   
> > >   
> > >   
> > >
> > > 
> > > 
> > > So there's no hidden link gatherer. And you've got a single xslt to
> > > filter, etc. Not specifying src="xxx" skips the xsl stage. The
> > > output of this xsl would be xml conforming to a predefined
> > > namespace.
> > 
> > Having eliminated the dont-follow-these-links use-case, I don't see a
> > use-case for XSLT transformations, so it simplifies to 
> > 
> > 
> 
> So are you saying you can manage without the XSLT stage?

I'm not sure, perhaps you can advise.  In Forrest we filter the links to:

 - Remove API doc links
 - Remove links to directories, which break the CLI
 - Remove image links that have been hacked to work with FOP

1) belongs in cli.xconf.  Perhaps the new CLI handles 2) better than the
original.  I think 3) is obsolete, as LinkSerializer ignores
XSL:FO-namespaced links anyway.

> Perhaps I should explain what I had in mind a bit more with that - I
> guess I would call it a tee, a pipeline element with one input and two
> outputs. The input is passed unchanged on through to the next stage in
> the pipeline. But it is also passed through an XSLT before links are
> gathered from it.

I'd call it a hack ;)  Why favour XSLT and not STX, or any other
transformer?  What about XSLT parameters? etc.  If people need XSLT, let
them use a link view.  I'd suggest just sticking with the basics:



Which isn't a hack.  In fact it would be great for Forrest, because we
only have a few matchers where links are relevant.  All the cocoon: and
image pipelines could go without.

Also, it resolves another little dilemma I've had with link views.  It's
all very well having the notion of a cross-cutting 'view', but there's no
way to override the 'view' for a specific pipeline.  With an explicit
gather-links transformer, one could have different link analysis for each
pipeline.  A *.css pipeline could list @import's as links, for example.

> > It certainly fixes the hard-wired'ness problem you mention above (that
> > 'content' != XML before the serializer).
> 
> And it sounds as if it could be a trivial solution.

'Solves' the cocoon: sub-pipeline problem too.

--Jeff

> 
> Upayavira


Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Nicola Ken Barozzi
Jeff Turner wrote, On 28/06/2003 3.59:
...
I hope I've convinced you :)  Certainly for simpler needs, hardcoding
a LinkGathererTransformer is fine, but in general (and I hope where
Forrest is going) we need the full power of a link view.
Sorry Jeff, but I don't have time nor energy to dwelve into this 
discussion further. I'm getting a bit tired about it too.
Don't get me wrong, it's not about you, it's just that sometimes one 
loses interest in some things.

So please excuse me if I won't reply to your points and I just only 
present MHO.

First of all, all I want is speed and less memory usage. At least the 
same speed we are now getting with the new CLI. If any alternative 
scheme can be devised to get to comparable speed and possibly memory 
usage, I'm *completely* fine with it.
IIUC What came out of the initial "new CLI" dicussion is that a single 
pass can be regarded as both technically and conceptually better.

Secondly, it seems to me that you are mixing conceptual decisions with 
what are in fact just implementation "details".
Things you have pointed out, like fixed gatherer position for example, 
are just fruit of the initial implementation, not a thorough and 
important design decision, and thus have still to be improved upon and 
testes in the real world (for us it's Forrest).
This is the sole reason why I ask you to read that thread. It explains 
the design decisions, and can help you in not necessarily 
re-investigating stuff that has already been fruitfully discussed.

Finally, the new CLI is a WIP, so I applaud your effort in getting it 
better, so that it does not throw out the baby (link view) with the 
water (3 pass generation).
I'm trying to see that in this process also the new features of the CLI 
(one pass gathering) are not thrown out themselves in the process of 
saving the baby ;-)

Ciao :-)

--
Nicola Ken Barozzi   [EMAIL PROTECTED]
- verba volant, scripta manent -
   (discussions get forgotten, just code remains)
-



Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Upayavira
On 28 Jun 2003 at 18:45, Jeff Turner wrote:

> On Sat, Jun 28, 2003 at 07:29:49AM +0100, Upayavira wrote:
> > On 28 Jun 2003 at 11:59, Jeff Turner wrote:
> ...
> > Okay. For the CLI, the cli.xconf file is the equivalent of the
> > web.xml and the user agent. 
> > 
> > Now, normally the user agent requests a URI, and that's it. It is up
> > to the user agent as to what to do with that URI.
> 
> Oh I see.  Yep, makes sense that the 'user agent' be the one who
> decides whether or not to chase down links.
> 
> > Are you saying that you want to put the configuration as to where
> > pages should be placed into the sitemap?
> 
> No, that's the user agent's (CLI's) business.

Good.

> > Or an alternative would be to ask: can you always do your link view
> > with a single XSLT stage? If so:
> > 
> > 
> >   
> >   
> >   
> >
> > 
> > 
> > So there's no hidden link gatherer. And you've got a single xslt to
> > filter, etc. Not specifying src="xxx" skips the xsl stage. The
> > output of this xsl would be xml conforming to a predefined
> > namespace.
> 
> Having eliminated the dont-follow-these-links use-case, I don't see a
> use-case for XSLT transformations, so it simplifies to 
> 
> 

So are you saying you can manage without the XSLT stage? Perhaps I should 
explain what I had in mind a bit more with that - I guess I would call it a tee, a 
pipeline 
element with one input and two outputs. The input is passed unchanged on through 
to the next stage in the pipeline. But it is also passed through an XSLT before links 
are gathered from it.

Are you saying you can manage without this?

> It certainly fixes the hard-wired'ness problem you mention above (that
> 'content' != XML before the serializer).

And it sounds as if it could be a trivial solution.

Upayavira


Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-28 Thread Jeff Turner
On Sat, Jun 28, 2003 at 07:29:49AM +0100, Upayavira wrote:
> On 28 Jun 2003 at 11:59, Jeff Turner wrote:
...
> Okay. For the CLI, the cli.xconf file is the equivalent of the web.xml and the user 
> agent. 
> 
> Now, normally the user agent requests a URI, and that's it. It is up to the user 
> agent 
> as to what to do with that URI.

Oh I see.  Yep, makes sense that the 'user agent' be the one who decides
whether or not to chase down links.

> Are you saying that you want to put the configuration as to where pages
> should be placed into the sitemap?

No, that's the user agent's (CLI's) business.

...
> Yup. The primary aim was to reduce the number of page generations. And there was 
> an element of hack here - particularly in the 'hard-wired'ness of the LinkGatherer. 
...

> Or an alternative would be to ask: can you always do your link view
> with a single XSLT stage? If so:
> 
> 
>   
>   
>   
>   
>   
> 
> 
> So there's no hidden link gatherer. And you've got a single xslt to filter, etc. Not 
> specifying src="xxx" skips the xsl stage. The output of this xsl would be xml 
> conforming to a predefined namespace.

Having eliminated the dont-follow-these-links use-case, I don't see a
use-case for XSLT transformations, so it simplifies to 



It certainly fixes the hard-wired'ness problem you mention above (that
'content' != XML before the serializer).


--Jeff

> 
> Regards, Upayavira


Re: Link view goodness (Re: residuals of MIME type bug ?)

2003-06-27 Thread Upayavira
On 28 Jun 2003 at 11:59, Jeff Turner wrote:

> Conceptually, I link the link-view because:
> 
> 1) Links are URIs
> 2) The sitemap is 100% in control of the URI space
>
> implying:
> 
> 3) The sitemap ought to be in control of link URI manipulation, not
> some external cli.xconf file.

Okay. For the CLI, the cli.xconf file is the equivalent of the web.xml and the user 
agent. 

Now, normally the user agent requests a URI, and that's it. It is up to the user agent 
as to what to do with that URI. Are you saying that you want to put the configuration 
as to where pages should be placed into the sitemap? And which URIs should be 
rendered? If so, how would you do this?

Thing is, for me, that means hardwiring the URIs you want to render into your site, 
and doesn't allow for a dynamic regeneration of different parts of the site.

> Now for practicalities:
> 
> I like the fact that the sitemap writer has full control over what is
> considered a link, and what those links look like.  An invisible
> linkgatherer transformer effectively hardcodes:
> 
>  src="org.apache.cocoon.serialization.LinkSerializer">
>   ISO-8859-1
> 
> 
>   
> 

Yup. The primary aim was to reduce the number of page generations. And there was 
an element of hack here - particularly in the 'hard-wired'ness of the LinkGatherer. 

It has to be said that the link gatherer uses the same approach as the LinkTranslator, 
which is used by the 'mime-type checking' code. That's where I got the idea.

> There are various points of flexibility that the links view allows:
> 
> Alternative Link schemes
> 
> 
> If the user's XML doesn't happen to use XLink or @href for linking,
> they would implement an alternative to LinkSerializer.
> 
> For example, imagine we want to render only PDFs.  The last XSLT in
> our pipeline would produce xsl:fo.  The standard LinkSerializer
> doesn't know about fo:external-link elements.  Even if it did, we'd
> want to filter out links to images, since PDFs have images inlined.
> What is an image?  That's up to the sitemap writer.
> 
> Encoding
> 
> When serializing links in Japanese or something, wouldn't tweaking the
>  tag be necessary?
> 
> Filtering unwanted links
> 
> We can filter out unwanted links, with arbitrary precision (eg using
> XPath expressions to determine what to throw out).  In Forrest we use
>  to filter out javadoc links.
> Eventually, 'api/' will be determined at runtime, by querying an input
> module that reads a forrest.xml config file.

I can (and already could) see these benefits. I would like to see a way to meet both 
of 
our requirements (a link view and single pass generation). Now, caching might be the 
simplest way. Or an alternative would be to ask: can you always do your link view with 
a single XSLT stage? If so:


  
  
  
  
  


So there's no hidden link gatherer. And you've got a single xslt to filter, etc. Not 
specifying src="xxx" skips the xsl stage. The output of this xsl would be xml 
conforming to a predefined namespace.

> I hope I've convinced you :)  Certainly for simpler needs, hardcoding
> a LinkGathererTransformer is fine, but in general (and I hope where
> Forrest is going) we need the full power of a link view.

I've always been convinced - just don't like the double pass.

Regards, Upayavira

I think there's a place for both, but I'd like to get it