Re: Link view goodness (Re: residuals of MIME type bug ?)
On 1 Jul 2003 at 14:47, Vadim Gritsenko wrote: > Jeff Turner wrote: > > >I'm not very familiar with the code; is there some cost in keeping > >the two-pass CLI alive, in the faint hope that caching comes to its > >rescue one day? > > > > Guys, > > Before you implement some approach here... Let me suggest something. > > Right now sitemap implementation automatically adds link gatherer to > the pipeline when it is invoked by CLI. This link gatherer is in fact > is "hard-coded links view". I suggest to replace this "hard-coded > links view" a.k.a link gatherer with the "real" links view, BUT attach > it as a tee to a main pipeline instead of running it as a pipeline by > itself. As a result, links view "baby" will be used, two-pass "water" > will be drained, and sitemap syntax will stay the same. Moreover, the > links view will be still accessible from the outside, meaning that you > can spider the site using out-of-the-process spiders. > > Example: > Given the pipeline: > G --> T1 (label="content") --> T2 --> S, > > And the links view: > from-label="content" --> T3 --> LinkSerializer, > > The pipeline built for the CLI request should be: > G --> T1 --> Tee --> T2 --> S --> OutputStream > \ >--> LinkSerializer --> NullOutputStream >\ > --> List of links in environment > > In one request, you will get: > * Regular output of the pipeline which will go to the destination > Source * List of links in the environment which is what link gatherer > was made for Splendid. I think that is exactly what I would want to do. We'd then have single(ish) pass generation with the benefits of link view. And if you just feed directly from the label into a serializer, it'll be pretty much the same in terms of performance as the LinkGatherer that we have now. I would need help implementing this. Are you able to explain how? There's a lot of pipeline building there that I wouldn't yet know how to do (but I'm willing to give it a go with guidance). If we're to use my current approach, we'd add a different serializer at the end of the second sub-pipe, which would take the links and put them into a specific List in the ObjectModel. In fact, we could create a LinkGatheringOutputStream that'd be handed to the LinkSerializer to do that. That would leave most of the complexity simply in building the pipeline. Can you guarantee that cocoon.process() will not complete until both sub-pipelines have completed their work? I'll take a bit of a look into the pipeline building code (if I can find it) to see what I can work out. This approach excites me. With help, I'd like to see if I can make it happen. Regards, Upayavira
Re: Link view goodness (Re: residuals of MIME type bug ?)
Jeff Turner wrote: I'm not very familiar with the code; is there some cost in keeping the two-pass CLI alive, in the faint hope that caching comes to its rescue one day? Guys, Before you implement some approach here... Let me suggest something. Right now sitemap implementation automatically adds link gatherer to the pipeline when it is invoked by CLI. This link gatherer is in fact is "hard-coded links view". I suggest to replace this "hard-coded links view" a.k.a link gatherer with the "real" links view, BUT attach it as a tee to a main pipeline instead of running it as a pipeline by itself. As a result, links view "baby" will be used, two-pass "water" will be drained, and sitemap syntax will stay the same. Moreover, the links view will be still accessible from the outside, meaning that you can spider the site using out-of-the-process spiders. Example: Given the pipeline: G --> T1 (label="content") --> T2 --> S, And the links view: from-label="content" --> T3 --> LinkSerializer, The pipeline built for the CLI request should be: G --> T1 --> Tee --> T2 --> S --> OutputStream \ --> LinkSerializer --> NullOutputStream \ --> List of links in environment In one request, you will get: * Regular output of the pipeline which will go to the destination Source * List of links in the environment which is what link gatherer was made for Comments? Vadim
Re: Link view goodness (Re: residuals of MIME type bug ?)
On Sun, Jun 29, 2003 at 11:34:01AM +0200, Nicola Ken Barozzi wrote: > Jeff Turner wrote, On 29/06/2003 8.03: ... > >- We're abusing the name 'transformer', since nothing is transformed. > > If we're really going to go this way, let's define a new sitemap > > element, . > > There are transformers that do not transform, it's not unusual, I can't think of any others? > " > So basically we are adding a contract to the sitemap, by saying that > each sitemap implementation has to provide a list of links if requested > to (as seen above). > " > > As you state, a Transformer does not feel right. In fact, a sitemap has > now a new contract that it has to give links. The question is: how can > it be made more versatile? Who can we tell the pipeline where we want > the link gathering to occur? > > What about a named pipeline that is inserted by the link gatherer where > it gets the links? What about using a spacial label to indicate where to > gather links? Hmm.. interesting. Perhaps we just need to augment Resources a bit: Ie, a Resource inserted in each pipeline after the 'content' label. Rather AOP'ish. > Just food for thought. Tasty.. --Jeff > -- > Nicola Ken Barozzi [EMAIL PROTECTED] > - verba volant, scripta manent - >(discussions get forgotten, just code remains) > -
Re: Link view goodness (Re: residuals of MIME type bug ?)
Jeff Turner wrote, On 29/06/2003 8.03: ... I still have the feeling that a link-gatherer transformer is mixing concerns a bit, and that two-pass is conceptually nicer: - We're abusing the name 'transformer', since nothing is transformed. If we're really going to go this way, let's define a new sitemap element, . There are transformers that do not transform, it's not unusual, although, since the sitemap has a new contract on links (see at the bottom), it might make sense. - Link gathering is irrelevant for online situations, so we pay some performance penalty having a link-gatherer transformer. This illustrates why I think it mixes concerns. Exactly. - It's easy to forget to define a link-gatherer transformer for new pipelines. Link-view is cross-cutting and doesn't have this problem. Again, exactly. I'm not very familiar with the code; is there some cost in keeping the two-pass CLI alive, in the faint hope that caching comes to its rescue one day? Actually it was three-pass. http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104013686220328&w=2 Thanks for engaging with me on this - I appreciate it. Thank _you_; an improved CLI will make Forrest significantly more usable. For your pleasure, and of interested parties, the previous threads: http://marc.theaimsgroup.com/?t=10272571031&r=1&w=2 http://marc.theaimsgroup.com/?t=10401370156&r=1&w=2 http://marc.theaimsgroup.com/?t=10460931492&r=1&w=2 http://marc.theaimsgroup.com/?t=10488703345&r=1&w=2 And a couple of mails: http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104610949203967&w=2 http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104679840022563&w=2 http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=104687731531754&w=2 The last mail in particular explains the current new-CLI method: " So basically we are adding a contract to the sitemap, by saying that each sitemap implementation has to provide a list of links if requested to (as seen above). " As you state, a Transformer does not feel right. In fact, a sitemap has now a new contract that it has to give links. The question is: how can it be made more versatile? Who can we tell the pipeline where we want the link gathering to occur? What about a named pipeline that is inserted by the link gatherer where it gets the links? What about using a spacial label to indicate where to gather links? Just food for thought. -- Nicola Ken Barozzi [EMAIL PROTECTED] - verba volant, scripta manent - (discussions get forgotten, just code remains) -
RE: Link view goodness (Re: residuals of MIME type bug ?)
Jeff Turner wrote: > > That's an issue I've come up against too - it seems that views are > > still too "tangled" up with labels and can't cut across pipelines > > properly. At least, that's how I understand it - maybe I'm missing > > something? > > I think labels and Views are independent of each other. You > can have a > view defined with 'from-position', and not use labels. > Labels are just > generic markers, with nothing to say they're only useful for defining > views. But with from-position you can have only "first" and "last" which is even more restrictive than labels. If you want to do anything very sophisticated don't you need labels? > Views give _every_ public URL in a sitemap an alternative > form. If you > only need an alternative form of some URLs, then that can be > done just as > you've described above, with a request-param selector. So ... I could just have use a RequestParamSelector to create my different views for the crawler? Damn! My problem was that I wanted to use Lucene to index a "content" view of 2 different pipelines, one of them based on TEI and another on HTML. In the case of the TEI pipeline I didn't want to convert the TEI to HTML first and then produce a "content" view based on an HTML-ized view of the TEI - I wanted an indexable view of the TEI. This is the same issue as you mention below: > The problem is that Views don't know the type of data they're getting. > If we have a view with from-label="content", we know it's content, but > what _type_ of content? What schema? What transformation > can we apply > to create a links-view of this content? If you could create more than one view with the same name, then we could use labels to specify the schema: e.g. 2 pipelines containing: ... ... and ... and 2 views called "content", one with from-label="tei" and the other with from-label="html". Cheers Con
Re: Link view goodness (Re: residuals of MIME type bug ?)
On Sun, Jun 29, 2003 at 05:36:45PM +1200, Conal Tuohy wrote: > Jeff Turner wrote: > > > > > Also, it resolves another little dilemma I've had with link views. > > It's all very well having the notion of a cross-cutting 'view', but > > there's no way to override the 'view' for a specific pipeline. With > > an explicit gather-links transformer, one could have different link > > analysis for each pipeline. A *.css pipeline could list @import's as > > links, for example. > > That's an issue I've come up against too - it seems that views are > still too "tangled" up with labels and can't cut across pipelines > properly. At least, that's how I understand it - maybe I'm missing > something? I think labels and Views are independent of each other. You can have a view defined with 'from-position', and not use labels. Labels are just generic markers, with nothing to say they're only useful for defining views. > For instance I couldn't see how to have 2 pipelines share a view (i.e. > both support a view) unless the 2 pipelines had a common stage > somewhere. > > I've always wondered why views weren't implemented using a Selector? > > > > > > > > > > > Views give _every_ public URL in a sitemap an alternative form. If you only need an alternative form of some URLs, then that can be done just as you've described above, with a request-param selector. The problem is that Views don't know the type of data they're getting. If we have a view with from-label="content", we know it's content, but what _type_ of content? What schema? What transformation can we apply to create a links-view of this content? That's why I'm looking forward to Cocoon 4.0, which will have strongly typed pipelines. Then the links view can see what kind of content its getting (say *.css), and apply an appropriate transformation to extract links (@import'ed files). Given the current release rate, Cocoon 4.0 is due in early 2030. --Jeff > In this way different pipelines could have quite different views, > without sharing a commonly-labelled component. I guess this is more > verbose than the current approach, where the view transformations are > attached by name using a label, but for some reason the label approach > reminds me powerfully of GOTO. > > Cheers > > Con >
Re: Link view goodness (Re: residuals of MIME type bug ?)
On Sat, Jun 28, 2003 at 03:38:55PM +0100, Upayavira wrote: ... > Okay. How about defining a namespace which > gets consumed by the transformer, that way you choose in your > previous XSLT which links you want to be spidered by presenting the > links in that namespace (and then repeat them for the sake > of the output). Sounds good. So you mean, eg, transforming into , and the gather-links transformer uses the link:href attribute? ... > Now the only question that remains is whether to have an implicit > gatherer if no explicit one is specified. I'd probably say no, as > other discussions have erred away from hidden things like that. +1 > I think that telling the sitemap where your links are is a pretty > reasonable adjustment to your site. In fact, we could have two > transformers - one that just looks for hrefs and xlinks, and another > that uses a links namespace - the former would make it real easy to > convert your site for spidering, and the latter providing a method > to do complex link management. +1, was just going to suggest that. > Another question - do we still leave link view (two pass) link > following in the CLI? Or does this method deprecate and thus replace > it? I still have the feeling that a link-gatherer transformer is mixing concerns a bit, and that two-pass is conceptually nicer: - We're abusing the name 'transformer', since nothing is transformed. If we're really going to go this way, let's define a new sitemap element, . - Link gathering is irrelevant for online situations, so we pay some performance penalty having a link-gatherer transformer. This illustrates why I think it mixes concerns. - It's easy to forget to define a link-gatherer transformer for new pipelines. Link-view is cross-cutting and doesn't have this problem. I'm not very familiar with the code; is there some cost in keeping the two-pass CLI alive, in the faint hope that caching comes to its rescue one day? > Thanks for engaging with me on this - I appreciate it. Thank _you_; an improved CLI will make Forrest significantly more usable. --Jeff > Regards, Upayavira >
RE: Link view goodness (Re: residuals of MIME type bug ?)
Jeff Turner wrote: > Also, it resolves another little dilemma I've had with link > views. It's > all very well having the notion of a cross-cutting 'view', > but there's no > way to override the 'view' for a specific pipeline. With an explicit > gather-links transformer, one could have different link > analysis for each > pipeline. A *.css pipeline could list @import's as links, > for example. That's an issue I've come up against too - it seems that views are still too "tangled" up with labels and can't cut across pipelines properly. At least, that's how I understand it - maybe I'm missing something? For instance I couldn't see how to have 2 pipelines share a view (i.e. both support a view) unless the 2 pipelines had a common stage somewhere. I've always wondered why views weren't implemented using a Selector? In this way different pipelines could have quite different views, without sharing a commonly-labelled component. I guess this is more verbose than the current approach, where the view transformations are attached by name using a label, but for some reason the label approach reminds me powerfully of GOTO. Cheers Con
Re: Link view goodness (Re: residuals of MIME type bug ?)
Upayavira wrote, On 28/06/2003 17.00: Nicola Ken, Sorry Jeff, but I don't have time nor energy to dwelve into this discussion further. I'm getting a bit tired about it too. Don't get me wrong, it's not about you, it's just that sometimes one loses interest in some things. FWIW, I'm pleased that Jeff is prepared to go along with these discussions - I think our original discussions only went so far. We got it down to one pass, and were pretty happy - we didn't really engage further with what the real consequences of that were, and what we potentially lost. And I think because we didn't do this, we haven't brought the rest of the Forrest community along with us. But now it is happening, which can only be good. In two, things can only go so far. Jeff has finally brought the stuff in "the real world (TM)" and highlighted things we did not think about. I'm happy that you're here to work with him on this now. What you have described is what I mean to say too. :-) -- Nicola Ken Barozzi [EMAIL PROTECTED] - verba volant, scripta manent - (discussions get forgotten, just code remains) -
Re: Link view goodness (Re: residuals of MIME type bug ?)
Nicola Ken, > Sorry Jeff, but I don't have time nor energy to dwelve into this > discussion further. I'm getting a bit tired about it too. Don't get me > wrong, it's not about you, it's just that sometimes one loses interest > in some things. FWIW, I'm pleased that Jeff is prepared to go along with these discussions - I think our original discussions only went so far. We got it down to one pass, and were pretty happy - we didn't really engage further with what the real consequences of that were, and what we potentially lost. And I think because we didn't do this, we haven't brought the rest of the Forrest community along with us. But now it is happening, which can only be good. > First of all, all I want is speed and less memory usage. At least the > same speed we are now getting with the new CLI. If any alternative > scheme can be devised to get to comparable speed and possibly memory > usage, I'm *completely* fine with it. IIUC What came out of the > initial "new CLI" dicussion is that a single pass can be regarded as > both technically and conceptually better. I agree entirely. > Secondly, it seems to me that you are mixing conceptual decisions with > what are in fact just implementation "details". Things you have > pointed out, like fixed gatherer position for example, are just fruit > of the initial implementation, not a thorough and important design > decision, and thus have still to be improved upon and testes in the > real world (for us it's Forrest). Exactly. And that is the point we are now coming to - we can move the link gathering stage where ever we like without slowing down the CLI. In fact, having an explicit one will speed things up, as we'll get rid of its use on cocoon: protocol pipelines. > Finally, the new CLI is a WIP, so I applaud your effort in getting it > better, so that it does not throw out the baby (link view) with the > water (3 pass generation). I'm trying to see that in this process also > the new features of the CLI (one pass gathering) are not thrown out > themselves in the process of saving the baby ;-) With the idea of two link gathering transformers, both of which can be placed anywhere in a pipeline, one which extracts hrefs and xlinks (as does the current link gatherer) and one which consumes a links namespace (which allows complete control over your followed links, just like the links view), I think we've got the best of both worlds. An empty bath with a baby in it :-) Regards, Upayavira
Re: Link view goodness (Re: residuals of MIME type bug ?)
Jeff wrote: > > So are you saying you can manage without the XSLT stage? > > I'm not sure, perhaps you can advise. In Forrest we filter the links > to: > > - Remove API doc links > - Remove links to directories, which break the CLI > - Remove image links that have been hacked to work with FOP > > 1) belongs in cli.xconf. Perhaps the new CLI handles 2) better than > the original. I think 3) is obsolete, as LinkSerializer ignores > XSL:FO-namespaced links anyway. > > > Perhaps I should explain what I had in mind a bit more with that - I > > guess I would call it a tee, a pipeline element with one input and > > two outputs. The input is passed unchanged on through to the next > > stage in the pipeline. But it is also passed through an XSLT before > > links are gathered from it. > > I'd call it a hack ;) Why favour XSLT and not STX, or any other > transformer? What about XSLT parameters? etc. If people need XSLT, > let them use a link view. I'd suggest just sticking with the basics: > Okay. How about defining a namespace which gets consumed by the transformer, that way you choose in your previous XSLT which links you want to be spidered by presenting the links in that namespace (and then repeat them for the sake of the output). This would be an extremely simple transformer to write. Beyond writing the transformer, it would take a minimal amount (1/2 hour) of changes to the rest of the CLI. > Which isn't a hack. In fact it would be great for Forrest, because we > only have a few matchers where links are relevant. All the cocoon: > and image pipelines could go without. Yup. > Also, it resolves another little dilemma I've had with link views. > It's all very well having the notion of a cross-cutting 'view', but > there's no way to override the 'view' for a specific pipeline. With > an explicit gather-links transformer, one could have different link > analysis for each pipeline. A *.css pipeline could list @import's as > links, for example. Great. > > > It certainly fixes the hard-wired'ness problem you mention above > > > (that 'content' != XML before the serializer). > > > > And it sounds as if it could be a trivial solution. > > 'Solves' the cocoon: sub-pipeline problem too. Yup. Now the only question that remains is whether to have an implicit gatherer if no explicit one is specified. I'd probably say no, as other discussions have erred away from hidden things like that. I think that telling the sitemap where your links are is a pretty reasonable adjustment to your site. In fact, we could have two transformers - one that just looks for hrefs and xlinks, and another that uses a links namespace - the former would make it real easy to convert your site for spidering, and the latter providing a method to do complex link management. Another question - do we still leave link view (two pass) link following in the CLI? Or does this method deprecate and thus replace it? Thanks for engaging with me on this - I appreciate it. Regards, Upayavira
Re: Link view goodness (Re: residuals of MIME type bug ?)
On Sat, Jun 28, 2003 at 11:07:45AM +0100, Upayavira wrote: > On 28 Jun 2003 at 18:45, Jeff Turner wrote: ... > > > > > > > > > > > > > > > > > > > > > > > > So there's no hidden link gatherer. And you've got a single xslt to > > > filter, etc. Not specifying src="xxx" skips the xsl stage. The > > > output of this xsl would be xml conforming to a predefined > > > namespace. > > > > Having eliminated the dont-follow-these-links use-case, I don't see a > > use-case for XSLT transformations, so it simplifies to > > > > > > So are you saying you can manage without the XSLT stage? I'm not sure, perhaps you can advise. In Forrest we filter the links to: - Remove API doc links - Remove links to directories, which break the CLI - Remove image links that have been hacked to work with FOP 1) belongs in cli.xconf. Perhaps the new CLI handles 2) better than the original. I think 3) is obsolete, as LinkSerializer ignores XSL:FO-namespaced links anyway. > Perhaps I should explain what I had in mind a bit more with that - I > guess I would call it a tee, a pipeline element with one input and two > outputs. The input is passed unchanged on through to the next stage in > the pipeline. But it is also passed through an XSLT before links are > gathered from it. I'd call it a hack ;) Why favour XSLT and not STX, or any other transformer? What about XSLT parameters? etc. If people need XSLT, let them use a link view. I'd suggest just sticking with the basics: Which isn't a hack. In fact it would be great for Forrest, because we only have a few matchers where links are relevant. All the cocoon: and image pipelines could go without. Also, it resolves another little dilemma I've had with link views. It's all very well having the notion of a cross-cutting 'view', but there's no way to override the 'view' for a specific pipeline. With an explicit gather-links transformer, one could have different link analysis for each pipeline. A *.css pipeline could list @import's as links, for example. > > It certainly fixes the hard-wired'ness problem you mention above (that > > 'content' != XML before the serializer). > > And it sounds as if it could be a trivial solution. 'Solves' the cocoon: sub-pipeline problem too. --Jeff > > Upayavira
Re: Link view goodness (Re: residuals of MIME type bug ?)
Jeff Turner wrote, On 28/06/2003 3.59: ... I hope I've convinced you :) Certainly for simpler needs, hardcoding a LinkGathererTransformer is fine, but in general (and I hope where Forrest is going) we need the full power of a link view. Sorry Jeff, but I don't have time nor energy to dwelve into this discussion further. I'm getting a bit tired about it too. Don't get me wrong, it's not about you, it's just that sometimes one loses interest in some things. So please excuse me if I won't reply to your points and I just only present MHO. First of all, all I want is speed and less memory usage. At least the same speed we are now getting with the new CLI. If any alternative scheme can be devised to get to comparable speed and possibly memory usage, I'm *completely* fine with it. IIUC What came out of the initial "new CLI" dicussion is that a single pass can be regarded as both technically and conceptually better. Secondly, it seems to me that you are mixing conceptual decisions with what are in fact just implementation "details". Things you have pointed out, like fixed gatherer position for example, are just fruit of the initial implementation, not a thorough and important design decision, and thus have still to be improved upon and testes in the real world (for us it's Forrest). This is the sole reason why I ask you to read that thread. It explains the design decisions, and can help you in not necessarily re-investigating stuff that has already been fruitfully discussed. Finally, the new CLI is a WIP, so I applaud your effort in getting it better, so that it does not throw out the baby (link view) with the water (3 pass generation). I'm trying to see that in this process also the new features of the CLI (one pass gathering) are not thrown out themselves in the process of saving the baby ;-) Ciao :-) -- Nicola Ken Barozzi [EMAIL PROTECTED] - verba volant, scripta manent - (discussions get forgotten, just code remains) -
Re: Link view goodness (Re: residuals of MIME type bug ?)
On 28 Jun 2003 at 18:45, Jeff Turner wrote: > On Sat, Jun 28, 2003 at 07:29:49AM +0100, Upayavira wrote: > > On 28 Jun 2003 at 11:59, Jeff Turner wrote: > ... > > Okay. For the CLI, the cli.xconf file is the equivalent of the > > web.xml and the user agent. > > > > Now, normally the user agent requests a URI, and that's it. It is up > > to the user agent as to what to do with that URI. > > Oh I see. Yep, makes sense that the 'user agent' be the one who > decides whether or not to chase down links. > > > Are you saying that you want to put the configuration as to where > > pages should be placed into the sitemap? > > No, that's the user agent's (CLI's) business. Good. > > Or an alternative would be to ask: can you always do your link view > > with a single XSLT stage? If so: > > > > > > > > > > > > > > > > > > So there's no hidden link gatherer. And you've got a single xslt to > > filter, etc. Not specifying src="xxx" skips the xsl stage. The > > output of this xsl would be xml conforming to a predefined > > namespace. > > Having eliminated the dont-follow-these-links use-case, I don't see a > use-case for XSLT transformations, so it simplifies to > > So are you saying you can manage without the XSLT stage? Perhaps I should explain what I had in mind a bit more with that - I guess I would call it a tee, a pipeline element with one input and two outputs. The input is passed unchanged on through to the next stage in the pipeline. But it is also passed through an XSLT before links are gathered from it. Are you saying you can manage without this? > It certainly fixes the hard-wired'ness problem you mention above (that > 'content' != XML before the serializer). And it sounds as if it could be a trivial solution. Upayavira
Re: Link view goodness (Re: residuals of MIME type bug ?)
On Sat, Jun 28, 2003 at 07:29:49AM +0100, Upayavira wrote: > On 28 Jun 2003 at 11:59, Jeff Turner wrote: ... > Okay. For the CLI, the cli.xconf file is the equivalent of the web.xml and the user > agent. > > Now, normally the user agent requests a URI, and that's it. It is up to the user > agent > as to what to do with that URI. Oh I see. Yep, makes sense that the 'user agent' be the one who decides whether or not to chase down links. > Are you saying that you want to put the configuration as to where pages > should be placed into the sitemap? No, that's the user agent's (CLI's) business. ... > Yup. The primary aim was to reduce the number of page generations. And there was > an element of hack here - particularly in the 'hard-wired'ness of the LinkGatherer. ... > Or an alternative would be to ask: can you always do your link view > with a single XSLT stage? If so: > > > > > > > > > > So there's no hidden link gatherer. And you've got a single xslt to filter, etc. Not > specifying src="xxx" skips the xsl stage. The output of this xsl would be xml > conforming to a predefined namespace. Having eliminated the dont-follow-these-links use-case, I don't see a use-case for XSLT transformations, so it simplifies to It certainly fixes the hard-wired'ness problem you mention above (that 'content' != XML before the serializer). --Jeff > > Regards, Upayavira
Re: Link view goodness (Re: residuals of MIME type bug ?)
On 28 Jun 2003 at 11:59, Jeff Turner wrote: > Conceptually, I link the link-view because: > > 1) Links are URIs > 2) The sitemap is 100% in control of the URI space > > implying: > > 3) The sitemap ought to be in control of link URI manipulation, not > some external cli.xconf file. Okay. For the CLI, the cli.xconf file is the equivalent of the web.xml and the user agent. Now, normally the user agent requests a URI, and that's it. It is up to the user agent as to what to do with that URI. Are you saying that you want to put the configuration as to where pages should be placed into the sitemap? And which URIs should be rendered? If so, how would you do this? Thing is, for me, that means hardwiring the URIs you want to render into your site, and doesn't allow for a dynamic regeneration of different parts of the site. > Now for practicalities: > > I like the fact that the sitemap writer has full control over what is > considered a link, and what those links look like. An invisible > linkgatherer transformer effectively hardcodes: > > src="org.apache.cocoon.serialization.LinkSerializer"> > ISO-8859-1 > > > > Yup. The primary aim was to reduce the number of page generations. And there was an element of hack here - particularly in the 'hard-wired'ness of the LinkGatherer. It has to be said that the link gatherer uses the same approach as the LinkTranslator, which is used by the 'mime-type checking' code. That's where I got the idea. > There are various points of flexibility that the links view allows: > > Alternative Link schemes > > > If the user's XML doesn't happen to use XLink or @href for linking, > they would implement an alternative to LinkSerializer. > > For example, imagine we want to render only PDFs. The last XSLT in > our pipeline would produce xsl:fo. The standard LinkSerializer > doesn't know about fo:external-link elements. Even if it did, we'd > want to filter out links to images, since PDFs have images inlined. > What is an image? That's up to the sitemap writer. > > Encoding > > When serializing links in Japanese or something, wouldn't tweaking the > tag be necessary? > > Filtering unwanted links > > We can filter out unwanted links, with arbitrary precision (eg using > XPath expressions to determine what to throw out). In Forrest we use > to filter out javadoc links. > Eventually, 'api/' will be determined at runtime, by querying an input > module that reads a forrest.xml config file. I can (and already could) see these benefits. I would like to see a way to meet both of our requirements (a link view and single pass generation). Now, caching might be the simplest way. Or an alternative would be to ask: can you always do your link view with a single XSLT stage? If so: So there's no hidden link gatherer. And you've got a single xslt to filter, etc. Not specifying src="xxx" skips the xsl stage. The output of this xsl would be xml conforming to a predefined namespace. > I hope I've convinced you :) Certainly for simpler needs, hardcoding > a LinkGathererTransformer is fine, but in general (and I hope where > Forrest is going) we need the full power of a link view. I've always been convinced - just don't like the double pass. Regards, Upayavira I think there's a place for both, but I'd like to get it