[jira] Created: (COCOON-1909) Cache validity of XSLT stylesheets does not reflect included or imported stylesheets.
Cache validity of XSLT stylesheets does not reflect included or imported stylesheets. -- Key: COCOON-1909 URL: http://issues.apache.org/jira/browse/COCOON-1909 Project: Cocoon Issue Type: Bug Components: - Components: Sitemap Affects Versions: 2.1.9 Reporter: Conal Tuohy XSLT stylesheets which either import or include other stylesheets are cached too aggressively: if you change an imported or included stylesheet the change does not take effect until you update the main stylesheet. This bug is supposed to have been fixed years ago, but it still doesn't work for us. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (COCOON-1880) [PATCH] Allow LuceneIndexTransformer to index large documents (with more than 10k terms)
[PATCH] Allow LuceneIndexTransformer to index large documents (with more than 10k terms) Key: COCOON-1880 URL: http://issues.apache.org/jira/browse/COCOON-1880 Project: Cocoon Type: Bug Components: Blocks: Lucene Versions: 2.1.8, 2.1.9 Reporter: Conal Tuohy The LuceneIndexTransformer did not provide a way to over-ride the default maxFieldLength value of a Lucene IndexWriter. Since the default value is 10k, this means that large documents can not be indexed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (COCOON-1880) [PATCH] Allow LuceneIndexTransformer to index large documents (with more than 10k terms)
[ http://issues.apache.org/jira/browse/COCOON-1880?page=all ] Conal Tuohy updated COCOON-1880: Attachment: LuceneIndexTransformer.diff The patch adds the facility to set a max-field-length parameter, in the same way as all the other parameters. Also the JavaDoc has been expanded significantly. [PATCH] Allow LuceneIndexTransformer to index large documents (with more than 10k terms) Key: COCOON-1880 URL: http://issues.apache.org/jira/browse/COCOON-1880 Project: Cocoon Type: Bug Components: Blocks: Lucene Versions: 2.1.8, 2.1.9 Reporter: Conal Tuohy Attachments: LuceneIndexTransformer.diff The LuceneIndexTransformer did not provide a way to over-ride the default maxFieldLength value of a Lucene IndexWriter. Since the default value is 10k, this means that large documents can not be indexed. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
RE: Document publishing worklow (was Re: Expert pre-defined/community post-defined?)
Ross Gardler wrote: Stefano Mazzocchi wrote: I like the notion of daisy - forrest - out makes very good sense. Now we just need to find a way to automate a little that workflow, but without introducing security vulnerabilities. How about this (bullet summaries followed by textual description. Daisy as editing environment. - anyone can edit - people are given quick committership here - change mails are sent out by Daisy for docs committer review - docs committers review changes and publish to the daisy repo Forrest as publishing tool - cocoon committers manage a - brings together XDocs, wiki docs, Daisy docs etc. - publish to (or dynamically host on) a staging server Offical Docs site - live Apache site is pulled from staging server Regarding the structuring of the resulting documentation (as opposed to the content) ... can you comment on that aspect? Does Daisy have support for tagging (whether folksonomy or controlled vocabularies of some sort)? How might these tags integrate with the core concepts in xdocs, the Forrest sitemap, etc? I've been working recently on a website upgrade project which uses XML Topic Maps to deal with these issues - the site was relaunched just last week. Now I'm wondering if the Cocoon community might benefit from using XTM. It looks like a classic application for topic maps. I think they could help by providing a set of logical hooks to tag content with, and tools for harvesting metadata, merging topics, querying etc. Con
RE: [RT] CTemplate
Glen Ezkovich wrote: You seem to be assuming this is a global configuration issue. If I were to make this configurable I would do it on template by template basis, since it seems that in the general case this would not be necessary but in a special case it might be expedient. Once you turn it on globally, say hello to Barbara Eden, the genie is out of the bottle. Now if you could turn it on by pipeline in the sitemap, you'd be golden. From a user perspective I think it makes more sense to turn it on in the template as a processing instruction. NB if the feature was enabled or disabled using an optional attribute on the root element, it could still be controlled centrally through the sitemap. To prevent template authors from enabling this feature, it would be enough to transform their templates before execution to disable the feature. map:match pattern=*.jx map:generate src={1}.jx/ map:transform src=xsl/disable-java.xsl/ map:serialize type=xml/ /map:match So you could lock this feature (or any feature) away from your template authors. NB this is just another application of the general pattern of dynamic templates, something that was discussed at some length about a month ago(?). Cheers Con
RE: Version $Id$ style for xml files (Was: SVN behaviour with Id keyword)
Tim Larson wrote: As part of my effort to keep whiteboard/forms mergeable, I am fixing a bunch of Id's and ran across a minor issue. How do we want to mark the version in xml files? There is quite a variety+possibilities: CVS $Id$ SVN $Id$ !--+ | $Id$ +-- Version $Id$ version $Id$ @version $Id$ etc... Could we pick a style, and then I will make the files I happen to touch match it? --Tim Larson I only see one choice that is within a comment. How do the others keep from breaking the document? What about a processing instruction? ?version $Id$? This has the advantage over a comment that it can be retrieved unambiguously with an XPath query: processing-instruction('version') Cheers Con
RE: Version $Id$ style for xml files (Was: SVN behaviour with Id keyword)
Tim Larson wrote: On Wed, Feb 02, 2005 at 10:35:03AM +1300, Conal Tuohy wrote: Tim Larson wrote: As part of my effort to keep whiteboard/forms mergeable, snip/ Could we pick a style, and then I will make the files I happen to touch match it? What about a processing instruction? ?version $Id$? This has the advantage over a comment that it can be retrieved unambiguously with an XPath query: processing-instruction('version') I like the ability to retrieve the Id info...not knowing much about processing instructions, will this produce valid processing instructions when the version control system expands $Id$? This expands into lots of stuff: datestamp, filename, submitter, etc. So long as it doesn't include the string ? it will be OK. Also, can this mess up any existing document processing? It shouldn't. Most XML software should be happy to ignore processing instructions (unless the pi is targeted at them), just like with comments. In the SAX pipeline, the processing instructions are passed using a processingInstruction method, so completely independently of startElement/endElement/characters/etc. Typically, pipeline components will inherit a processingInstruction method from a superclass, which will do nothing. Cheers Con
RE: sitemap, jx and flow design (was: servicemanager and jxtg)
Daniel Fagerstrom wrote: Instruction Inclusion = Now for the instructions (jx:forEach etc) we have the question if they should be choosed: 1. Programatically: There is an abstract base class with common template generator functionality. To implement e.g. JXTG this base class is extended. And the extended class enumerates the instructions that should be used and also choose parsing strategy etc. 2. Configuration: Instructions or set of instructions are made components and connected to the template generator in the configuration. 3. Within the template language: There are special instructions in some common base template language that allow the user to include sets of (Java written) instructions. What kind of instructions did you have in mind? The reason I ask is I wonder if it's really necessary to add extra instructions to the template language? A language with conditionals, iteration, AND recursion is surely already Turing-complete, and doesn't really need more control structures? Con
RE: [RT] Escaping Sitemap Hell
Il giorno 06/gen/05, alle 01:54, Daniel Fagerstrom ha scritto: * where repository:{1}.x.y.z exists == XYZPipeline We get the right pipeline by querying the repository instead of encoding it in the URL. A further advantage is that the rule becomes listable as the where clause in the match expresses what are the allowed values for the wildcard. Ugo Cei wrote: Unless I misinterpret what you mean, we already can do this: map:match pattern=* map:call function=fun map:parameter name=par value={1}/ /map:call /map:match function fun() { var entity = repo.query(select * from entities where id = + cocoon.parameters.par + and {x, y, z}); cocoon.sendPage(views/XYZPipeline, { entity : entity }); } map:match pattern=XYZPipeline map:generate type=jx src=xyz.jx.xml/ ... /map:match Apart from the obviously contrived example, isn't the Flowscript just what we need to get the right pipeline by querying the repository? I was struck by your example because right now we are revising our website using the same technique you describe, with a single external pipeline calling a flowscript. (BTW the revised website isn't public yet but should be ready next month.) We're using a topic map as the metadata repository (with TM4J). As in Daniel's example it completely decouples the external URL space from the URLs of internal pipelines. In the external sitemap, we just marshall a couple of parameters out of the URL and request headers, and pass them to a flowscript. This is the first time I've used flowscript, but it has been fairly easy to write and it's worked pretty well. The flowscript queries the topic map to find the topic to display, and the appropriate internal pipeline to use. It also looks up other scope topics which define different viewpoints of the other topics. They are such things as different languages, and (since this is a digital library application) we also have simplified and scholarly scopes. The flowscript traverses the class-instance and superclass-subclass hierarchies between the topics looking for a jxtemplate to use (in the appropriate scope). Finally it passes the content topic and the scope topics (and a basic ontology of other topics) to the specified jxtemplate pipeline. In my sitemaps, public pipelines contain almost only map:call and map:read (for static resources) elements. All classical generator-transformer-serializer pipelines go into an internal-only pipeline that can be called from flowscripts only. Admittedly, this is fine for webapps, and maybe not so much for publishing-oriented websites. Yes I think for webapps you could do the mapping just in javascript, but for publishing I think you really need a metadata repository of some sort. You could use an xml document linkbase, as Daniel suggests, or a cms, or a topic map or rdf store, or a sql db or any number of things. Con
RE: Templating: experiments with Conal's html-to-xslt transform
Bertrand wrote: For me, the template author just wants to say all table elements must be copied with border=1 added, but he has no idea in which order the elements will appear in the source, and shouldn't have to care. Does your for-each syntax allow this? I think it requires declarative rules, and if we're careful about how they are defined, they won't be too scary and will add a lot of power. I've put my rules in a separate div on purpose, to make it clear that the template has a linear part and another rules section, and that the rules section works in a different way than the rest. WDYT? I like the idea - in fact I had the same idea myself, but without adding a special rules section. I'm not sure I see the point in keeping it in a special div? I think it's good to allow people to define templates anywhere in the template file ... that way you can take a design dummy page and simply annotate it with these attributes without having to rearrange it. Another thing it really should have is a way to declare global parameters, passed to it from the sitemap. The old stylesheet I posted the other day automatically declares parameters id and random because they were common requirements of our templates, but it would be better to have to declare them explicitly. e.g. html template:parameters=foo bar baz I've done some work (not yet finished) on a similar transform to jxt, but without any pattern-matching templates so far (they're not impossible, just not quite so easy, because jxt doesn't already have pattern-matching templates). Cheers Con
RE: JXTG: invoke macro by name from expression
Leszek Gawron wrote: I would like to add one feature to JXTG that would allow not to promote hacks like [1]. Example: jx:macro name=fooBar somecontent/some /jx:macro you can only invoke it by fooBar/ If I were able to do jx:invoke macro=fooBar/ I would be able to pass macro name as a parameter to other macro which allows quite powerful data injection technique: Yes! Functional programming! +100 Though I would suggest jx:call-macro name=fooBar/ which would make it clear that the attribute is the NAME of the macro to be invoked, rather than the macro itself.
RE: [RT] Attribute Driven Templates
Le 7 déc. 04, à 18:54, Sylvain Wallez a écrit : ...(hearing Stefano coming behind me, ready to shout FS!!! in my ears...) Nahh...his detector exploded yesterday, don't worry. ...Now going back to the annotated-HTML-to-XSL compiler that we (Anyware) wrote many years ago, it allows to mix both syntax, as attributes are simply translated to their XSLT equivalent, and you therefore can write plain XSLT in the HTML file (aargh! I hear Stefano coming!) A similar approach could be used for the template language with a single engine, by simply converting on the fly directives-as-attributes to directives-as-elements... Bertrand Delacretaz: Interesting - do you envision using XSLT for this conversion? Or doing it in java code? At NZETC we have used a similar system to what Sylvain describes: We use XSLT to convert attribute-driven HTML to XSLT. The XSLT is attached (3.5kb). It performs text substitution both in attribute values e.g. p class={foo/bar} and also in the text content e.g. pBlah blah: {foo/bar}/p It supports attributes for-each, if, apply-templates, and copy-of. e.g. li nzetc:for-each=foo/bar{.}/li Obviously the same thing could be easily done with JXT as the target language rather than XSLT. To my mind, this could be a good thing: the JXT 2 language could have a single (element-driven) syntax for everything, and we could use XSLT to convert from this attribute-driven syntax, or indeed from any other attribute-driven syntax that people wanted, if they felt a need for another attribute (is that FS?) Cheers Con html-to-xslt.xsl Description: html-to-xslt.xsl
RE: [Design] JXTG 2.0 (Just say yes!)
Miles wrote: One concern though: Is that results variable a result set or just a collection of data. If the former, how is the database connection handled (closing or returning to the pool)? If the latter, how can large result sets be returned without exhausting memory from a few queries? That's the one case where I see ESQL winning out. Surely the controller script should handle this too? After calling the template it should clean up the model.
RE: Templating Engine - Next steps?
Coincidentally I'm just starting work on a templater too. I have been using JXTemplate and I really like it, but a few things have frustrated me about it, too. Perhaps it's unusual, but I really like it BECAUSE it's like XSLT, and I don't mind at all that it's not like Java. The things I don't like about it are mostly things where it differs from XSLT: In order of priority: 1) JXTemplate has jx:macro elements, which are like xsl:template name=foo, but there's no equivalent to xsl:template match=foo mode=bar. To me, pattern-matching is the thing I need most. I'd like something like: template match=[class/name='org.apache.cocoon.foo.Bar'] html:h1value-of select=baz//html:h1 etc. /template 2) It has 2 expression languages! So you need to say #{foo} instead of foo. I think it should have 1 language, but it could be pluggable. My preferred language is JXPath rather than Jexl, partly because of the specific needs I have in mind (navigating TMAPI object graphs), for which JXPath is ideal. I also think that Jexl is too close to Java and hence a bit over-powered. You can call arbitrary Java methods in JXPath too, but it uses an extension function. I don't know if there are things that Jexl can do that JXPath can't? I don't think so. 3) Logically it is very similar to XSLT, but it has a different syntax for everything; forEach instead of for-each, var set=foo value=bar/ instead of variable name=foo select=bar/, macro instead of template, etc. I think the syntax should be same as XSLT much as possible, to aid in learning, and to help avoid typographical errors. 3) jx:macro elements don't seem to have access to global variables (strange scoping rules) 4) no jx:attribute That's my list! Con -Original Message- From: Reinhard Poetz [mailto:[EMAIL PROTECTED] Sent: Tuesday, 2 November 2004 5:30 a.m. To: [EMAIL PROTECTED] Subject: Templating Engine - Next steps? After a lot of mails talking about expression languages and templating enginges I try to summarize the current state of our discussion. I see following requirements so far (in brackets you find the name of the person that brought up the requirement): - control structures like for/each, if, choose (RP) - call methods in a _simple/natural_ way on passed objects (RP) - stream objects (DOM, XMLizable, ...?) (RP) - define macros/taglibs (see cForm macros) (RP) - works as a Transformer (SAX pipeline) (GP) - Ability to directly produce SAX events, for efficency reasons. (SW) - one default expression language (CZ) it thing most people prefer the dotted JEXL syntax - support for additional expression languages (SW) - use the TemplateObjectModelHelper as Cocoon wide object model (CZ) and not to forget - cacheability (RP) (BTW, has caching been implemented to everybodies satisfaction or are there open issues or possible improvements? Leszek?) and one issue that bugs me for a long time ... - add support for dynamic attributes - similar to xsl:attribute (RP) - o - So, how do we continue to meet all these requirements? A.) Rewrite/refactor JXTemplate - break it up in smaller, easier understandable/maintainable pieces - deprecate #{} syntax and make expression languages plugable as Sylvain suggested - investigate the performance problems (I guess there are only problems if macros are used) - add the missing things from above B.) use Garbage C.) talk with Carsten about Tempo D.) completly new templating engine - o - In my opinion we should write JXTemplate 2.0 which would be from the user's POV as similar as possible to 1.0. Technically this could be a complete rewrite (use garbage, tempo or really from scratch) or at least a major refactoring. Calling it JXTemplate is better for marketing reasons because it shows more stability than telling our user base that they should use something completly different in the future. Calling it differently gives another impression than incrementing version numbers. WDOT ... * are there any missing requirments? * or is it too much (FS)? * which alternative do you prefer? Don't forget, this is no vote ;-) -- Reinhard
RE: [RT] doco lite?
Jorg Heymans wrote: 2)Should any effort towards documentation ATM go into improving its *quality* or improving its {searchability|updateability|scaleability|auto-generateability} WDYT? Personally, I think that Cocoon has a lot of good documentation, but poorly/inconsistently organised. I know I have many bookmarks of Cocoon-related resources, many of which deal with the same actual topic. Often I know I've seen some useful documentation somewhere but it takes a while to find it again. That's why I think that Bertrand's idea of harvesting metadata from the docs and using Lucene to cluster them semantically and automatically is an excellent one. With a live Cocoon instance it would not be too hard (just in XSLT) to harvest a lot of useful lucene fields from the xdocs, wiki docs, mail archives, etc. Java class names, namespace URIs, sitemap tags, etc, are all there already, and could all be used to pull the docs together into a more topical framework. Cheers Con
svn repository
I can't access the SVN repository at http://svn.apache.org/ - or am I looking in the wrong place? I'm accessing it through a proxy, so it's hard to say exactly what happens, but my proxy reports a zero-length response: ERROR The requested URL could not be retrieved While trying to retrieve the URL: http://svn.apache.org/ The following error was encountered: * Zero Sized Reply Squid did not receive any data for this request. -- Conal Tuohy
RE: svn repository
Thanks JD. Actually it seems to have come right just a minute ago! -Original Message- From: JD Daniels [mailto:[EMAIL PROTECTED] Sent: Monday, 18 October 2004 7:04 p.m. To: [EMAIL PROTECTED] Subject: Re: svn repository Try https https://svn.apache.org/repos/asf/cocoon/branches/BRANCH_2_1_X Conal Tuohy wrote: I can't access the SVN repository at http://svn.apache.org/ - or am I looking in the wrong place? I'm accessing it through a proxy, so it's hard to say exactly what happens, but my proxy reports a zero-length response: ERROR The requested URL could not be retrieved While trying to retrieve the URL: http://svn.apache.org/ The following error was encountered: * Zero Sized Reply Squid did not receive any data for this request. -- Conal Tuohy
RE: FIXME in IncludeTransformer [Was: Re: Adding Files to Subversion]
Pier Fumagalli wrote: I was thinking about one thing... The one thing that troubles us is the request. That introduce a degree of variability that (i don't know to what degree), might be counter-productive to analyze and cache... What about if we were doing subrequests, much like in Apache... I mean, why making the included request inherit all the varible stuff? Wouldn't it be simply easier to create a new request/response and start from a clean status. Then we could re-create the request by something like: incl:include src=proto://whatever incl:param name=parameterNamevalue/incl:param /incl:include There's the raw protocol which does this. http://wiki.apache.org/cocoon/Protocols I like it, because it isolates those internal pipeline calls from the full request environment - it means that the URI is a full specification of their behaviour. I guess there might be cases where you want the full request available to the internal pipeline though, and it may be too much hassle to marshall all the necessary parameters to the internal pipeline call, as you indicate above.
RE: Custom extensions - to be made available if possible
Antonio Fiol Bonnin wrote: Thank you, Con, for your very interesting point of view. We were working on (a) but I have told my team that we will be changing approach in one hour if they do not see a clear end. Other than that, I will look into pdftohtml (is it really html?). http://pdftohtml.sourceforge.net/ It can produce HTML or XML. The XML is closer in form to the content of the PDF - it has pages containing text with typographic and positional formatting. The HTML has some of the formatting information removed (I think) and some kind of guess-work is used to stick lines of text back into paragraphs.
RE: Custom extensions - to be made available if possible
Stefano Mazzocchi wrote: What about using XSL:FO? Would be pretty cool to have the ability to transform PDF into FO, basically reversing what FOP does. I know it would be pretty painful to make it work with all kinds of PDF, but for reasonable ones it shouldn't be that hard (PDF is sort of a markup language already). It would be cool, but sadly I think the PDF format usually has too much information thrown away - there's no concept of a flow of text, or even a paragraph! I think SVG (or a subset of course) would be a better match than FO. In tagged PDF there's more information, but most PDF files have a very much simpler structure, of disconnected lines of text, positioned at particular locations on a page. I think the DTD I quoted actually covers most of what you could extract from most PDF files. :-(
RE: Custom extensions - to be made available if possible
Antonio Fiol Bonnín wrote: a) Refactoring SimpleLuceneXMLIndexerImpl so that its private method indexDocument is not private, and taking it to an external component. b) Creating a PDFGenerator (in the cocoon sense of generator, of course). Option (a) seems to be giving us more headaches than pleasure, and option (b) seems cleaner to a certain point. Option (b) would allow to follow links in the PDF file, if developed to that point. I like option (b) too. You could start with plain text, but it could later be developed to extract basic formatting, hyperlinks, bookmarks (the table of contents), images, etc. However, option (b) implies choosing a format for its output (which?), An interesting question. Perhaps html, and begin with an implementation which produces: html head/ body blah blah blahbr/ blah blahbr/ br class=page/ ... /body /html Later you (or someone else) could add extra things as they need them. Alternatively, you could use a more PDF-oriented DTD. I have used a simple freeware tool called pdftohtml which produces XML according to the following DTD: !ELEMENT pdf2xml (page+) !ELEMENT page (fontspec*, text*) !ATTLIST page number CDATA #REQUIRED position CDATA #REQUIRED top CDATA #REQUIRED left CDATA #REQUIRED height CDATA #REQUIRED width CDATA #REQUIRED !ELEMENT fontspec EMPTY !ATTLIST fontspec id CDATA #REQUIRED size CDATA #REQUIRED family CDATA #REQUIRED color CDATA #REQUIRED !ELEMENT text (#PCDATA | b | i)* !ATTLIST text top CDATA #REQUIRED left CDATA #REQUIRED width CDATA #REQUIRED height CDATA #REQUIRED font CDATA #REQUIRED !ELEMENT b (#PCDATA) !ELEMENT i (#PCDATA) and also poses some problems wrt. the sitemap. Until now, we have a pipeline using a reader to read pdf files (static, from disk). And we would need a generator to be invoked instead for the content and links views. How can we do that? Maybe with a selector? But that does not seem very clean. Any hints there? I'm not sure. It might work. I hope someone else can help you with that. But NB there's also another way to build a Lucene index - using the LuceneIndexTransformer rather than by crawling the site and using views. This technique would certainly work with option (b) - a PDFGenerator - but I'm not sure that it would integrate nicely with option (a) since it's a transformer and therefore requires XML. So if you could resolve the sitemap issue with option (b) then it would work with both indexing techniques, whereas option (a) could only ever work with the crawler, I think. Cheers Con
RE: accessing the pipeline structure
Jorg Heymans wrote: Recently there were 2 requests on the users list about accessing the current pipeline structure. snip/ Having this extra metadata would allow for components that can produce different output depending on how they are used in a pipeline (but this probably breaks a few cocoon design rules right?). It certainly creates the _potential_ for components to be horribly tangled up with the pipelines that contain them. snip/ Thoughts? Is this difficult to implement? A 2.2 feature perhaps? I don't know about the use cases presented on the user list, but I know another use case is for debugging and maintenance people to have direct access _from a Cocoon-produced resource_ to the pieces of the pipeline that produced it. In this scenario the final output contains a bunch of hyperlinks to the resources (content, transforms, stylesheets, etc) which can be used to get quick access to edit one of these sources. As an experiment I implemented this for some pipelines using XML processing instructions. Each pipeline component adds a PI to the SAX stream to identify itself (a signature), and at the end a transformer can convert them into HTML link elements or similar. NB this is totally different to (the inverse of) the use of PIs embedded into a source XML file to specify an XSL transformation. It wasn't very convenient to add each PI at each stage, but it seemed to me that the pipeline processor could certainly maintain this metadata more easily, and make it available to components when needed. Perhaps a special pipeline processor could keep track of the pipeline, and use a special transformer which is pipeline aware (InsertPipelineMetadataTransformer) to insert this metadata into the pipeline only when needed. Cheers Con
RE: why JXTG sucks? was: FreeMarker integration
-Original Message- From: Leszek Gawron [mailto:[EMAIL PROTECTED] assume you have a collection of Projects. Each project has a project.description property. This property contains a string that can be parsed by a wiki parser and generate html out of it. How would you implement that. Assume that your controller does NOT know what string properties should be wikified as there are hundreds of this kind of properties and you also have several orthogonal views which query different model parts. I think the idea is that your Java model should present these rich text properties in an already-parsed form. Strictly speaking, you shouldn't have to _parse_ your Java model - just _access_ it. This avoids computations in the view layer. If these properties have internal structure relevant to the view, then the properties should be structured in the model (e.g.a DOM or some kind of graph of Java beans, not just a String). That's the idea, anyway ... so proponents of JXTG would tell you to add the parser to your model, not to the view templates. Otherwise, you would be introducing aspects of your model into the view layer. I agree with Joerg that JXTG is already too powerful. Adding extra parsers etc to it may end up turning it into XSP. As a practical suggestion, maybe you could instantiate a Wiki-parser in the controller script, and pass it to the view template, as another aspect of the model. http://jakarta.apache.org/commons/jxpath/users-guide.html#Extension%20Functions Cheers Con
RE: why JXTG sucks? was: FreeMarker integration
-Original Message- From: Leszek Gawron My question is: why do you call a wiki parser a model aspect if in my example I have to pass it for EVERY model? It looks more like a view plugin really. Where should you draw the line between model and view? In the case of JXTG, it is really only convenient if the view (JXT) can access the full structure of the model programmatically. If the model includes structured text, then the model would have to expose that structure programmatically. For example, I think you have properties (such as your project/description) which contain structured text (in Wiki markup) but are actually just defined as Strings. IMHO the wiki syntax is not an appropriate model for JXT because it is not a _Java_ model but a textual (markup) model. To present Wiki text in Cocoon, you would normally convert it to XML and then style it as HTML, FO, etc. The first step is parsing (convert to XML infoset, whether a DOM or SAX stream), the second step is presentation. I think the first step should be in the model, the second in the view. Why should your project/description property type be Wiki-text rather than DOM? A DOM is no more presentational than the Wiki-text is it? It's just a way of representing the logical structure of the document. Just my 2c Con
RE: [RT] A Groovy Kind of Sitemap
Peter Hunsberger wrote: As others have said, one needs to step back and look at the overall objective: what do you want Cocoon to do when you feed it a request (either via http or CLI or whatever)? Figure out all the high level use cases and their interactions, step back, generalize and repeat. Personally, I end up with something more like RDF and ontology traversal than I do with scripting... I don't think many people could afford the hardware to do that in real-time for large scale web sites, so I come back to XML technologies as a reasonable compromise for the near term. I don't know if this is exactly what you're thinking of, but at my work we are developing something which sounds similar - using XML Topic Maps rather than RDF - and I think (hope) it will be a powerful technique for knowledge-intensive sites. There are 3 parts to it: 1) harvesting or mining the knowledge from the various sources (we use XSLT in Cocoon pipelines to extract knowledge and encode it as XTM). 2) using the semantic network to structure the website itself. For this second part we have a sitemap which handles all requests with a simple flowscript, passing it the request information. This flowscript looks up the requested topic (a concept) in the topic map database (we use TM4J with Hibernate). Then it finds an appropriate jxtemplate for rendering that topic, and calls sendPage(jxtemplate, topic) to render it. The jxtemplate is responsible for rendering topics and inserting xinclude statements to aggregate topic occurrences (resources). So 90% of the sitemap consists of pipelines for rendering various occurrences, but totally decoupled from the website's external URI space. These pipelines are consumed by the rendering templates. The logical structure of the site is entirely in the topic map, the choice of page layout for each type of topic is also in the topic map, but the page layouts themselves are just jxtemplates.
RE: [RT] A Groovy Kind of Sitemap
Stefano wrote: Conal Tuohy wrote: It's [XML sitemap syntax] also potentially useful for validation. Nop, wrong. There is no XMl validation language that can tell you if the sitemap is valid from a cocoon-logic point of view (for example there is no class file name datatype, or no way for the XML validator to know that that class is inside the classloader). You can validate the xml structure but the semantics of that will have to be validated by special code anyway. There will always be some semantics that are out of reach of validation. I believe the Cocoon sitemap language is already Turing-complete so it actually would be theoretically impossible to validate it from a cocoon-logic point of view (even just considering the pipelines and leaving aside the component declarations etc). But some level of validation might still be useful (check that pipelines have generators and serializers, etc). But I'm speaking hypothetically because I have never done it. :-) The fact that XML is a common syntax means that there will always be new things you can with it. FS. This was the argument in the past and it never happened. Personally, I like it as XML. :-) Don't get me wrong, it's not that I don't like it... but many times it felt just too verbose for the task... so it would be kinda cool to have the ability to have two syntaxes. I agree. Or even several syntaxes. I have used the graphic editor Dia to generate XML code and I'm sure it could be used as a graphical pipeline editor for Cocoon. I remember seeing some graphical notation that I think you invented, Stephano. Did you have a system for using it as actual source code?
RE: [RT] A Groovy Kind of Sitemap
Stefano Mazzocchi wrote: without java and flow, the sitemap is far from being turing complete. Well ... this is getting off the topic of the thread, but actually I don't think is true. Maybe you are forgetting that you have recursion with the cocoon: protcol. With URI matchers and selectors I think this is all you would need. The request URI would play the role of the tape of the Turing machine. But some level of validation might still be useful (check that pipelines have generators and serializers, etc). But I'm speaking hypothetically because I have never done it. :-) I disagree. XML-schema like validation for the sitemap is just misleading because it gives you a false sense of solidity that you just can't understand from that point of view. It depends on how much you expect from it of course. I agree it would be foolish to imagine that a validator could prove that a site map was totally solid in the sense of being logically correct. If you appreciate that the schema doesn't prove that your web app will work correctly, a schema could still be quite useful purely as a syntax checker. It might be more useful to use a schema to help MAINTAIN validity while editing a sitemap. There are a few XML editors that will give you context-sensitive assistance for editing a document based on a schema. e.g. http://pollo.sourceforge.net/cocoon.html
RE: [RT] A Groovy Kind of Sitemap
Stefano wrote: The XML syntax makes sense only when you want to process the sitemap iteself via pipeline (for example, to generate an SVG poster of it via XSLT) And makes sense if you want to prevent people from adding scripting inside the pipelines (well, actions are kinda like scripting aren't they) It's also potentially useful for validation. Another thing I like about XML sitemaps is that you can load them in a browser and use + and - buttons to reveal only the sections you want. The fact that XML is a common syntax means that there will always be new things you can with it. Personally, I like it as XML. :-)
RE: Sitemap versionning in TreeProcessor?
Carsten Ziegeler wrote: A long time ago, we agreed that if we change the sitemap syntax, we will change the version number of the sitemap namespace. Is it really necessary to change the xml namespace? How does this help developers? If it's just a matter of being able to identify the version of the syntax, a version attribute on the map:sitemap element would surely be simpler. This is like what happened with XSLT, where XSLT 2.0 has the same namespace URI as XSLT 1.0, despite a lot of difference in syntax. Cheers Con
RE: [chaperon] [patch] .cwiki improvements
Dave Brondsema wrote: Note: the regex used in the 'spacify' template doesn't work. How can I make the replace() function work? Since it gives an error about the function not being found, I haven't tested the regex either: it's supposed to put a space in before capital letters (except for multiple sequential capital letters). D'oh! Of course replace() is an XPath 2 function. So still best to avoid it. :-)
RE: [chaperon] [patch] .cwiki improvements
Dave Brondsema wrote: Note: the regex used in the 'spacify' template doesn't work. How can I make the replace() function work? Since it gives an error about the function not being found, I haven't tested the regex either: it's supposed to put a space in before capital letters (except for multiple sequential capital letters). Dave, replace is not XSLT - I would guess it's some kind of extension function. But I have a recursive template (pure XSLT 1) which does exactly this job, that I wrote just a few weeks ago. I'll see if I can track it down at work tomorrow.
RE: [chaperon] [patch] .cwiki improvements
Dave Brondsema wrote: Note: the regex used in the 'spacify' template doesn't work. How can I make the replace() function work? Since it gives an error about the function not being found, I haven't tested the regex either: it's supposed to put a space in before capital letters (except for multiple sequential capital letters). Hi Dave I've attached a text file containing an xslt 1 template called splitString which could either just replace the spacify template or could be called from it: xsl:template name=spacify xsl:param name=name select=''/ xsl:call-template name=splitString xsl:with-param name=restOfString select=$name/ /xsl:call-template /xsl:template Cheers Con xsl:template name=splitString xsl:param name=restOfString/ xsl:variable name=uppercase(ABCDEFGHIJKLMNOPQRSTUVWXYZ/xsl:variable xsl:variable name=currentLetter select=substring($restOfString,1,1)/ xsl:choose xsl:when test=contains($restOfString, '(') or contains($restOfString,' ') xsl:value-of select=$restOfString/ /xsl:when xsl:when test=string-length($restOfString) gt;= 2 !-- there's a possibility it needs to be split -- xsl:choose xsl:when test=contains($uppercase,$currentLetter) xsl:variable name=followingLetter select=substring($restOfString,2,1)/ xsl:if test=not(contains($uppercase,$followingLetter)) xsl:text /xsl:text /xsl:if xsl:value-of select=$currentLetter/ xsl:call-template name=splitString xsl:with-param name=restOfString select=substring($restOfString,2)/ /xsl:call-template /xsl:when xsl:otherwise !-- current letter is lower-case - just spit it out -- xsl:value-of select=$currentLetter/ xsl:call-template name=splitString xsl:with-param name=restOfString select=substring($restOfString,2)/ /xsl:call-template /xsl:otherwise /xsl:choose /xsl:when xsl:otherwise !-- end of string - just write the remainder -- xsl:value-of select=$restOfString/ /xsl:otherwise /xsl:choose /xsl:template
RE: JXTG jx:attribute
On Sat, Apr 24, 2004 at 04:51:35PM -0700, Christopher Oliver wrote: foo jx:attribute name=bar namespace=http://www.bar.org; value=${bar}/ jx:if test={$fubaz 1} jx:attribute name=xyz value=100/ /jx:if jx:forEach var=item items=${items}/ jx:attribute name=${item.name} value=${item.value}/ /jx:forEach ... /foo The start element event for foo must be buffered until all potential jx:attribute tags have been processed. Since these are within conditional blocks and loops determining when that is the case in an efficient way isn't easy Leszek Gawron wrote: After a longer thinking it really does not look so easy. As it is not known if an element has dynamic attributes or not the start element on the output would have to be postponed for EVERY elemeny till we reach startElement for a child node a endElement for current one. ... or a characters() event containing non-whitespace. Whitespace characters() between an element an jx:attribute should not be output at all.
@author tags (WAS: RE: ASF Board Summary for February 18, 2004)
I agree with Antonio about the utility of @author tags (I have also found them very useful), and I also think that the ASF board's concerns about the dangers of ownership are probably overblown. I don't think the ASF should discourage developers from keeping useful metadata about the code inside the source files. What better place to put the metadata than in the code? This makes it more likely to be used and kept up to date than if it was stored somewhere else, IMHO. I don't agree with the idea that banning author tags would make the changes file more useful because it would encourage developers to keep it up to date. On the contrary, I think people are encouraged when you make things easy; I don't think requiring people to do things the hard way constitutes encouragement. :-) If the board insists then ... OK ... but if the board only discourages the use of @author tags on the grounds that they COULD cause problems, then I think Cocoon should note their concern but keep the @author tags because in THIS CASE they have NOT caused problems. Just my 2c. Con -Original Message- From: Antonio Gallardo [mailto:[EMAIL PROTECTED] Sent: Friday, 27 February 2004 7:53 a.m. To: [EMAIL PROTECTED] Subject: Re: ASF Board Summary for February 18, 2004 Steven Noels dijo: On 26 Feb 2004, at 17:12, Torsten Curdt wrote: + and we remove all author tags I find this just a little bit too reactionary - especially for the little known/used areas of code. We haven't had ownership issues because of them in the past, not? These tags sometimes help to find a contact, when questions remain unanswered on the list. [RT]: Will be enough to browse the CVS to find the folks involved in a concrete file or block? No, we cannot trace many files to the first post. The original file, who made changes, etc? When I started to use the auth-fw, the @author tags allow me to know the names of people that was involved in this. After this I also used the names to harvest the mailarchives looking for help about the auth-fw. Best Regards, Antonio Gallardo.
RE: Cleaning up unused namespace declaration
I don't know if you can configure the xml serializer to drop a namespace (seems unlikely, because such namespace might not be used until the end of the document, for all the serializer knows, so it wouldn't be safe without buffering the entire output document to check). But typically you should suppress the namespace in the XSLT which converts the dir:* content into xhtml, using the exclude-result-prefixes attribute of the xsl:stylesheet or xsl:transform element. This works for me - I just checked! :-) Con -Original Message- From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED] Sent: Monday, 9 February 2004 15:18 To: Cocoon Subject: Cleaning up unused namespace declaration I'm not normally bugged by namespace declarations which aren't used, but boy, something like this just can't go on without me to do something about it: br xmlns:dir=http://apache.org/cocoon/directory/2.0; xmlns:include=http://apache.org/cocoon/include/1.0/ [taken from my blog output] do you have any suggestions on how to use the xml serializer so that it stops doing that? -- Stefano.
RE: Cleaning up unused namespace declaration
It's true that xsl:copy copies namespace declarations that are in scope. But how do you have html elements inside the scope of a dir:directory element? Are you using the XPathDirectoryGenerator? If so, or if you've transformed the dir:file elements into inclusions, etc, then you might want to transform the enclosing dir:file elements into xhtml at the same time. e.g. something like: xsl:transform ... exclude-result-prefixes=dir xsl:template match=dir:file div class=file-content id=[EMAIL PROTECTED] xi:include href=@name/ /div /xsl:template ... /xsl:transform Then when you come to xsl:copy the content in a later stage of the pipeline, you won't have the dir namespace in scope any more. Con -Original Message- From: Stefano Mazzocchi [mailto:[EMAIL PROTECTED] Sent: Monday, 9 February 2004 16:18 To: [EMAIL PROTECTED] Subject: Re: Cleaning up unused namespace declaration On 8 Feb 2004, at 21:37, Conal Tuohy wrote: I don't know if you can configure the xml serializer to drop a namespace (seems unlikely, because such namespace might not be used until the end of the document, for all the serializer knows, so it wouldn't be safe without buffering the entire output document to check). But typically you should suppress the namespace in the XSLT which converts the dir:* content into xhtml, using the exclude-result-prefixes attribute of the xsl:stylesheet or xsl:transform element. This works for me - I just checked! :-) Yeah, well, that doesn't help me because I have the namespace declarations already there in the document I want to process and it appears that xsl:copy copies over the namespace declarations everytime and it's not influenced by exclude-result-prefixes. And this isn't true if you use xsl:element name={name()}, which feels hacky, but what the hell. [read http://www.xslt.com/xsl-list/2002-02/msg00026.html for more info] -- Stefano.
RE: Future of XSP and ESQL [was Re: An idea - transformer logicsheets.]
Stefano Mazzocchi wrote: snip/ I think that the ESQL logicsheet is the only really valuable and unique thing that we have in XSP, so, yeah, I totally understand your concern. Ryan Hoegg wrote: Why not pull the ESQL stuff out and provide a stripped down ESQLTransformer? I haven't used ESQL (I've tried to avoid XSP from day one), but what does it do that the existing SQLTransformer can't?
RE: Another attempt at wrapping code lines
Hi Sylvain I've checked out your new attempt docnbsp.xsl and compared with the other approaches in the text-wrap sample. Your attempt is definitely superior to the current approach (no special handling in the sample) implemented by raw.xsl, which uses just HTML pre. The raw.xsl often leads to infeasibly long lines, which have the effect of making the entire web page too wide to print, or view without horizontal scroll. Your attempt is also better than the split.xsl result in that it maintains the original lines of source and allows the browser to break lines where necessary to fit into the HTML container width. It's more appropriate for the browser to make decisions on line lengths. If your browser is wide enough and/or you have a small enough font, then no line break will ever happen. So you can still copy and paste snippets of source code from the docs and immediately use it (this is important in my opinion). So I think docnbsp is the best candidate for a detault source-handling template. The part of the text-wrap sample which I did, wrap2para, uses a similar approach to yours in that it retains each line of the original source (as HTML p.../p, rather than as code.../codebr, but pretty similar). My xsl also replaced all spaces inside double-quotes with nbsp; to prevent the browser wrapping inside quoted strings, which is a similar idea. But others pointed out that to make these decisions optimally you need to know the type of the source e.g. xml, java, etc. I also think that for source containing quoted XML a fixed-width font is not important, and a variable-width font would be much more readable, as well as narrower which wwould greatly reduce the need for line-breaks in the first place. I realise this can't be done in general, because part of the contract of source is that it is generally rendered in a fixed-width font, but I think if it can be made to apply only to e.g. XML or Java, then it would be OK. I've also been working on another template, based on wrap2para, which only matches source elements which start with lt; and ends with gt;, and then adds XML syntax highlighting. I'll merge and upload a patch shortly. It uses a FSM in XSLT to parse the quoted XML markup. I've got it running on my machine at home in the meantime, installed as the stylesheet driving the Cocoon docs, so it handles everything under: http://203.79.120.217/cocoon212/docs/index.html Take a look! I haven't checked every page, but it works for the ones I've read, and in particular, it's a big improvement for the modules page which was really wide: http://203.79.120.217/cocoon212/docs/userdocs/concepts/modules.html ... but I think it's more readable and better even for the narrow source. I like syntax highlighting :-) So my suggestion is, that we should make your docnbsp.xsl the default source handler, and my highlightmarkup.xsl the handler specifically for source containing XML. Any other types of source should also be dealt with by more specific stylesheets. Nicola Ken Barozzi suggested a syntax highlighter for Java on the Forrest list, but I was more keen to do XML first since the itch that I was scratching was precisely the Cocoon docs, which have mostly XML-flavoured source; sitemap snippets, xslt, basically every kind of xml namespace... I think Java would be only about the same complexity though: for highlighting, say, comments, keywords, strings, numbers, and punctuation symbols. It's not necessary to do a deep parse, just a simple tokenising really. Cheers Con -Original Message- From: Sylvain Wallez [mailto:[EMAIL PROTECTED] Sent: Saturday, 15 November 2003 04:57 To: [EMAIL PROTECTED] Subject: Another attempt at wrapping code lines Hi all, I looked at David's text-wrap sample and added another attempt: every sequence of two consecutive spaces is replaced by a non-breaking space and a regular space. This keeps the text indentation that would be obtained with a pre while still allowing line wrapping. I tried to find a CSS trick to add a mark at the beginning of a continuation line, but CSS only has :first-line and no :other-lines. Please have a look at it, it seems to be to be a good compromise between keeping code indentation and readability of wrapped lines. Sylvain -- Sylvain Wallez Anyware Technologies http://www.apache.org/~sylvain http://www.anyware-tech.com { XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects } Orixo, the opensource XML business alliance - http://www.orixo.com
RE: Another attempt at wrapping code lines
-Original Message- From: Sylvain Wallez [mailto:[EMAIL PROTECTED] Sent: Saturday, 15 November 2003 04:57 To: [EMAIL PROTECTED] Subject: Another attempt at wrapping code lines snip/ I tried to find a CSS trick to add a mark at the beginning of a continuation line, but CSS only has :first-line and no :other-lines. Hi Sylvain What about a marker at the start of every line - or is that too intrusive where (as is usual) there's no wrapped lines at all? I had a play with your code. I tried one way to indicate wrapped lines by colour-coding alternate lines light grey. This way if a code element is wrapped then you can see a double-height line (either grey or white). I don't know that it really works visually though. It looks a bit like that traditional fan-fold paper with alternate-coloured lines: ugly :-) But anyway! I updated my dev site to use docnbsp.xsl as modified above. NB almost all of the source in Cocoon's docs is markup, for which I'm using the syntax-highlighting template, and of the remaining source elements, almost all of them are really narrow, so you have to look hard to find any wrapped lines. But see the DTD at the bottom of this page for a wide non-xml source element (esp if you narrow your browser window): http://203.79.120.217/cocoon212/docs/catalog-test.html And the installation page has lots of narrow non-xml source: http://203.79.120.217/cocoon212/docs/userdocs/installation/index.html And there's always the text-wrap sample with very wide (though less realistic) data: http://203.79.120.217/cocoon212/samples/text-wrap/
RE: [PATCH] docs pages containing source are sometimes too wide
First I want to say thanks to David for putting the text-wrap sample together. --- Additional Comments From [EMAIL PROTECTED] 2003-11-09 01:58 --- Sorry for being destructive, but I do not see a real problem. Or better said this problem resulting from trying to handle all types of using source is to complex and I would not do it. Agree. I had no look on the code, only on the samples, where David shows the possible use cases - and these are really extreme samples :-) there-are-some-funny-examples-there-alright-no-doubt-about-it/ It's almost not possible to handle both indenting and line length for every type of source. Absolutely true. We just need XSLT templates to handle specific types of source. I don't think it's too hard for a type-specific template to recognise it's own flavour of source, and the actually existing source template can handle anything else. The DTD says that source is normally displayed in a fixed-width font, so in general I think authors have the right to expect pre. But for sitemap snippets etc. this is really not good, because there are long class-names like org.apache.cocoon.generation.WebServiceProxyGenerator and attributes like xmlns:dir=http://apache.org/cocoon/directory/2.0; and in a fixed-width font these are much wider than they need to be. The wrap2para.xsl which I wrote was originally for Cocoon's docs in which the source elements almost always contain representations of XML markup. I think it does a reasonable job of that, though it could of course be improved. But it turned out that source has a wider applicability than I realised. My suggestion is clearly no good for the ascii-art example of a file-system hierarchy. That kind of source requires a fixed-width font, and the current templates are fine for that. So I plug for the sniff content option, and 1) for source which is xml/html/dtd use wrap2para.xsl 2) for everything else use the actually existing stylesheet (i.e. pre) I don't think there's much need for split.xsl, but it could be used for (2) where I guess it would usually produce the same output, and would occasionally have to split a long line. Potential enhancements would be: For XML: Allow browser to wrap lines only in the whitespace inside element tags. Syntax highlighting of elements and attributes. For Java: Prevent line-wrapping in //-style comments. Syntax highlighting for string literals, operators, comments, etc. NB the same long lines are in the PDF version of the docs too, see http://cocoon.apache.org/2.1/userdocs/concepts/modules.pdf
RE: Tabs on Cocoon site
Konstantin Piroumian wrote: Had a look at the updated Cocoon site and it is great to see that the layout is now displayed correctly in IE. Though, I don't quite understand why the left menu is still that overcrowded while we have a nice possibility in Forrest to group related items into separate tabs. I agree. I think this would make the left menu narrower - some of the pages are already quite wide and the left menu makes it worse! Cheers Con
RE: [RT] Improved navigation of learning objects
Stefano wrote: automatic harvesting scares the crap out of me, Conal. This is conceptually no different to harvesting JavaDoc tags from Java source. I agree that there must be some kind of automatism going on, but the topic creation is a human task and programs would do a terrible job at doing this. The example I gave assumed precisely that a human editor had written a namespace topic; the harvester was simply linking a document (which mentioned that namespace) to that existing topic. So this is automatic creation of associations or links, rather than topics. But topics can also be safely created automatically in some cases: where good structured metadata exists we can confidently base topics on it. e.g. topics can usefully be automatically harvested from Java classes that implement particular interfaces (generators, transformers, etc). but anyway, we decided to do a first step with handwritten linkmaps. we can move incrementally from there on. Yes that's true. What I particularly like about TM is that they invert the usual relationship of resources to metadata - in a TM the topics are central and the resources are attached to them. So the key activity is to identify the high-level topics (the ontology) and then build a harvester to link your resources to the topics in the ontology. This linking can be done by recognising patterns in the resources (e.g. a reference to a namespace), or, better, by recognising explicit metadata (e.g. JavaDoc). Cheers Con
RE: [RT] Improved navigation of learning objects
Nicola Ken Barozzi wrote: We will probably be moving the Forrest DTD to XHTML2 in one of the next things to do. As you know, there are meta tags that are a nice way of adding additional info in the page. http://academ.hvcc.edu/~kantopet/xhtml/index.php?page=xhtml+me ta+content Types of meta-information include: * The contents and topics of the document. DocBook has this too - you can markup a section as relating to a topic in some external controlled vocabulary. I think this is a good way to relate a document or part of a document to a particular concept or set of concepts. Stefano wrote: I agree that there must be some kind of automatism going on, but the topic creation is a human task and programs would do a terrible job at doing this. Nicola wrote: It's still humans editing them, but the information can be scattered in the documents themselves. Yes ... the topics (at least the main, ontological, topics) are written by hand, and the documents contain metadata that LINKS them to these ontological topic. So the metadata in the docs are precisely ASSOCIATIONS rather than topics. but anyway, we decided to do a first step with handwritten linkmaps. we can move incrementally from there on. What I see is that metadata in the docs cannot and should not totally replace a centralized bell-written site.xml, but can nicely complement it. Agree 100%. Except for the bell-written bit. :-)
RE: [RT] Improved navigation of learning objects
Stefano Mazzocchi wrote: I agree that there must be some kind of automatism going on, but the topic creation is a human task and programs would do a terrible job at doing this. The example I gave assumed precisely that a human editor had written a namespace topic; the harvester was simply linking a document (which mentioned that namespace) to that existing topic. So this is automatic creation of associations or links, rather than topics. yes, but this is simply spreading the issue of topic creation all over the place, you are not making it any easier (IMHO) I don't follow you. It seems to me that harvesting can make it easier. See below: But topics can also be safely created automatically in some cases: where good structured metadata exists we can confidently base topics on it. e.g. topics can usefully be automatically harvested from Java classes that implement particular interfaces (generators, transformers, etc). True, but again, I don't see the point. I'm sure that if we make the editing interface to our doc system people will find it much easier to just make a list of components and update them as we go (expecially since they are not so many). I respectfully disagree. For example: there are many sitemap components that are not adequately documented. The same is true of xml namespaces used in Cocoon. Which ones are undocumented? Well, who knows? It's not easy to see because they're not documented! :-) If we could harvest a topic from the java source of the component, then the lack of documentation for a given component or ns would be immediately obvious, in the form of a topic without any useful content: a blank page saying write something here. A slightly similar phenomenon exists on the Wiki, where you can reify a topic as a Wiki page just by referring to it in another page. Cheers Con
RE: [RT] Improved navigation of learning objects
Bertrand Delacretaz wrote: Some form of topic map would be useful to build what's related info though, which helps navigation and discovery a lot. Stefano Mazzocchi wrote: Yes, but such a topic maps would have to be human edited. this is what scares me. tools for ontology creation (like Protege, http://protege.stanford.edu/ for example) are available but are *incredibly* complex to to use. Certainly some, but not all of this topic map needs to be human-edited. I think a basic ontology could be written by hand in a text editor and still be large enough to be usable. NB there are other sytaxes for authoring topic maps which are simpler than XTM, too. To be useful, a hand-written ontology would only need to cover some of the core concepts such as namespace, component, block, howto, generator, transformer. The bulk of the topics and relationships are implicit in the docs, and could be automatically harvested into XTM with a bunch of XML stylesheets, and linked to the underlying ontology. For instance, references to java classes, XML namespaces, etc could be automatically harvested from xdocs: xsl:template match=source[contains(.,'http://apache.org/cocoon/request/2.0')] mode=harvest-namespace-topics !-- This document contains a reference to the request namespace, so it can be harvested as an occurrence of the namespace's topic -- xtm:topic !-- topic for the request namespace -- xtm:subjectIdentity xtm:subjectIndicatorRef xlink:href=http://apache.org/cocoon/request/2.0/ /xtm:subjectIdentity !-- this occurrence -- xtm:occurrence xtm:resourceRef xlink:href={$current-page-url}/ /xtm:occurrence /xsl:template xsl:template match=source[contains(.,'org.apache.cocoon.')] !-- reference to a cocoon class: -- !-- harvest the class-name and link this resource to the topic reifying that class -- etc. /xsl:template A topic map layer could be harvested not just from the docs, but also from the Wiki, the javadoc, the cvs, etc, etc, and the resulting topics merged to reveal the relationships which are currently produced by hand. Then you can define a website FROM the topic map. Also, of course, you can use other TM tools such as the tm-visualiser tmnav which displays the topic map using the TouchGraph component. Cheers Con
RE: DirectoryGenerator
Hi Alfred Alfred Fuchs wrote: Conal Tuohy wrote: [...] Alternatively, if it really is necessary to call back another pipeline, what about using a generic transformer like the XIncludeTransformer? I will explain, what I want to do: (1) I have a website with a topic actual. people should be able to upload HTML or (docbook)XML files to this directory. I want scan this directory and generate a overview of the files with a description exctracted from the titles. solutions: - XPathDirectoryGenerator? but the HTML files? - DirectoryGenerator-CInclude-transformer-WriteTransformer- ...-HTMLGenerator- You could use DirectoryGenerator, transform the result into include statements that use the cocoon: protocol to refer back to your mime-type-specific pipelines (which return the file descriptions), then use the IncludeTransformer to import these descriptions. So the CIncludeTransformer for instance can call your pipelines, rather than the generator doing so. snip/ but at this stage a thought: I have a pipeline for this file type (depends on the name) and a pipeline for that file type. why not extend the mechanism of map:aggregate and give a FileInfoDirectoryGenerator a prefix for a pipeline or resource, that is calld on every found file. Yes I see your point. Maybe I've missed something, but I just wondered why this facility should be built into the generator - rather than performed as a subsequent step using an inclusion transformer.
RE: DirectoryGenerator
Alfred Fuchs wrote: in my expamle I extract the title of a HTML page in this way: if title exist and title not empty, use it as title. otherwise use the first h1 etc... this is logic, simply done in a xslt, but hoe to do this in a single xpath-query? string(/html/head/title[normalize-space()]|/html/body//h1[1]) Cheers Con
RE: Related Documents, MapTransformer and Topic Maps
Rogier Peters wrote: snip/ Googleing xml and relations quickly brought me another subject that I haven't seen discussed much here - XML topic maps. On of the big advantages of topic maps over my simple mapping is the amount of semantics that topic maps allow. Topic maps allow one thing to be related to another, and also describe what the one thing is, what the other thing is, and what kind of relation they have. So the next step would be to implement a topic map transformer. There is a apache-license topic map project at http://sourceforge.net/projects/tm4j. I'm definitely going to look into it myself, but need to do some reading first, and I would like to discuss it. By the way, if you don't like topic maps, I would like to know too - I wasn't able to find any criticism on the matter (googleing 'why topic maps are bad' or 'topic maps suck' didn't help) I've done some experimental work with Topic Maps in Cocoon - using XSLT to harvest TMs from other data sources, merge them, and then to render them as web pages with related links. See for example http://www.nzetc.org:8080/tm/corpora.html for a TM-based view of some of our website that shows some of these relations but virtually no actual content (warning: it's very slow). I think the technology holds a lot of promise, and could be particulaly useful in things like Forrest, but we will need some extra components before they will be readily used in Cocoon, particularly a TopicMapMergeTransformer, and some kind of TM-oriented templating transformer for rendering. I haven't had a chance yet to deal with it, but it's on my list of things to do :-) By the way, did you realise that the tm4j project actually already includes some Cocoon components? Cheers Con attachment: winmail.dat
RE: DirectoryGenerator
Peter Hunsberger wrote: Conal Tuohy [EMAIL PROTECTED] wrote: string(/html/head/title[normalize-space()]|/html/body//h1[1]) The string() function returns the string value of the FIRST NODE in the resulting nodeset. Ahh, that would do the trick. I believe in some old implementations (Xerces/Xalan) I've run into that wasn't the case; I'm pretty sure I've seen the union of the strings. Might be a tad expensive, but most likely he can be more specific on the path to the h1... Yes. Though theoretically, the xpath intepreter should optimise the expression and evaluate the right-hand side of the union operator only if the left hand side is empty (like the or operator in Java if statements). I don't know if this actually happens but it's an obvious and probably rather easy optimisation to implement. I tested the XPath expression in an XSLT stylesheet, but in the original example Alfred was replacing the XPathDirectoryGenerator, which looks up an XPath Component I believe ... if it turns out that the default xpath implementation is sub-optimal (or even buggy) it could be worth putting the effort in there. Alternatively, if it really is necessary to call back another pipeline, what about using a generic transformer like the XIncludeTransformer? Con attachment: winmail.dat
RE: [RT] Improving Views
Sylvain Wallez wrote: Relating this to the newly introduced virtual components, we can consider the view definition as a virtual serializer. Example : map:serializers map:serializer type=xml src=oacs.XMLSerializer/ map:serializer type=pretty-xml map:transform type=xslt src=xml2pretty.xsl/ map:serialize type=xml/ /map:serializer /map:serializers map:views map:view name=content from-label=content serializer=xml/ map:view name=pretty-content from-label=content serializer=pretty-xml/ /map:views I like this, and while we're at it - I'd also like to see the ability to define different views with the same name. I don't think this is possible at present. e.g. map:views map:view name=search from-label=tei serializer=tei-search/ map:view name=search from-label=html serializer=html-search/ map:view name=search from-label=svg serializer=svg-search/ /map:views When accessing a pipeline using a view called X, the pipeline processor could branch off from the main pipeline at the FIRST label Y for which there exists a map:view with @name=X and @from-label=Y. This would make it easy to define e.g. a search view of heterogeneous pipelines, with different types of content. Typically the label attributes in the pipeline would refer to the TYPE of the content at that stage in the pipeline. Cheers Con
RE: [RT] Webdavapps with Cocoon
Stefano wrote: If I have a resource like http://blah.com/news/ and I want to edit, I could ask for http://blah.com:/news/ http://edit.blah.com/news/ http://blah.com/news/?view=edit; which are all 'orthogonal' ways of asking a different view of the same resource accessing it thru a parallel URL space (but with different behaviors) Marc wrote: the only thing I am still strugling with is if there could be an automatic way for the clients to know about these other URI's? I mentioned the possible use of PROPFIND somwhere down the line for communicating READ-ONLY state, in this light this could be extended to having a property that describes the URL or 'view' to get into editing? (or at least a url to a linkbase that mentiones the different allowed view-modes, their role and their URL?) on the web-page side of thinks I guess there would just be an 'edit' link on the page? I've been thinking about this same thing too - how to provide write access to distinct aspects of a website to different people. When a web page is created by Cocoon, the different aspects are woven together, but the resulting web page could also be annotated with links to the individual aspects (as e.g. WebDAV or FTP URLs, so they could be edited). Here are my thoughts about it: 1) how to keep track of the editable resources used in any particular pipeline, so that they could be individually exposed for editing? 2) how to encode these URIs for presentation to the user agent? For the first, my idea was that each stage in the pipeline could insert xml processing instructions into the SAX output, recording the webdav or FTP URIs that correspond to the editable resources used in that stage. This would be a trivial addition to any XSLT in the pipeline, or it could be added by a specialised annotator XSLT, something like: xsl:param name=$edit-uri/ xsl:param name=$edit-title/ xsl:template match=/ xsl:processing-instruction name=editor href={$edit-uri} title={$edit-title} /xsl:processing-instruction xsl:copy-of select=./ /xsl:template In the sitemap, this could be called like this: !-- generate content -- map:generate src=content/{1}.xml/ !-- annotate content with an editing link -- map:transform src=annotate-edit-link.xsl map:parameter name=edit-title value=content/ map:parameter name=edit-uri value=http://blah.com/webdav/content/{1}.xml/ /map:transform !-- style content -- map:transform src=style/{2}.xsl/ !-- annotate with style-editing link -- map:transform src=annotate-edit-link.xsl map:parameter name=edit-title value=style/ map:parameter name=edit-uri value=ftp://blah.com/style/{2}/.xsl/ /map:transform etc. Or perhaps a pipeline processor could insert them automatically, based on a small addition to the current sitemap syntax: map:generate src=content/{1}.xml edit-uri=/webdav/content/{1}.xml label=content/ map:transform src=xsl/some-transform.xsl edit-uri=/webdav/xsl/some-transform.xsl label=style/ Finally, at the end of the pipeline, the PIs would be harvested by component that would encode these links for presentation to the user. The problem is, how to encode these links in a way that a standard client can handle? It would be really nice to just add elements like this: link type=edit-url title=content href=/webdav-root/content/blah.xml link type=edit-url title=skin href=/webdav-root/skin/zebra.xml link type=edit-url title=css href=/webdav-root/css/default.css etc. Or RDF could be used instead (again, in conjunction with some client-side javascript to handle the links). Of course, no standard client application will know what to do with these. But with some client-side javascript, these links could easily be converted into an aspect-oriented editing toolbar attached to the browser window frame. One way would be to write the links (and associated javascript) into the html pages just before serializing. e.g. map:match pattern=*/*.html map:generate src=content/{1}.xml map:transform src=xsl/{2}.xsl !-- inserted this transform to allow editing: -- map:transform src=insert-editing-links.xsl/ map:serialize/ /map:match The downside is that this works only for HTML. Finally, the option that looks best to me would be to communicate the webdav URIs to the client browser through a separate channel such as using the Annotea protocol. These links would appear as annotations to an Annotea client (e.g. Snufkin or Annozilla). When an Annotea-aware browser loads a web page, it queries an Annotea server to find any annotations of the page. Cocoon would be producing both the page and the annotations in this case - the annotations would be a Cocoon view of the regular output pipeline; a view in which just the editor PIs are harvested and presented as links. One big advantage of this technique is that it could be used to
RE: Recent changes to LuceneIndexTransformer
Carsten wrote: After the last changes to LuceneIndexTransformer, a fragment of the code looks like this: if (analyzerClassname == null) analyzerClassname = this.analyzerClassname; String sMergeFactor = atts.getValue(LUCENE_QUERY_MERGE_FACTOR_ATTRIBUTE); mergeFactor = this.mergeFactor; Now, there is no local variable analyzerClassname nor mergeFactor, so the assignments are useless. Looking at the code, I would guess that there is something wrong with it. Does anybody know what? Yes ... it was me. These used to be local variables as well but I removed them when refactoring Vadim's code. I will fix it. In the meantime, the lines of code are at least not harmful. Con
RE: [Vote] Controller/Sitemap integration
Stephan Michels wrote: map:pipeline [...] map:continue type=petshop id={1}/ [...] /map:pipeline I think it's a potential source of confusion to have id attributes which are not of type ID (i.e. they are not unique identifiers of elements in the sitemap). Of course, I realise that id is just an abbreviation for identifier, and that it's purely conventional that id attributes are of type ID, but I think we should respect this very common convention because doing so will lower the cognitive burden for people learning Cocoon flow. http://www.w3.org/TR/REC-xml#id Similarly, a newbie might expect a state-id attribute to be of type IDREF, which is a mark against it IMHO. http://www.w3.org/TR/REC-xml#idref And again, that's why I think src would be a poor name for the same attribute, because an attribute with this name would conventionally contain a URI. http://www.w3.org/TR/xmlschema-2/#anyURI That's why I'd prefer any other of the alternative names proposed: state, from, continue, or flow. Cheers Con
RE: [Vote] Controller/Sitemap integration
Marc Portier wrote: The following might seem like nagging but I do share Sylvain's eagerness to get names really right, so I'm wide open for other alternatives and views... I don't have a vote either :-) but I agree - names are a very important detail, so I'll stick my nose in... [C] A state of the controller is called by: map:call state= map:parameter name=x value=y/ /map:call We don't call states in this sense. We continue a continuation ;-) actually I think we continue with the 'use-case' or we continue the 'interaction' I guess map:continue continuation={1}/ is bad. map:continue src={1}/ or map:continue id={1}/ Still map:continue state-id=.. / might make sense as well? What about map:continue from=.../ I agree with your analysis completely ... personally I find map:continue with attribute of id or source or state sounds jarring - it sounds like you would be continuing a state, or continuing an id, which is wrong, as you say. It doesn't read smoothly in the sense of a regular English sentence, whereas continue from ... reads very naturally. The point is that you continue a FLOW (a use-case as you say), and that you continue the flow FROM a particular point which must be identified by this attribute. Continue is the active (verb form) word which identifies the activity (noun form would be continuation). From is a word that identifies the role that the rhino-continuation or FSM-state plays in this activity, without having to give it some overly-specific name (state, continuation, location, point, etc) My 2c Con