Re: [WSG] Naked metadata - RDF in HTML
Andy Kirkwood|Motive said: An interesting application of the technology, although I'm not sure that is addresses how to make it *easier* for administrators to maintain metadata records. and (Assuming the ideal solution would be a wysiwyg editing environment for non-technical content authors.) Andy, I see value in all the points you raise - I'd like to offer some counterpoint. I'm approaching the subject with the idea that metadata is important in order for people to find (related) information at some later time. I think the issue being addressed by Jonathon was not how-to in a WYSIWYG editor, rather that metadata is not front-of-mind when editing an existing resource. The method presents an elegant solution for metadata that is important for an external audience/end users (who wrote it, when, what's it about, what else is there, where am I with regard to related documents), as opposed to the internal management of a collection (similar but slightly or significantly different to the above). -adding DC class values to span elements is not a mark-up behaviour likely to be supported by wysiwyg editors The leading WSIWYG editor can be extended, with much gnashing of the teeth and swearing, to provide this type of functionality. In fact, that is a major selling point. -administrators will still not entirely 'see' the metadata they've added, as it is the combination of the name and content values that creates a meaningful record, and this would only be visible at a code level I think the opposite. Sure, the finer points of the machine readable part of the record is invisible, but the metadata itself become recognisable patterns that are contained within the document, *are* visible, and not abstracted to another level. How many people do you know who save adequate (any?) metadata with their word documents? Out of sight out of mind. Authors have the opportunity to administer the metadata for their own content in a simple, relevant way. Again, the popular WSYIWYG editor can be extended to help less-savvy people. -the benefit of metadata is that it can be used to classify content to a significant degree of detail *without encroaching upon the visible page content itself*. Agreed. Though see my point earlier re: external and internal metadata. The example provided, http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml , re-purposes content as metadata. If the content is edited, the record could (unintentionally) be deleted, or the content rewritten to included the records required I'm missing something here... this reads like an argument in favor of both sides: you can delete the metadata or add it? -if metadata records are split between the head and body of a document, review would likely require a greater degree of concentration/quality assurance and/or additional supporting technologies (such as a metadata record 'viewer' that would reveal both conventional and class-based records) -etc. A custom-built CMS, as a companion to a well-supported publishing process, is still your best bet. For enterprise sized endevours with a huge budget or significant inhouse savvy, sure. The metadata records can be entered at the same time as the content, with values selected from a controlled vocabulary, etc. and then output either into the head or body as required. After all, it's more than just the ability to add or edit metadata records, its also the relevance of the values entered to the content, end-use of the records and the intended community. One word: Tags. Bottom up, ad-hoc, and eventually convergent labelling seems to have a lot more traction in the wider audience than thesauri, and controlled vocabs. Problem is the latter are usually not revealed to end users, and thus run the risk of being pretty meaningless as a tool to help them find stuff. Of course, the opposite is true in a closed community (i.e where people know the vocab). Lastly, naked metadata will be indexed by (public) search engines, used to determine relevance, and returned in SERP's. kind regards Terrence Wood. ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Naked metadata - RDF in HTML
Title: Re: [WSG] Naked metadata - RDF in HTML Hi Terrence, It feels like we're talking at cross-purposes? I'm approaching the subject with the idea that metadata is important in order for people to find (related) information at some later time. Interesting and valid point, unfortunately not the point being discussed. I think the issue being addressed by Jonathon was not how-to in a WYSIWYG editor, rather that metadata is not front-of-mind when editing an existing resource.' I equate front-of-mindness with visibility, hence reference to an editing interface that will *show* the metadata records--a wysiwyg editor. Jonathan's focus was on the author and not the reader. From the original post: ** The problem ** People updating Web pages often doesn't update the metadata in the header. The method presents an elegant solution for metadata that is important for an external audience/end users (who wrote it, when, what's it about, what else is there, where am I with regard to related documents), as opposed to the internal management of a collection (similar but slightly or significantly different to the above). I was not advocating a separate metadata collection, but rather that metadata within a single document may be more elegantly edited/updated if all contained within the head of the document, than when the records are mixed-in with the content. The leading WSIWYG editor can be extended, with much gnashing of the teeth and swearing, to provide this type of functionality. In fact, that is a major selling point. Moving away from specifics of which tool, the issue is still educating authors on a practice that is peripheral to writing the content. To create and maintain metadata requires the author to either care about metadata it also helps if they *see* the metadata when editing/updating the content. The RDF approach requires the author to have access either to the source code or the means by which they can assign classes to spans. Wysiwyg editors have *not been created to include a work flow that is optimised for adding metadata records to content in this manner*. I think the opposite. Sure, the finer points of the machine readable part of the record is invisible, If the incorrect class value is assigned, the meaning of the record changes. Say for example I accidentally markup the author's name as the title: span class=dc-titleAndy Kirkwood/span At a visual level (i.e. without viewing the name value) it is not possible to spot the error. It would also be easy to accidentally add content to a record when editing, e.g. span class=dc-titleAndy Kirkwood will be out of the office until next week/span Authors have the opportunity to administer the metadata for their own content in a simple, relevant way. Again, the popular WSYIWYG editor can be extended to help less-savvy people. As far as I'm aware, the cutomisation available does not replace the need for the author to care about metadata :). That the RDF method is simple is definitely debatable. How is adding spans to content more or less relevant to an author than adding records to the head? The example provided, http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml , re-purposes content as metadata. If the content is edited, the record could (unintentionally) be deleted, or the content rewritten to included the records required I'm missing something here... this reads like an argument in favor of both sides: you can delete the metadata or add it? At a visual level, that the word 'Anna' is a metadata record for the first name of the author would not be apparent. I might re-edit the copy from Anna spoke to Susan on the phone to She spoke to Susan on the phone. By removing 'Anna', I've remove a metadata record from the document. To maintain the metadata record I would then need add 'Anna' to somewhere else. Keeping track of which records have and haven't been entered would be a nightmare. It's enough as an author to keep an eye on structure, grammar, spelling, etc. -if metadata records are split between the head and body of a document, review would likely require a greater degree of concentration/quality assurance and/or additional supporting technologies (such as a metadata record 'viewer' that would reveal both conventional and class-based records) -etc. A custom-built CMS, as a companion to a well-supported publishing process, is still your best bet. For enterprise sized endevours with a huge budget or significant inhouse savvy, sure. Savvy enough to care about metadata, not savvy enough to edit it when all the records are in the head, but savvy enough to pick through the content and assign classes to spans to approximate metadata records AND keep track of which records have and haven't been completed? An author that is comfortable with adding span elements with class values corresponding to the DC standard is not the 'problem'. It's the person who forgets to add metadata records when authoring content. Embedding
Re: [WSG] Naked metadata - RDF in HTML
Thanks Jonathon. This is great, I have forwarded a link to your page to our metadata people. -- kind regards Terrence Wood. ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Naked metadata - RDF in HTML
Some more thoughts (that send button is just too easy to press =) Would using a rel or rev attribute be more appropriate than using a class to delineate the metadata? These attributes imply a relationship whereas class does not. If you needed to get at elements containing metadata at the presentation level you could use: element[rel=dc.title] Maybe it's not too late to have that conversation on the DC.General list, or with Ian Davis? Maybe, it's just not that important? -- kind regards, Terrence Wood. ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Naked metadata - RDF in HTML
Hi Jonathan, An interesting application of the technology, although I'm not sure that is addresses how to make it *easier* for administrators to maintain metadata records. ISSUES (Assuming the ideal solution would be a wysiwyg editing environment for non-technical content authors.) -adding DC class values to span elements is not a mark-up behaviour likely to be supported by wysiwyg editors in such a manner that it would be 'effortless' for an author, i.e. the author would typically need to edit the source code to add appropriate class values -administrators will still not entirely 'see' the metadata they've added, as it is the combination of the name and content values that creates a meaningful record, and this would only be visible at a code level -the benefit of metadata is that it can be used to classify content to a significant degree of detail *without encroaching upon the visible page content itself*. The example provided, http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml , re-purposes content as metadata. If the content is edited, the record could (unintentionally) be deleted, or the content rewritten to included the records required -if metadata records are split between the head and body of a document, review would likely require a greater degree of concentration/quality assurance and/or additional supporting technologies (such as a metadata record 'viewer' that would reveal both conventional and class-based records) -etc. A custom-built CMS, as a companion to a well-supported publishing process, is still your best bet. The metadata records can be entered at the same time as the content, with values selected from a controlled vocabulary, etc. and then output either into the head or body as required. After all, it's more than just the ability to add or edit metadata records, its also the relevance of the values entered to the content, end-use of the records and the intended community. Food for thought anyway... Best regards, -- Andy Kirkwood | Creative Director Motive | web.design.integrity http://www.motive.co.nz ph: (04) 3 800 800 fx: (04) 970 9693 mob: 021 369 693 93 Rintoul St, Newtown PO Box 7150, Wellington South, New Zealand ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Naked metadata - RDF in HTML
Hi Ian, Liddy, Charles, Peter, Misha, Alan, Patrick, Andy, Geoff, DC-General and WSG Thank you for all your help and comments. In particular, thank you, Ian, for RDF in HTML. Last week, I wrote to the DC-General and the Web Standards Group mailing lists. I was lamenting the fact that Dublin Core metadata needed to be embedded in the head of the Web page, and that people often didn't update the metadata when they updated the Web page. I proposed a half-baked idea, and asked for comments. Everyone was extremely helpful, and gave me really valuable feedback. I learnt a lot. ** RDF in HTML ** In particular, I learnt that RDF in HTML [1] will do exactly what I want. It provides a valid way to embed Dublin Core (or other) metadata in the Web page. I can use class attributes, so it is CSS-friendly. It can be harvested using a Gleaning Resource Descriptions from Dialects of Languages [2] (GRDDL)-aware harvester. And Ian has built a GRDDL-aware harvester, Embedded RDF Extractor, [3] that I can use to test my pages. Now, I have built a page, and it works! http://purl.nla.gov.au/net/jod/tutorial/naked-metadata.html If anyone would like to have a look at it, I would appreciate feedback. Have I got it right? Are there things that I could be doing better? ** XHTML2 ** And Misha pointed out that XHTML2 [4] deals with this very nicely. In XHTML2, meta elements can appear in the body of the document, not just the head and any element can link to them. So, once again, thanks everybody. The Internet continues to blow my mind! ** References ** [1] RDF in HTML: http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml [2] Gleaning Resource Descriptions from Dialects of Languages (GRDDL): http://www.w3.org/2004/01/rdxh/spec [3] Embedded RDF Extractor: http://research.talis.com/2005/erdf/extract [4] eXtensible HyperText Markup Language 2 (XHTML2): http://www.w3.org/TR/xhtml2 -- Jonathan O'Donnell mailto:[EMAIL PROTECTED] http://purl.nla.gov.au/net/jod +61 4 2575 5829 ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
[WSG] Naked metadata
Hi WSG'ers After seeing Sarah's post about CSS for titles, I thought that people might be interested in this idea. It's a half baked idea. If you have any comments or suggestions, I would love to hear them. Apologies for those who have already seen this on the DC-General list. ** The problem ** People updating Web pages often doesn't update the metadata in the header. ** The solution ** Tag appropriate Web data with id attributes. Point to the data from the appropriate metadata field in the header. ** Example ** !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd; html xmlns=http://www.w3.org/1999/xhtml; head titleNaked Metadata/title meta name=DC.title content=#title / meta name=DC.creator content=#creator / meta name=DC.creator content=#rights / /head body h1 id=titleNaked Metadata/h1 h2 id=creatorJonathan O'Donnell/h2 p id=rightshttp://purl.nla.gov.au/net/jod/tutorial/metadata.html copy; Jonathan O'Donnell 23 October 2005/p /body /html ** Background ** At DC-ANZ 2005, Eve Young and Baden Hughes made the point that people updating Web pages often don't update the metadata. One of of the problems that they talked about was that metadata in the header is essentially invisible to people editing the page (when, for example, using some wysiwyg editors). In general, data (including metadata) should be stored in one place only. This prevents drift: if it is only stored in one place, it can only be updated in that place. Often, the information that we want to store as metadata already appears in the Web page. Examples include the title, description (especially as opening paragraph) and the author's name. In footers, we often find rights information, the URL, and date information. If this information already exists in the data, and we replicate it in the metadata, there is the danger of drift. Perhaps pointing to the data from the metadata fields is a way of preventing drift, and ensuring that the metadata is as up-to-date as the data. ** Method ** In html (including xhtml), one way of doing this is to use id attributes. Many Web developers use these already to style particular aspects of a Web site. They can also be used as a target anchor for hypertext links For example, if you use this tag: p id=rightscopy; Jonathan 2005/p in the page: http://example.net/foo.html Then the URL http://example.net/foo.html#rights will point to that paragraph. ** Advantages ** + Metadata sits with the data. + As data is updated, the metadata continues to be current. ** Disadvantages ** + id attributes must be unique within a Web page. -- Jonathan O'Donnell mailto:[EMAIL PROTECTED] http://purl.nla.gov.au/net/jod +61 4 2575 5829 ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Naked metadata
Jonathan O'Donnell wrote: meta name=DC.title content=#title / meta name=DC.creator content=#creator / meta name=DC.creator content=#rights / I'm not a DC expert, but I believe that that's not a valid way to go about it. When you embed DC information via meta elements, content needs to contain the actual value, not a reference to another location. When the expected value itself is a URI, you should use link elements...but even that doesn't apply in this scenario, as it's usually reserved for things like DC.relation Have a read through http://dublincore.org/documents/dcq-html/ Now, a different approach may be to process all pages server side on a regular basis to fill in the correct DC.title etc meta elements based on the content of the actual page in case they've been left empty, which could probably be achieved with a single XSL transformation or similar. -- Patrick H. Lauke __ re·dux (adj.): brought back; returned. used postpositively [latin : re-, re- + dux, leader; see duke.] www.splintered.co.uk | www.photographia.co.uk http://redux.deviantart.com __ Web Standards Project (WaSP) Accessibility Task Force http://webstandards.org/ __ ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Naked metadata
Hi Jonathan, I second Patrick's comment that 'pointing' the DC records to content on the page is not the solution. Although, from a maintenance perspective, this may appear to be a work-around for not completing the metadata records (questionable), metadata harvesting tools are unlikely to populate the content attribute of the meta element with content from the webpage. In other words, the metadata records cease to have value as metadata. Consider educating content authors or moving to a CMS. For example, in a CMS, a rule could be created that require a minimum set of metadata records to be completed before content can be published. If using a static system, then adding a placeholder for metadata content to template pages may be a solution, e.g. meta name=DC.title content=[tba] / meta name=DC.creator content=[tba] / (The author would then search the source code for the string '[tba]' as part of the publishing protocol to remind them to complete the md records.) ** The problem ** People updating Web pages often doesn't update the metadata in the header. ** The solution ** Tag appropriate Web data with id attributes. Point to the data from the appropriate metadata field in the header. Best regards, -- Andy Kirkwood | Creative Director Motive | web.design.integrity http://www.motive.co.nz ph: (04) 3 800 800 fx: (04) 970 9693 mob: 021 369 693 93 Rintoul St, Newtown PO Box 7150, Wellington South, New Zealand ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **
Re: [WSG] Naked metadata
Jonathan O'Donnell wrote: Hi WSG'ers In general, data (including metadata) should be stored in one place only. This prevents drift: if it is only stored in one place, it can only be updated in that place. Often, the information that we want to store as metadata already appears in the Web page. Examples include the title, description (especially as opening paragraph) and the author's name. In footers, we often find rights information, the URL, and date information. If this information already exists in the data, and we replicate it in the metadata, there is the danger of drift. Perhaps pointing to the data from the metadata fields is a way of preventing drift, and ensuring that the metadata is as up-to-date as the data. ** Method ** Hi Jonathan, Given what you have said here, and what I would expect to see in serious authoring tools and CMSs, I think this area is generally neglected in most publishing tools (last time I looked). Quit a few CMS's say that they are DC compliant, but as you mentioned, do they actually store the data in one place, and not in the web pages? Is it part of the work flow and version control of the documents? I don't think so. I'd be glad if anyone can point me to a product that does address this need. For a CMS to address this properly, it needs to have incorporated a normalised schema based on DC into it's database. This was all the pages published from this system can incorporate the various metadata as well as alt and longdesc for images. Many organisations have legal requirements where they require snapshots of published data from any given time. A publishing system based on DC not only allows this features, but allow a complete analysis of all the subcomponents of a document and the various contributors. That also leads to problems with document management systems that manage their meta data from properties within the documents and network environment variables. Last time I tried to extract metadata from MS Word, using Perl and Python, I could only get the standard set of properties, any data in custom properties was unretrievable (at least by me). I don't know what OO or the latest MS Office offers. But I don't think asking users to maintain this data will work, unless they are librarians. I think that it has to be as automated and as transparent to the user as possible, because most users are just not interested in this level of site QA, unless it is an important component of the job. Regards Geoff Deering ** The discussion list for http://webstandardsgroup.org/ See http://webstandardsgroup.org/mail/guidelines.cfm for some hints on posting to the list getting help **