Re: [WSG] Naked metadata - RDF in HTML

2005-11-10 Thread Terrence Wood
Andy Kirkwood|Motive said:
 An interesting application of the technology, although I'm not sure that
is addresses how to make it *easier* for administrators to
 maintain metadata records.

and

 (Assuming the ideal solution would be a wysiwyg editing environment for
non-technical content authors.)

Andy, I see value in all the points you raise -  I'd like to offer some
counterpoint. I'm approaching the subject with the idea that metadata is
important in order for people to find (related) information at some later
time.

I think the issue being addressed by Jonathon was not how-to in a WYSIWYG
editor, rather that metadata is not front-of-mind when editing an existing
resource.

The method presents an elegant solution for metadata that is important for
an external audience/end users (who wrote it, when, what's it about, what
else is there, where am I with regard to related documents), as opposed to
the internal management of a collection (similar but slightly or
significantly different to the above).

 -adding DC class values to span elements is not a mark-up behaviour
likely to be supported by wysiwyg editors

The leading WSIWYG editor can be extended, with much gnashing of the teeth
and swearing, to provide this type of functionality. In fact, that is a
major selling point.

 -administrators will still not entirely 'see' the metadata they've
added, as it is the combination of the name and content values that
creates a meaningful record, and this would only be visible at a code
level

I think the opposite. Sure, the finer points of the machine readable part
of the record is invisible, but the metadata itself become recognisable
patterns that are contained within the document, *are* visible, and not
abstracted to another level. How many people do you know who save adequate
(any?) metadata with their word documents? Out of sight out of mind.

Authors have the opportunity to administer the metadata for their own
content in a simple, relevant way. Again, the popular WSYIWYG editor can
be extended to help less-savvy people.

 -the benefit of metadata is that it can be used to classify content to a
significant degree of detail *without encroaching upon the
 visible page content itself*.

Agreed. Though see my point earlier re: external and internal metadata.

The example provided, 
 http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml ,
 re-purposes content as metadata. If the content is edited, the record
could (unintentionally) be deleted, or the content rewritten to
 included the records required

I'm missing something here... this reads like an argument in favor of both
sides: you can delete the metadata or add it?

 -if metadata records are split between the head and body of a
document, review would likely require a greater degree of
 concentration/quality assurance and/or additional supporting
 technologies (such as a metadata record 'viewer' that would reveal both
conventional and class-based records)
 -etc.

 A custom-built CMS,  as a companion to a well-supported publishing
process, is still your best bet.

For enterprise sized endevours with a huge budget or significant inhouse
savvy, sure.

 The metadata records can be entered
 at the same time as the content, with values selected from a
 controlled vocabulary, etc. and then output either into the head or
body as required. After all, it's more than just the ability to add or
edit metadata records, its also the relevance of the values
 entered to the content, end-use of the records and the intended
 community.

One word: Tags.

Bottom up, ad-hoc, and eventually convergent labelling seems to have a lot
more traction in the wider audience than thesauri, and controlled vocabs.

Problem is the latter are usually not revealed to end users, and thus run
the risk of being pretty meaningless as a tool to help them find stuff. Of
course, the opposite is true in a closed community (i.e where people know
the vocab).

Lastly, naked metadata will be indexed by (public) search engines, used to
determine relevance, and returned in SERP's.


kind regards
Terrence Wood.




**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] Naked metadata - RDF in HTML

2005-11-10 Thread Andy Kirkwood|Motive
Title: Re: [WSG] Naked metadata - RDF in
HTML


Hi Terrence,

It feels like we're talking at cross-purposes?

I'm approaching the subject with the idea
that metadata is
important in order for people to find
(related) information at some later
time.

Interesting and valid point, unfortunately not the point being
discussed.

I think the issue being addressed by
Jonathon was not how-to in a WYSIWYG
editor, rather that metadata is not front-of-mind when editing an
existing
resource.'

I equate front-of-mindness with visibility, hence reference to an
editing interface that will *show* the metadata records--a wysiwyg
editor. Jonathan's focus was on the author and not the reader. From
the original post:

** The problem **
People updating Web pages often doesn't update the
metadata in the header.

The method presents an elegant solution
for metadata that is important for
an external audience/end users (who wrote it, when, what's it about,
what
else is there, where am I with regard to related documents), as
opposed to
the internal management of a collection (similar but slightly
or
significantly different to the
above).

I was not advocating a separate metadata collection, but rather
that metadata within a single document may be more elegantly
edited/updated if all contained within the head of the
document, than when the records are mixed-in with the content.

The leading WSIWYG editor can be
extended, with much gnashing of the teeth
and swearing, to provide this type of functionality. In fact, that is
a
major selling point.

Moving away from specifics of which tool, the issue is still
educating authors on a practice that is peripheral to writing the
content. To create and maintain metadata requires the author to either
care about metadata it also helps if they *see* the metadata when
editing/updating the content.

The RDF approach requires the author to have access either to the
source code or the means by which they can assign classes to
spans. Wysiwyg editors have *not been created to include a
work flow that is optimised for adding metadata records to content in
this manner*.

I think the opposite. Sure, the finer
points of the machine readable part
of the record is invisible,

If the incorrect class value is assigned, the meaning of the
record changes. Say for example I accidentally markup the author's
name as the title:
span class=dc-titleAndy
Kirkwood/span

At a visual level (i.e. without viewing the name value) it is not
possible to spot the error. It would also be easy to accidentally add
content to a record when editing, e.g.

span class=dc-titleAndy Kirkwood will be out
of the office until next week/span

Authors have the opportunity to
administer the metadata for their own
content in a simple, relevant way. Again, the popular WSYIWYG editor
can
be extended to help less-savvy
people.

As far as I'm aware, the cutomisation available does not replace
the need for the author to care about metadata :).

That the RDF method is simple is definitely debatable. How is
adding spans to content more or less relevant to an author
than adding records to the head?

The example provided,


http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml ,
 re-purposes content as metadata. If the content is edited, the
record
could (unintentionally) be deleted, or the content rewritten to
 included the records required

I'm missing something here... this reads like an argument in favor of
both
sides: you can delete the metadata or add it?

At a visual level, that the word 'Anna' is a metadata record for
the first name of the author would not be apparent. I might re-edit
the copy from Anna spoke to Susan on the phone to
She spoke to Susan on the phone. By removing 'Anna', I've
remove a metadata record from the document. To maintain the metadata
record I would then need add 'Anna' to somewhere else.

Keeping track of which records have and haven't been entered
would be a nightmare. It's enough as an author to keep an eye on
structure, grammar, spelling, etc.

 -if metadata records are split
between the head and body of a
document, review would likely require a greater degree of
 concentration/quality assurance and/or additional supporting
 technologies (such as a metadata record 'viewer' that would
reveal both
conventional and class-based records)
 -etc.

 A custom-built CMS, as a companion to a well-supported
publishing
process, is still your best bet.

For enterprise sized endevours with a
huge budget or significant inhouse
savvy, sure.

Savvy enough to care about metadata, not savvy enough to edit it
when all the records are in the head, but savvy enough to pick
through the content and assign classes to spans to approximate
metadata records AND keep track of which records have and haven't been
completed?

An author that is comfortable with adding span elements
with class values corresponding to the DC standard is not the
'problem'. It's the person who forgets to add metadata records when
authoring content. Embedding

Re: [WSG] Naked metadata - RDF in HTML

2005-11-09 Thread Terrence Wood
Thanks Jonathon. This is great, I have forwarded a link to your page to
our metadata people.

-- 
kind regards
Terrence Wood.



**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] Naked metadata - RDF in HTML

2005-11-09 Thread Terrence Wood
Some more thoughts (that send button is just too easy to press =)

Would using a rel or rev attribute be more appropriate than using a class
to delineate the metadata?  These attributes imply a relationship whereas
class does not.

If you needed to get at elements containing metadata at the presentation
level you could use: element[rel=dc.title]

Maybe it's not too late to have that conversation on the DC.General list,
or with Ian Davis? Maybe, it's just not that important?

--
kind regards,
Terrence Wood.

**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] Naked metadata - RDF in HTML

2005-11-09 Thread Andy Kirkwood|Motive

Hi Jonathan,

An interesting application of the technology, although I'm not sure 
that is addresses how to make it *easier* for administrators to 
maintain metadata records.


ISSUES
(Assuming the ideal solution would be a wysiwyg editing environment 
for non-technical content authors.)


-adding DC class values to span elements is not a mark-up behaviour 
likely to be supported by wysiwyg editors in such a manner that it 
would be 'effortless' for an author, i.e. the author would typically 
need to edit the source code to add appropriate class values
-administrators will still not entirely 'see' the metadata they've 
added, as it is the combination of the name and content values that 
creates a meaningful record, and this would only be visible at a code 
level
-the benefit of metadata is that it can be used to classify content 
to a significant degree of detail *without encroaching upon the 
visible page content itself*. The example provided,  
http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml , 
re-purposes content as metadata. If the content is edited, the record 
could (unintentionally) be deleted, or the content rewritten to 
included the records required
-if metadata records are split between the head and body of a 
document, review would likely require a greater degree of 
concentration/quality assurance and/or additional supporting 
technologies (such as a metadata record 'viewer' that would reveal 
both conventional and class-based records)

-etc.

A custom-built CMS,  as a companion to a well-supported publishing 
process, is still your best bet. The metadata records can be entered 
at the same time as the content, with values selected from a 
controlled vocabulary, etc. and then output either into the head or 
body as required. After all, it's more than just the ability to add 
or edit metadata records, its also the relevance of the values 
entered to the content, end-use of the records and the intended 
community.


Food for thought anyway...

Best regards,

--
Andy Kirkwood | Creative Director

Motive | web.design.integrity
http://www.motive.co.nz
ph: (04) 3 800 800  fx: (04) 970 9693
mob: 021 369 693
93 Rintoul St, Newtown
PO Box 7150, Wellington South, New Zealand
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



Re: [WSG] Naked metadata - RDF in HTML

2005-11-08 Thread Jonathan O'Donnell
Hi Ian, Liddy, Charles, Peter, Misha, Alan, Patrick, Andy, Geoff, 
DC-General and WSG


Thank you for all your help and comments.  In particular, thank you, 
Ian, for RDF in HTML.


Last week, I wrote to the DC-General and the Web Standards Group 
mailing lists. I was lamenting the fact that Dublin Core metadata 
needed to be embedded in the head of the Web page, and that people 
often didn't update the metadata when they updated the Web page.  I 
proposed a half-baked idea, and asked for comments.


Everyone was extremely helpful, and gave me really valuable feedback.  
I learnt a lot.


** RDF in HTML **

In particular, I learnt that RDF in HTML [1] will do exactly what I 
want.  It provides a valid way to embed Dublin Core (or other) metadata 
in the Web page.  I can use class attributes, so it is CSS-friendly. It 
can be harvested using a Gleaning Resource Descriptions from Dialects 
of Languages [2] (GRDDL)-aware harvester.  And Ian has built a 
GRDDL-aware harvester,  Embedded RDF Extractor, [3] that I can use to 
test my pages.


Now, I have built a page, and it works!
http://purl.nla.gov.au/net/jod/tutorial/naked-metadata.html
If anyone would like to have a look at it, I would appreciate feedback. 
 Have I got it right?  Are there things that I could be doing better?


** XHTML2 **

And Misha pointed out that XHTML2 [4] deals with this very nicely.
In XHTML2, meta elements can appear in the body of the document, not 
just the head and any element can link to them.


So, once again, thanks everybody.  The Internet continues to blow my 
mind!


** References **

[1]  RDF in HTML: 
http://research.talis.com/2005/erdf/wiki/Main/RdfInHtml
[2] Gleaning Resource Descriptions from Dialects of Languages (GRDDL): 
http://www.w3.org/2004/01/rdxh/spec

[3] Embedded RDF Extractor: http://research.talis.com/2005/erdf/extract
[4] eXtensible HyperText Markup Language 2 (XHTML2): 
http://www.w3.org/TR/xhtml2


--
Jonathan O'Donnell
mailto:[EMAIL PROTECTED]
http://purl.nla.gov.au/net/jod
+61 4 2575 5829

**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



[WSG] Naked metadata

2005-11-05 Thread Jonathan O'Donnell

Hi WSG'ers

After seeing Sarah's post about CSS for titles, I thought that people 
might be interested in this idea.  It's a half baked idea. If you have 
any comments or suggestions, I would love to hear them.


Apologies for those who have already seen this on the DC-General list.

** The problem **
People updating Web pages often doesn't update the metadata in the 
header.


** The solution **
Tag appropriate Web data with id attributes. Point to the data from the 
appropriate metadata field in the header.


** Example **
!DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd;
html xmlns=http://www.w3.org/1999/xhtml;
head
titleNaked Metadata/title
meta name=DC.title content=#title /
meta name=DC.creator content=#creator /
meta name=DC.creator content=#rights /
/head
body
h1 id=titleNaked Metadata/h1
h2 id=creatorJonathan O'Donnell/h2
	p id=rightshttp://purl.nla.gov.au/net/jod/tutorial/metadata.html 
copy; Jonathan O'Donnell 23 October 2005/p

/body
/html

** Background **

At DC-ANZ 2005, Eve Young and Baden Hughes made the point that people 
updating Web pages often don't update the metadata. One of of the 
problems that they talked about was that metadata in the header is 
essentially invisible to people editing the page (when, for example, 
using some wysiwyg editors).


In general, data (including metadata) should be stored in one place 
only. This prevents drift: if it is only stored in one place, it can 
only be updated in that place.


Often, the information that we want to store as metadata already 
appears in the Web page.  Examples include the title, description 
(especially as opening paragraph) and the author's name.  In footers, 
we often find rights information, the URL, and date information.


If this information already exists in the data, and we replicate it in 
the metadata, there is the danger of drift. Perhaps pointing to the 
data from the metadata fields is a way of preventing drift, and 
ensuring that the metadata is as up-to-date as the data.


** Method **

In html (including xhtml), one way of doing this is to use id 
attributes. Many Web developers use these already to style particular 
aspects of a Web site.  They can also be used as a target anchor for 
hypertext links

For example, if you use this tag:
p id=rightscopy; Jonathan 2005/p
in the page:
http://example.net/foo.html
Then the URL
http://example.net/foo.html#rights
will point to that paragraph.

** Advantages **
+   Metadata sits with the data.
+   As data is updated, the metadata continues to be current.

** Disadvantages **
+   id attributes must be unique within a Web page.

--
Jonathan O'Donnell
mailto:[EMAIL PROTECTED]
http://purl.nla.gov.au/net/jod
+61 4 2575 5829

**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



Re: [WSG] Naked metadata

2005-11-05 Thread Patrick H. Lauke

Jonathan O'Donnell wrote:


meta name=DC.title content=#title /
meta name=DC.creator content=#creator /
meta name=DC.creator content=#rights /


I'm not a DC expert, but I believe that that's not a valid way to go 
about it. When you embed DC information via meta elements, content needs 
to contain the actual value, not a reference to another location.
When the expected value itself is a URI, you should use link 
elements...but even that doesn't apply in this scenario, as it's usually 
reserved for things like DC.relation


Have a read through http://dublincore.org/documents/dcq-html/

Now, a different approach may be to process all pages server side on a 
regular basis to fill in the correct DC.title etc meta elements based on 
the content of the actual page in case they've been left empty, which 
could probably be achieved with a single XSL transformation or similar.



--
Patrick H. Lauke
__
re·dux (adj.): brought back; returned. used postpositively
[latin : re-, re- + dux, leader; see duke.]
www.splintered.co.uk | www.photographia.co.uk
http://redux.deviantart.com
__
Web Standards Project (WaSP) Accessibility Task Force
http://webstandards.org/
__
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**



Re: [WSG] Naked metadata

2005-11-05 Thread Andy Kirkwood|Motive
Hi Jonathan,

I second Patrick's comment that 'pointing' the DC records to content on the 
page is not the solution.

Although, from a maintenance perspective, this may appear to be a work-around 
for not completing the metadata records (questionable), metadata harvesting 
tools are unlikely to populate the content attribute of the meta element with 
content from the webpage. In other words, the metadata records cease to have 
value as metadata.

Consider educating content authors or moving to a CMS. For example, in a CMS, a 
rule could be created that require a minimum set of metadata records to be 
completed before content can be published.

If using a static system, then adding a placeholder for metadata content to 
template pages may be a solution, e.g.
   meta name=DC.title content=[tba] /
   meta name=DC.creator content=[tba] /

(The author would then search the source code for the string '[tba]' as part of 
the publishing protocol to remind them to complete the md records.)


** The problem **
People updating Web pages often doesn't update the metadata in the header.

** The solution **
Tag appropriate Web data with id attributes. Point to the data from the 
appropriate metadata field in the header.

Best regards,

-- 
Andy Kirkwood | Creative Director

Motive | web.design.integrity
http://www.motive.co.nz
ph: (04) 3 800 800  fx: (04) 970 9693
mob: 021 369 693
93 Rintoul St, Newtown
PO Box 7150, Wellington South, New Zealand
**
The discussion list for  http://webstandardsgroup.org/

 See http://webstandardsgroup.org/mail/guidelines.cfm
 for some hints on posting to the list  getting help
**



Re: [WSG] Naked metadata

2005-11-05 Thread Geoff Deering

Jonathan O'Donnell wrote:


Hi WSG'ers

In general, data (including metadata) should be stored in one place 
only. This prevents drift: if it is only stored in one place, it can 
only be updated in that place.


Often, the information that we want to store as metadata already 
appears in the Web page.  Examples include the title, description 
(especially as opening paragraph) and the author's name.  In footers, 
we often find rights information, the URL, and date information.


If this information already exists in the data, and we replicate it in 
the metadata, there is the danger of drift. Perhaps pointing to the 
data from the metadata fields is a way of preventing drift, and 
ensuring that the metadata is as up-to-date as the data.


** Method **



Hi Jonathan,

Given what you have said here, and what I would expect to see in serious 
authoring tools and CMSs, I think this area is generally neglected in 
most publishing tools (last time I looked).


Quit a few CMS's say that they are DC compliant, but as you mentioned, 
do they actually store the data in one place, and not in the web pages?  
Is it part of the work flow and version control of the documents?  I 
don't think so.  I'd be glad if anyone can point me to a product that 
does address this need.


For a CMS to address this properly, it needs to have incorporated a 
normalised schema based on DC into it's database.  This was all the 
pages published from this system can incorporate the various metadata as 
well as alt and longdesc for images.


Many organisations have legal requirements where they require snapshots 
of published data from any given time.  A publishing system based on DC 
not only allows this features, but allow a complete analysis of all the 
subcomponents of a document and the various contributors.


That also leads to problems with document management systems that manage 
their meta data from properties within the documents and network 
environment variables.


Last time I tried to extract metadata from MS Word, using Perl and 
Python, I could only get the standard set of properties, any data in 
custom properties was unretrievable (at least by me).  I don't know what 
OO or the latest MS Office offers.


But I don't think asking users to maintain this data will work, unless 
they are librarians.  I think that it has to be as automated and as 
transparent to the user as possible, because most users are just not 
interested in this level of site QA, unless it is an important component 
of the job.


Regards
Geoff Deering
**
The discussion list for  http://webstandardsgroup.org/

See http://webstandardsgroup.org/mail/guidelines.cfm
for some hints on posting to the list  getting help
**