>>>If we wish to communicate that level of semantics, yes.  It may not be 
>>>useful to us.  If you *really* need some metadata/semantics, @class probably 
>>>can't convey it with enough granularity.  Check out the big discussion from 
>>>a few months ago about ccRel and RDFa.
 

Not yet maybe, but we could at least try to keep options open for the future.


>>Second: Suppose I want to collect all copyright notices from 1000 websites 
>>(don't ask me why, I just want to), how am I to do this when they are marked 
>>up in <small>s? I will definatly end up with a lot of text that has nothing 
>>to do with copyrights (and probably miss a lot of copyright notices as they 
>>are marked up differently) Whereas If they were maked up in (for example) 
>><span class="copyright"> I could retrieve it all based on the class-name.

>>>That would be a wonderful perfect world.  I'd like the copyright date as 
>>>well, so I can retrieve only things copyrighted in the last ten years.  
>>>Assuming that metadata will exist is a fool's errand.  The fact is that if 
>>>you are searching for copyright notices, the most efficient way is likely to 
>>>just search for the string "copyright" and the (c) symbol.  That'll net you 
>>>copyright notices with a high accuracy, and some training on real data can 
>>>yield further rules to improve the data-mining accuracy.

You say it yourself, only in a perfect world where all websites in the world 
would be written in the same language would your "solution" work. Unfortunatly 
I would miss out on all the chinese copyright stuff.
But another example (based on "siemens") wouldn't it be nice if I could tell 
Google I am looking for a person named "Siemens" so it would ignore the 
"brand"-name?


>>>While we're hoping for copyright notices to be marked up as <span 
>>>class="copyright">, though, why not wish for <small class="copyright">?  If 
>>>you're going to be providing metadata, it works the same.  Is it that you 
>>>believe people won't provide a special class for copyrights if the <small> 
>>>tag already gives them the preferred display?  Do you believe that everyone 
>>>will automatically use class="copyright" to mark up their copyright notices? 
>>> What if they use class="copyright-notice"?  Or class="license"?  Or any of 
>>>a million other distinct possibilities that would destroy any naive attempt 
>>>to datamine based on a particular class name?


Well, that would have to be defined in the standard, wouldn't it? I'm not 
saying -again- it should be defined NOW, but at least leave the door open.
I have no problems with using small over span, neither one is correct as far as 
I can see, in this context. Using "copyright" instead of "license" or 
"copyright-notice" would have to be defined somewhere, either in the standard 
or in an externally maintained "document" that is accepted as "best practice" 
or "standards related".

PS: I find it very difficult to respond to rich-text/html messages as they 
seriously mess up the indentation. Sorry therfor if this message is unclear as 
original message and reply are mixed up.

Reply via email to