Jiri,
Thanks for the feedback.
On 1 Feb 2009, at 21:03, Jiri Prochazka wrote:
In the article I haven't found a solid definition of what is a dataset
and when to use another dataset/subset. I think this has to be clearly
defined.
“A dataset in voiD (void:Dataset) is a collection of data, which is:
- published and maintained by a single provider, and
- available as RDF, and
- accessible, for example, through dereferenceable HTTP URIs or a
SPARQL endpoint.”
I think this is as clear as it's possible without becoming overly
constraining.
From what I understood, the publisher which is the "primary key" of
datasets.
It's three points, see above.
I think that it should be emphasized that categorizing datasets should
only be used, if the data in it are somewhat homogeneous - the
categorization applies to all of it.
Categorization is an art that is way older than voiD, and we don't
want tell people how to do it properly! And I definitely don't agree
with you when you say that “a categorization must apply to all of the
dataset”. For example, I think it would be absolutely adequate to say
that DBpedia is about people and geography, because it is a sizable
and valuable resource for both those areas, even though it also
contains data about lots of other things.
I guess the categorization it is fairly unusable in use cases like
personal website, because the information are various...
Well, http://dbpedia.org/resource/Personal_web_page might be a nice
subject here. (Assuming that you do have some interesting RDF on your
site!)
(I note with regret that the Wikipedia article on “Random stuff” has
been deleted, it would make for another nice DBpedia resource...)
Another thing - dataset partitioning. Combination of dataset
categorization and partitioning led me to great confusion - I have
thought voiD also wanted to categorize the data in the dataset.
Better to put a notice that partitioning should be used carefully and
that it was designed for mirroring of datasets.
I don't understand. “I have thought voiD also wanted to categorizing
the data in the dataset” -- yes, that IS what we want. “partitioning
was designed for mirroring of datasets” -- no, it was designed for
cases where voiD authors want to say something about just a part of
the dataset, and not about the entire dataset, for whatever reason.
Best,
Richard
Best regards,
Jiri Prochazka
PS: Please send the replies also directly to me, as I am not
subscribed
to this list.