I agree they are extremely useful for many scenarios already. Earlier today I 
sorted the human proteins category by popularity, and by reading the articles 
for the most popular ones that I didn't know I felt like I was browsing the 
table of contents of a live molecular biology book that was more comprehensive 
than any existing book in print. I do think we are on track for undeniable 
improvements though. Arnold Schwarzenegger is in about 40 categories right now. 
His Wikidata item has about 20 statements. Eventually, at least all of the 
information you can gleam from those categories will be contained in the 
statements on Wikidata. Then we could update the pages so that the links at the 
bottom aren't to relevant categories, but are to relevant queries. At first, it 
would look sort of the same. You can click on the 20th-century American actors 
category now, and you could click on the 20th-century American actors query in 
the future. But when you get to the query page you can easily specialize or 
generalize the query with another click in many more directions than are 
currently supported in the category system. Right now, I can specialize the 
pages I see by going to the subcategory for American silent film actors. I can 
generalize the pages I see by going to a supercategory that drops the American 
requirement, the actor requirement, or the 20th century requirement. But if 
your first click away from the article doesn't take you to a category, but 
instead takes you to a query page you now have many more options. For example, 
you could delete the 20th-century requirement and add a politician requirement 
to the actor requirement. Then you are looking at Americans that are actors and 
politicians, which you can't do in the category system.

> From: [email protected]
> To: [email protected]
> Date: Mon, 6 May 2013 18:08:04 +0000
> Subject: Re: [Wikidata-l] Question about wikipedia categories.
> 
>     From my viewpoint,  biases are an issue of statistical sampling.
> 
>     Wikipedia is an encyclopedia by humans for humans so of course it has a 
> anthropocentric background,  in which the mass of all the concepts swirling 
> around the Earth like an atmosphere curves the graph,  keeping the Sun in 
> orbit around our world.
> 
>     I find Wikipedia categories useful today,  warts and all.  They've got 
> two things going for them:
> 
> (1) Class and out-of-class dichotomies are the atom of ontology. 
> Well-designed categories have an operational definition that allows class 
> members to be determined with practically perfect precision
> (2) They are densely populated.
> 
> Look at the categories on this guy's web page
> 
> http://en.wikipedia.org/wiki/Arnold_Schwarzenegger
> 
> each one of those categories states a useful and correct fact,  even if the 
> organization of those facts is entirely haphazard.
> 
> For instance,  it would be better if he was coded as an "American" and an 
> "Austrian",  "Californian",  "Los Angelino" and he is also a "Bodybuilder" 
> and an "Actor" and a zillion other things and then infer that he was a 
> "American Bodybuilder",  "Austrian Actor" and such.  But it's not that easy 
> because he was an "Austrian soldier" but not an "American soldier" and I'd 
> feel uncomfortable calling him an "Austrian Politician".  A lot of nuance is 
> encoded in that sticky mess.
> 
> It's very easy to analyze those categories and produce desired concepts like 
> "Car" and "Bodybuilder" from junky categories like "Front-wheel drive 
> vehicle," "General Motors Concept Cars",  "Bodybuilder Actor" and "Actor 
> Bodybuilder",  in fact,  that's exactly what the semantic web is for.
> 
> There is so much rich and precise information in the categories that you get 
> great results despite sampling error caused by low recall in the categories.
> 
> I'd love to see better structure,  but not at the cost of fact density or 
> precision.
> 
> If we can take advantage of the knowledge in the graph to exert gentle 
> pressure that improves categorization in Wikipedia that would be great. 
> It's definitely time for the social industry to move beyond "tags"
> 
> 
> 
> 
> _______________________________________________
> Wikidata-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata-l
                                          
_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to