Re: [Wikidata-l] DBpedia usage in the bbc Re: DBpedia usage in the bbc - selected highlights - selected highlights

2012-07-05 Thread Yury Katkov
Re: [Wikidata-l] DBpedia usage in the bbc - selected highlights
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] DBpedia usage in the bbc

2012-07-05 Thread Lin Clark
On Thu, Jul 5, 2012 at 12:08 AM, Gregor Hagedorn g.m.haged...@gmail.comwrote:


 In my observation, numeric-URI-based systems like Drupal tend to have
 minimal Links inside their content pages (i.e. beyond the menu
 system), mediawiki-based system tend to have hundreds of links inside
 their content. I believe this is so because links inside Drupal pages
 usually point to something like http://drupal.org/node/21947/ which
 makes it impossible for humans to easily check whether this is an
 intentional or erroneous link.


This is off-topic, but for Drupal this is a configuration issue. One of the
early lessons in books and tutorial series is how to configure this, and
many Drupal sites are configured to use human-readable paths. Drupal.org is
not because it has millions of nodes which often change names.

You are correct that most Drupal sites have fewer internal links than
wikis, but I think that holds for Drupal sites that are configured to use
human-readable paths as well. The cause is more likely in a different
interface issue.

I don't mean to spin this out into a tangent about Drupal, just wanted to
point out that correlation doesn't imply causation in this case.

-Lin

-- 
Lin Clark
Drupal Consultant

lin-clark.com
twitter.com/linclark
___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] DBpedia usage in the bbc

2012-07-05 Thread Gregor Hagedorn
 I don't mean to spin this out into a tangent about Drupal.

Me neither, my discussion point here is: There are advantages for
opaque (like http:something.org/node123456) and nonopaque
(http:something.org/Bonn,_Northrhine-Westfalia,_Germany) URI/IRI
identifiers.

In the light of the use-case of interlinking discussed here: which is
right for Wikidata? Does Wikidata need both in parallel (I believe
this is the current plan)?

Gregor

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] DBpedia usage in the bbc

2012-07-05 Thread Denny Vrandečić
Yes, we are planning to do both in parallel, as this page explains:

https://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme

Cheers,
Denny

2012/7/5 Gregor Hagedorn g.m.haged...@gmail.com:
 I don't mean to spin this out into a tangent about Drupal.

 Me neither, my discussion point here is: There are advantages for
 opaque (like http:something.org/node123456) and nonopaque
 (http:something.org/Bonn,_Northrhine-Westfalia,_Germany) URI/IRI
 identifiers.

 In the light of the use-case of interlinking discussed here: which is
 right for Wikidata? Does Wikidata need both in parallel (I believe
 this is the current plan)?

 Gregor

 ___
 Wikidata-l mailing list
 Wikidata-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikidata-l



-- 
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das
Finanzamt für Körperschaften I Berlin, Steuernummer 27/681/51985.

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] DBpedia usage in the bbc

2012-07-04 Thread Denny Vrandečić
Hello Michael,

thank you for your input, this is extremely valuable.

In general I expect that Wikidata will serve your needs better than an
extraction from Wikipedia could. First, yes, we will have more stable
identifiers. Second, it should be better at identifying items of
interest. Some of the reasons why several meanings are conflated into
one article or spread over several articles in Wikipedia is that it
simply makes sense for a text encyclopedia. I don't see a reason for
Wikidata doing the same.

I do not expect Wikidata to solve all problems. In some glorious
future, Wikidata will have a community. This community will decide on
criteria for inclusion, both with regards to the coverage of items and
with regards to what they are saying about them. The community will
decide on the kind of sources they accept. Etc.

(Actually, decide is too nice a word for the process I expect will unfold... )

We will keep the problems you mentioned in mind, and I fully think
that we will improve on every single one of them.

2012/7/3 Michael Smethurst michael.smethu...@bbc.co.uk:

 So I think we'd be interested in wikidata for 2 (maybe 3) reasons:
 1. as a source of data for domains where there's no established (open)
 authority (eg the equivalent of musicbrainz for films)
 2. as a better, more stable source of identifiers to triangulate to other
 data sources

Yes, I expect that both use cases will be covered by Wikidata.

 ?3?. Possibly as a place to contribute of some of our data (eg we're
 donating our classical music data to musicbrainz; there may be data we have
 that would be useful to wikidata)

It will be up to the community to accept data donations -- the
development team does not speak for the community. Personally I would
be thrilled to see such donations happen. See also:

http://meta.wikimedia.org/wiki/Wikidata/FAQ#I_have_a_lot_of_data_to_contribute._How_can_I_do_that.3F

 Have glanced quickly at the proposed wikidata uri scheme
 (http://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme#Proposal_for_Wikid
 ata) and
 snip
 http://{site}.wikidata.org/item/{Title} is a semi-persistent convenience URI
 for the item about the article Title on the selected site
 Semi-persistent refers to the fact that Wikipedia titles can change over
 time, although this happens rarely
 /snip
 Not sure on the definition of infrequently but I know it's caused us
 problems.

Fully agree. But they make for nice looking URIs. The canonical URI
though is the ID-based one, and these are stable. The pretty ones are
for convenience only. I will take a look at the note to see if this
needs to be made more explicit.

 Wondering if the id in http://wikidata.org/id/Q{id} is the wikipedia row ID
 (as used by dbpedialite)? Also wondering why there's a different set of URIs
 for machine-readable access rather than just using content negotiation?

No it is not. There is no such thing as the wikipedia row ID, what
you mean is the page ID on the English Wikipedia. As there are
plenty of items that have articles only in Wikipedia other than
English, a reliance on the English Page ID would be problematic. We
introduce new IDs for Wikidata, but we will provide mappings to page
IDs in the different Wikipedia language editions.

Thank you again for your input, and I hope the answers help.

Cheers,
Denny

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


Re: [Wikidata-l] DBpedia usage in the bbc

2012-07-04 Thread Michael Smethurst



On 04/07/2012 10:48, Denny Vrandečić denny.vrande...@wikimedia.de wrote:

 Hello Michael,
 
 thank you for your input, this is extremely valuable.
 
 In general I expect that Wikidata will serve your needs better than an
 extraction from Wikipedia could. First, yes, we will have more stable
 identifiers. Second, it should be better at identifying items of
 interest. Some of the reasons why several meanings are conflated into
 one article or spread over several articles in Wikipedia is that it
 simply makes sense for a text encyclopedia. I don't see a reason for
 Wikidata doing the same.
 
 I do not expect Wikidata to solve all problems. In some glorious
 future, Wikidata will have a community. This community will decide on
 criteria for inclusion, both with regards to the coverage of items and
 with regards to what they are saying about them. The community will
 decide on the kind of sources they accept. Etc.
 
 (Actually, decide is too nice a word for the process I expect will unfold...
 )
 
 We will keep the problems you mentioned in mind, and I fully think
 that we will improve on every single one of them.

Look forward to seeing it unfold :-)
 
 2012/7/3 Michael Smethurst michael.smethu...@bbc.co.uk:
 
 So I think we'd be interested in wikidata for 2 (maybe 3) reasons:
 1. as a source of data for domains where there's no established (open)
 authority (eg the equivalent of musicbrainz for films)
 2. as a better, more stable source of identifiers to triangulate to other
 data sources
 
 Yes, I expect that both use cases will be covered by Wikidata.
 
 ?3?. Possibly as a place to contribute of some of our data (eg we're
 donating our classical music data to musicbrainz; there may be data we have
 that would be useful to wikidata)
 
 It will be up to the community to accept data donations -- the
 development team does not speak for the community.

Yes, that goes for musicbrainz too. We can offer data but it's up to the
community whether or not they accept it

 Personally I would
 be thrilled to see such donations happen. See also:
 
 http://meta.wikimedia.org/wiki/Wikidata/FAQ#I_have_a_lot_of_data_to_contribut
 e._How_can_I_do_that.3F
 
 Have glanced quickly at the proposed wikidata uri scheme
 (http://meta.wikimedia.org/wiki/Wikidata/Notes/URI_scheme#Proposal_for_Wikid
 ata) and
 snip
 http://{site}.wikidata.org/item/{Title} is a semi-persistent convenience URI
 for the item about the article Title on the selected site
 Semi-persistent refers to the fact that Wikipedia titles can change over
 time, although this happens rarely
 /snip
 Not sure on the definition of infrequently but I know it's caused us
 problems.
 
 Fully agree. But they make for nice looking URIs.

Aesthetic concerns about uris tend to make me shiver :-)

 The canonical URI
 though is the ID-based one, and these are stable. The pretty ones are
 for convenience only. I will take a look at the note to see if this
 needs to be made more explicit.

Think it is explicit. Just that there's so many flavours of URI knocking
about it feels a bit confusing. The separation of the human readable and the
machine readable feels like it's following the dbpedia design pattern and
conflating the NIR  IR step with the content negotiation which feels (to
me) like a mistake.

Have talked about this is the past on the LOD list so to save typing:
http://lists.w3.org/Archives/Public/public-lod/2012Mar/0337.html

Not sure putting /data in a URI is ever a good idea. Shouldn't whether you
want data or not be decided by your accept headers. Same for ?format=json
etc.

For reference we use hash uris for things but only reference those in rdf
and never link to them. One information resource uri gets exposed in links /
the browser bar and does content negotiation for format (and eventually
language) and the response comes with content location header of the IR URI
dot the_format



 
 Wondering if the id in http://wikidata.org/id/Q{id} is the wikipedia row ID
 (as used by dbpedialite)? Also wondering why there's a different set of URIs
 for machine-readable access rather than just using content negotiation?
 
 No it is not. There is no such thing as the wikipedia row ID, what
 you mean is the page ID on the English Wikipedia.

Ah, ok. Think someone once said that was the id of the underlying database
row of the page record. Looking at dbpedialite it seems it does only support
en.wikipedia

 As there are
 plenty of items that have articles only in Wikipedia other than
 English, a reliance on the English Page ID would be problematic. We
 introduce new IDs for Wikidata, but we will provide mappings to page
 IDs in the different Wikipedia language editions.

Cool. Those mappings would be very useful for us. We're using Wikiminer (
https://secure.wikimedia.org/wikipedia/meta/wiki/WikiMiner) for entity
extraction on archive media which also returns the page ID so some systems
only know that ID. Be good to be able to query wikidata by it
 
 Thank you again for your 

Re: [Wikidata-l] DBpedia usage in the bbc

2012-07-03 Thread Andy Mabbett
On 3 July 2012 19:19, Tom Morris tfmor...@gmail.com wrote:

 A few notes on the BBC's use of DBpedia which Dan thought might be of
 interest to this list:

 It's great to see real world use cases to inform the development
 priorities of Wikidata.

Amen to that.


 === some problems we've found when using dbpedia ===

 1. it's not really intended for use for data extraction. The semantics of
 extraction depend on the infobox data and this isn't always applied
 correctly. So http://en.wikipedia.org/wiki/Fox_News_Channel and
 http://en.wikipedia.org/wiki/Fox_News_Channel_controversies share the same
 main infobox meaning dbpedia sees them both as tv channels

 This is partly (mostly) a social problem which Wikidata will need to
 solve at the community level, rather than through technical means.

Indeed; and if we can better explain these issues to the community we
might be better successful in persuading the blockers that such
matters are important.

Of course, there will always be some Luddites who see Wikipedia as a
prose encyclopedia rather than the database of encyclopedic content
which it really is ;-)

-- 
Andy Mabbett
@pigsonthewing
http://pigsonthewing.org.uk

___
Wikidata-l mailing list
Wikidata-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-l