Re: [Dbpedia-discussion] Accuracy of coordinates in dbpedia/wikipedia & freebase

2010-09-29 Thread Roberto Mirizzi
  Il 29/09/2010 10:10, Jens Lehmann ha scritto:
> You can check this by comparing the regular DBpedia and DBpedia Live:
>
> http://dbpedia.org/page/Oakville_Assembly
> http://dbpedia-live.openlinksw.com/page/Oakville_Assembly
>
> Indeed, the coordinates have changed, so there was probably an error in
> Wikipedia.

imho I don't think that comparing DBpedia with its live version is the 
solution. In addition to having to double-check each resource, if 
something is different in the live version, it does not mean that surely 
there was an error in the previous version: the error could be in the 
live version.
I remember some time ago, I was looking for the DBpedia live page about 
the programming language PHP, and the abstract reported: "PHP: Problèmes 
d'Hygiène Personnelle". :-) Unfortunately vandalism is quite widespread 
in WIkipedia.
Probably, at least nowadays, there are better datasets than DBpedia that 
offer more accurate geolocation, as you pointed out with LinkedGeoData.

Besides the problem of geographical coordinates, as I mentioned in the 
previous message, I think it's a problem doing "reasoning" on resources 
belonging to two different datasets, linked through owl: sameAs, and 
having inconsistent information. Probably reasoning on Linked Data (I 
mean, not a single dataset) is not yet completely mature.


-- 
Regards,

Roberto Mirizzi
PhD Student at Politecnico di Bari (Italy)
http://sisinflab.poliba.it/mirizzi

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] Accuracy of coordinates in dbpedia/wikipedia & freebase

2010-09-29 Thread Jens Lehmann

Hello Paul,

On 27.09.2010 22:42, Paul Houle wrote:
> My guess is that
> wikipedia might have been wrong at one time,  and has had it corrected.

You can check this by comparing the regular DBpedia and DBpedia Live:

http://dbpedia.org/page/Oakville_Assembly
http://dbpedia-live.openlinksw.com/page/Oakville_Assembly

Indeed, the coordinates have changed, so there was probably an error in 
Wikipedia.

>   From my viewpoint,  I'd like to make a map that doesn't have
> embarassing errors in it...  What's the best way to clean up this mess?

Consider using LinkedGeoData [1]. As member of both projects (DBpedia 
and LinkedGeoData) I am quite sure that the latter has more accurate 
coordinates. LinkedGeoData also has DBpedia links, although they need 
(and will be) updated, because of some changes in LinkedGeoData.

Note that for large objects (e.g. Russia and other countries) both 
LinkedGeoData/OpenStreetMap and DBpedia/Wikipedia contain a reference 
point, which is a good representative for this object. However, there is 
no strict mathematical definition how to compute it, which means that 
reference points in LinkedGeoData and DBpedia do not necessarily coincide.

Kind regards,

Jens

[1] http://linkedgeodata.org

-- 
Dipl. Inf. Jens Lehmann
Department of Computer Science, University of Leipzig
Homepage: http://www.jens-lehmann.org
GPG Key: http://jens-lehmann.org/jens_lehmann.asc


--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] Accuracy of coordinates in dbpedia/wikipedia & freebase

2010-09-28 Thread Kingsley Idehen
  On 9/28/10 3:43 AM, Roberto Mirizzi wrote:
>Il 28/09/2010 00:00, Kingsley Idehen wrote:
>> You have two data spaces: DBpedia and Freebase, you should make a third
>> -- yours, which I think you have via ookaboo.
>>
>> Place the fixed (cleansed) data  in your ookaboo data space, connect the
>> coreferenced entities using an "owl:sameAs" relation, scope queries that
>> are accuracy sensitive to your ookaboo data space.  Use inference rules
>> for union expansion across DBpedia and Freebase via "owl:sameAs", when
>> data quality requirements are low and data expanse requirements high.
>>
>> That's how you clean up the mess and potentially get compensated for
>> doing so, in the process :-)
>>
> I think I'm missing something with this approach. If you link, with
> owl:sameAs, a resource with e.g., the corresponding one in DBpedia, and
> the two resources have two different coordinates, it would lead to an
> inconsistency due to the semantics of owl:sameAs, isn't it?
> After all, overcoming this problem is one of the reasons for the Okkam
> Project [1].
> So, how do you plan to deal with this issue?
>
> [1] http://www.okkam.org/okkam-more

No, you have a 3rd data space, and in that data space you fix the 
coordinates. You simply use owl:sameAs to keep links to the other data 
spaces that provide access to broader pool of data modulo questionable 
geo coordinates etc..

The owner of the third data space is providing specific value i.e., 
accurate coordinates.

I think the single canonical URI misconception is what's clouding the 
obvious re. what I am suggesting :-)  There are going to be lots of data 
spaces on the Web that focus on the quality aspects of Linked Data. The 
owners of these data spaces, will also discover intriguing business 
models in this area.


-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen






--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] Accuracy of coordinates in dbpedia/wikipedia & freebase

2010-09-28 Thread Roberto Mirizzi
  Il 28/09/2010 00:00, Kingsley Idehen wrote:
> You have two data spaces: DBpedia and Freebase, you should make a third
> -- yours, which I think you have via ookaboo.
>
> Place the fixed (cleansed) data  in your ookaboo data space, connect the
> coreferenced entities using an "owl:sameAs" relation, scope queries that
> are accuracy sensitive to your ookaboo data space.  Use inference rules
> for union expansion across DBpedia and Freebase via "owl:sameAs", when
> data quality requirements are low and data expanse requirements high.
>
> That's how you clean up the mess and potentially get compensated for
> doing so, in the process :-)
>
I think I'm missing something with this approach. If you link, with 
owl:sameAs, a resource with e.g., the corresponding one in DBpedia, and 
the two resources have two different coordinates, it would lead to an 
inconsistency due to the semantics of owl:sameAs, isn't it?
After all, overcoming this problem is one of the reasons for the Okkam 
Project [1].
So, how do you plan to deal with this issue?

[1] http://www.okkam.org/okkam-more

-- 
Regards,

Roberto Mirizzi
PhD Student at Politecnico di Bari (Italy)
http://sisinflab.poliba.it/mirizzi

--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


Re: [Dbpedia-discussion] Accuracy of coordinates in dbpedia/wikipedia & freebase

2010-09-27 Thread Kingsley Idehen
  On 9/27/10 4:42 PM, Paul Houle wrote:
>I've recently put up a site that uses coordinate information from
> Freebase and Dbpedia,  and I'm starting to think about how to clean up
> certain data quality problems I'm encountering,  for instance,  see:
>
> http://ookaboo.com/o/pictures/topic/209440/Oakville_Assembly
>
> In this particular case,  I've only got data from dbpedia,  which drops
> the point a few hundred km from where it really is...  It's obvious that
> this is a bad one because it's right in the middle of Lake Erie.
> Freebase doesn't have any coordinate for this thing (seems to me that it
> should),  and at the moment,  Wikipedia has the right coordinates (at
> least on Google maps I see a big factory building)  My guess is that
> wikipedia might have been wrong at one time,  and has had it corrected.
> It's also possible that the conversion wasn't done right in dbpedia,
> since coordinates are represented differently in a few hundred different
> infoboxes.
>
> It seems to me that both the number of points and the quality of points
> in Wikipedia has been improving dramatically over the last two years...
> About a year ago I plotted the points for Staten Island Railroad
> stations and found that the railroad was displaced a few km east and ran
> right under the middle of the Tapan Zee bridge...  Now it's much better.
>
> I can find examples where:
>
> (a) dbpedia is right and freebase is wrong (for instance,  a town in
> continental Europe gets its longitude sign flipped and ends up with the
> wrecked ships west of the UK -- maybe here the point got fixed in
> wikipedia but not in freebase)
> (b) dbpedia is wrong and freebase is right
> (c) a point is missing from dbpedia but is in freebase (I see a lot of
> these in Switzerland),  and
> (d) a point is missing from freebase but in dbpedia
>
> An analysis of this is is tricky because there are a lot of things where
> the coordinates are iffy:  the location of 'Russia' could vary within a
> few thousand kilometers,  'Tompkins County' could vary by ten or so
> kilometers,  etc.
>
> Looking at a handful of points that have diverged,  I get the impression
> that freebase is more accurate than dbpedia,  but that I get better
> results just looking at the coordinates on the human interface of
> wikipedia -- currently,  it seems like a scan of the current coordinates
> in wikipedia (however wikipedia extracts them from the infoboxes)
> benefits the most from the human labor being done to fix points and also
> avoids errors&  missed points from other people's extraction pipelines.
>
>   From my viewpoint,  I'd like to make a map that doesn't have
> embarassing errors in it...  What's the best way to clean up this mess?

You have two data spaces: DBpedia and Freebase, you should make a third 
-- yours, which I think you have via ookaboo.

Place the fixed (cleansed) data  in your ookaboo data space, connect the 
coreferenced entities using an "owl:sameAs" relation, scope queries that 
are accuracy sensitive to your ookaboo data space.  Use inference rules 
for union expansion across DBpedia and Freebase via "owl:sameAs", when 
data quality requirements are low and data expanse requirements high.

That's how you clean up the mess and potentially get compensated for 
doing so, in the process :-)


Kingsley
> --
> Start uncovering the many advantages of virtual appliances
> and start using them to simplify application deployment and
> accelerate your shift to cloud computing.
> http://p.sf.net/sfu/novell-sfdev2dev
> ___
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>


-- 

Regards,

Kingsley Idehen 
President&  CEO
OpenLink Software
Web: http://www.openlinksw.com
Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca: kidehen






--
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion