Re: [Kim-discussion] Large KB Gazetteer

2012-06-18 Thread Keith Cortis
Hi Philip,

 

You’re right, I wasn’t executing it with the DISTINCT for counting the total
number of countries.

 

Thanks a lot for looking into my problem.

 

Keith

 

 

From: Philip Alexiev [mailto:philip.alex...@ontotext.com] 
Sent: 18 June 2012 16:03
To: Keith Cortis
Cc: 'KIM discussion'
Subject: Re: [Kim-discussion] Large KB Gazetteer

 

Keith,

 

Please make sure you are executing it with the DISTINCT keyword. When I do
without DISTINCT I also get 329 results.

 

I will look into the other problem in a moment.

 

Hth,

Philip

 

On 18 Jun 2012, at 4:02 PM, Keith Cortis wrote:





Hi Philip,

 

I just double checked it more than once and I got 329 countries if I execute
the sparql query directly with the dbpedia sparql endpoint.

 

To be honest I’m more concerned about the issue related to the Large KB
Gazetteer, i.e. to why it does not recognise names containing a special
character (as provided in the examples below), although even the mentioned
issue is important.

 

Thanks a lot for your replies.

 

Keith

 

 

From: Philip Alexiev [mailto:philip.alex...@ontotext.com] 
Sent: 18 June 2012 13:52
To: Keith Cortis
Cc: 'KIM discussion'
Subject: Re: [Kim-discussion] Large KB Gazetteer

 

Keith,

 

When I executed this sparql query over the provided sparql endpoint
(http://dbpedia.org/sparql)  I got exactly 305 results. Could you double
check to confirm that you get 329 ?

 

Thanks,

Philip

 

On 18 Jun 2012, at 1:56 PM, Keith Cortis wrote:






Hi Philip,

 

Thanks for your quick reply.

 

The following is the SPARQL query:

 

SELECT DISTINCT ?Name ?Country ?Cls

WHERE {

?Country a ?Cls ; rdfs:label ?Name ;

http://dbpedia.org/property/capital ?capital .

OPTIONAL { ?Country dbpedia-owl:dissolutionYear ?year } .

FILTER(!BOUND(?year))

FILTER (?Cls = http://dbpedia.org/ontology/Country)

FILTER ( langMatches( lang(?Name), es) )

 

}

ORDER BY (?Name)

 

The DBPedia SPARQL Endpoint (http://dbpedia.org/sparql) returns a total of
329 country names for the query above, whilst the same query returns 305
country names only within the Large KB Gazetteer.

 

From the tests conducted I noticed that all the Spanish country names that
do not contain any special character such as Austria, Australia, etc. are
all recognised (since they have been populated in the gazetteer), whilst the
ones containing special characters such as Brunéi, Camerún, etc. are not
recognised as Countries, even though some of the country names are within
the gazetteer.

 

I can’t figure out why the names containing special characters are not being
recognised by the Large KB Gazetteer, even though some of the names are
listed within.

 

Regards,

 

Keith

 

From: Philip Alexiev [mailto:philip.alex...@ontotext.com] 
Sent: 18 June 2012 11:11
To: Keith Cortis
Cc: KIM discussion
Subject: Re: [Kim-discussion] Fwd: Large KB Gazetteer

 

Hi Keith,

 

Most probably the gazetteer query is not matching the RDF for those labels.

 

Please provide the RDF for some of the missed countries and also the
gazetteer query, in case you customized it.

 

Regards,

Philip Alexiev

Software Engineer, KIM team

 

 

On 18 Jun 2012, at 12:55 PM, Philip Alexiev wrote:







 

 

Begin forwarded message:







I have been testing out the Large KB Gazetteer module in GATE (v 7.0), where
I noticed that the country names having a special character, are not being
imported into the newly created gazetteer. For example, if I want to create
a Gazetteer containing all the countries in the world, in Spanish
(rdfs:label =es), the gazetteer is only loading 299 instances from a
possible 324. Therefore, country names such as: Afganistán, Azerbaiyán,
Benín, Brunéi, etc.. are not being loaded, thus not recognised as a Country
entity. The same problem is occurring for city names, where all the names
are being imported into the gazetteer, but the ones containing any special
character (like the example provided above), are not being recognised as
being an entity.

 

Do you know what might be causing this issue please?

 

Thanks a lot for your help.

 

Regards,

 

Keith

 



Keith Cortis

Digital Enterprise Research Institute (DERI) Galway,

Semantic Collaborative Software Unit (USCS)

National University of Ireland, Galway

Lower Dangan

Galway, Ireland

 

___
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/kim-discussion

 

___
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/kim-discussion

 

___
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/kim-discussion

 

___
Kim-discussion mailing list

Re: [Kim-discussion] Large KB Gazetteer

2012-03-26 Thread Philip Alexiev
Hello Fabian,

This is the right place to ask questions related to the LKB gazetteer, as it is 
developed by Ontotext. I will apply my answers inline - under your questions.

On 26 Mar 2012, at 9:18 AM, Fabian Cretton wrote:

 Dear all,
  
 I write to this list as this seems to be the place to ask questions about the 
 large KB Gazetteer. This question is not directly related to KIM, thank you 
 to redirect me to a more appropriate list if needed.
  
 I have only done little tests with the large KB Gazetteer in Gate (not with 
 KIM), and there are a few things about which I couldn't find more information:
 - can the large KB Gazetteer use the output of a lemmatisation ? if not, is 
 there so far no way to use Gate with a very large gazetteer, but doing 
 lookups also on lemmas ?

We have also met with this requirement. The LKB Gazetteer does not support this 
functionality. Internally, we are using its successor - the Linked Data 
Gazetteer (LD Gazetteer). Unfortunately it is still not publicly released. 

 - when the large KB Gazetteer is used with a traditional gate gazetteer (a 
 .lst file), is it possible to add features to each entry in the list, as in 
 the ANNIE gazetteer ?

The LKB Gazetteer has the capability to only set class and instance features of 
the annotations, in this way relating them to instances in the semantic 
repository. Thus the name - semantic annotations. You can use JAPE rules, or 
write your own resource to set the features you desire. If you provide some 
more information about the scenario, we could perhaps help you more.

 - I see a strange behaviour working with Gate 6.1, the large KB gazetteer, 
 and connecting to an OWLIM 4.3 store:
 the initial SPARQL query gives:
 ** Loading completed: 2414240 aliases in 213 second(s).
 but when restarting the project:
 ** Aliases in IGNORE list:0
 ** Loading of trusted entities from C:\Fab\Semantic 
 Web\SoftCust\Ontology\gate_largeKB_remoteRep\kim.trusted.entities.cache
 ** 1662791 elements loaded.
 Is it normal those figures are not the same ?

My opinion is that this is caused either by an ignore list (although the log 
shows that this list is empty)  or  duplicate records. A bunch of identical 
statements in the result will cause a single record in the dictionary to be 
created.  Try putting a distinct to your gazetteer query.

  
 Thank you for any information
 Fabian
 ___
 Kim-discussion mailing list
 Kim-discussion@ontotext.com
 http://ontomail.semdata.org/cgi-bin/mailman/listinfo/kim-discussion

Hope this helps,
Philip Alexiev
Software Engineer, KIM team

___
Kim-discussion mailing list
Kim-discussion@ontotext.com
http://ontomail.semdata.org/cgi-bin/mailman/listinfo/kim-discussion