from:"Tom Morris"

Re: [DBpedia-discussion] Meetup: SF Bay Area Knowledge Graphs

2019-08-21 Thread Tom Morris

On Wed, Aug 21, 2019 at 3:23 AM Sebastian Hellmann <
hellm...@informatik.uni-leipzig.de> wrote:

> we switched completely to:
>
> - https://forum.dbpedia.org
> - Slack https://dbpedia-slack.herokuapp.com/
>
> and http://blog.dbpedia.org for announcements.
>
That's too bad. Why abandon such a nice lingua franca like email?

Tom
___
DBpedia-discussion mailing list
DBpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [DBpedia-discussion] [Dbpedia-discussion] DBpedia as Tables release

2016-11-06 Thread Tom Morris

Perhaps here:
https://github.com/dbpedia/dbpedia/tree/master/tools/DBpediaAsTables

On Sun, Nov 6, 2016 at 10:24 AM, Dimitris Kontokostas 
wrote:

> Hi Petar,
>
> There is some interest to revive this project and cannot recall / find
> where is the code to generate these dumps.
> Will you be able to help us re-bootstrap this?
> We can create a standalone github repo and we will try to find a new
> maintainer
>
> Cheers,
> Dimtiris
>
> On Fri, Dec 13, 2013 at 2:07 AM, Petar Ristoski <
> petar.risto...@informatik.uni-mannheim.de> wrote:
>
>> Hi Pablo,
>>
>>
>>
>> I set up a web page [1] where all classes from the DBpedia ontology are
>> available for download as separate .csv and .json files.
>>
>>
>>
>> Regards,
>>
>>
>>
>> Petar
>>
>>
>>
>> [1] http://web.informatik.uni-mannheim.de/DBpediaAsTables/DBpedi
>> aClasses.htm
>>
>>
>>
>>
>>
>> *From:* Pablo N. Mendes [mailto:pablomen...@gmail.com]
>> *Sent:* Thursday, December 12, 2013 4:44 PM
>> *To:* Petar Ristoski
>> *Cc:* ibu ☉ radempa ䷰; dbpedia-discussion@lists.sourceforge.net
>>
>> *Subject:* Re: [Dbpedia-discussion] DBpedia as Tables release
>>
>>
>>
>>
>>
>> Hi Petar,
>>
>> Thanks for sharing this! Tried to use it yesterday, but 3GB still takes
>> quite a long time to download if you're just hacking something together
>> from a Starbucks. >From the standpoint of practicality, this would be
>> infinitely more useful if we could download files individually, or at least
>> in smaller chunks.
>>
>>
>>
>> Any chance we'll get something like that shared from [1]?
>>
>>
>>
>> Cheers,
>>
>> Pablo
>>
>>
>>
>> [1] http://wiki.dbpedia.org/DBpediaAsTables
>>
>>
>>
>> On Thu, Nov 28, 2013 at 5:35 AM, Petar Ristoski <
>> petar.risto...@informatik.uni-mannheim.de> wrote:
>>
>> Hi Ibu,
>>
>> Thank you for your feedback.
>>
>> To simplify the parsing of the files, from all literals I removed the
>> following characters: "\" { } | , \n". If there are quotes in the URIs,
>> they are escaped as '""'. Also, there is no URI that starts with"{" and
>> ends with "}", so there is no need to escape "{ } |" inside the URIs.
>>
>> I apologize for those two incorrectly parsed files. I fixed them couple
>> of days ago, so please download them again.
>>
>> Regards,
>>
>> Petar
>>
>>
>> -Original Message-
>> From: ibu ☉ radempa ䷰ [mailto:i...@radempa.de]
>> Sent: Wednesday, November 27, 2013 10:00 PM
>> To: dbpedia-discussion@lists.sourceforge.net
>> Subject: Re: [Dbpedia-discussion] DBpedia as Tables release
>>
>> On 11/25/2013 02:18 PM, Petar Ristoski wrote:
>> > We are happy to announce the first version of the DBpedia as Tables
>> > tool [1].
>>
>> > Any feedback is welcome!
>>
>> > [1] http://wiki.dbpedia.org/DBpediaAsTables
>>
>> Thanks Petar,
>>
>> your CSV files are really helpful.
>>
>> For all who want to import data into Postgresql, I've written a python
>> script which automatically creates the SQL corresponding to the CSV:
>>
>> https://gitorious.org/dbpedia_csv2sql/dbpedia_csv2sql
>>
>> The column types (ofter arrays) are inferred from your headers and the
>> data rows; indexes are also created.
>>
>> (If people here find this script useful, I could also package it for pypi
>> and improve documentation a bit.)
>>
>> I was assuming that your files are encoded in UTF-8, which worked, but I
>> didn't find either a '""' or a '\"' inside a field value, so I don't know
>> how a '"' would be encoded, if there were one. Also for a multi-value field
>> (e.g. '{1|2|3}') I don't know how '{', '|' and '}' are encoded, if they
>> appear within one of the values. - Maybe you could add some documentation
>> on that.
>>
>> In your data I found 2 format problems (I don't think my download went
>> wrong, but anyway, a checksum might be helpful):
>>
>> * Film.csv seems to have no headers (it has 20004 lines for me).
>> * Aircraft.csv: the 2nd last row (
>> "http://dbpedia.org/resource/Marinens_Flyvebaatfabrikk_M.F.10;
>> ) has too many columns.
>>
>> All other files (except owl#Thing.csv and Agent.csv, which I didn't check
>> due to size and column number) were ok.
>>
>> I also noticed another thing, not concerning your tool, where some parser
>> maybe could be optimized:
>> http://dbpedia.org/resource/Americas
>> has language="American (but see [[#English usage"
>>
>> Regards,
>> ibu
>>
>> 
>> --
>> Rapidly troubleshoot problems before they affect your business. Most IT
>> organizations don't have a clear picture of how application performance
>> affects their revenue. With AppDynamics, you get 100% visibility into your
>> Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics
>> Pro!
>> http://pubads.g.doubleclick.net/gampad/clk?id=84349351=/
>> 4140/ostg.clktrk
>> ___
>> Dbpedia-discussion mailing list
>> Dbpedia-discussion@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>
>>
>>
>>

Re: [Dbpedia-discussion] Concept Identifiers

2016-05-31 Thread Tom Morris

Hi Katie. I don't think there are universally agreed best practices in this
space and people often have strongly held views on either side.  You don't
mention internationalization/localization which is, in my experience, a
bigger concern for folks than semantic drift. Those who believe in numeric
identifiers often think that using identifiers in a given natural language
provides that language an undeserved pride of place and priority over other
languages. Folks in this camp include the creators of CIDOC and there are
people dismayed by BibFrame's abandonment of MARC-style numbers.

>From a practical point of view, numeric identifiers, while perfectly
sensible in the abstract, suffer from the weak tools that we have, so end
up disadvantaging everyone equally, but everyone more than English
identifiers probably would.

Your note implies that concept URIs could change over time if they had
natural language words as part of the URI.  I don't think this would be a
good practice. If UAT:Black now means "orange," I think you need to either
live with UAT:Black as the URI, mint a synonym UAT:Orange (and keep
UAT:Black), or deprecate UAT:Black as a valid concept and create a new
concept UAT:Orange. Which course of action is most appropriate will depend
on the specific circumstances of a change. If you decide there's a new
concept UAT:DarkGrey, that is split off from UAT:Black, perhaps the
original can exist unchanged, but if you decide that there's really no such
thing as "black" but just UAT:DarkGrey and UAT:DarkestGrey, then perhaps
UAT:Black gets deprecated and removed. Changing the pieces of URI to
UAT101, UAT102, UAT301, etc doesn't really affect most of the discussion.
The only case it makes easier is avoid UAT:Black having a description of
"vibrant orange," if the concept drifts far enough from its original label
(which is embedded in the URI).

Since Dimitris mentioned Freebase, briefly what they did was initially mint
English language URIs based on the label of the topic, but eventually
abandoned the practice because it was too difficult to do automatically and
added too little value. They did keep English identifiers for types &
properties which were part of the scheme, but these were hand assigned and
provided a useful organizing function to group properties with the
associated type, types with their domain, etc. A powerful feature of the
Freebase setup was that a single topic could have arbitrarily many URIs, so
dereferencing /en/Boston, /authority/viaf/1234,
/authority/loc/lcnam/nm1234, /wikipedia/en_title/Boston (city), etc could
all fetch the same the same content (without the use of redirects). The
core identifiers for non-schema topics were machine generated sequential
IDs encoded with a compact base 37(?) encoding, e.g.  /m/0d_23

Tom

p.s. I'm a couple of blocks away if you want to chat about this stuff some
time.

On Thu, May 26, 2016 at 2:43 PM, Katie Frey  wrote:

> Hello,
>
> How are concept IDs handled for DBpedia?  It looks like the concept URIs
> are descriptive (i.e. for the concept http://dbpedia.org/page/Solar_System,
> the concept ID is "Solar_System").  Are the descriptive IDs used throughout
> all of dbpedia (back and front end) or are terms ultimately kept unique by
> using numeric identifiers?
>
> I've been developing a controlled vocabulary and I would also like to use
> URIs so that my terms can be used with other linked data schemes.  My group
> and I have had a lot of discussions regarding the concept IDs; some want
> them to be descriptive, based on the preferred term for each concept so
> that they are human readable but this could cause problems if the terms
> used to describe each concept change over time, others want them to be
> randomly generated so that if the description of a term drifts over time
> the URI for the concept will always remain static.
>
> We are trying to figure out if there are any standards or best practices
> we should be looking towards when it comes to concept IDs.  Any
> thoughts/comments/justifications would be appreciated.
>
> Best,
> Katie
>
> --
> Katie E. Frey
> John G. Wolbach Library, Harvard-Smithsonian Center for Astrophysics
> 60 Garden Street, MS-56, Cambridge, MA 02138
> email: kf...@cfa.harvard.edu   |   phone: 617-496-7579
> http://astrothesaurus.org   |   http://library.cfa.harvard.edu/
>
> "Surprising what you can dig out of books if you read long enough, isn’t
> it?"
> - Rand al'Thor (in Robert Jordan's The Shadow Rising, Book Four of the
> Wheel of Time)
>
> "This is insanity!"   "No, this is scholarship!"
> - Yalb and Shallan (in Brandon Sanderson's Words of Radiance, Book Two of
> the Stormlight Archive)
>
>
> --
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control

Re: [Dbpedia-discussion] Clarification regarding the instance type files

2015-12-15 Thread Tom Morris

Two other sources you might consider are Freebase and Wikidata.  Using them
together with DBpedia might give you better results.

Tom

On Tue, Dec 15, 2015 at 5:27 AM, Vihari Piratla 
wrote:

> Thanks Dimitris for a detailed response.
> I see 2,945,956 unique titles in instance-types_en.nt.bz2 and 2,716,774
> unique titles in instance-types-transitive_en.nt.bz2. The number of unique
> titles in the two files together is 2,945,956.
> Currently, Wikipedia contains 5,031,836 articles in English. I am assuming
> the dump is missing 2 million or so titles because of the bug in the
> extraction framework.
>
> When can we expect the 2016 release?
>
> Thanks
>
> On Mon, Dec 14, 2015 at 8:53 PM, Dimitris Kontokostas 
> wrote:
>
>> Hi Vihari,
>>
>> The main reason for the size reduction is due to the split between direct
>> & transitive types [1]
>> There was a bug [2] that indirectly affected some type assignments but is
>> now fixed and the next release will not have this problem.
>> Also note that besides SD-Types, in this release we published two
>> additional type datasets, dbatx and LHD [3]
>>
>> Regarding your 2nd question ('__'). These resources are extracted from
>> additional infoboxes in the same page but when they cannot be merged, we
>> create additional resources.
>> This is also a way to create intermediate node mappings
>> through
>> the mappings wiki e.g. in [4]
>>
>> [1]
>> http://downloads.dbpedia.org/2015-04/core-i18n/en/instance-types-transitive_en.nt.bz2
>> [2] https://github.com/dbpedia/extraction-framework/issues/404
>> [3] http://wiki.dbpedia.org/dbpedia-data-set-2015-04
>> [4] http://mappings.dbpedia.org/index.php/Mapping_en:Infobox_officeholder
>>
>> On Mon, Dec 14, 2015 at 1:12 PM, Vihari Piratla 
>> wrote:
>>
>>> Hi,
>>> I am a software developer, we use DBpedia instance type or mapping-based
>>> type files in a pipeline to recognize entities.
>>> We found that the latest instance-types resource available at
>>> http://downloads.dbpedia.org/2015-04/core-i18n/en/instance-types_en.nt.bz2
>>> is much smaller than the corresponding 2014 release
>>> http://data.dws.informatik.uni-mannheim.de/dbpedia/2014/en/instance_types_en.nt.bz2
>>> .
>>> As a result, the latest instance file is missing many entries present on
>>> Wikipedia such as Taj_Mahal, J._Paul_Getty_Museum, Grand_Canyon.
>>> What is the reason for the reduced size (110MB->35MB)
>>> Is this a bug?
>>> Are there some other files that we have to consider along with this file?
>>>
>>> We also sometimes see entries with '__', as in "Abraham_Lincoln__1" in
>>> the line
>>>  <
>>> http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <
>>> http://dbpedia.org/ontology/TimePeriod>
>>> What does '__' mean? Where can I find more information about these
>>> things.
>>>
>>> Thanks
>>> --
>>> Vihari PIratla
>>>
>>>
>>> --
>>>
>>> ___
>>> Dbpedia-discussion mailing list
>>> Dbpedia-discussion@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>>>
>>>
>>
>>
>> --
>> Kontokostas Dimitris
>>
>
>
>
> --
> V
>
>
> --
>
> ___
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
--
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Wordnet instances to DBpedia resources mappings

2015-10-03 Thread Tom Morris

Freebase has mappings to both Wordnet and EN Wikipedia, so you might be
able to bridge from Wordnet to DBpedia via that route if you can't find
anything more direct.

Tom

On Sat, Oct 3, 2015 at 2:54 AM, Nasr Eddine  wrote:

> I wounder if there is a mapping between DBpedia resources and Wordnet
> instances ?
>
> Thanks.
>
>
> --
>
> ___
> Dbpedia-discussion mailing list
> Dbpedia-discussion@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
>
>
--
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Fact Ranking tool

2015-06-24 Thread Tom Morris

On the surface that sounds like a useful bit of research, but I fear
you're building on a foundation of quicksand.

My very first entity was labelled Guinness but had no information
whether it was the beer, the company, the brand or one of the other
similarly named entities.  Given that we don't know what the entity
is, how can we usefully evaluate these assertions?

Guinness is a member of Food and drink in Ireland
Guinness is a member of Guinness advertising
Guinness is a member of History of Ireland 1801–1923
Guinness is a member of Companies formerly listed on the London Stock Exchange
Guinness is a member of Beer and breweries in Ireland
Guinness is a member of Irish alcoholic beverages
Guinness is a member of Diageo beer brands
Guinness is a member of 1759 establishments in Ireland
Guinness is a member of Beer and breweries in multi regions
Guinness is a member of Companies established in 1759

Clearly a single entity can't be a company, a beverage, AND a brand,
so many of the assertions are false, but how do we know which ones?

Tom

On Wed, Jun 24, 2015 at 10:16 AM, Tamara Bobic tamara.bo...@hpi.de wrote:
 Dear all,

 we are trying to get as much input as possible for the purpose of ranking
 DBpedia facts:

 http://s16a.org/fr

 Our goal is to create a generic ground truth and it would be great if you
 could play around with it and provide us with your opinions!
 We are especially looking forward to the keywords in Step1 of the
 evaluation.

 Thanks for your help!!
 --
 Tamara Bobic,
 PhD Student/Research Assistant

 Hasso-Plattner-Institut für Softwaresystemtechnik GmbH
 Prof.-Dr.-Helmert-Str. 2-3
 D-14482 Potsdam
 Germany

 Amtsgericht Potsdam, HRB 12184
 Geschäftsführung: Prof. Dr. Christoph Meinel

 Phone:   +49 (0)331-5509-569
 Fax:   +49 (0)331-5509-325

 Office: H-1.40
 Email: tamara.bo...@hpi.de
 Web:   http://www.hpi.de


 --
 Monitor 25 network devices or servers for free with OpManager!
 OpManager is web-based network management software that monitors
 network devices and physical  virtual servers, alerts via email  sms
 for fault. Monitor 25 devices for free with no restriction. Download now
 http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
 ___
 Dbpedia-discussion mailing list
 Dbpedia-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


--
Monitor 25 network devices or servers for free with OpManager!
OpManager is web-based network management software that monitors 
network devices and physical  virtual servers, alerts via email  sms 
for fault. Monitor 25 devices for free with no restriction. Download now
http://ad.doubleclick.net/ddm/clk/292181274;119417398;o
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] [Dbpedia-developers] DBpedia-based RDF dumps for Wikidata

2015-03-11 Thread Tom Morris

Sebastian,

Thanks very much for the explanation.  It was a single missing word,
ontology, which led me astray.  If the opening sentence had said based
on the DBpedia ontology, I probably would have figured it out.  Your
amplification of the underlying motivation helps me better understand
what's driving this though.

I guess I had naively abandoned critical thinking and assumed DBpedia was
dead now that we had WikiData without thinking about how the two could
evolve / compete / cooperate / thrive.

Good luck!

Best regards,
Tom

On Wed, Mar 11, 2015 at 4:29 PM, Sebastian Hellmann 
hellm...@informatik.uni-leipzig.de wrote:

 Your description sounds quite close to what we had in mind. The high level
 group is manifesting quite well, the domain groups are planned as pilots
 for selected domains (e.g. Law or Mobility).

 I lost a bit the overview on the data classification. We might auto-link
 or crowdsource. I would need to ask others, however.

 We are aiming to create a structure that allows stability and innovation
 in an economic way - - I see this as the real challenge...

 Jolly good show,
 Sebastian




 On 11 March 2015 20:53:55 CET, John Flynn jflyn...@verizon.net wrote:

 This is a very ambitious, but commendable, goal. To map all data on the
 web to the DBpedia ontology is a huge undertaking that will take many
 years of effort. However, if it can be accomplished the potential payoff is
 also huge and could result in the realization of a true Semantic Web. Just
 as with any very large and complex software development effort, there needs
 to be a structured approach to achieving the desired results. That
 structured approach probably involves a clear requirements analysis and
 resulting requirements documentation. It also requires a design document
 and an implementation document, as well as risk assessment and risk
 mitigation. While there is no bigger believer in the build a little, test
 a little rapid prototyping approach to development, I don't think that is
 appropriate for a project of this size and complexity. Also, the size and
 complexity also suggest the final product will likely be beyond the scope
 of any individual to fully comprehend the overall ontological structure.
 Therefore, a reasonable approach might be to break the effort into smaller,
 comprehensible segments. Since this is a large ontology development effort,
 segmenting the ontology into domains of interest and creating working
 groups to focus on each domain might be a workable approach. There would
 also need to be a working group that focus on the top levels of the
 ontology and monitors the domain working groups to ensure overall
 compatibility and reduce the likelihood of duplicate or overlapping
 concepts in the upper levels of the ontology and treats universal concepts
 such as  space and time consistently. There also needs to be a clear,
 and hopefully simple, approach to mapping data on the web to the DBpedia
 ontology that will accommodate both large data developers and web site
 developers.  It would be wonderful to see the worldwide web community
 get behind such an initiative and make rapid progress in realizing this
 commendable goal. However, just as special interests defeated the goal of
 having a universal software development approach (Ada), I fear the same
 sorts of special interests will likely result in a continuation of the
 current myriad development efforts. I understand the one size doesn't fit
 all arguments, but I also think one size could fit a whole lot could be
 the case here.



 Respectfully,



 John Flynn

 http://semanticsimulations.com





 *From:* Sebastian Hellmann [mailto:hellm...@informatik.uni-leipzig.de]
 *Sent:* Wednesday, March 11, 2015 3:12 AM
 *To:* Tom Morris; Dimitris Kontokostas
 *Cc:* Wikidata Discussion List; dbpedia-ontology;
 dbpedia-discussion@lists.sourceforge.net; DBpedia-Developers
 *Subject:* Re: [Dbpedia-discussion] [Dbpedia-developers] DBpedia-based
 RDF dumps for Wikidata



 Dear Tom,

 let me try to answer this question in a more general way.  In the future,
 we  honestly consider to map all data on the web to the DBpedia ontology
 (extending it where it makes sense). We hope that this will enable you to
 query many  data sets on the Web using the same queries.

 As a convenience measure, we will get a huge download server that
 provides all data from a single point in consistent  formats and consistent
 metadata, classified by the DBpedia Ontology.  Wikidata is just one
 example, there is also commons, Wiktionary (hopefully via DBnary), data
 from companies, DBpedia members and EU projects.

 all the best,
 Sebastian

 On 11.03.2015 06:11, Tom Morris wrote:

 Dimitris, Soren, and DBpedia team,



 That sounds like an interesting project, but I got lost between the
 statement of intent, below, and the practical consequences:



 On Tue, Mar 10, 2015 at 5:05 PM, Dimitris Kontokostas 
 kontokos...@informatik.uni-leipzig.de wrote:

 we made some different design

Re: [Dbpedia-discussion] DBpedia-based RDF dumps for Wikidata

2015-03-10 Thread Tom Morris

Dimitris, Soren, and DBpedia team,

That sounds like an interesting project, but I got lost between the
statement of intent, below, and the practical consequences:

On Tue, Mar 10, 2015 at 5:05 PM, Dimitris Kontokostas 
kontokos...@informatik.uni-leipzig.de wrote:

 we made some different design choices and map wikidata data directly into
 the DBpedia ontology.


What, from your point of view, is the practical consequence of these
different design choices?  How do the end results manifest themselves to
the consumers?

Tom
--
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Discovery_Communications missing in infobox dump

2015-01-12 Thread Tom Morris

On Mon, Jan 12, 2015 at 6:02 AM, Volha Bryl 
vo...@informatik.uni-mannheim.de wrote:


 At the time of the extraction the infobox at the corresponding wiki page
 had a line with a strange syntax:

 | [[type]] = [[Public]]

 See the wiki page history, May 2014. This had caused the extraction error.


But surely an infobox parse error shouldn't cause the entire object to be
dropped, right?  At least some basics like the rdfs:label, owl:sameAs, etc
could still be populated rather than leaving a disconnected piece of the
graph.

Tom
--
New Year. New Location. New Benefits. New Data Center in Ashburn, VA.
GigeNET is offering a free month of service with a new server in Ashburn.
Choose from 2 high performing configs, both with 100TB of bandwidth.
Higher redundancy.Lower latency.Increased capacity.Completely compliant.
www.gigenet.com___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Querying for keywords while discarding accents

2014-06-27 Thread Tom Morris

For any type of search application, you not only want to do case and accent
folding, but also Unicode normalization http://unicode.org/reports/tr15/
(you could have both precomposed and combining accent versions of the è in
Isère).  Typically a search engine could be directed to normalize both the
text before indexing and the query.  If DBpedia doesn't support this, you
could look at using something like Apache Jena's SOLR-based text search
support http://jena.apache.org/documentation/query/text-query.html.

Tom


On Fri, Jun 27, 2014 at 8:00 AM, Andrea Di Menna ninn...@gmail.com wrote:

 Hi,
 there is no magic in that.
 It only happens that wikipedia has got a page Isere (
 http://en.wikipedia.org/wiki/Isere) which is actually a mere redirect to
 Isère (http://en.wikipedia.org/wiki/Is%C3%A8re).
 Hence the framework links the two DBpedia entities together in a triple

- dbpedia:Isere dbpedia-owl:wikiPageRedirects dbpedia:Isère
-
- However, I think this is not always true for all the pages which
contain non-ASCII chars, that is wikipedia is not filled with redirects
from ASCII folded pages.
-
- This is why in my opinion you should enrich the data with additional
triples which link ASCII folded and other languages labels to the original
entity, e.g.
- dbpedia:Italy rdfs:label Italy@en
- dbpedia:Italy rdfs:label Italia@it
- and
- dbpedia:Isère rdfs:label Isère@en
- dbpedia:Isère rdfs:label  Isere@en
-
- (this is just an example, I would not use rdfs:label for the ASCII
folded label but another property).
-
- Hope this helps.
-
- Cheers
- Andrea



 2014-06-27 13:46 GMT+02:00 Mohammad Ghufran emghuf...@gmail.com:

 Hello,

 Thank you for your reply. Yes, I tried doing that. If i try to remove the
 accents, i normally get a redirection page in the search results. I can
 then get the resource uri for this result and get the actual resource page.
 However, this only happens sometimes. For example, a region in France
 called Isère has the following page: http://dbpedia.org/page/Is%C3%A8re
 . If i access the page without the accent, I am still redirected to the
 correct page. However, if I search for the plain string in the label, I
 don't get any results. Here is the query I am using:

 PREFIX dbpedia-owl: http://dbpedia.org/ontology/
 PREFIX rdfs: http://www.w3.org/2000/01/rdf-schema#
 SELECT DISTINCT ?place
 WHERE
 {
 ?place a dbpedia-owl:PopulatedPlace .
 ?place rdfs:label ?label .
 FILTER (str(?label)= Isere) .
 }

 The language is not known a-priori, as i said in my earlier message. I am
 trying to make my code language independent. So I cannot use the language.

 What is interesting is the fact that dbpedia itself redirects the url
 http://dbpedia.org/page/Isere to http://dbpedia.org/page/Is%C3%A8re . I
 am wondering how this magic is done.

 Mohammad Ghufran


 On Fri, Jun 27, 2014 at 1:08 PM, Romain Beaumont romain.r...@gmail.com
 wrote:

 Hello,
 I think you are going to do some preprocessing. For example to handle
 accents, you can just remove them (in your program/script/...) before
 transforming it to sparql.
 Some labels are present in different languages in DBpedia, maybe you
 could use that ?



 2014-06-27 10:57 GMT+02:00 Mohammad Ghufran emghuf...@gmail.com:

 Hello,

 I am using dbpedia to work with locations in order to compare them and
 determine if two locations are same / similar and to what extent. Since my
 data source can be user input, the data normally does not match the exact
 resource / label defined in dbpedia.

 I am using the sparql endpoint for this (right now, i am using the
 dbpedia endpoint but i intend to use a local mirror at a later stage).

 I am looking to address this but still haven't found a good way to do
 so. I give an example here to elaborate. Take for example the
 region Rhône-Alpes in France. If i search for Rhone-Alpes in the label, i
 don't see any results. Neither in the disambiguation pages or even through
 the keyword search (Lookup) api.

 Is there a way to address this issue? I want to query such that i get
 the page Rhône-Alpes as one of the results when i search for Rhone-Alpes
 for example. This also extends to labels in different languages. My input
 does not specify the language so the input might be in different languages.
 For instance, Italia, Italy, Italie all refer to the country Italy in
 different languages.

 Thank you for any suggestions / help in advance.

 Best Regards,
 Ghufran


 --
 Open source business process management suite built on Java and Eclipse
 Turn processes into business applications with Bonita BPM Community
 Edition
 Quickly connect people, data, and systems into organized workflows
 Winner of BOSSIE, CODIE, OW2 and Gartner awards
 http://p.sf.net/sfu/Bonitasoft
 ___
 Dbpedia-discussion mailing list

Re: [Dbpedia-discussion] Constructing the right SPARQL query

2013-12-21 Thread Tom Morris

On Sat, Dec 21, 2013 at 2:24 PM, Ali Gajani aligaj...@gmail.com wrote:

 ... I want to make sure I can use this dataset to count indegrees (high
 influencers) properly. It is impossible to survey all the rows to ensure
 the knowledge is true, but I am asking anyway.


Presumably you mean out-degree if you're talking about influencers.  A
simple count doesn't sound like it'll capture the real influence.  Even if
you assume that Wikipedia has a comprehensive and unbiased coverage of
influential people (almost certainly not true), shouldn't influencing
someone influential count more? That would imply you need to do a page-rank
style aggregation of link weights.

Tom
--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Constructing the right SPARQL query

2013-12-21 Thread Tom Morris

On Sat, Dec 21, 2013 at 2:58 PM, Ali Gajani aligaj...@gmail.com wrote:

 Many thanks for your input Tom. An in-degree is the number of incoming
 edges towards that node. I think that captures *influencer: influencee
 (Aristotle : Alexander, Aristotle : Myself)*, which means, in this
 scenario, Aristotle (the node), has an in-degree of 2. I thought in-degree
 was a measure of influence rather than an outdegree. Remember, this is
 going to be plotted as a *directed* graph in Gephi. I'll be curious to
 know how I'll actually distinguish in-degrees and out-degrees in Gephi
 practicaly, but anyway.


It really depends on whether your relation is influencer - influenced -
influencee or influenced - hadInfluencee - influencee (ie which way the
directed edges in the graph run).


 Moreover, I didn't quite get about how I could do a Page-Rank style style
 aggregation on this specific scenario. Could you please provide some
 examples using actual person names so I can digest it well in my head.
 Thanks for getting my head working though, but I still believe the
 Wikipedia data gives you a decent impression of influence to an extent,
 albeit not the most accurate, but it kind of appears to be right in one way
 or the other.


You can't do PageRank from just the counts.  You need the full network of
links.  As an example, if Marx had the most direct influencees, but
Aristotle influenced Marx, shouldn't that count for something?  Perhaps
more?  BTW, Freebase actually thinks Nietzsche is first by simple count,
not Marx, but the underlying data is so biased and incomplete for both
Wikipedia  Freebase, that I'm not sure it's worth pursuing a more
sophisticated weighting.

Tom

p.s.  If you're using Gephi, it has a PageRank implementation
http://wiki.gephi.org/index.php/PageRank




 On Sat, Dec 21, 2013 at 7:49 PM, Tom Morris tfmor...@gmail.com wrote:

 On Sat, Dec 21, 2013 at 2:24 PM, Ali Gajani aligaj...@gmail.com wrote:

 ... I want to make sure I can use this dataset to count indegrees (high
 influencers) properly. It is impossible to survey all the rows to ensure
 the knowledge is true, but I am asking anyway.


 Presumably you mean out-degree if you're talking about influencers.  A
 simple count doesn't sound like it'll capture the real influence.  Even if
 you assume that Wikipedia has a comprehensive and unbiased coverage of
 influential people (almost certainly not true), shouldn't influencing
 someone influential count more? That would imply you need to do a page-rank
 style aggregation of link weights.

 Tom




 --


 Ali Gajani
 Founder at Mr. Geek
 www.mrgeek.me
 www.aligajani.com


--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Pagelinks dataset

2013-12-03 Thread Tom Morris

On Tue, Dec 3, 2013 at 1:44 PM, Paul Houle ontolo...@gmail.com wrote:

 Something I found out recently is that the page links don't capture
 links that are generated by macros,  in particular almost all of the
 links to pages like

 http://en.wikipedia.org/wiki/Special:BookSources/978-0-936389-27-1

 don't show up because they are generated by the {cite} macro.  These
 can be easily extracted from the Wikipedia HTML of course,


That's good to know, but couldn't you get this directly from the Wikimedia
API without resorting to HTML parsing by asking for template calls to
http://en.wikipedia.org/wiki/Template:Cite ?

Tom
--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] GSoC 2013 blogpost

2013-11-29 Thread Tom Morris

That's an interesting teaser, but it'd be useful to include a link to get
more detail. It took me a while to track down the actual GSoC progress
page:
https://github.com/dbpedia/extraction-framework/wiki/GSOC2013_Progress_Kasun

Tom

On Fri, Nov 29, 2013 at 6:57 AM, Marco Fossati hell.j@gmail.com wrote:

Hi everyone,

Kasun and I made an article to appear in the DBpedia blog.
It's about the Google Summer of Code 2013 project dealing with Wikipedia
categories.
Below you can find the link of the final draft.

https://docs.google.com/document/d/1HmRwUWK0Do0xme4TYLMeRllcY1_o9ajPAIs2IXfaU5k/edit?usp=sharing

Cheers!
--
Marco Fossati
http://about.me/marco.fossati
Twitter: @hjfocs
Skype: hell_j

--
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics
Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--
Rapidly troubleshoot problems before they affect your business. Most IT
organizations don't have a clear picture of how application performance
affects their revenue. With AppDynamics, you get 100% visibility into your
Java,.NET, PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349351iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] ANN: DBpedia 3.9 released, including wider infobox coverage, additional type statements, and new YAGO and Wikidata links

2013-09-23 Thread Tom Morris

Congratulations on the new release!

On Mon, Sep 23, 2013 at 6:27 AM, Christian Bizer ch...@bizer.de wrote:


 1. the new release is based on updated Wikipedia dumps dating from March /
 April 2013 (the 3.8 release was based on dumps from June 2012), leading to
 an overall increase in the number of concepts in the English edition from
 3.7 to 4.0 million things.


What accounts for the long latency between the date of the dumps and the
date of the release?

Tom
--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] [Dbpedia-developers] Mapping Wikidata properties to DBpedia ones

2013-08-31 Thread Tom Morris

I'm not going to weigh in on the URI minting, but I did want to correct a
couple of misconceptions.

On Sat, Aug 31, 2013 at 3:20 AM, Dimitris Kontokostas jimk...@gmail.comwrote:


 Just like in Wikipedia when we see an Infobox_Person we assume that the
 resource is an dbo:Person


That's a bad assumption.  Not all Wikipedia articles which contain an
Infobox:Person are about a person.


 we want to do the same in WikiData when we see a P107  
 Q215627http://www.wikidata.org/entity/Q215627claim


P107 is going away.  Don't use it.
http://www.wikidata.org/wiki/Property:P107

Tom
--
Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
Discover the easy way to master current and previous Microsoft technologies
and advance your career. Get an incredible 1,500+ hours of step-by-step
tutorial videos with LearnDevNow. Subscribe today and save!
http://pubads.g.doubleclick.net/gampad/clk?id=58040911iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Number of infoboxes

2013-08-01 Thread Tom Morris

On Thu, Aug 1, 2013 at 2:18 PM, Andy Mabbett a...@pigsonthewing.org.ukwrote:

 I wonder whether one of you good folk could kindly answer a quick
 question for me, please?

 How many articles on he English Wikipedia have infoboxes? As of what date?

 I appreciate that there will be caveats!


Your subject and body ask two different questions.  Articles can have more
than one infobox.  Are you interested in the count of infoboxes or count of
articles?

Tom
--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Airpedia vs DBpedia entity counts

2013-06-27 Thread Tom Morris

I've looking at an analysis of the Airpedia entity types and I have a
question about how things are counted between DBpedia and Airpedia.

If I look at the DBpedia stats
http://wiki.dbpedia.org/Datasets/DatasetStatistics it says there are 71,715
films in EN wikipedia.  If I count the Airpedia films it has 88,997 with a
confidence breakdown of:

  67613  http://airpedia.org/ontology/type_with_conf#10
  15610  http://airpedia.org/ontology/type_with_conf#9
   2995  http://airpedia.org/ontology/type_with_conf#8
   1487  http://airpedia.org/ontology/type_with_conf#7
   1292  http://airpedia.org/ontology/type_with_conf#6

What is the mapping between between these two sets of counts?  What is the
meaning of the confidence levels?  What precision/recall should I expect
for each confidence level?

Also, what's the relationship between the new dataset and the old
dataset versions?  The old version seems to be much more granular in terms
of being able to identify what classifiers were used.  Does the new data
set integrate all the different classifiers in some way?  Is this described
anywhere?

Sorry for all the questions, but it seems like a potentially useful
resource, so I'd like to understand better how the pieces fit together.

Tom
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Airpedia resources for the mapping sprint

2013-06-25 Thread Tom Morris

Speaking of wrong mappings, do the algorithms used to generate the Airpedia
class mappings have any concept of classes which are (or should be)
disjoint with each other?  I was looking at the distribution of the number
of classes assigned to entities and was curious what classes were assigned
to the entities with the most classes.  Naturally the very first one I
picked to look at was rather strange looking.

$ zgrep  Zosimas airpedia-classes-en.nt.gz
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/Eukaryote .
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/FloweringPlant .
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/Plant .
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/Saint .
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/Agent .
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/Species .
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/Person .
http://dbpedia.org/resource/Zosimas_of_Palestine 
http://airpedia.org/ontology/type_with_conf#10 
http://dbpedia.org/ontology/Cleric .

Looking at the Wikipedia article, I'm not seeing where flowering plant is
coming from, but regardless, it should probably recognize Flowering Plant
(and its parents) as being disjoint from Person.

Has anyone compared the inferred types against the types assigned in
Freebase?  That'll be my next project if no one else has already done it.

Tom




On Fri, Jun 14, 2013 at 6:37 PM, Alessio Palmero Aprosio apro...@fbk.euwrote:

  *

 Dear DBpedians,

 we are the team of Airpedia project [1], which aims to enhance the
 classes/properties coverage of DBpedia over Wikipedia using machine
 learning techniques.

  We read about the “mapping sprint”, therefore we want to bring to your
 attention the resource we are producing concerning DBpedia. We think that
 it can help the community to speed up the mapping process.

  Wrong mappings

 The basic idea of our approach is the use of DBpedia resource as training
 data. For this reason, we have to be sure that the mappings are correct. We
 then implement a cross-language validation to discover wrong mappings. We
 found out some obvious errors, that we think may be correct before the
 release of DBpedia 3.9. See attachment for the list of these mappings.

  Automatic class mappings

 In a paper accepted to I-KNOW conference [2], we present a resource
 obtained by automatically mapping Wikipedia templates in 25 languages.
 Our approach can replicate the human mappings with high reliability, and
 producing an additional set of mappings not included in the original
 DBpedia. The resource can be downloaded from the resource section [3] of
 the Airpedia website and consists of CSV files with two columns: Wikipedia
 infobox name and DBpedia class.

  Automatic properties mappings

 In a second paper submitted to ISWC conference [4], we focus on the
 problem of automatically mapping infobox attributes to properties into the
 DBpedia ontology for extending the coverage of the existing localised
 versions or building from scratch versions for languages not covered in the
 current version. We report results comparable to the ones obtained by a
 human annotator in term of precision, but our approach leads to a
 significant improvement in recall and speed. Specifically, we mapped
 45,978 Wikipedia infobox attributes to DBpedia properties in 14 different
 languages for which mappings were not available yet. Again, it can be
 downloaded from the resource section [3] of the Airpedia website and
 consists of CSV files with two columns: Wikipedia infobox attribute name
 and DBpedia property.

  Enhanced coverage of DBpedia over classes in 31 languages

 Following the work already presented at ESWC conference [5], we enhance
 the coverage of DBpedia over pages devoid of infobox. The resource
 contains 10M computed entity types. It is available in RDF format and can
 be downloaded in the resource section [3] of our website.

  Integration in Italian DBpedia

 The Italian DBpedia team has been the firts adopter of our dataset. Next
 week a new version of the SPARQL endpoint containing our statements will be
 released. Stay tuned!

  Cheers!

 Alessio


  [1] http://www.airpedia.org

 [2] http://i-know.tugraz.at/

 [3] http://www.airpedia.org/download/

 [4] http://iswc2013.semanticweb.org/

 [5] http://2013.eswc-conferences.org/

 *


 --
 This SF.net email is sponsored by Windows:

 Build for

Re: [Dbpedia-discussion] VIAF extraction?

2013-06-19 Thread Tom Morris

On Thu, Jun 13, 2013 at 7:52 PM, Young,Jeff (OR) jyo...@oclc.org wrote:

  Stephen,

 While you're waiting, you could get the owl:sameAs assertions in the
 reverse direction from the VIAF data dumps. An N-TRIPLE form was added to
 the processing about a week ago, which should make it easier to consume:

 http://viaf.org/viaf/data/viaf-20130514-clusters-rdf.nt.gz


 The problem I see with working from the VIAF end is that when these links
were added to Wikpedia, the Wikipedians went through and corrected (or at
least flagged) a lot of errors, so working from the Wikipedia version will
have the benefit of those corrections.

http://en.wikipedia.org/wiki/Wikipedia:VIAF/errors

Tom
--
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Airpedia (was Re: Slovak DBPedia mappings)

2013-06-04 Thread Tom Morris

I too would be interested in more info on Airpedia.  What forum/list is
used to discuss it?


On Tue, Jun 4, 2013 at 6:40 AM, Jona Christopher Sahnwaldt
j...@sahnwaldt.dewrote:

 Hi Alession,

 I think airpedia looks really interesting. Could you tell as a bit
 more about the precision of these mappings?


There's a precision/recall graph here:
http://www.airpedia.org/about/

Tom


 Maybe we can find a good
 way to add them to the wiki.

 Cheers,
 Christopher

 On 17 May 2013 16:58, Alessio Palmero Aprosio apro...@fbk.eu wrote:
  Dear Alberto,
  if you are interested, you can find the mappings for sk (both classes
  and properties) that we automatically generated at this page:
  http://www.airpedia.org/download/
  The approach used is described on two papers currently under review. If
  you are interested in more details, feel free to ask.
 
  Best regards,
  Alessio
 

--
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
http://p.sf.net/sfu/servicenow-d2d-j___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Incorrect information in one of the dbpedia entity

2013-04-29 Thread Tom Morris

On Mon, Apr 29, 2013 at 8:45 AM, Andrea Di Menna ninn...@gmail.com wrote:

 Since DBpedia is an effort to extract structured data from Wikipedia, then
 whether something is in Wikipedia or not is not a matter of interpretation.
 It is a simple fact.

 Also as per my understanding, DBpedia is not trying to filter Wikipedia
 data by crosschecking facts and statements.
 The Persondata template is a source of valuable information and hence it
 is used to extract structured data from Wikipedia biographies.
 Still it is possible it contains wrong data, this is the drawback of a
 community curated encyclopedia.


Let me try restating it in a different way.  Information which is invisible
(because persondata template is not rendered) is much less likely to be
correct.

Tom




 Cheers
 Andrea


 2013/4/26 Tom Morris tfmor...@gmail.com

 On Fri, Apr 26, 2013 at 2:14 PM, Jona Christopher Sahnwaldt 
 j...@sahnwaldt.de wrote:

 The information is in Wikipedia, and thus also in DBpedia.

 The source code of http://en.wikipedia.org/wiki/Questlove contains the
 following section (visible when you click 'edit page' and scroll
 down):

 {{Persondata
 |NAME=Thompson, Ahmir Khalib
 |ALTERNATIVE NAMES=?uestlove
 |SHORT DESCRIPTION=[[African-American]] musician
 |DATE OF BIRTH=January 20, 1971
 |PLACE OF BIRTH=[[Philadelphia, Pennsylvania]], [[United States]]
 |DATE OF DEATH=November 24, 2008
 |PLACE OF DEATH==[[Philadelphia, Pennsylvania]], [[United States]]
 }}

 That's why DBpedia extracted a death date. If you think this is an
 error, you should edit the Wikipedia page.


 I think whether or not it's in Wikipedia is a matter of interpretation.
  If the information isn't being rendered on the page, as is the case
 currently, how is anyone going to know that it's wrong (or even there)?
  I'm not sure why it isn't being rendered.

 Tom


 --
 Try New Relic Now  We'll Send You this Cool Shirt
 New Relic is the only SaaS-based application performance monitoring
 service
 that delivers powerful full stack analytics. Optimize and monitor your
 browser, app,  servers with just a few lines of code. Try New Relic
 and get this awesome Nerd Life shirt!
 http://p.sf.net/sfu/newrelic_d2d_apr
 ___
 Dbpedia-discussion mailing list
 Dbpedia-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion



--
Try New Relic Now  We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app,  servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Missing file at http://noc.wikimedia.org/conf/langlist - can't run generate-settings

2013-04-11 Thread Tom Morris

On Thu, Apr 11, 2013 at 8:40 AM, Jona Christopher Sahnwaldt j...@sahnwaldt.de
 wrote:

 Thanks for the heads up and the mail to noc. Please let us know what they
 say. We'll have to find a new language list file.

Are you after languages or wikis?  You might be able to use one of the
other files in that directory such as

http://noc.wikimedia.org/conf/wikipedia.dblist

Tom





--
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis  visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Lua scripting on Wikipedia

2013-04-05 Thread Tom Morris

On Fri, Apr 5, 2013 at 9:40 AM, Jona Christopher Sahnwaldt
j...@sahnwaldt.dewrote:


 thanks for the heads-up!

 On 5 April 2013 10:44, Julien Plu julien@redaction-developpez.com
 wrote:
  Hi,
 
  I saw few days ago that MediaWiki since one month allow to create
 infoboxes
  (or part of them) with Lua scripting language.
  http://www.mediawiki.org/wiki/Lua_scripting
 
  So my question is, if every data in the wikipedia infoboxes are in Lua
  scripts, DBPedia will still be able to retrieve all the data as usual ?

 I'm not 100% sure, and we should look into this, but I think that Lua
 is only used in template definitions, not in template calls or other
 places in content pages. DBpedia does not parse template definitions,
 only content pages. The content pages probably will only change in
 minor ways, if at all. For example, {{Foo}} might change to
 {{#invoke:Foo}}. But that's just my preliminary understanding after
 looking through a few tuorial pages.


As far as I can see, the template calls are unchanged for all the templates
which makes sense when you consider that some of the templates that they've
upgraded to use Lua like Template:Coord
https://en.wikipedia.org/wiki/Template:Coord are
used on almost a million pages.

Here are the ones which have been updated so far:
https://en.wikipedia.org/wiki/Category:Lua-based_templates
Performance improvement looks impressive:
https://en.wikipedia.org/wiki/User:Dragons_flight/Lua_performance

Tom
--
Minimize network downtime and maximize team effectiveness.
Reduce network management and security costs.Learn how to hire 
the most talented Cisco Certified professionals. Visit the 
Employer Resources Portal
http://www.cisco.com/web/learning/employer_resources/index.html___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Tom Morris

I wouldn't claim that Freebase is bug-free, but that's a quite old and
simple algorithm, so unless they're triples from very early in it's life
(say, 2007), I'd guess that bad input data from Wikipedia is more likely
than a problem with the transformation.

It might help to give a little background on how Freebase deals with these
links.  The canonical link uses the article number (in the namespace
/wikipedia/en_id), but the alpha title (MQL key escaped) *and all
redirects* are also stored (namespace /wikipedia/en).  Additionally, the
same information has recently been added for number of the other language
wikipedias.

You can see them all here for the example that Andrea mentioned:

  https://www.freebase.com/m/09q3rp?keys

Outbound links from Freebase to Wikipedia are made using the article
number, so that's really the most important link.  The wisdom of including
redirects is debatable, I think.  Sometimes they're good alternate names,
but other times they represent misspellings, related concepts, etc.

If DBpedia has the Wikipedia article number, I'd suggest creating the links
based on those.  If not, I'd suggest using the redirect file to
canoncialize on a single best link.

Tom


On Mon, Mar 25, 2013 at 6:41 AM, Andrea Di Menna ninn...@gmail.com wrote:

 Hi all,

 it looks like there are actually some pages in Wikipedia which contain
 wrong data, which is where the pages originate from in Freebase, e.g.

 http://en.wikipedia.org/wiki/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila

 This page has been deleted on Jan 21, and this actually lead to the
 Freebase key

 Marl$00C3$00ADn$002C_$00C3$0081vila

 since UTF-8 0xC3 0x83 - Unicode U+00C3 , etc..

 Cheers
 Andrea


 2013/3/25 Andrea Di Menna ninn...@gmail.com

 Hi,

 Maybe the only thing that can be done is to notify the freebase
 discussion list about this problem.
 Agree with Jona that the number of problematic references is not relevant.

 Cheers
 Andrea


 2013/3/25 Jona Christopher Sahnwaldt j...@sahnwaldt.de


 On Mar 25, 2013 3:32 AM, Tom Morris tfmor...@gmail.com wrote:
 
  Can someone point to the part of the discussion which talks about what
 the problem is?  This thread seems to start in mid-stream...

 That's right. Sorry. The start of the thread is in the middle of this
 page:

 https://github.com/dbpedia/extraction-framework/pull/25

 
  Freebase's MQL key encoding (
 http://wiki.freebase.com/wiki/MQL_key_escaping) is a completely private
 encoding which shouldn't have any effect on external
 URIs/IRIs/references/etc

 That's correct, and that's how the Scala script has always worked: it
 unescapes the MQL keys and uses the result to form DBpedia IRIs. The
 problems arise because some MQL keys contain invalid escapes (UTF-8 and
 Windows-1252 bytes instead of Unicode code points), and some others contain
 whitespace like U+2003 that is invalid even in IRIs.

 I would guess though that it's not a big problem because the affected
 keys are 1. not many, i.e. 1% and 2. not relevant anyway because they do
 not represent valid, current, non-redirect Wikipedia page titles. That's
 just a guess though, based on only a very cursory look at a few bad keys.

 I don't remember if these problems also came up when I ran the script on
 the old freebase dump format.

 JC

 
  On Sun, Mar 24, 2013 at 9:44 PM, Jona Christopher Sahnwaldt 
 j...@sahnwaldt.de wrote:
 
  On 22 March 2013 23:21, Andrea Di Menna ninn...@gmail.com wrote:
  
   Hi Jona,
  
   thanks for merging the pull request!
  
   Anyway, couldn't we use percent encoding for Unicode code points
 which are
   not allowed in N-Triples? (namely those outside the [#x20,#7E]
 range?
   In this case we should get UTF-8 bytes and percent encode them.
  
   For example, as far as I can see
  
   Marl$00C3$00ADn$002C_$00C3$0081vila
  
   is
  
   http://dbpedia.org/resource/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila
  
   where \00C3 is 0xC3 0x83
\00AD is 0xC2 0xAD
\0081 is 0xC2 0x81
 
  Oh, by the way, it would be
  http://dbpedia.org/resource/Marl%C3%ADn,_%C3%81vila because that's
 the
  UTF-8-percent-encoding for Marlín,_Ávila.
 
  The weird thing is that these Wikipedia page titles in the Freebase
  contain UTF-8-encoded characters when they should contain no encoding
  at all, just plain Unicode code points. (Of course, the characters and
  codepoints are also dollar-escaped as usual for Freebase, but that's
  not a problem.)
 
 
  JC
 
  
   WDYT?
  
   Cheers
   Andrea
  
   2013/3/22 Christopher Sahnwaldt notificati...@github.com
  
   Ok, I got it. It has nothing to do with your platform. These are
 actually
   wrong URIs. There's not much we can do about it. I don't know
 where Freebase
   got them from, but I assume they may actually be wrong in
 Wikipedia.
  
   Examples:
  
   Marl$00C3$00ADn$002C_$00C3$0081vila
   AD 2C and C3 81 are UTF-8 encodings, but Freebase says [1] that the
   numbers should be plain Unicode code points, not UTF-8 bytes. 81
 is an
   invalid code point

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-25 Thread Tom Morris

Another approach might be to use the recently introduced Topic Equivalent
Webpage property:

ns:m.09q3rp ns:common.topic.topic_equivalent_webpage
http://pt.wikipedia.org/wiki/Marlín.
ns:m.09q3rp ns:common.topic.topic_equivalent_webpage
http://es.wikipedia.org/wiki/Marlín_(Ávila).
ns:m.09q3rp ns:common.topic.topic_equivalent_webpage
http://en.wikipedia.org/wiki/Marlín.
ns:m.09q3rp ns:common.topic.topic_equivalent_webpage
http://it.wikipedia.org/wiki/Marlín.

It appears to be a single canonical alpha link for each language Wikipedia
with the MQL escaping undone and the redirects resolved.

Tom

On Mon, Mar 25, 2013 at 9:18 AM, Tom Morris tfmor...@gmail.com wrote:

 I wouldn't claim that Freebase is bug-free, but that's a quite old and
 simple algorithm, so unless they're triples from very early in it's life
 (say, 2007), I'd guess that bad input data from Wikipedia is more likely
 than a problem with the transformation.

 It might help to give a little background on how Freebase deals with these
 links.  The canonical link uses the article number (in the namespace
 /wikipedia/en_id), but the alpha title (MQL key escaped) *and all
 redirects* are also stored (namespace /wikipedia/en).  Additionally, the
 same information has recently been added for number of the other language
 wikipedias.

 You can see them all here for the example that Andrea mentioned:

   https://www.freebase.com/m/09q3rp?keys

 Outbound links from Freebase to Wikipedia are made using the article
 number, so that's really the most important link.  The wisdom of including
 redirects is debatable, I think.  Sometimes they're good alternate names,
 but other times they represent misspellings, related concepts, etc.

 If DBpedia has the Wikipedia article number, I'd suggest creating the
 links based on those.  If not, I'd suggest using the redirect file to
 canoncialize on a single best link.

 Tom



 On Mon, Mar 25, 2013 at 6:41 AM, Andrea Di Menna ninn...@gmail.comwrote:

 Hi all,

 it looks like there are actually some pages in Wikipedia which contain
 wrong data, which is where the pages originate from in Freebase, e.g.

 http://en.wikipedia.org/wiki/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila

 This page has been deleted on Jan 21, and this actually lead to the
 Freebase key

 Marl$00C3$00ADn$002C_$00C3$0081vila

 since UTF-8 0xC3 0x83 - Unicode U+00C3 , etc..

 Cheers
 Andrea


 2013/3/25 Andrea Di Menna ninn...@gmail.com

 Hi,

 Maybe the only thing that can be done is to notify the freebase
 discussion list about this problem.
 Agree with Jona that the number of problematic references is not
 relevant.

 Cheers
 Andrea


 2013/3/25 Jona Christopher Sahnwaldt j...@sahnwaldt.de


 On Mar 25, 2013 3:32 AM, Tom Morris tfmor...@gmail.com wrote:
 
  Can someone point to the part of the discussion which talks about
 what the problem is?  This thread seems to start in mid-stream...

 That's right. Sorry. The start of the thread is in the middle of this
 page:

 https://github.com/dbpedia/extraction-framework/pull/25

 
  Freebase's MQL key encoding (
 http://wiki.freebase.com/wiki/MQL_key_escaping) is a completely
 private encoding which shouldn't have any effect on external
 URIs/IRIs/references/etc

 That's correct, and that's how the Scala script has always worked: it
 unescapes the MQL keys and uses the result to form DBpedia IRIs. The
 problems arise because some MQL keys contain invalid escapes (UTF-8 and
 Windows-1252 bytes instead of Unicode code points), and some others contain
 whitespace like U+2003 that is invalid even in IRIs.

 I would guess though that it's not a big problem because the affected
 keys are 1. not many, i.e. 1% and 2. not relevant anyway because they do
 not represent valid, current, non-redirect Wikipedia page titles. That's
 just a guess though, based on only a very cursory look at a few bad keys.

 I don't remember if these problems also came up when I ran the script
 on the old freebase dump format.

 JC

 
  On Sun, Mar 24, 2013 at 9:44 PM, Jona Christopher Sahnwaldt 
 j...@sahnwaldt.de wrote:
 
  On 22 March 2013 23:21, Andrea Di Menna ninn...@gmail.com wrote:
  
   Hi Jona,
  
   thanks for merging the pull request!
  
   Anyway, couldn't we use percent encoding for Unicode code points
 which are
   not allowed in N-Triples? (namely those outside the [#x20,#7E]
 range?
   In this case we should get UTF-8 bytes and percent encode them.
  
   For example, as far as I can see
  
   Marl$00C3$00ADn$002C_$00C3$0081vila
  
   is
  
   http://dbpedia.org/resource/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila
  
   where \00C3 is 0xC3 0x83
\00AD is 0xC2 0xAD
\0081 is 0xC2 0x81
 
  Oh, by the way, it would be
  http://dbpedia.org/resource/Marl%C3%ADn,_%C3%81vila because that's
 the
  UTF-8-percent-encoding for Marlín,_Ávila.
 
  The weird thing is that these Wikipedia page titles in the Freebase
  contain UTF-8-encoded characters when they should contain

Re: [Dbpedia-discussion] Fwd: [extraction-framework] Update the CreateFreebaseLinks based on the new Freebase RDF dump format (#25)

2013-03-24 Thread Tom Morris

Can someone point to the part of the discussion which talks about what the
problem is?  This thread seems to start in mid-stream...

Freebase's MQL key encoding (http://wiki.freebase.com/wiki/MQL_key_escaping)
is a completely private encoding which shouldn't have any effect on
external URIs/IRIs/references/etc

On Sun, Mar 24, 2013 at 9:44 PM, Jona Christopher Sahnwaldt j...@sahnwaldt.de
 wrote:

 On 22 March 2013 23:21, Andrea Di Menna ninn...@gmail.com wrote:
 
  Hi Jona,
 
  thanks for merging the pull request!
 
  Anyway, couldn't we use percent encoding for Unicode code points which
 are
  not allowed in N-Triples? (namely those outside the [#x20,#7E] range?
  In this case we should get UTF-8 bytes and percent encode them.
 
  For example, as far as I can see
 
  Marl$00C3$00ADn$002C_$00C3$0081vila
 
  is
 
  http://dbpedia.org/resource/Marl%C3%83%C2%ADn,_%C3%83%C2%81vila
 
  where \00C3 is 0xC3 0x83
   \00AD is 0xC2 0xAD
   \0081 is 0xC2 0x81

 Oh, by the way, it would be
 http://dbpedia.org/resource/Marl%C3%ADn,_%C3%81vila because that's the
 UTF-8-percent-encoding for Marlín,_Ávila.

 The weird thing is that these Wikipedia page titles in the Freebase
 contain UTF-8-encoded characters when they should contain no encoding
 at all, just plain Unicode code points. (Of course, the characters and
 codepoints are also dollar-escaped as usual for Freebase, but that's
 not a problem.)


 JC

 
  WDYT?
 
  Cheers
  Andrea
 
  2013/3/22 Christopher Sahnwaldt notificati...@github.com
 
  Ok, I got it. It has nothing to do with your platform. These are
 actually
  wrong URIs. There's not much we can do about it. I don't know where
 Freebase
  got them from, but I assume they may actually be wrong in Wikipedia.
 
  Examples:
 
  Marl$00C3$00ADn$002C_$00C3$0081vila
  AD 2C and C3 81 are UTF-8 encodings, but Freebase says [1] that the
  numbers should be plain Unicode code points, not UTF-8 bytes. 81 is an
  invalid code point, so we generate an invalid URI.
 
  Bene$009A_decrees
  9A is the Windows-1252 encoding for š, but 9A invalid in Unicode.
 
  Switzerland$2003
  2003, 2029 etc. are valid Unicode code points, but for whitespace
  characters that are invalid in URIs
 
  In a nutshell: all these characters are invalid in URIs, and it's not
 our
  fault. I'll pull your changes in a moment.
 
  [1] http://wiki.freebase.com/wiki/MQL_key_escaping
 
  —
  Reply to this email directly or view it on GitHub.
 
 


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_d2d_mar
 ___
 Dbpedia-discussion mailing list
 Dbpedia-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Backslash encoding for URIs

2013-03-20 Thread Tom Morris

I suspect multistream bzip2 is the culprit (which is a sensible
correlation with parallel bzip).

For what it's worth Python 2.x can't read these files either.  There's
a backport of the 3.x support, but it requires installing a separate
package.

Tom

On Wed, Mar 20, 2013 at 9:48 PM, Jona Christopher Sahnwaldt
j...@sahnwaldt.de wrote:
 On 20 March 2013 20:10, Andrea Di Menna ninn...@gmail.com wrote:
 Hi Jona,

 I have tried loading labels_en_uris_de.nt.bz2 from the DBpedia 3.8 release
 using both Jena 2.7.4 and 2.10.0, but both fail with the following error:

 andread@build04:~/tools/apache-jena-2.10.0/bin$ ./tdbloader2 --loc .
 /media/HD2/data/dbpedia-3.8-archive/source_data/labels_en_uris_de.nt.bz2
  19:48:02 -- TDB Bulk Loader Start
  19:48:02 Data phase
 INFO  Load:
 /media/HD2/data/dbpedia-3.8-archive/source_data/labels_en_uris_de.nt.bz2 --
 2013/03/20 19:48:03 CET
 Exception in thread main org.apache.jena.atlas.AtlasException:
 java.nio.charset.MalformedInputException: Input length = 1
 at org.apache.jena.atlas.io.IO.exception(IO.java:154)
 at
 org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:79)
 at
 org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:156)
 at
 org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:139)
 at
 org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:251)
 at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:244)
 at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:169)
 at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:108)
 at
 org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
 at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:130)
 at org.apache.jena.riot.RiotReader.parse(RiotReader.java:115)
 at org.apache.jena.riot.RiotReader.parse(RiotReader.java:93)
 at org.apache.jena.riot.RiotReader.parse(RiotReader.java:66)
 at
 com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.exec(CmdNodeTableBuilder.java:162)
 at arq.cmdline.CmdMain.mainMethod(CmdMain.java:101)
 at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
 at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
 at
 com.hp.hpl.jena.tdb.store.bulkloader2.CmdNodeTableBuilder.main(CmdNodeTableBuilder.java:80)
 Caused by: java.nio.charset.MalformedInputException: Input length = 1
 at java.nio.charset.CoderResult.throwException(CoderResult.java:277)
 at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:338)
 at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
 at java.io.InputStreamReader.read(InputStreamReader.java:184)
 at java.io.Reader.read(Reader.java:140)
 ... 17 more

 Anyway, I have now tried the following:

 1) Download german labels
 2) Run tdbloader2 on the bz2 nt file - failure
 3) Uncompress the bz2 file and run tdbloader2 - SUCCESS
 4) Compress the nt file again - failure

 Looks like Jena is having some problems with bz2 files then.

 Interesting.

 Since 3.8, we use parallel bzip2 [1] to compress the files (it's much
 faster on multi-core machines). The files created by pbzip2 have a
 slightly different format though. Legal for bzip2, but for example
 older versions of Commons Compress cannot deal with it [2][3].

 2) Run tdbloader2 on the bz2 nt file - failure
 3) Uncompress the bz2 file and run tdbloader2 - SUCCESS

 This very much looks like compression is the culprit, not DBpedia encoding.

 4) Compress the nt file again - failure

 This is a bit weird. How do you compress the file?

 Cheers,
 JC

 [1] http://compression.ca/pbzip2/
 [2] https://issues.apache.org/jira/browse/COMPRESS-146
 [3] https://issues.apache.org/jira/browse/COMPRESS-162

 Would you mind giving it a try?

 But anyway please check this JIRA issue out
 https://issues.apache.org/jira/browse/STANBOL-804

 Cheers
 Andrea


 2013/3/20 Jona Christopher Sahnwaldt j...@sahnwaldt.de

 Hi Andrea,

 there used to be encoding problems, but I think they are all fixed
 since the 3.8 release. I tried very hard to make TurtleEscaper do the
 right thing - I checked the relevant standards etc. Could you give an
 example where Jena complains about a DBpedia 3.8 file?

 Cheers,
 JC

 On Wed, Mar 20, 2013 at 6:16 PM, Andrea Di Menna ninn...@gmail.com
 wrote:
  Hi,
 
  I have been using Stanbol [1] to process DBpedia data files and build a
  dbpedia Solr index.
  Stanbol is using Jena TDB in order to load DBpedia files into a triple
  store.
  Unfortunately, almost all the DBpedia N-Triples files must be
  pre-processed
  before being able to import them using Jena [2].
 
  The following sed command is launched:
 
  sed 's//\\u005c\\u005c/g;s/\\\([^u]\)/\\u005c\1/g'
 
  Basically the backslash is replaced with the unicode character escape
  sequence.
 
  Do you think this should/could be fixed in
  org.dbpedia.extraction.util.TurtleEscaper#escapeTurtle ?
 
  Cheers

Re: [Dbpedia-discussion] framework lexicalization extraction

2013-03-15 Thread Tom Morris

On Thu, Mar 14, 2013 at 4:03 PM, Aiden Bell aiden...@gmail.com wrote:

 Hi all

 I'm wondering if there is a way to produce spotlight type wikipage link
 surface form RDF using the dbpedia framework rather than Spotlight? I'm
 implementing a disambiguation algorithm with a large amount of existing
 triples and would like to add wikipedia surface forms to this existing
 triplestore.

 Any pointers?


You might also be interested in the UMass/Google Research Wikilinks corpus


http://googleresearch.blogspot.com/2013/03/learning-from-big-data-40-million.html
  http://www.iesl.cs.umass.edu/data/wiki-links

As well as this earlier Google Research corpus:


http://googleresearch.blogspot.com/2012/05/from-words-to-concepts-and-back.html

Both of them have surface forms linked to Wikipedia URLs

Tom
--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_mar___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Wrong type in RDF SPARQL view

2013-02-09 Thread Tom Morris

On Sat, Feb 9, 2013 at 5:22 AM, Mohamed Morsey 
mor...@informatik.uni-leipzig.de wrote:


 the one which equals 138^^http://www.w3.org/2001/XMLSchema#int is
 dbpprop:municipalityCode


That seems very wrong.  Where is this mapping/extraction defined?  And why
are there two different mappings?


  dbpedia-owl:municipalityCode 0138.046@en ;
 
 the value of dbpedia-owl:municipalityCode is 0138.046@en, which is the
 same as the one already in the source article [1].


That seems not quite as wrong, but still wrong since the identifier string
doesn't really have an associated language.

Tom

[1] http://en.wikipedia.org/w/index.php?title=Samstagernaction=edit


--
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] [ANN] Add your links to DBpedia workflow version 0.1 (this is also an RFC)

2013-01-21 Thread Tom Morris

On Wed, Jan 16, 2013 at 9:06 AM, Sebastian Hellmann
hellm...@informatik.uni-leipzig.de wrote:

 we thought that it might be a nice idea to simplify the workflow for
 creating outgoing links from DBpedia to your data sets. This is why we
 created the following GitHub repository:
 https://github.com/dbpedia/dbpedia-links
...
 Here is a (non-exhaustive) list of open issue:
...
 - yago, freebase and flickrwrappr have been excluded due to their size (
   0.5GB )

Presumably this means excluded from the repository, not excluded from
DBpedia, correct?

Tom

--
Master Visual Studio, SharePoint, SQL, ASP.NET, C# 2012, HTML5, CSS,
MVC, Windows 8 Apps, JavaScript and much more. Keep your skills current
with LearnDevNow - 3,200 step-by-step video tutorials by Microsoft
MVPs and experts. SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122412
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] literatur extraction

2013-01-04 Thread Tom Morris

On Fri, Jan 4, 2013 at 6:54 AM, Robert Glaß rgl...@avantgarde-labs.de wrote:
 Dear dbpedia team,

 I have a question concerning the extraction of literatur information
 from wikipedia pages.
 How can I get an integrated view of the information concerning the data
 inside the literature template.

I don't know the answer, but there are more problems than just the
fact that the property values aren't grouped together.  The ISBNs are
being corrupted because they're extracted as integers instead of
strings and the author property is all author names together rather
than individual values for each author.

Tom

--
Master HTML5, CSS3, ASP.NET, MVC, AJAX, Knockout.js, Web API and
much more. Get web development skills now with LearnDevNow -
350+ hours of step-by-step video tutorials by Microsoft MVPs and experts.
SALE $99.99 this month only -- learn more at:
http://p.sf.net/sfu/learnmore_122812
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Fwd: live dbpedia and new/updated articles

2012-12-24 Thread Tom Morris

On Mon, Dec 24, 2012 at 5:20 AM, Mohamed Morsey
mor...@informatik.uni-leipzig.de wrote:

 You can use the Scala script [3], to generate the owl:sameAs links to
 Freebase.

A word of warning about that script: it works off the old quad dump
format which is deprecated and may have been generated for the last
time on 9 Nov 2012.  There is a weekly RDF dump now containing the
equivalent information and it would only take a small tweak to the
Scala program to be able to use it as input instead.

http://wiki.freebase.com/wiki/Data_dumps#RDF_dump

Tom


 [3]
 http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/dump/scripts/src/main/scala/org/dbpedia/extraction/scripts/CreateFreebaseLinks.scala


--
LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial
Remotely access PCs and mobile devices and provide instant support
Improve your efficiency, and focus on delivering more value-add services
Discover what IT Professionals Know. Rescue delivers
http://p.sf.net/sfu/logmein_12329d2d
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] local dbpedia dump load to virtuoso server

2012-11-28 Thread Tom Morris

On Wed, Nov 28, 2012 at 10:42 AM, Jona Christopher Sahnwaldt 
j...@sahnwaldt.de wrote:

 On Wed, Nov 28, 2012 at 4:41 PM, Jona Christopher Sahnwaldt
 j...@sahnwaldt.de wrote:
  On Wed, Nov 28, 2012 at 2:29 AM, Hugh Williams hwilli...@openlinksw.com
 wrote:
  Hi Christopher,
 
  I am not aware of a linkset to Freebase in the DBpedia project.
 
  http://wiki.dbpedia.org/Downloads38#links-to-freebase
  http://downloads.dbpedia.org/3.8/links/freebase_links.nt.bz2

 Preview:


 http://downloads.dbpedia.org/preview.php?file=3.8_sl_links_sl_freebase_links.nt.bz2


Note also that if you need a fresher copy (for example to use with DBpedia
Live), it's trivial to generate from the weekly Freebase dump.
You can either use the Scala program:

http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/dump/scripts/src/main/scala/org/dbpedia/extraction/scripts/CreateFreebaseLinks.scala

or a very light postprocessing of the output of this command:

 bzgrep $'/type/object/key\t/wikipedia/en\t'
freebase-datadump-quadruples.tsv.bz2 | cut -f 4,1

Tom
--
Keep yourself connected to Go Parallel: 
INSIGHTS What's next for parallel hardware, programming and related areas?
Interviews and blogs by thought leaders keep you ahead of the curve.
http://goparallel.sourceforge.net___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Decoding the Freebase Quad Dump

2012-11-16 Thread Tom Morris

I'm confused by the editorial comments at the end:


On Fri, Nov 16, 2012 at 12:52 PM, p...@ontology2.com wrote:


  I’d like to advise Freebase to resume publication of the quad dump until
 it can demonstrate the correctness of any alternative data export. In fact,
 with infovore available under a Apache License and all of my claims
 independently verifiable, the freebase quad dump could remain in use
 indefinitely.

  Freebase users should demand a correct export.


Google, as far as I'm aware, is still publishing a quad dump and I've never
heard any reports about it not being correct (whatever that means in this
context).

Can you expand on what you're trying to say?

Tom
--
Monitor your physical, virtual and cloud infrastructure from a single
web console. Get in-depth insight into apps, servers, databases, vmware,
SAP, cloud infrastructure, etc. Download 30-day Free Trial.
Pricing starts from $795 for 25 servers or applications!
http://p.sf.net/sfu/zoho_dev2dev_nov___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] sameAs links to Freebase

2012-10-17 Thread Tom Morris

The Freebase dump includes direct links for topics which have interwiki
links:

$ bzgrep /wikipedia/it freebase-datadump-quadruples.tsv.bz2 | head
/m/010pld   /type/object/key/wikipedia/it
Yorktown_$0028Virginia$0029
/m/010pld   /type/object/key/wikipedia/it_id1071928
/m/010pld   /type/object/key/wikipedia/it_title
Yorktown_$0028Virginia$0029
/m/010vmb   /type/object/key/wikipedia/it   Walla_Walla
/m/010vmb   /type/object/key/wikipedia/it_id2353022
/m/010vmb   /type/object/key/wikipedia/it_title Walla_Walla
/m/011f6/type/object/key/wikipedia/it   Aerodinamica
/m/011f6/type/object/key/wikipedia/it_id47589
/m/011f6/type/object/key/wikipedia/it_title Aerodinamica
/m/011m0q   /type/object/key/wikipedia/it_id160

On Wed, Oct 17, 2012 at 9:30 AM, Marco Fossati hell.j@gmail.com wrote:


 However, I am wondering how to generate Freebase links for those
 lanugage-specific resources that do not have their counterpart in the
 English chapter. See for instance the Italian Wikipedia article for
 Maurizio Crozza [1]. There is no English counterpart, hence no data in
 the English DBpedia, but only in the Italian one [3]. In Freebase, there
 is indeed some [4].
 So, the core of my question is: how to generate Freebase links from
 scratch?


Practically the only way to make the link for Maurizio Crozza is manually,
because there's no additional information to match on.  If Wikipedia had a
link to MusicBrainz or Freebase had one of his movies and thus a link to
IMDB, the task might be possible to automate, but with the information
that's available it's basically an impossible task.One might imagine a
super sophisticated matching tool which was able Gialappa's
Bandhttp://it.wikipedia.org/wiki/Gialappa%27s_Band references
from the WP article and the MusicBrainz album title, but I think that's a
real stretch.


 Since there is no real RDF Freebase dump, I assume I cannot use a tool
 like Silk for the linking task.


It's not the data format that's the problem.  You can't use an automated
guessing tool like Silk unless it's got at least a little bit of
information to base its guesses on.  For topics which *do* have other
strong identifiers (IMDB, MusicBrainz, Library of Congress, VIAF, etc) or
additional information (e.g. birth date) it would be possible to do some
automatic matching, but I don't how many of your articles fall into that
category.

Tom



 [1] http://it.wikipedia.org/wiki/Maurizio_Crozza
 [2] http://dbpedia.org/resource/Maurizio_Crozza
 [3] http://it.dbpedia.org/resource/Maurizio_Crozza
 [4] http://www.freebase.com/view/en/maurizio_crozza

 On 10/17/12 2:04 PM, Dimitris Kontokostas wrote:
  Hi Marco,
 
  In DBpedia Greek we created a script [1] where we created these links
  transitively from the English datasets.
  Note that it needs a few modifications to work with the new directory
  structure.
  If you find it handy and adapt it you can contribute it back for the
  other chapters.
 
  Best,
  Dimitris
 
  [1]
 
 http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/7c499145af2c/scripts/shell-scripts/interlinking
 
  On Wed, Oct 17, 2012 at 2:37 PM, Marco Fossati hell.j@gmail.com
  mailto:hell.j@gmail.com wrote:
 
  Forgot the links...
 
  [1] http://wiki.freebase.com/wiki/DBPedia
  [2]
 
 http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-links-to-opencyc-updated/
 
  On 10/17/12 1:30 PM, Marco Fossati wrote:
Hi everyone,
   
In order to improve the Italian DBpedia interlinkage, it would be
  great
to add sameAs links to Freebase.
How did you guys do that on the English version?
I found a couple of posts in the Freebase wiki [1] and DBpedia
  blog [2]
saying it was done some years ago, but I can't find more details.
Could you please tell me more?
Thanks!
   
Cheers,
   
Marco
 
 
 --
  Everyone hates slow websites. So do we.
  Make your web apps faster with AppDynamics
  Download AppDynamics Lite for free today:
  http://p.sf.net/sfu/appdyn_sfd2d_oct
  ___
  Dbpedia-discussion mailing list
  Dbpedia-discussion@lists.sourceforge.net
  mailto:Dbpedia-discussion@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
 
 
 
 
  --
  Kontokostas Dimitris


 --
 Everyone hates slow websites. So do we.
 Make your web apps faster with AppDynamics
 Download AppDynamics Lite for free today:
 http://p.sf.net/sfu/appdyn_sfd2d_oct
 ___
 Dbpedia-discussion mailing list

Re: [Dbpedia-discussion] sameAs links to Freebase

2012-10-17 Thread Tom Morris

On Wed, Oct 17, 2012 at 10:23 AM, Dimitris Kontokostas jimk...@gmail.comwrote:

 On Wed, Oct 17, 2012 at 5:10 PM, Tom Morris tfmor...@gmail.com wrote:

 The Freebase dump includes direct links for topics which have interwiki
 links:

 $ bzgrep /wikipedia/it freebase-datadump-quadruples.tsv.bz2 | head
 /m/010pld   /type/object/key/wikipedia/it
 Yorktown_$0028Virginia$0029
 /m/010pld   /type/object/key/wikipedia/it_id1071928
 /m/010pld   /type/object/key/wikipedia/it_title
 Yorktown_$0028Virginia$0029
 /m/010vmb   /type/object/key/wikipedia/it   Walla_Walla
 /m/010vmb   /type/object/key/wikipedia/it_id2353022
 /m/010vmb   /type/object/key/wikipedia/it_title
 Walla_Walla


 Do you copy the interwiki links as-is or do you post process them?


I don't do anything.  You'd have to ask the Freebase team what links they
use (although it should be pretty easy to figure out by comparing the
Freebase dump with the interwiki links).


 We calculated that more than 90% of interlinking errors come from 1-way
 links and thus, we use only the 2-way links for owl:sameAs [1]


$40?  I'm sure it's a wonderful paper, but I think I'll wait for the movie
version.

Tom



 Best,
 Dimitris

 [1] http://dx.doi.org/10.1016/j.websem.2012.01.001


 On Wed, Oct 17, 2012 at 9:30 AM, Marco Fossati hell.j@gmail.comwrote:


 However, I am wondering how to generate Freebase links for those
 lanugage-specific resources that do not have their counterpart in the
 English chapter. See for instance the Italian Wikipedia article for
 Maurizio Crozza [1]. There is no English counterpart, hence no data in
 the English DBpedia, but only in the Italian one [3]. In Freebase, there
 is indeed some [4].
 So, the core of my question is: how to generate Freebase links from
 scratch?


 Practically the only way to make the link for Maurizio Crozza is
 manually, because there's no additional information to match on.  If
 Wikipedia had a link to MusicBrainz or Freebase had one of his movies and
 thus a link to IMDB, the task might be possible to automate, but with the
 information that's available it's basically an impossible task.One
 might imagine a super sophisticated matching tool which was able Gialappa's
 Band http://it.wikipedia.org/wiki/Gialappa%27s_Band references from
 the WP article and the MusicBrainz album title, but I think that's a real
 stretch.


 Since there is no real RDF Freebase dump, I assume I cannot use a tool
 like Silk for the linking task.


 It's not the data format that's the problem.  You can't use an automated
 guessing tool like Silk unless it's got at least a little bit of
 information to base its guesses on.  For topics which *do* have other
 strong identifiers (IMDB, MusicBrainz, Library of Congress, VIAF, etc) or
 additional information (e.g. birth date) it would be possible to do some
 automatic matching, but I don't how many of your articles fall into that
 category.

 Tom



 [1] http://it.wikipedia.org/wiki/Maurizio_Crozza
 [2] http://dbpedia.org/resource/Maurizio_Crozza
 [3] http://it.dbpedia.org/resource/Maurizio_Crozza
 [4] http://www.freebase.com/view/en/maurizio_crozza

 On 10/17/12 2:04 PM, Dimitris Kontokostas wrote:
  Hi Marco,
 
  In DBpedia Greek we created a script [1] where we created these links
  transitively from the English datasets.
  Note that it needs a few modifications to work with the new directory
  structure.
  If you find it handy and adapt it you can contribute it back for the
  other chapters.
 
  Best,
  Dimitris
 
  [1]
 
 http://dbpedia.hg.sourceforge.net/hgweb/dbpedia/extraction_framework/file/7c499145af2c/scripts/shell-scripts/interlinking
 
  On Wed, Oct 17, 2012 at 2:37 PM, Marco Fossati hell.j@gmail.com
  mailto:hell.j@gmail.com wrote:
 
  Forgot the links...
 
  [1] http://wiki.freebase.com/wiki/DBPedia
  [2]
 
 http://blog.dbpedia.org/2008/11/15/dbpedia-is-now-interlinked-with-freebase-links-to-opencyc-updated/
 
  On 10/17/12 1:30 PM, Marco Fossati wrote:
Hi everyone,
   
In order to improve the Italian DBpedia interlinkage, it would
 be
  great
to add sameAs links to Freebase.
How did you guys do that on the English version?
I found a couple of posts in the Freebase wiki [1] and DBpedia
  blog [2]
saying it was done some years ago, but I can't find more
 details.
Could you please tell me more?
Thanks!
   
Cheers,
   
Marco
 
 
 --
  Everyone hates slow websites. So do we.
  Make your web apps faster with AppDynamics
  Download AppDynamics Lite for free today:
  http://p.sf.net/sfu/appdyn_sfd2d_oct
  ___
  Dbpedia-discussion mailing list
  Dbpedia-discussion@lists.sourceforge.net
  mailto:Dbpedia-discussion

Re: [Dbpedia-discussion] sameAs links to Freebase

2012-10-17 Thread Tom Morris

On Wed, Oct 17, 2012 at 12:48 PM, Marco Fossati hell.j@gmail.comwrote:

I checked the whole Freebase dump looking for topics that have an Italian
Wikipedia id AND NOT an English one. It seems there is no one.

That sounds correct. As far as Wikipedia is concerned, Freebase only
imports from English Wikipedia, so that forms the core.

Tom

Since the English links are already created and published, I conclude
there is no need to generate new links from scratch (at least for the
Italian chapter).
WRT Italian Wikipedia articles that have no counterparts in the English
version, I will see whether it is worth to automate the task or not.
Thank you all for the advices!
Cheers,

Marco

On 10/17/12 4:10 PM, Tom Morris wrote:

The Freebase dump includes direct links for topics which have interwiki
links:

$ bzgrep /wikipedia/it freebase-datadump-quadruples.**tsv.bz2 | head
/m/010pld /type/object/key/wikipedia/it
Yorktown_$0028Virginia$0029
/m/010pld /type/object/key/wikipedia/it_id1071928
/m/010pld /type/object/key/wikipedia/it_title
Yorktown_$0028Virginia$0029
/m/010vmb /type/object/key/wikipedia/it Walla_Walla
/m/010vmb /type/object/key/wikipedia/it_id2353022
/m/010vmb /type/object/key/wikipedia/it_title
Walla_Walla
/m/011f6/type/object/key/wikipedia/it Aerodinamica
/m/011f6/type/object/key/wikipedia/it_id47589
/m/011f6/type/object/key/wikipedia/it_title
Aerodinamica
/m/011m0q /type/object/key/wikipedia/it_id160

On Wed, Oct 17, 2012 at 9:30 AM, Marco Fossati hell.j@gmail.com
mailto:hell.j@gmail.com wrote:

However, I am wondering how to generate Freebase links for those
lanugage-specific resources that do not have their counterpart in the
English chapter. See for instance the Italian Wikipedia article for
Maurizio Crozza [1]. There is no English counterpart, hence no data in
the English DBpedia, but only in the Italian one [3]. In Freebase,
there
is indeed some [4].
So, the core of my question is: how to generate Freebase links from
scratch?

Practically the only way to make the link for Maurizio Crozza is
manually, because there's no additional information to match on. If
Wikipedia had a link to MusicBrainz or Freebase had one of his movies
and thus a link to IMDB, the task might be possible to automate, but
with the information that's available it's basically an impossible task.
One might imagine a super sophisticated matching tool which was able
Gialappa's Band
http://it.wikipedia.org/wiki/**Gialappa%27s_Bandhttp://it.wikipedia.org/wiki/Gialappa%27s_Band
references from the WP

article and the MusicBrainz album title, but I think that's a real
stretch.

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Fetching song and movie related information

2012-10-11 Thread Tom Morris

On Thu, Oct 11, 2012 at 9:48 AM, Venkatesh Channal
venkateshchan...@gmail.com wrote:

Hi,

I think I am still missing on how to query the information.

On executing the query:
SELECT * WHERE { ?s http://purl.org/dc/terms/subject
http://dbpedia.org/resource/Category:Hindi-language_films. ?s rdf:type
http://dbpedia.org/ontology/Film } LIMIT 100

One of the values got was -
http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na.

To find the information about all triples that has the film name as
subject. The idea was to find song and singer of those songs. The query
executed:

select * where { http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na
?p ?o . }

One of the values returned is -
http://dbpedia.org/property/titlePappu Can't Dance@en

Here Pappu Can't Dance@en is a song.

I executed the following query to find artist associated with the song
that begin with the character 'P'. The song Pappu Can't Dance was not
returned.

Select distinct * where { ?song http://purl.org/dc/terms/subject
http://dbpedia.org/resource/Category:Hindi_songs . ?song rdf:type
http://dbpedia.org/ontology/Song . ?song
http://dbpedia.org/ontology/artist ?artist . ?song
http://dbpedia.org/ontology/runtime ?runtime . filter
(regex(str(?song),'P'))} limit 100

The correspong wikipedia link is -
http://en.wikipedia.org/wiki/Jaane_Tu..._Ya_Jaane_Na

That article is in Category:Hindi_films, not Category:Hindi_songs and it's
a Film, not a song, so it's not going to meet the requirements of your
query. It looks like the DBpedia extractor attempted to extract as much
information as possible from the page, but that strategy, combined with the
way Wikipedians edit is causing confusion.

Even the article contains infoboxes for both the film and the soundtrack
album, the subject is principally, in my opinion, the film. Including
triples related to the soundtrack album associated with the same URI is
just going to cause confusion.

Tom

Appreciate your feedback and help.

Thanks and regards,
Venkatesh

On Thu, Oct 11, 2012 at 6:20 PM, Julien Cojan julien.co...@inria.frwrote:

Hi Mohamed,

*De: *Mohamed Morsey mor...@informatik.uni-leipzig.de
*À: *Julien Cojan julien.co...@inria.fr
*Cc: *Venkatesh Channal venkateshchan...@gmail.com,
dbpedia-discussion@lists.sourceforge.net
*Envoyé: *Jeudi 11 Octobre 2012 10:49:45

*Objet: *Re: [Dbpedia-discussion] Fetching song and movie related
information

Hi Julien,

On 10/10/2012 05:34 PM, Julien Cojan wrote:

Hi,

Are you running the query on http://live.dbpedia.org/sparql ?
I don't get any artist associated to
http://dbpedia.org/resource/Piyu_Bole in this endpoint.
You must have got from http://dbpedia.org/sparql which is not synched
with wikipedia.

As Mohamed said, the wikipedia page was redirected to
http://en.wikipedia.org/wiki/Parineeta_(2005_film).
There is something weird though, there is a triple

http://dbpedia.org/resource/Piyu_Bolehttp://dbpedia.org/resource/Piyu_Bole
http://dbpedia.org/ontology/wikiPageRedirectshttp://dbpedia.org/ontology/wikiPageRedirects
http://dbpedia.org/resource/Parineeta_(2005_film)http://dbpedia.org/resource/Parineeta_(2005_film).

but describe
http://dbpedia.org/resource/Parineeta_(2005_film)http://live.dbpedia.org/sparql?default-graph-uri=query=describe+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FParineeta_%282005_film%29%3Eshould-sponge=format=text%2Fplaintimeout=0debug=ongives
only this triple
and describe
http://dbpedia.org/resource/Parineeta_%282005_film%29http://live.dbpedia.org/sparql?default-graph-uri=query=describe+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FParineeta_%25282005_film%2529%3Eshould-sponge=format=text%2Fplaintimeout=0debug=onvery
little.

sorry I didn't get what you mean by very little.
I tried query describe
http://dbpedia.org/resource/Parineeta_%282005_film%29http://dbpedia.org/resource/Parineeta_%282005_film%29,
and it gives 109 triples.

Same for me now, when I tried I had about 10. Maybe the page was being
extracted after a change.

There is bug though about the use of two different URI/IRIs for
Parineeta_(2005_film)http://live.dbpedia.org/sparql?default-graph-uri=query=describe+%3Chttp%3A%2F%2Fdbpedia.org%2Fresource%2FParineeta_%282005_film%29%3Eshould-sponge=format=text%2Fplaintimeout=0debug=on.

Anyone knows why ?

Julien

--
Kind Regards
Mohamed Morsey
Department of Computer Science
University of Leipzig

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Dbpedia-discussion mailing list

Re: [Dbpedia-discussion] Fetching song and movie related information

2012-10-11 Thread Tom Morris

On Thu, Oct 11, 2012 at 11:03 AM, Pablo N. Mendes pablomen...@gmail.comwrote:


 Good point, Tom.

 That article is in Category:Hindi_films, not Category:Hindi_songs and it's
 a Film, not a song, so it's not going to meet the requirements of your
 query.


 But maybe the class hierarchy comes to the rescue (Work is a supertype of
 Song and Film)?


The main point is that the extracted triples are semantic nonsense because
they conflate multiple subjects under a single URI.

You've got

http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na   
http://dbpedia.org/property/runtime   
9300.0^^http://dbpedia.org/datatype/second
.

http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na   
http://dbpedia.org/property/length
276.0^^http://dbpedia.org/datatype/second
.

http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na   
http://dbpedia.org/property/length
1908.0^^http://dbpedia.org/datatype/second
.

http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na   
http://dbpedia.org/property/length
221.0^^http://dbpedia.org/datatype/second
.

Which is the length of what?  They all refer to the same subject.

Similarly Pappu Can't Dance isn't a song. It's an (alternate?) title for
the film according to the RDF.  A human knows it's a song because of the

http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na   
http://dbpedia.org/property/title Pappu
Can't Dance@en .
http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na   
http://dbpedia.org/ontology/Work/runtime  
155.0^^http://dbpedia.org/datatype/minute
.

To make what Venkatesh wants to work happen, you'd need to teach the
extractor to figure out what the main subject of a page was and then have
it mint new subject URIs for all related concepts represented on the page
which are different (and don't have their on Wikipedia page) such as sound
track album, songs on a sound track album, etc.  Then you'd also need to
teach it that the physical proximity of the track listing and the
soundtrack infobox implies that they refer to the same subject.  Finally,
you'd have to make this robust in the face of different editing 
structuring styles by different Wikipedians.

I'd love to see the extractor get this smart, but I'm not holding my breath.

Tom


 On Thu, Oct 11, 2012 at 4:15 PM, Tom Morris tfmor...@gmail.com wrote:



 On Thu, Oct 11, 2012 at 9:48 AM, Venkatesh Channal 
 venkateshchan...@gmail.com wrote:

 Hi,

 I think I am still missing on how to query the information.

 On executing the query:
 SELECT * WHERE { ?s http://purl.org/dc/terms/subject 
 http://dbpedia.org/resource/Category:Hindi-language_films. ?s
  rdf:type http://dbpedia.org/ontology/Film } LIMIT 100

 One of the values got was -
 http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na.

 To find the information about all triples that has the film name as
 subject. The idea was to find song and singer of those songs. The query
 executed:

 select *  where { http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na
 ?p ?o . }

 One of the values returned is -
 http://dbpedia.org/property/titlePappu Can't Dance@en

  Here Pappu Can't Dance@en is a song.

 I executed the following query to find artist associated with the song
 that begin with the character 'P'. The song  Pappu Can't Dance was
 not returned.

 Select distinct * where { ?song http://purl.org/dc/terms/subject 
 http://dbpedia.org/resource/Category:Hindi_songs . ?song rdf:type 
 http://dbpedia.org/ontology/Song . ?song 
 http://dbpedia.org/ontology/artist ?artist .  ?song 
 http://dbpedia.org/ontology/runtime ?runtime . filter
 (regex(str(?song),'P'))} limit 100

 The correspong wikipedia link is -
 http://en.wikipedia.org/wiki/Jaane_Tu..._Ya_Jaane_Na


 That article is in Category:Hindi_films, not Category:Hindi_songs and
 it's a Film, not a song, so it's not going to meet the requirements of your
 query.  It looks like the DBpedia extractor attempted to extract as much
 information as possible from the page, but that strategy, combined with the
 way Wikipedians edit is causing confusion.

 Even the article contains infoboxes for both the film and the soundtrack
 album, the subject is principally, in my opinion, the film.  Including
 triples related to the soundtrack album associated with the same URI is
 just going to cause confusion.

 Tom



 Appreciate your feedback and help.

 Thanks and regards,
 Venkatesh

 On Thu, Oct 11, 2012 at 6:20 PM, Julien Cojan julien.co...@inria.frwrote:

 Hi Mohamed,

 --

 *De: *Mohamed Morsey mor...@informatik.uni-leipzig.de
 *À: *Julien Cojan julien.co...@inria.fr
 *Cc: *Venkatesh Channal venkateshchan...@gmail.com,
 dbpedia-discussion@lists.sourceforge.net
 *Envoyé: *Jeudi 11 Octobre 2012 10:49:45

 *Objet: *Re: [Dbpedia-discussion] Fetching song and movie related
 information

 Hi Julien,

 On 10/10/2012 05:34 PM, Julien Cojan wrote:

 Hi,

 Are you running the query on http://live.dbpedia.org/sparql ?
 I don't get any artist associated to
 http://dbpedia.org/resource/Piyu_Bole

Re: [Dbpedia-discussion] Fetching song and movie related information

2012-10-11 Thread Tom Morris

On Thu, Oct 11, 2012 at 11:24 AM, Pablo N. Mendes pablomen...@gmail.comwrote:


 Is this the page?
 http://en.wikipedia.org/wiki/Pappu_Can't_Dance_Saala


No, he's talking about this:
http://en.wikipedia.org/wiki/Jaane_Tu..._Ya_Jaane_Na#Music

It's a page about a film with both film and soundtrack infoboxes.

If you look at the triples extracted :
http://dbpedia.org/data/Jaane_Tu..._Ya_Jaane_Na.ntriples

you can see that both sets of properties have been applied to the same
subject.

Tom



 It contains infobox film, so it should be of type Film, as Tom says. It is
 also not in the category you said, as Tom also pointed out. Check the
 wikipedia page.

 There is another problem. For some reason, artist and runtime information
 were not extracted. Look at this query:

 Select distinct * where {
 ?song http://purl.org/dc/terms/subject 
 http://dbpedia.org/resource/Category:Hindi-language_films .
 ?song rdfs:label ?label.
 ?song rdf:type http://dbpedia.org/ontology/Work .
 optional {
   ?song http://dbpedia.org/ontology/artist ?artist .
 }
 optional {
   ?song http://dbpedia.org/ontology/runtime ?runtime .
 }
 filter (regex(str(?label),'Pappu'))} limit 100


 Also worth pointing out, there is a bug in the Linked Data hosting for
 URIs with special chars. See:

 http://dbpedia.org/page/Pappu_Can%27t_Dance_Saala

 And contrast with:

 describe http://dbpedia.org/page/Pappu_Can%27t_Dance_Saala

 Cheers,
 Pablo

 On Thu, Oct 11, 2012 at 5:07 PM, Venkatesh Channal 
 venkateshchan...@gmail.com wrote:

 Even with rdf:type Work the song starting with Pappu is not among the
 returned values.

 Regards,
 Venkatesh


 On Thu, Oct 11, 2012 at 8:33 PM, Pablo N. Mendes 
 pablomen...@gmail.comwrote:


 Good point, Tom.

 That article is in Category:Hindi_films, not Category:Hindi_songs and
 it's a Film, not a song, so it's not going to meet the requirements of your
 query.


 But maybe the class hierarchy comes to the rescue (Work is a supertype
 of Song and Film)?

 Select distinct * where { ?song http://purl.org/dc/terms/subject 
 http://dbpedia.org/resource/Category:Hindi_songs . ?song rdf:type 
 http://dbpedia.org/ontology/Work . ?song 
 http://dbpedia.org/ontology/artist ?artist .  ?song 
 http://dbpedia.org/ontology/runtime ?runtime . filter
 (regex(str(?song),'P'))} limit 100

 Cheers,
 Pablo

 On Thu, Oct 11, 2012 at 4:15 PM, Tom Morris tfmor...@gmail.com wrote:



 On Thu, Oct 11, 2012 at 9:48 AM, Venkatesh Channal 
 venkateshchan...@gmail.com wrote:

 Hi,

 I think I am still missing on how to query the information.

 On executing the query:
 SELECT * WHERE { ?s http://purl.org/dc/terms/subject 
 http://dbpedia.org/resource/Category:Hindi-language_films. ?s
  rdf:type http://dbpedia.org/ontology/Film } LIMIT 100

 One of the values got was -
 http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na.

 To find the information about all triples that has the film name as
 subject. The idea was to find song and singer of those songs. The query
 executed:

 select *  where { http://dbpedia.org/resource/Jaane_Tu..._Ya_Jaane_Na
 ?p ?o . }

 One of the values returned is -
 http://dbpedia.org/property/titlePappu Can't Dance@en

  Here Pappu Can't Dance@en is a song.

 I executed the following query to find artist associated with the song
 that begin with the character 'P'. The song  Pappu Can't Dance was
 not returned.

 Select distinct * where { ?song http://purl.org/dc/terms/subject 
 http://dbpedia.org/resource/Category:Hindi_songs . ?song rdf:type 
 http://dbpedia.org/ontology/Song . ?song 
 http://dbpedia.org/ontology/artist ?artist .  ?song 
 http://dbpedia.org/ontology/runtime ?runtime . filter
 (regex(str(?song),'P'))} limit 100

 The correspong wikipedia link is -
 http://en.wikipedia.org/wiki/Jaane_Tu..._Ya_Jaane_Na


 That article is in Category:Hindi_films, not Category:Hindi_songs and
 it's a Film, not a song, so it's not going to meet the requirements of your
 query.  It looks like the DBpedia extractor attempted to extract as much
 information as possible from the page, but that strategy, combined with the
 way Wikipedians edit is causing confusion.

 Even the article contains infoboxes for both the film and the
 soundtrack album, the subject is principally, in my opinion, the film.
  Including triples related to the soundtrack album associated with the same
 URI is just going to cause confusion.

 Tom



 Appreciate your feedback and help.

 Thanks and regards,
 Venkatesh

 On Thu, Oct 11, 2012 at 6:20 PM, Julien Cojan 
 julien.co...@inria.frwrote:

 Hi Mohamed,

 --

 *De: *Mohamed Morsey mor...@informatik.uni-leipzig.de
 *À: *Julien Cojan julien.co...@inria.fr
 *Cc: *Venkatesh Channal venkateshchan...@gmail.com,
 dbpedia-discussion@lists.sourceforge.net
 *Envoyé: *Jeudi 11 Octobre 2012 10:49:45

 *Objet: *Re: [Dbpedia-discussion] Fetching song and movie related
 information

 Hi Julien,

 On 10/10/2012 05:34 PM, Julien Cojan wrote:

 Hi,

 Are you running the query on http

Re: [Dbpedia-discussion] Fetching song and movie related information

2012-10-10 Thread Tom Morris

On Wed, Oct 10, 2012 at 12:23 PM, Julien Cojan julien.co...@inria.frwrote:

 Ok, it is hard to discuss about results on changing data.

 There there is a mess about this example 
 Piyu_Bolehttp://dbpedia.org/resource/Piyu_Bolebecause it was the page of a 
 song :
 http://en.wikipedia.org/w/index.php?title=Piyu_Boledirection=prevoldid=503695506
 then it was redirected to a film :
 http://en.wikipedia.org/w/index.php?title=Piyu_Boleredirect=no


Which is why you can never really infer anything about what a redirect
means on Wikipedia, because it basically means whatever the Wikipedians
want it to mean.  Sometimes it connects equivalent topics, but often, as in
this case, it simply represents something that wasn't notable enough to
have its own page and the redirect goes to something that discusses
multiple topics: a film, the soundtrack for the film, the tracks on the
soundtrack for the film, etc.

Tom
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] DBpedia Attribution

2012-09-14 Thread Tom Morris

On Fri, Sep 14, 2012 at 1:25 PM, Kingsley Idehen kide...@openlinksw.com wrote:
 All,

 Here is a rehash of a post I made last week about DBpedia attribution in
 line with the fundamental goals and principles of Linked Open Data.

 ** Attribution Guidelines Start ***

A better place for this would be near the DBpedia license statement:
http://wiki.dbpedia.org/Datasets#h18-23

CC-BY-SA requires attribution in the form the author (or in this case
publisher) specifies.  Lacking specificity consumers can basically
make up their rules as to what constitutes compliance.

Tom

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] How to best choose and use dbpprops for public use in radial tree layout app

2012-08-30 Thread Tom Morris

On Thu, Aug 30, 2012 at 7:58 AM, Kingsley Idehen kide...@openlinksw.com wrote:
 On 8/29/12 11:53 PM, Michael Douma wrote:

 @Kingsley:

 Our app will work offline, so we will need to create packages. We
 probably will not send users to DBpedia in realtime. Also, we are
 merging and massaging results. e.g., to filter from 1000's of hits to
 the top 10.

 My comments aren't about the on or offline modality of your app. Its all
 about keeping the Web of Linked Data intact. For instance, when sharing
 insights, you should use DBpedia and Wikipedia URIs. Right now you only use
 Wikipedia URI.

Isn't there a 1-to-1 mapping between Wikipedia  DBpedia URIs?  That
should allow one to easily map from a Wikipedia URI to a DBpedia URI.

Tom

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] How to best choose and use dbpprops for public use in radial tree layout app

2012-08-29 Thread Tom Morris

On Wed, Aug 29, 2012 at 11:22 AM, Michael Douma micha...@idea.org wrote:

 I have a question about using DBpedia properties.

 I'm working on an iPad app for visually browsing Wikipedia with radial
 tree layouts. My colleagues and I published a proof-of-concept app
 called 'WikiNodes':
 http://itunes.apple.com/us/app/wikinodes/id433834594?mt=8

 We want to improve the layout and content of the app using DBPedia data.
 For example, for a primary node Paris, a child node could be
 Birthplace of, and then we'd list ~10 of the people with
 dbpedia-owl:birthPlace.

Is there more than one property used for this information?  That
sounds like a very low number of instances.  By comparison, Freebase
has 3583 people born there (which is still probably just a tiny
fraction of those in Wikipedia).

 Where/how can we find some guidance on DBpedia properties, and whether
 to use them? -- e.g., a list of all the properties, their frequency of
 use,  what they mean (if it's not obvious), and whether they are
 reliable or buggy? Does this exist, or should we extract statistics from
 the downloads?

I can't help with documentation, but one way to gauge reliability of
properties would to see how well DBpedia and Freebase correspond for
Wikipedia-based entries, since they're basically starting with the
same source data.  Where there's large discrepancies, you'd probably
need to dig in to it further to figure out which one is more
reliable/representative.

 Also, if this project catches the interest of any list members, I can
 tell you more about our project, as we'd welcome any insights, ideas or
 input on how to use DBpedia data for an app for the general public.

I'd be a lot more interested in your had an Android, Linux, or Windows
app that I could play with on one of my devices. :-)

Tom

p.s. Is Wikipedia really 17+?  How does Apple enforce this on that
tiny portion of the web that they don't control?

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] question about dbpedia and freebase interlinking

2012-07-10 Thread Tom Morris

I know it has been correct in the past so this only effects some subset of
the releases (not sure which ones).

Tom
On Jul 10, 2012 8:50 AM, Jona Christopher Sahnwaldt j...@sahnwaldt.de
wrote:

 DBpedia linked to the wrong URI, but that has been fixed. In the
 upcoming release, we'll use the dot. Thanks for the report!

 Christopher

 On Tue, Jul 10, 2012 at 12:21 AM, Juan Sequeda juanfeder...@gmail.com
 wrote:
  I'm fwding my question to this mailing list.
 
  Juan Sequeda
  +1-575-SEQ-UEDA
  www.juansequeda.com
 
 
  -- Forwarded message --
  From: Juan Sequeda juanfeder...@gmail.com
  Date: Sun, Jul 8, 2012 at 7:09 PM
  Subject: question about dbpedia and freebase interlinking
  To: public-lod public-...@w3.org
 
 
  All,
 
  It seems like there is a mismatch with the links between dbpedia and
  freebase.
 
  Take for example http://dbpedia.org/resource/Austin,_Texas
 
  The owl:sameAs link is to: http://rdf.freebase.com/ns/m/0vzm however
 the RDF
  triples that are returned have the URI http://rdf.freebase.com/ns/m.0vzm
  (not that it has a period (.) instead of a slash (/)).
 
  Did freebase change the URIs? Or is DBpedia linking to the wrong URI?
 
  Juan Sequeda
  +1-575-SEQ-UEDA
  www.juansequeda.com
 
 
 
 --
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond. Discussions
  will include endpoint security, mobile security and the latest in malware
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
  ___
  Dbpedia-discussion mailing list
  Dbpedia-discussion@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
 


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Dbpedia-discussion mailing list
 Dbpedia-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] question about dbpedia and freebase interlinking

2012-07-10 Thread Tom Morris

I don't speak for Freebase, but I know a fair bit about it, so I'll
provide the background that I'm familiar with.

There are two different issues here:

1. Which identifier(s) are used: guid, mid, or human readable /en/... key
2. The format of the identifier, ie whether they have embedded dots
(.) or slashes (/)

GUIDs are bare metal identifiers and should never be used because they
don't survive merges, splits, etc.  The /en identifiers used to be
given preference, but now are deprecated and MIDs are recommended for
long term linking (plus no new /en identifiers have been minted in a
long time, so you need to be able to handle MIDs).  MIDs are similar
to GUIDs, *except* that they move with topics which are merged, so
they're more stable.

As for separators, the Freebase RDF endpoint has always converted the
native slashes (/) to dots (.) in identifiers.  I'm not 100% certain,
but I think the slashes caused problems in certain processing
frameworks (or perhaps they just wanted the identifiers to be treated
as an opaque string rather than a namespace hierarchy).

DBpedia should, in my opinion, always be using MIDs for the canonical
link and they should used the dotted form in preference to slashes.
If you want an official recommendation, you should probably ask Jason
Douglas or Shawn Simister (hopefully they won't say anything too
wildly different than the above).

Tom



On Tue, Jul 10, 2012 at 10:41 AM, Jona Christopher Sahnwaldt
j...@sahnwaldt.de wrote:
 I just looked through older DBpedia releases. I think we started
 publishing Freebase links in 3.1. Until 3.5, we used URIs like
 http://rdf.freebase.com/ns/guid.9202a8c04000641f802d1e19 . In
 3.6 and 3.7, we used URIs like http://rdf.freebase.com/ns/m/01006r .
 In 3.8, we will use http://rdf.freebase.com/ns/m.01006r .

 All these URIs can be resolved, but they are not the 'canonical' URIs.
 The ones with the dot also aren't 'canonical': the page
 http://rdf.freebase.com states that Freebase RDF IDs look like
 http://rdf.freebase.com/ns/en.steve_martin . On the other hand, the
 explanations on http://wiki.freebase.com/wiki/Mid and
 http://wiki.freebase.com/wiki/Id seem to say that mids like '/m/0p_47'
 are more stable than ids like 'steve_martin', and while all topics
 have a mid, some don't have an id. So it seems best to use URIs like
 http://rdf.freebase.com/ns/m.0p_47 .

 Also, extracting URIs like http://rdf.freebase.com/ns/en.steve_martin
 would require a bit more coding, but if people (someone at Freebase?)
 think DBpedia should do that, we could probably add it for 3.9.

 Christopher



 On Tue, Jul 10, 2012 at 2:57 PM, Tom Morris tfmor...@gmail.com wrote:
 I know it has been correct in the past so this only effects some subset of
 the releases (not sure which ones).

 Tom

 On Jul 10, 2012 8:50 AM, Jona Christopher Sahnwaldt j...@sahnwaldt.de
 wrote:

 DBpedia linked to the wrong URI, but that has been fixed. In the
 upcoming release, we'll use the dot. Thanks for the report!

 Christopher

 On Tue, Jul 10, 2012 at 12:21 AM, Juan Sequeda juanfeder...@gmail.com
 wrote:
  I'm fwding my question to this mailing list.
 
  Juan Sequeda
  +1-575-SEQ-UEDA
  www.juansequeda.com
 
 
  -- Forwarded message --
  From: Juan Sequeda juanfeder...@gmail.com
  Date: Sun, Jul 8, 2012 at 7:09 PM
  Subject: question about dbpedia and freebase interlinking
  To: public-lod public-...@w3.org
 
 
  All,
 
  It seems like there is a mismatch with the links between dbpedia and
  freebase.
 
  Take for example http://dbpedia.org/resource/Austin,_Texas
 
  The owl:sameAs link is to: http://rdf.freebase.com/ns/m/0vzm however the
  RDF
  triples that are returned have the URI http://rdf.freebase.com/ns/m.0vzm
  (not that it has a period (.) instead of a slash (/)).
 
  Did freebase change the URIs? Or is DBpedia linking to the wrong URI?
 
  Juan Sequeda
  +1-575-SEQ-UEDA
  www.juansequeda.com
 
 
 
  --
  Live Security Virtual Conference
  Exclusive live event will cover all the ways today's security and
  threat landscape has changed and how IT managers can respond.
  Discussions
  will include endpoint security, mobile security and the latest in
  malware
  threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
  ___
  Dbpedia-discussion mailing list
  Dbpedia-discussion@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion
 


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Dbpedia

Re: [Dbpedia-discussion] decimal and grouping separators doubt

2012-05-30 Thread Tom Morris

On Wed, May 30, 2012 at 10:29 AM, Jona Christopher Sahnwaldt
j...@sahnwaldt.de wrote:

 @developers: We will have to discuss what's the best way to do this...

 - Add a configuration value decimalSeparator whose value may be dot or
 comma: , or .. Bit hard to read... We would also need a
 configuration value groupSeparator.

 - Add a configuration value numberFormat that takes a language code,
 in this case en.

 - Add a configuration value numberFormat that takes a decimal
 separator and a group separator: .,. Bit hard to read...

 Any other ideas?

POSIX sorted this all out a couple of decades ago (and standardized
it).  Why not just use the infrastructure that they've made available
(and reference that standard)?
http://www.chemie.fu-berlin.de/chemnet/use/info/libc/libc_19.html

Note that they specify monetary and non-monetary number formatting
separately.  It may seem like overkill, but there's almost certainly a
good reason for it (although I don't know what it is).

Tom

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] values of dbpedia-owl:wikPageDisambiguate - how are they extracted

2012-05-23 Thread Tom Morris

On Wed, May 23, 2012 at 5:44 AM, Ziqi Zhang
ziqizhang.em...@googlemail.com wrote:

 My task is to extract candidate concepts/entities for an ambiguous term
 from dbpedia, e.g., cat (disambiguation).

Like some of the other answers, not directly relevant, but another
signal that you can use is inclusion in Freebase.  Freebase includes
basically all of Wikipedia, but aggressively deletes both
disambiguation pages and list pages.

I'd also point out that not all ambiguous Wikipedia articles are
tagged as disambiguation pages.  If you look at the split_to hint
properties in Freebase, you can find Wikipedia articles which were
split into separate concepts after import.

Tom

p.s. As far as computational complexity goes, none of the things being
discussed strike me as being computationally infeasible unless you are
on a very, very limited budget.

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Linking to the Norwegian Company Registry

2012-02-16 Thread Tom Morris

On Thu, Feb 16, 2012 at 4:11 AM, Martin Giese marti...@ifi.uio.no wrote:
 Norway has the additional advantage of being so small that only a few
 hundred companies, organizations, and institutions are present on
 Wikipedia.  (At least with an org. nr.  For the rest, we have little
 hope of linking them up)

Wikipedia categories tend to be pretty noisy (at least in English
Wikipedia), but if you can get a reasonable set of candidates, the
list sounds like it would be small enough that you could use Google
Refine and the OpenCorporates reconciliation service for Refine to
discover any missing registry numbers.  Alternatively, you could use
the DERI RDF extension and your triplified registry database to do the
reconciliation directly against that (I think.  I haven't done this
personally).  The RDF extension can also be used to generate RDF if
you wanted to produce your owl:sameAs triples that way, although
that's probably not a good workflow for anything other than a one-time
effort.

Tom

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Linking to the Norwegian Company Registry

2012-02-15 Thread Tom Morris

On Wed, Feb 15, 2012 at 9:39 AM, Martin Giese marti...@ifi.uio.no wrote:

 in general, what is the process to get new extraction functionality
 added to dbpedia?  Do I submit a feature request and wait and hope, or
 should I write code and propose it for inclusion in the extraction
 framework?

 More concretely:  we are in the process of publishing lots of
 interesting information about Norwegian companies and organisations,
 taken from the official national registry
 (http://en.wikipedia.org/wiki/Br%C3%B8nn%C3%B8ysund_Register_Centre), as
 linked data.  This will be RDF data based on daily dumps of the
 registry.  We will obviously try to link our data to dbpedia, but we
 think it would be cool to have dbpedia link to up-to-date information
 from the company register.

 In many cases, this shouldn't be too hard: Norwegian organisations are
 identified by a unique 9 digit organisation number, which will be part
 of the URIs of organisations.  These organisation numbers are stated on
 many Norwegian wikipedia pages, either in an infobox as Org. nummer 123
 456 789 (see e.g.
 http://no.wikipedia.org/wiki/Br%C3%B8nn%C3%B8ysundregistrene), or as
 part of an external reference to Nøkkelopplysninger fra
 Enhetsregisteret (see e.g. ref 3 on the same page).  In the latter
 case, the link goes to an info page created by the company registry,
 based on an organisation number in the URI.

I think the semantics of an external reference are much too loose to
do anything useful with.  Using infobox data is more reliable, but you
should still be cautious about cases where the infobox doesn't refer
to the same entity as the page that hosts it does.  To make up an
example, where a person who's the managing director of a company has
an infobox for the company on his page.  It may sound unlikely, but
Wikipedia is riddled with stuff like this.  Failing to identify cases
where this happens will cause bad assertions to be made which will
ripple downstream to cause contradictory inferences.

Tom

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] DBpedia ontology

2011-12-27 Thread Tom Morris

On Mon, Dec 26, 2011 at 7:26 PM, Patrick Cassidy p...@micra.com wrote:
 I have looked briefly at the DBpedia ontology and it appears to leave a
 great deal to be desired in terms of what an ontology is best suited for: to
 carefully and precisely define the meanings of terms so that they can be
 automatically reasoned with by a computer, to accomplish useful tasks.  I
 will be willing to spend some time to reorganize the ontology to make it
 more logically coherent, if (1) there are any others who are interested in
 making the ontology more sound and (2) if there is a process by which that
 can be done without a very long drawn-out debate.

 I think that the general notion of formalizing the content of the WikiPedia
 a a great idea, but to be useful it has to be done carefully.  It is very
 easy, even for those with experience, to put logically inconsistent
 assertions into an ontology, and even easier to put in elements that are so
 underspecified that they are ambiguous to the point of being essentially
 useless for automated reasoning.  The OWL reasoner can catch some things,
 but it is very limited, and unless a first-order reasoner is used one needs
 to be exceedingly careful about how one defines the relations.

You could create an ontology as it should be or you can use an
ontology which matches the practices and conventions used by the
Wikipedia editors.  The latter is going to be messy in many ways, but
at least it'll have a large quantity of data to work with.  Getting
any use out of the former would require you convincing all Wikipedians
to adhere to your strict conventions, which seems unlikely to me.

Another way to approach this would be the MCC/CYC approach.  It'll
take billions of dollars and you'll need to wait many decades for them
to finish, but at the end of it all I'm sure you'd have a perfectly
consistent knowledge base.

Tom

--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create 
new or port existing apps to sell to consumers worldwide. Explore the 
Intel AppUpSM program developer opportunity. appdeveloper.intel.com/join
http://p.sf.net/sfu/intel-appdev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Dbpedia growth trends

2011-06-30 Thread Tom Morris

2011/6/30 Luis Galárraga shamant...@gmail.com:
 Thank you very much for your prompt response and your help. I do not have
 too much time for the presentation so I have used the information you have
 provided plus the dates of release in a simple OpenOffice Calc chart which I
 am sharing now.

Didn't you connect those data points with Bezier curves?  It makes the
two Q1 2010 points (which are almost? directly over each other) look
like they're going back in time.  Straight lines would probably be
more appropriate...

Tom

--
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security 
threats, fraudulent activity, and more. Splunk takes this data and makes 
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] URL encoding of '(', ')' and '#'

2011-05-12 Thread Tom Morris

I agree it'd be nice to fix the encoding of parentheses, but with regard to #:

 A similar thing is also the case for URLs that include an #
 e.g.
  http://dbpedia.org/resource/Midfielder%23Winger is an own Resource
  http://dbpedia.org/resource/Midfielder#Winger returns the contents
 for Midfielder

The number sign is the URI fragment separator and most clients will
only send the left-hand portion to the server (certainly that's true
for browsers).  The reason you're seeing the content for Midfielder is
because that's what you requested.  The #Winger part never makes it
to the server.

Tom

--
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] dbpedia data problems - old movie data pages almost empty

2011-01-31 Thread Tom Morris

On Mon, Jan 31, 2011 at 11:36 AM, Dan Brickley dan...@danbri.org wrote:
Hi folks

I have just taken the list of old Archive.org-hosted movies in
http://tech.blorge.com/Structure:%20/2010/08/11/top-40-best-free-legal-movies-you-can-download-right-now/
and linked them by hand to DBpedia. In doing so I noticed that many
of the DBpedia pages were almost empty, even while the main Wikipedia
page seemed informative and contained structured info panels.

My working file is at
http://buttons.notube.tv/moredata/archive.org/films/_titles.tab.txt

example: http://dbpedia.org/page/A_Star_Is_Born_(1937_film)

...hmm it doesn't even have a link back to wikipedia, just Yago categories.

It looks like it does to me. When I click on the sameAs link at the
bottom of the rendered HTML page, it jumps straight to Wikipedia.

BTW, you can you get more structured info including Academy Award
info, Netflix links, etc at

http://rdf.freebase.com/rdf/wikipedia.en.A_Star_Is_Born_$00281937_film$0029/

but, unfortunately, it's not very easy to use the schema.

Tom

Compare: http://en.wikipedia.org/wiki/A_Star_Is_Born_(1937_film)

See the .txt file above for more examples. They don't all fail, but in
general the level of data they contain is suprisingly poor.

Thanks for any tips,

cheers,

Dan

ps. random aside: I'm using this bookmarklet to hop from wiki to db
... sharing in case useful
javascript:location.href=location.href.replace(/en.wikipedia.org\/wiki/,%22dbpedia.org\/page%22)

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires
February 28th, so secure your free ArcSight Logger TODAY!
http://p.sf.net/sfu/arcsight-sfd2d
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] dbpedia data problems - old movie data pages almost empty

2011-01-31 Thread Tom Morris

On Mon, Jan 31, 2011 at 11:36 AM, Dan Brickley dan...@danbri.org wrote:

 I have just taken the list of old Archive.org-hosted movies in
 http://tech.blorge.com/Structure:%20/2010/08/11/top-40-best-free-legal-movies-you-can-download-right-now/
  and linked them by hand to DBpedia.
...
 My working file is at
 http://buttons.notube.tv/moredata/archive.org/films/_titles.tab.txt

p.s. Not to be controversial or anything, but that file basically
seems to be a tabified version of Sean Aune's blog post, which I
assume is copyrighted by him.  What license/permissions are associated
with your file?

Tom

--
Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)!
Finally, a world-class log management solution at an even better price-free!
Download using promo code Free_Logger_4_Dev2Dev. Offer expires 
February 28th, so secure your free ArcSight Logger TODAY! 
http://p.sf.net/sfu/arcsight-sfd2d
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Companies

2010-11-11 Thread Tom Morris

You could use the Freebase data dumps to narrow down what you're
looking for and then go to DBpedia for any missing information.
They're down weekly and include both DBpedia IDs as well as the
original Wikipedia article number, so you can easily link to either.
http://wiki.freebase.com/wiki/Data_dumps

Anything with the type /business/business_operation should be a
company, division, subsidiary, etc.

You can use the data for anything you want as long as you provide attribution.

Tom

On Thu, Nov 11, 2010 at 6:58 AM, Robert Campbell rrc...@gmail.com wrote:
 Thanks Ed. Is there any way to do all of this offline? I assume since
 dbpedia provides datasets for download, I should be able to have an
 offline RDF database containing everything I need. I'm guessing the
 lookup service is online only, but I could try to find alternatives
 for that piece.


 On Thu, Nov 11, 2010 at 12:55 PM, Ed Summers e...@pobox.com wrote:
 On Thu, Nov 11, 2010 at 6:32 AM, Robert Campbell rrc...@gmail.com wrote:
 In summary: what's the best way to translate a company name to a
 dbpedia resource and what dataset actually contains the information
 shown in that URL for company resources?

 Did you run across http://lookup.dbpedia.org yet? It's ranking is
 quite good, and it has a nice xml webservice, e.g.

 http://lookup.dbpedia.org/api/search.asmx/KeywordSearch?QueryString=IBMQueryClass=StringMaxHits=10

 Once you get the URL for the resource you want, you can resolve it and
 dig into the RDF to see what's there. There is also the SPARQL
 endpoint too [1] for when you get familiar with the RDF data that's in
 dbpedia.

 //Ed


 --
 Centralized Desktop Delivery: Dell and VMware Reference Architecture
 Simplifying enterprise desktop deployment and management using
 Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
 client virtualization framework. Read more!
 http://p.sf.net/sfu/dell-eql-dev2dev
 ___
 Dbpedia-discussion mailing list
 Dbpedia-discussion@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion


--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Companies

2010-11-11 Thread Tom Morris

On Thu, Nov 11, 2010 at 12:32 PM, Kingsley Idehen
kide...@openlinksw.com wrote:
 On 11/11/10 11:55 AM, Tom Morris wrote:
 You could use the Freebase data dumps to narrow down what you're
 looking for and then go to DBpedia for any missing information.
 They're down weekly and include both DBpedia IDs as well as the
 original Wikipedia article number, so you can easily link to either.
 http://wiki.freebase.com/wiki/Data_dumps

 Anything with the type /business/business_operation should be a
 company, division, subsidiary, etc.

 You can use the data for anything you want as long as you provide 
 attribution.

 Tom,

 Where are the RDF format dumps from Freebase?

As far as I know they provide an RDF end point, but not RDF dumps.
The wiki page that I linked to has detailed information on the dump
formats (quads similar to N3 triples and a lightly processed Wikipedia
dump format called WEX).

 If they aren't delivering
 RDF dumps, of what value are these dumps to someone working with
 Linked Data?

I've actually heard some people preach that linked data isn't solely
about RDF.  In this particular case the request was for _local data_
about _companies_.  There was no linked or Linked or RDF aspect to it.

Don't get me wrong.  I think Freebase RDF dumps would be nice.  It
just doesn't have anything whatsoever to do with what the user asked.

As long as we're talking about standards though, both DBpedia and
Freebase use largely private schemas/vocabularies, so even if you were
to get things in RDF you would still have a ton of non-standard stuff
to deal with.

Tom

--
Centralized Desktop Delivery: Dell and VMware Reference Architecture
Simplifying enterprise desktop deployment and management using
Dell EqualLogic storage and VMware View: A highly scalable, end-to-end
client virtualization framework. Read more!
http://p.sf.net/sfu/dell-eql-dev2dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] interrogating freebase and dbpedia from the same query

2010-06-21 Thread Tom Morris

On Mon, Jun 21, 2010 at 2:53 PM, Paul Houle p...@ontology2.com wrote:
 Benjamin Good wrote:
 Cassio,

  The short answer to your question (as I understood it) is that you could 
 not issue such a query to the dbpedia sparql endpoint by itself.  Somehow 
 you would need to get access to an endpoint that contains both the freebase 
 data as RDF and the mappings that Paul discusses here in order to run your 
 query.

  Please correct me if I am wrong!

 -Ben

    Ok,  actually it is a bit more complex thant this.  Both my and the
 mapping files contain fbase URIs like

 http://rdf.freebase.com/ns/guid.9202a8c04000641f80ac9819

    If you go to ~that~ URL you get redirected to

 http://www.freebase.com/view/en/uab_hospital

    Now,  if you follow a few links you'll eventually find

 http://rdf.freebase.com/rdf/en.uab_hospital

    which contains the facts that you probably want about this subject
 in NT format.

A more direct way to do this is just ask for RDF instead of HTML with
your first request:

curl -L -H 'Accept: application/rdf+xml'
http://rdf.freebase.com/ns/guid.9202a8c04000641f80ac9819

which will redirect you to
http://rdf.freebase.com/rdf/guid.9202a8c04000641f80ac9819

Beware though that there's currently a bug in Freebase which truncates
the results to the first 100 triples.  For the vast majority of the
topics this isn't an issue, but if you receive exactly 100 triples,
you should probably assume that you don't have them all.

Tom

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Problem with SPARQL query returning 0 results

2010-05-04 Thread Tom Morris

2010/5/4 Benjamin Großmann benja...@neofonie.de:

 The property for abstract has been changed since DBPedia Version 3.5:
 Instead of dbpedia2:abstract you have to query for dbo:abstract now. Then 
 your query works.

Do properties get deprecated for some period of time before they go
away?  Are there tools to help application writers find their uses of
deprecated properties?  More generally, what strategies can
application writers use to keep their applications from breaking when
new versions are released?

Tom

--
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] DBpedia 3.5.1?

2010-04-28 Thread Tom Morris

On Thu, Apr 29, 2010 at 12:24 AM, Kingsley Idehen
kide...@openlinksw.com wrote:
 Tom Morris wrote:

 On Wed, Apr 28, 2010 at 6:28 PM, Kingsley Idehen kide...@openlinksw.com
 wrote:
 We are syncing Live with Wikipedia now that 3.5.1 cut is out.


 Do those announcements not get posted here?  Where do they get posted?

 We did announce last year. Anyway, until now there have been issues with
 extraction (the evolution from the old scheme to the new also affected the
 DBpedia-Live effort).

 We will make an announcement in the coming days :-)


Time warp or are we talking about two different things?

I found the DBpedia 3.5.1 announcement here (only):
http://blog.dbpedia.org/2010/04/28/dbpedia-351-released/

I don't know if it'll get posted to the mailing list eventually, but
the announcement says that it's a bug fix release based on March 2010
Wikpedia dumps and includes:

The new release provides the following improvements and changes
compared to the DBpedia 3.5 release:

   1. Some abstracts contained unwanted WikiText markup. The detection
of infoboxes and tables has been improved, so that even most pages
with syntax errors have clean abstracts now.
   2. In 3.5 there has been an issue detecting interlanguage links,
which led to some non-english statements having the wrong subject.
This has been fixed.
   3. Image references to dummy images (e.g.
http://en.wikipedia.org/wiki/Image:Replace_this_image.svg) have been
removed.
   4. DBpedia 3.5.1 uses a stricter IRI validation now. Care has been
taken to only discard URIs from Wikipedia, which are clearly invalid.
   5. Recognition of disambiguation pages has been improved,
increasing the size from 247,000 to 769,000 triples.
   6. More geographic coordinates are extracted now, increasing its
number from 1,200,000 to 1,500,000 in the english version.
   7. For this release, all Freebase links have been regenerated from
the most recent freebase dump.

--
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Links to Geonames

2010-04-20 Thread Tom Morris

On Tue, Apr 20, 2010 at 1:45 AM, Jens Lehmann
lehm...@informatik.uni-leipzig.de wrote:

 For some link datasets in DBpedia, there is no proper update mechanism
 included in the DBpedia SVN repository. In such cases, the link data
 sets are copied from the previous release. For Geonames, this means that
 the links you see were not recently updated (and can be as old as one or
 two years).

Is there a list someplace of who is responsible for each of these link
sets and when they were last updated?

I think I remember reading somewhere that the Freebase links were in a
similar situation.  Also, the last time the links were done they were
made to the GUID form of the Freebase identifier, which I'm not sure
is the best target (conversely, Freebase generates DBpedia for *every*
Wikipedia article name, including redirects and misspellings, which
doesn't seem right either).

Tom

--
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] Links to Geonames

2010-04-20 Thread Tom Morris

On Tue, Apr 20, 2010 at 3:22 PM, Jens Lehmann
lehm...@informatik.uni-leipzig.de wrote:

 Hello,

 Tom Morris wrote:
 On Tue, Apr 20, 2010 at 1:45 AM, Jens Lehmann
 lehm...@informatik.uni-leipzig.de wrote:

 For some link datasets in DBpedia, there is no proper update mechanism
 included in the DBpedia SVN repository. In such cases, the link data
 sets are copied from the previous release. For Geonames, this means that
 the links you see were not recently updated (and can be as old as one or
 two years).

 Is there a list someplace of who is responsible for each of these link
 sets and when they were last updated?

 If you go to the download page and click on a data set, you get some
 information (or scroll to the bottom of the page):
 http://wiki.dbpedia.org/Downloads35

Thanks. I'd seen that.  I was hoping for something more along the
lines of an email address or a person's name.
The entries in question are:

Links to Freebase - Links between DBpedia and Freebase. Update
mechanism: unclear/copy over from previous release.

Links to Geonames - Links between geographic places in DBpedia and
data about them in the Geonames database. Provided by the Geonames
people. Update mechanism: unclear/copy over from previous release.

 Please note that within the last year the extraction framework was
 rewritten and the live extraction was implemented. It's difficult to
 improve all aspects of DBpedia within a short timeframe and most
 interlinking data sets were never designed for long term maintenance,
 but rather one time efforts. (Anyone is invited to contribute mapping
 code to DBpedia, of course, to improve the situation.)

I'm willing to help out with that, but it would seem like the people
who did the original mappings are likely to have knowledge, and
perhaps even code, from their previous efforts which would be highly
applicable to the task.  Is all that knowledge really lost forever?

Tom

--
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

[Dbpedia-discussion] Large companies all have exactly 151, 000 employees?

2010-04-16 Thread Tom Morris

This example query from the DBpedia page returns a list of companies
which all have exactly 151,000 employees:

http://dbpedia.org/snorql/?query=SELECT+%3Fsubject+%3Femployees+%3Fhomepage+WHERE+{%0D%0A%3Fsubject+rdf:type+%3Chttp://dbpedia.org/class/yago/Company108058098%3E.%0D%0A%3Fsubject+dbpedia2:numEmployees+%3Femployees%0D%0AFILTER+(xsd:integer(%3Femployees)+%3E%3D+5).%0D%0A%3Fsubject+foaf:homepage+%3Fhomepage.%0D%0A}+ORDER+BY+DESC(xsd:integer(%3Femployees))%0D%0ALIMIT+20%0D%0A

That seems a rather improbable result.  I'm not a real SPARQL guru,
but I don't see anything obviously wrong with the query.  Is the query
incorrect or is the issue with the database or the server?

Tom

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] [Data-modeling] English Words

2009-08-18 Thread Tom Morris

[I left dbpedia-discuss on the distribution, but I'm not sure why they
got tacked on at the very end of the conversation.  They'll probably
need to go check the data-modeling archives to get caught up on what
the conversation was about.]

On Tue, Aug 18, 2009 at 10:01 AM, Paul Houlep...@ontology2.com wrote:
 Iain Sproat wrote:
 Are we agreed that a freebase topic is a symset (and vice versa)?

    Here's a better example of a topic that has two meanings,

 http://www.freebase.com/view/en/oxygen

 or

 http://en.wikipedia.org/wiki/Oxygen

    Wikipedia goes right out and says it...This article is about the
 chemical element and its most stable form, O_2 or dioxygen. For other
 forms of this element, see Allotropes of Oxygen.

I think Iain was referring to the defined intent of a Freebase topic
and a synset.  The example that you've identified represents a bug
where the difference between the definition of a Wikipedia article
(whatever the editors wants it to be) and a Freebase topic (a single
concept) hasn't yet been cleaned up.

    This is annoying because you can't make entirely truthful statements
 about Oxygen if you conflate the element and the diatomic gas.

Wikipedia is what it is, so it's really up to the users of structured
data to decide how much effort they want to put into tidying things
up.  Unfortunately, there are lots of folks who would be happy to make
use of a well structured data set when it's done, but not a lot of
folks (so far?) who are interested in contributing time and effort to
making it better structured.  Unless someone figures out how to break
the circle, we'll all be left bemoaning the inadequate and annoying
state of the data.

 It seems to me that the problem is tractable,
 but people have stopped short of the work it takes to do it:

So did you click the little split flag on the edit page?  Better
yet, did you use one of the split tools to cleave the properties into
two sets and distribute them among the appropriate topics for the gas
and the chemical element?

I agree that people need to deal with this, but people includes
everyone with a vested interest.  If you need better data for your
app, that includes you.

Tom

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

Re: [Dbpedia-discussion] The Lovable ListOf

2009-08-03 Thread Tom Morris

On Mon, Aug 3, 2009 at 12:55 PM, Paul Houlep...@ontology2.com wrote:

  For what it's worth,  metaweb seems to largely remove ListOf pages
 when adding wikipedia resources to Freebase.

Actually they get added to Freebase, but usually without a type
(unless they accidently get mistyped as a Person or something).  When
users come across them they flag them for deletion, the proposal gets
voted on, then the Metaweb machinery kicks in and acts on the outcome
of the vote.

There's a ton of this stuff in Wikipedia.  Some other examples of
patterns include:

Discography of
Filmography of
National Register of Historic Places in
History of
Telecommunications [or any other topic] in geography
History of Telecommunications in

and on and on and ...

Tom

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Dbpedia-discussion mailing list
Dbpedia-discussion@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dbpedia-discussion

66 matches

Mail list logo