Re: DBpedia-based entity recognition service / tool?
Hi Matthias, We worked on something similar: entity type discovery using linked open data. Our project was given a corpus of documents in the same domain, identify specific entity types in the documents. Our objective was to search for documents in a corpus by specific entities. For example: find articles that are about RDBMs Standard NER tools identify high level types such as persons, organization, places because they have been previously trained on general corpora. I assume tools like OpenCalais have been trained on news-like documents and Zemanta has been trained on blog-like documents. We were interested in identifying specific types such a RDBMS when the word Oracle would show up in the text. In order to do that, we followed several domain term extraction techniques. We used LOD, specifically DBpedia, Freebase and Opencyc to disambiguate terms and also retrieve the entities. Honestly, evaluation is pretty hard to do, but our current implementation was not that bad (75% precision and 55% recall). We built upon some work by IBM where they create a vocabulary from text using LOD [1] Let me see if I can clean up the code and publish it as a service. [1] http://data.semanticweb.org/conference/iswc/2009/paper/inuse/143/html Juan Sequeda (575) SEQ-UEDA www.juansequeda.com On Tue, Feb 2, 2010 at 6:26 AM, Matthias Samwald samw...@gmx.at wrote: Dear LOD community, I would be glad to hear your advice on how to best accomplish a simple task: extracting DBpedia entities (identified with DBpedia URIs) from a string of text. With good accuracy and recall, possibly with some options to constraint the recognized entities to some subset of DBpedia, based on categories. The tool or service should be performant enough to process large numbers of strings in a reasonable amount of time. Given the prolific creation of tiny tools and services in this community I am puzzled about my inability to find anything that accomplishes this task. Could you point me to something like that? Are there tools/services for Wikipedia that I could use? Zemanta seems to be too much geared towards 'enhanced blogging', while OpenCalais does not return Wikipedia/DBpedia identifiers. Please correct me if I am wrong. Cheers, Matthias
Announcing Virtuoso Open-Source Edition v 6.1.0
Hi, OpenLink Software is pleased to announce the official release of Virtuoso Open-Source Edition, Version 6.1.0: ***IMPORTANT NOTE*** for up-graders from pre-6.x versions: The database file format has substantially changed between VOS 5.x and VOS 6.x. To upgrade your database, you must dump all data from the VOS 5.x database and re-load it into VOS 6.x. Complete instructions may be found here. ***IMPORTANT NOTE*** for up-graders from earlier 6.x versions: The database file format has not changed, but the introduction of a newer RDF index requires you run a script to upgrade the RDF_QUAD table. Since this can be a lengthy task and take extra disk space (up to twice the space used by the original RDF_QUAD table may be required during conversion) this is not done automatically on startup. Complete instructions may be found here. New and updated product features include: * Database engine - Added new 2+3 index scheme for RDF_QUAD table - Added new inlined string table for RDF_QUAD - Added optimizations to cost based optimizer - Added RoundRobin connection support - Removed deprecated samples/demos - Fixed align buffer to sizeof pointer to avoid crash on strict checking platforms like sparc - Fixed text of version mismatch messages - Fixed issue with XA exception, double rollback, transact timeout - Merged enhancements and fixes from V5 branch * SPARQL and RDF - Added support for owl:inverseOf, owl:SymmetricProperty, and owl:TransitiveProperty. - Added DB.DBA.BEST_LANGMATCH() and bif_langmatches_pct_http() - Added initial support for SPARQL-FED - Added initial support for SERVICE { ... }; - Added support for expressions in LIMIT and OFFSET clauses - Added built-in predicate IsRef() - Added new error reporting for unsupported syntax - Added rdf box id only serialization; stays compatible with 5/6 - Added support for SPARQL INSERT DATA / DELETE DATA - Added SPARQL 1.1 syntax sugar re. HAVING CLAUSE for filtering on GROUP BY - Added special code generator for optimized handling of: SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } - Added support for HTML+RDFa representation re. output from SPARQL CONSTRUCT and DESCRIBE queries - Added support for output:maxrows - Improved SPARQL parsing and SQL codegen for negative numbers - Improved recovery of lists in DB.DBA.RDF_AUDIT_METADATA() - Fixed iSPARQL compatibility with 3rd party SPARQL endpoints - Fixed bad init in trans node if multiple inputs or step output values - Fixed redundant trailing '' in results of TTL load when IRIs contain special chars - Fixed problem with rfc1808_expand_uri not using proper macros and allocate byte extra for strings - Fixed when different TZ is used, find offset and transform via GMT - Fixed graph-level security in cluster - Fixed redundant equalities in case of multiple OPTIONALs with same variable - Fixed BOOLEAN_OF_OBJ in case of incomplete boxes - Fixed NTRIPLES serialization of triples - Merged enhancements and fixes from V5 branch * Sponger Middleware - Added Extractor Cartridges mapping Zillow, O'Reilly, Amazon, Googlebase, BestBuy, CNET, and Crunchbase content to the GoodRelations Ontology. - Added Extractor Cartridges for Google Spreadsheet, Google Documents, Microsoft Office Docs (Excel, PowerPoint etc), OpenOffice, CSV, Text files, Disqus, Twitter, and Discogs. - Added Meta Cartridges covering Google Search, Yahoo! Boss, Bing, Sindice, Yelp, NYT, NPR, AlchemyAPI, Zemanta, OpenCalais, UMBEL, GetGlue, Geonames, DBpedia, Linked Open Data Cloud, BBC Linked Data Space, sameAs.org, whoisi, uclassify, RapLeaf, Journalisted, Dapper, Revyu, Zillow, BestBuy, Amazon, eBay, CNET, Discogs, and Crunchbase. * ODS Applications - Added support for ckeditor - Added new popup calendar based on OAT - Added REST and Virtuoso PL based Controllers for user API - Added new API functions - Added FOAF+SSL groups - Added feed admin rights - Added Facebook registration and login - Removed deprecated rte and kupu editors - Removed support for IE 5 and 6 compatibility - Merged enhancements and fixes from V5 branch Other links: Virtuoso Open Source Edition: * Home Page: http://virtuoso.openlinksw.com/wiki/main/ * Download Page: http://virtuoso.openlinksw.com/wiki/main/Main/VOSDownload OpenLink Data Spaces: * Home Page: http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex * SPARQL Usage Examples (re. SIOC, FOAF, AtomOWL, SKOS): http://virtuoso.openlinksw.com/wiki/main/Main/ODSSIOCRef Best Regards Hugh Williams Professional Services OpenLink Software Web: http://www.openlinksw.com Support: http://support.openlinksw.com Forums: http://boards.openlinksw.com/support Twitter: http://twitter.com/OpenLink
Re: DBpedia-based entity recognition service / tool?
Juan Sequeda wrote: we followed several domain term extraction techniques. any chance you could name drop / point to a few of the techniques - very interested in this myself and in all honesty, no idea where to start (other than a crude string split and check word combinations against a dictionary - not very practical!) Many Regards, Nathan
Announcing Virtuoso Open-Source Edition v 5.0.13
Hi OpenLink Software is pleased to announce a new release of Virtuoso, Open-Source Edition, version 5.0.13. This version includes: * Database engine - Added configuration option BuffersAllocation - Added configuration option AsyncQueueMaxThreads - Added docbook-xsl-1.75.2 - Added RoundRobin connection support - Removed deprecated samples/demos - Fixed copyright and license clarification - Fixed use MD5 from OpenSSL when possible - Fixed issue with XA exception, double rollback, transact timeout - Fixed issue reading last chunk in http session - Fixed use pipeline client in crawler - Fixed accept different headers in pipeline request - Fixed do not post when no post parameters - Fixed checkpoint messages in log - Fixed read after allocated memory - Fixed shortened long URLs in the crawlers view to avoid UI breakage - Fixed building with external zlib - Removed support for deprecated JDK 1.0, 1.1 and 1.2 - Rebuilt JDBC drivers * SPARQL and RDF - Added initial support for SPARQL-FED - Added initial support for SERVICE { ... }; - Added support for expressions in LIMIT and OFFSET clauses - Added built-in predicate IsRef() - Added new error reporting for unsupported syntax - Added rdf box id only serialization; stays compatible with 5/6 - Added support for SPARQL INSERT DATA / DELETE DATA - Added support for HAVING in sparql - Added special optimizations for handling: SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } - Added support for HTML+RDFa representation re. SPARQL CONSTRUCT and DESCRIBE query results - Added support for output:maxrows - Updated ontologies API - Updated iSPARQL application - Fixed IRI parts syntax to match SPARQL 1.0 W3C recommendation - Fixed support for XMLLiteral - Fixed bad box flags for strings for bnodes and types - Fixed replace lost filters with equivs that have no spog vars and no good subequivs. - Fixed cnet doublt awol:content - Fixed Googlebase query results with multiple entries - Fixed Googlebase location info - Fixed default sitemap crawling functions/pages - Fixed use SPARUL LOAD instead of SOFT - Fixed make sure version is intact as changes to .ttl file must reflect in sparql.sql - Fixed missing qualification of aggregate - Fixed compilation of ORDER BY column_idz clause in iterator of sponge with loop - Fixed UNION of SELECTs and for multiple OPTIONALs at one level with good and bad equalities - Fixed support for define output:format JSON - Fixed crash of rfc1808_expand_uri on base without schema - Fixed redundant trailing '' in results of TTL load when IRIs contain special chars - Fixed option (score ...) in a gp with multiple OPTIONAL {...} - Fixed when different TZ is used, must find offset and transform via GMT - Fixed SPARQL parsing and SQL codegen for negative numbers - Fixed some 'exotic' cases of NT outputs * ODS Applications - Added support for ckeditor - Added new popup calendar based on OAT - Added VSP and REST implementation for user API - Added new API functions - Added FOAF+SSL groups - Added feed admin rights - Added Facebook registration and login - Removed support for Kupu editor - Removed support for rte editor - Removed support for IE 5 and 6 compatibility - Fixed users paths to physical location - Fixed problem with activity pages Other links: Virtuoso Open Source Edition: * Home Page: http://virtuoso.openlinksw.com/wiki/main/ * Download Page: http://virtuoso.openlinksw.com/wiki/main/Main/VOSDownload OpenLink Data Spaces: * Home Page: http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex * SPARQL Usage Examples (re. SIOC, FOAF, AtomOWL, SKOS): http://virtuoso.openlinksw.com/wiki/main/Main/ODSSIOCRef Best Regards Hugh Williams Professional Services OpenLink Software Web: http://www.openlinksw.com Support: http://support.openlinksw.com Forums: http://boards.openlinksw.com/support Twitter: http://twitter.com/OpenLink
Re: [Virtuoso-users] Announcing Virtuoso Open-Source Edition v 6.1.0
Wow - that's a nice release! Did I read it correctly; all the cartridges for sponger are now in v6.1??? Also didn't note mention of GEO extension, is that for Virtuoso Commercial only at this time? Many Regards Congrats, Nathan Hugh Williams wrote: Hi, OpenLink Software is pleased to announce the official release of Virtuoso Open-Source Edition, Version 6.1.0: ***IMPORTANT NOTE*** for up-graders from pre-6.x versions: The database file format has substantially changed between VOS 5.x and VOS 6.x. To upgrade your database, you must dump all data from the VOS 5.x database and re-load it into VOS 6.x. Complete instructions may be found here. ***IMPORTANT NOTE*** for up-graders from earlier 6.x versions: The database file format has not changed, but the introduction of a newer RDF index requires you run a script to upgrade the RDF_QUAD table. Since this can be a lengthy task and take extra disk space (up to twice the space used by the original RDF_QUAD table may be required during conversion) this is not done automatically on startup. Complete instructions may be found here. New and updated product features include: * Database engine - Added new 2+3 index scheme for RDF_QUAD table - Added new inlined string table for RDF_QUAD - Added optimizations to cost based optimizer - Added RoundRobin connection support - Removed deprecated samples/demos - Fixed align buffer to sizeof pointer to avoid crash on strict checking platforms like sparc - Fixed text of version mismatch messages - Fixed issue with XA exception, double rollback, transact timeout - Merged enhancements and fixes from V5 branch * SPARQL and RDF - Added support for owl:inverseOf, owl:SymmetricProperty, and owl:TransitiveProperty. - Added DB.DBA.BEST_LANGMATCH() and bif_langmatches_pct_http() - Added initial support for SPARQL-FED - Added initial support for SERVICE { ... }; - Added support for expressions in LIMIT and OFFSET clauses - Added built-in predicate IsRef() - Added new error reporting for unsupported syntax - Added rdf box id only serialization; stays compatible with 5/6 - Added support for SPARQL INSERT DATA / DELETE DATA - Added SPARQL 1.1 syntax sugar re. HAVING CLAUSE for filtering on GROUP BY - Added special code generator for optimized handling of: SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } - Added support for HTML+RDFa representation re. output from SPARQL CONSTRUCT and DESCRIBE queries - Added support for output:maxrows - Improved SPARQL parsing and SQL codegen for negative numbers - Improved recovery of lists in DB.DBA.RDF_AUDIT_METADATA() - Fixed iSPARQL compatibility with 3rd party SPARQL endpoints - Fixed bad init in trans node if multiple inputs or step output values - Fixed redundant trailing '' in results of TTL load when IRIs contain special chars - Fixed problem with rfc1808_expand_uri not using proper macros and allocate byte extra for strings - Fixed when different TZ is used, find offset and transform via GMT - Fixed graph-level security in cluster - Fixed redundant equalities in case of multiple OPTIONALs with same variable - Fixed BOOLEAN_OF_OBJ in case of incomplete boxes - Fixed NTRIPLES serialization of triples - Merged enhancements and fixes from V5 branch * Sponger Middleware - Added Extractor Cartridges mapping Zillow, O'Reilly, Amazon, Googlebase, BestBuy, CNET, and Crunchbase content to the GoodRelations Ontology. - Added Extractor Cartridges for Google Spreadsheet, Google Documents, Microsoft Office Docs (Excel, PowerPoint etc), OpenOffice, CSV, Text files, Disqus, Twitter, and Discogs. - Added Meta Cartridges covering Google Search, Yahoo! Boss, Bing, Sindice, Yelp, NYT, NPR, AlchemyAPI, Zemanta, OpenCalais, UMBEL, GetGlue, Geonames, DBpedia, Linked Open Data Cloud, BBC Linked Data Space, sameAs.org, whoisi, uclassify, RapLeaf, Journalisted, Dapper, Revyu, Zillow, BestBuy, Amazon, eBay, CNET, Discogs, and Crunchbase. * ODS Applications - Added support for ckeditor - Added new popup calendar based on OAT - Added REST and Virtuoso PL based Controllers for user API - Added new API functions - Added FOAF+SSL groups - Added feed admin rights - Added Facebook registration and login - Removed deprecated rte and kupu editors - Removed support for IE 5 and 6 compatibility - Merged enhancements and fixes from V5 branch Other links: Virtuoso Open Source Edition: * Home Page: http://virtuoso.openlinksw.com/wiki/main/ * Download Page: http://virtuoso.openlinksw.com/wiki/main/Main/VOSDownload OpenLink Data Spaces: * Home Page: http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex * SPARQL Usage Examples (re. SIOC, FOAF, AtomOWL, SKOS): http://virtuoso.openlinksw.com/wiki/main/Main/ODSSIOCRef Best Regards Hugh Williams Professional Services OpenLink
Re: [Virtuoso-users] Announcing Virtuoso Open-Source Edition v 6.1.0
Nathan wrote: Wow - that's a nice release! Did I read it correctly; all the cartridges for sponger are now in v6.1??? We just have new Cartridges (Extractor and Meta). The Extractors are Open Source ( versions 5 6) while the Meta Cartridges are commercial only. Also didn't note mention of GEO extension, is that for Virtuoso Commercial only at this time? Yes, re. SPARQL-GEO. Thus, the fundamental differentiators between the Open Source and Commercial Editions come down to: 1. Sponger's Meta Cartridges -- these embellish the basic Extractor Cartridge generated graphs by performing LOD Cloud (and other Linked Data Space) lookups and joins 2. SPARQL-GEO GeoSpatial indexing in general (so it applies to SQL engine also) 3. Virtual Database Functionality -- RDF Views over ODBC and JDBC accessible data sources 4. Replication 5. Clustering High Availability. Kingsley Many Regards Congrats, Nathan Hugh Williams wrote: Hi, OpenLink Software is pleased to announce the official release of Virtuoso Open-Source Edition, Version 6.1.0: ***IMPORTANT NOTE*** for up-graders from pre-6.x versions: The database file format has substantially changed between VOS 5.x and VOS 6.x. To upgrade your database, you must dump all data from the VOS 5.x database and re-load it into VOS 6.x. Complete instructions may be found here. ***IMPORTANT NOTE*** for up-graders from earlier 6.x versions: The database file format has not changed, but the introduction of a newer RDF index requires you run a script to upgrade the RDF_QUAD table. Since this can be a lengthy task and take extra disk space (up to twice the space used by the original RDF_QUAD table may be required during conversion) this is not done automatically on startup. Complete instructions may be found here. New and updated product features include: * Database engine - Added new 2+3 index scheme for RDF_QUAD table - Added new inlined string table for RDF_QUAD - Added optimizations to cost based optimizer - Added RoundRobin connection support - Removed deprecated samples/demos - Fixed align buffer to sizeof pointer to avoid crash on strict checking platforms like sparc - Fixed text of version mismatch messages - Fixed issue with XA exception, double rollback, transact timeout - Merged enhancements and fixes from V5 branch * SPARQL and RDF - Added support for owl:inverseOf, owl:SymmetricProperty, and owl:TransitiveProperty. - Added DB.DBA.BEST_LANGMATCH() and bif_langmatches_pct_http() - Added initial support for SPARQL-FED - Added initial support for SERVICE { ... }; - Added support for expressions in LIMIT and OFFSET clauses - Added built-in predicate IsRef() - Added new error reporting for unsupported syntax - Added rdf box id only serialization; stays compatible with 5/6 - Added support for SPARQL INSERT DATA / DELETE DATA - Added SPARQL 1.1 syntax sugar re. HAVING CLAUSE for filtering on GROUP BY - Added special code generator for optimized handling of: SPARQL SELECT DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } } - Added support for HTML+RDFa representation re. output from SPARQL CONSTRUCT and DESCRIBE queries - Added support for output:maxrows - Improved SPARQL parsing and SQL codegen for negative numbers - Improved recovery of lists in DB.DBA.RDF_AUDIT_METADATA() - Fixed iSPARQL compatibility with 3rd party SPARQL endpoints - Fixed bad init in trans node if multiple inputs or step output values - Fixed redundant trailing '' in results of TTL load when IRIs contain special chars - Fixed problem with rfc1808_expand_uri not using proper macros and allocate byte extra for strings - Fixed when different TZ is used, find offset and transform via GMT - Fixed graph-level security in cluster - Fixed redundant equalities in case of multiple OPTIONALs with same variable - Fixed BOOLEAN_OF_OBJ in case of incomplete boxes - Fixed NTRIPLES serialization of triples - Merged enhancements and fixes from V5 branch * Sponger Middleware - Added Extractor Cartridges mapping Zillow, O'Reilly, Amazon, Googlebase, BestBuy, CNET, and Crunchbase content to the GoodRelations Ontology. - Added Extractor Cartridges for Google Spreadsheet, Google Documents, Microsoft Office Docs (Excel, PowerPoint etc), OpenOffice, CSV, Text files, Disqus, Twitter, and Discogs. - Added Meta Cartridges covering Google Search, Yahoo! Boss, Bing, Sindice, Yelp, NYT, NPR, AlchemyAPI, Zemanta, OpenCalais, UMBEL, GetGlue, Geonames, DBpedia, Linked Open Data Cloud, BBC Linked Data Space, sameAs.org, whoisi, uclassify, RapLeaf, Journalisted, Dapper, Revyu, Zillow, BestBuy, Amazon, eBay, CNET, Discogs, and Crunchbase. * ODS Applications - Added support for ckeditor - Added new popup calendar based on OAT - Added REST and Virtuoso PL based Controllers for user API - Added new API functions - Added FOAF+SSL groups - Added feed admin rights - Added Facebook registration and login -
RE: DBpedia-based entity recognition service / tool?
Matthias, OpenCalais does have links to DBpedia URIs for large subset of entities. The DBpedia URIs are not included in OpenCalais output but in the LinkedData endpoint. For example, http://d.opencalais.com/er/geo/city/ralg-geo1/f08025f6-8e95-c3ff-2909-0a 5219ed3bfa The entities which have links to DBpedia are documented here: http://www.opencalais.com/documentation/linked-data-entities Rafi -Original Message- From: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] On Behalf Of Matthias Samwald Sent: Tuesday, February 02, 2010 2:26 PM To: public-lod@w3.org Subject: DBpedia-based entity recognition service / tool? Dear LOD community, I would be glad to hear your advice on how to best accomplish a simple task: extracting DBpedia entities (identified with DBpedia URIs) from a string of text. With good accuracy and recall, possibly with some options to constraint the recognized entities to some subset of DBpedia, based on categories. The tool or service should be performant enough to process large numbers of strings in a reasonable amount of time. Given the prolific creation of tiny tools and services in this community I am puzzled about my inability to find anything that accomplishes this task. Could you point me to something like that? Are there tools/services for Wikipedia that I could use? Zemanta seems to be too much geared towards 'enhanced blogging', while OpenCalais does not return Wikipedia/DBpedia identifiers. Please correct me if I am wrong. Cheers, Matthias This email was sent to you by Thomson Reuters, the global news and information company. Any views expressed in this message are those of the individual sender, except where the sender specifically states them to be the views of Thomson Reuters.
EKAW 2010 – Call for Workshop Proposals
Apologies for cross-postings. Please send to interested colleagues and students - *** Call for workshop proposals *** *** Knowledge Engineering and Knowledge Management EKAW - 2010 *** 11th October-15th October 2010 - Lisbon, Portugal http://ekaw2010.inesc-id.pt see also workshop call at: http://www.siegfried-handschuh.net/events/EKAW2010/ Background and Motivation --- Workshops provide members of a community a forum to discuss common interests in a focused way. If you are working in an emerging area in Knowledge Acquisition, Knowledge Engineering and/or Knowledge Management, consider organizing a workshop. They are a chance to meet mind-alike researchers and discover what others are doing. A workshop offers a good opportunity for young researchers to present their work and to obtain feedback from an interested community. Successful workshops may result in edited books or special issues in international journals. EKAW 2010 is looking for exciting proposals for half-day or full-day workshops to be held. Each workshop should generate discussions that give the EKAW community a new, organized way of thinking about the topic, or ideas that suggest promising directions for future research. A successful workshop can move the field forward and help in building community. EKAW workshops provide an informal setting where the participants have the opportunity to discuss specific technical topics in an atmosphere that fosters the active exchange of ideas. Our aim for the workshop program is to promote and collect multidisciplinary research directions that contribute to knowledge management and engineering. Members from all research areas related to knowledge are invited to submit workshop proposals. Workshop @ EKAW 2010 - The 17th International Conference on Knowledge Engineering and Knowledge Management is concerned with all aspects of eliciting, acquiring, modelling and managing knowledge, and its role in the construction of knowledge-intensive systems and services for the semantic web, knowledge management, e-business, natural language processing, intelligent information integration, etc. We seek high quality proposals for workshops about topics related to the conference. Of particular interest are proposals that address one of the following areas: 1) Knowledge Management 2) Knowledge Engineering and Acquisition 3) Knowledge In Use 4) Social and Cognitive Aspects of Knowledge Engineering 5) Special focus knowledge management and engineering by the masses * Human-machine synergy in knowledge acquisition * Incentives for knowledge creation and semantic annotation * Enhancing human productivity (e.g. knowledge workers) * Social and human factors in knowledge management * Collective and collaborative intelligence in knowledge management * Social tagging and folksonomies, social networks * Web2.0 approaches to KM (including semantic wikis, folksonomies, etc.) * Games with a Purpose and KM * Linked Open Data / Web of Data Submission Requirements -- Each workshop should have one or more organizers and an international program committee. Proposals for workshops should be a maximum of 1000 words, and should contain the following information to judge the importance, quality, and benefits for the research community: A cover page including: - Workshop title - Name, affiliation, full postal address, homepage and e-mail address for each organizer - Identification of the primary contact person(s) - One-paragraph biography for each workshop organizer Motivation and Objectives: This section is about a brief description of the workshop topic and goals, its relevance to EKAW 2010 and significance for the research field. - Motivation: What is the overall topic of the workshop? Relation to the topic of the conference, including a brief discussion of why and to whom the workshop is of interest. - A tentative list of PC members, clearly stating those that have already accepted - The tentative dates (submission, notification, camera-ready deadline, etc.) Workshop intentions and proposals should be submitted via email in PDF format to the EKAW 2010 workshop chair and selection committee: ekaw10worksh...@lists.deri.org with a subject line of: EKAW 2010 Workshop Proposal Submission Dates and Details Friday 19 March - Informal “intention to submit” email with general details Friday 2 April - Submission of proposal by email to Friday 16 April - Notification of workshop acceptance Friday 23 April - Publication of Workshop's Call for Paper Setup of workshop web site Wed 1 September - Deadline for camera-ready workshop notes and other information Organizer's Responsibilities The organizers of accepted workshops are expected to: - Define, produce and distribute the
Re: DBpedia-based entity recognition service / tool?
Nathan, On Feb 4, 2010, at 8:10, Nathan nat...@webr3.org wrote: Juan Sequeda wrote: we followed several domain term extraction techniques. any chance you could name drop / point to a few of the techniques - very interested in this myself and in all honesty, no idea where to start (other than a crude string split and check word combinations against a dictionary - not very practical!) --- http://gate.ac.uk/ Many Regards, Nathan
Re: DBpedia-based entity recognition service / tool?
On Thu, Feb 4, 2010 at 5:10 AM, Nathan nat...@webr3.org wrote: Juan Sequeda wrote: we followed several domain term extraction techniques. any chance you could name drop / point to a few of the techniques - very interested in this myself and in all honesty, no idea where to start (other than a crude string split and check word combinations against a dictionary - not very practical!) yes, that would be very naive :) Look into the Term Extraction [1] area of Information extraction. There are several techniques which can be combined including POS tagging, Phrase chunking, etc... [1] http://en.wikipedia.org/wiki/Terminology_extraction Many Regards, Nathan
Re: DBpedia-based entity recognition service / tool?
On Tue, Feb 2, 2010 at 10:21 AM, Nathan nat...@webr3.org wrote: I should probably be replying here as I've been doing this, and working on this for the past few months. I've found from experience that the only viable way to address this need is to do as follows: 1: Pass content through to both OpenCalais and Zemanta 2: Combine the results to provide a list of string terms to be associated with dbpedia resources (where zemanta hasn't already done it) 3: Lookup each string resource and try and associate it to the string 4: Return all matches with results to the end user in order for them to manually confirm the results. Steps 3 and 4 are the killers here, because no matter how could the service you can't always match to exact URIs (sometimes you can only determine that you may mean one of X many ambiguous URIs); and ... I don't understand the roundabout approach since both of these services output Freebase identifiers and they are all mapped explicitly to both DBpedia by owl:sameAs and Wikipedia via normal URL. Why not just follow the links directly? The only time this won't work is where the concept was sourced from someplace other than Wikipedia or Wikipedia article(s) were split/merged so there isn't a 1:1 correspondence. Tom
Re: DBpedia-based entity recognition service / tool?
Tom Morris wrote: On Tue, Feb 2, 2010 at 10:21 AM, Nathan nat...@webr3.org wrote: I should probably be replying here as I've been doing this, and working on this for the past few months. I've found from experience that the only viable way to address this need is to do as follows: 1: Pass content through to both OpenCalais and Zemanta 2: Combine the results to provide a list of string terms to be associated with dbpedia resources (where zemanta hasn't already done it) 3: Lookup each string resource and try and associate it to the string 4: Return all matches with results to the end user in order for them to manually confirm the results. Steps 3 and 4 are the killers here, because no matter how could the service you can't always match to exact URIs (sometimes you can only determine that you may mean one of X many ambiguous URIs); and ... I don't understand the roundabout approach since both of these services output Freebase identifiers and they are all mapped explicitly to both DBpedia by owl:sameAs and Wikipedia via normal URL. Why not just follow the links directly? The only time this won't work is where the concept was sourced from someplace other than Wikipedia or Wikipedia article(s) were split/merged so there isn't a 1:1 correspondence. Where they are available; I do - but you still get an amount of terms which are not mapped, and can be mapped by doing lookups; and where you are unsure promoting the user to do the disambiguation provides a fuller result :) also obviously as the services improve the need for lookups drops; and finally it allows for domain specific document / thing relations regards!