Re: DBpedia-based entity recognition service / tool?

2010-02-04 Thread Juan Sequeda
Hi Matthias,

We worked on something similar: entity type discovery using linked open
data.

Our project was given a corpus of documents in the same domain, identify
specific entity types in the documents. Our objective was to search for
documents in a corpus by specific entities. For example: find articles that
are about RDBMs

Standard NER tools identify high level types such as persons, organization,
places because they have been previously trained on general corpora. I
assume tools like OpenCalais have been trained on news-like documents and
Zemanta has been trained on blog-like documents.

We were interested in identifying specific types such a RDBMS when the
word Oracle would show up in the text. In order to do that, we followed
several domain term extraction techniques. We used LOD, specifically
DBpedia, Freebase and Opencyc to disambiguate terms and also retrieve the
entities. Honestly, evaluation is pretty hard to do, but our current
implementation was not that bad (75% precision and 55% recall).

We built upon some work by IBM where they create a vocabulary from text
using LOD [1]

Let me see if I can clean up the code and publish it as a service.

[1] http://data.semanticweb.org/conference/iswc/2009/paper/inuse/143/html

Juan Sequeda
(575) SEQ-UEDA
www.juansequeda.com


On Tue, Feb 2, 2010 at 6:26 AM, Matthias Samwald samw...@gmx.at wrote:

 Dear LOD community,

 I would be glad to hear your advice on how to best accomplish a simple
 task: extracting DBpedia entities (identified with DBpedia URIs) from a
 string of text. With good accuracy and recall, possibly with some options to
 constraint the recognized entities to some subset of DBpedia, based on
 categories. The tool or service should be performant enough to process large
 numbers of strings in a reasonable amount of time.
 Given the prolific creation of tiny tools and services in this community I
 am puzzled about my inability to find anything that accomplishes this task.
 Could you point me to something like that? Are there tools/services for
 Wikipedia that I could use?
 Zemanta seems to be too much geared towards 'enhanced blogging', while
 OpenCalais does not return Wikipedia/DBpedia identifiers. Please correct me
 if I am wrong.

 Cheers,
 Matthias




Announcing Virtuoso Open-Source Edition v 6.1.0

2010-02-04 Thread Hugh Williams
Hi,

OpenLink Software is pleased to announce the official release of Virtuoso 
Open-Source Edition, Version 6.1.0:

***IMPORTANT NOTE*** for up-graders from pre-6.x versions: The database file 
format has substantially changed between VOS 5.x and VOS 6.x. To upgrade your 
database, you must dump all data from the VOS 5.x database and re-load it into 
VOS 6.x. Complete instructions may be found here.

***IMPORTANT NOTE*** for up-graders from earlier 6.x versions: The database 
file format has not changed, but the introduction of a newer RDF index requires 
you run a script to upgrade the RDF_QUAD table. Since this can be a lengthy 
task and take extra disk space (up to twice the space used by the original 
RDF_QUAD table may be required during conversion) this is not done 
automatically on startup. Complete instructions may be found here.

New and updated product features include:

* Database engine
  - Added new 2+3 index scheme for RDF_QUAD table
  - Added new inlined string table for RDF_QUAD
  - Added optimizations to cost based optimizer
  - Added RoundRobin connection support
  - Removed deprecated samples/demos
  - Fixed align buffer to sizeof pointer to avoid crash on strict checking 
platforms like sparc
  - Fixed text of version mismatch messages
  - Fixed issue with XA exception, double rollback, transact timeout
  - Merged enhancements and fixes from V5 branch

* SPARQL and RDF
  - Added support for owl:inverseOf, owl:SymmetricProperty, and 
owl:TransitiveProperty.
  - Added DB.DBA.BEST_LANGMATCH() and bif_langmatches_pct_http()
  - Added initial support for SPARQL-FED
  - Added initial support for SERVICE { ... };
  - Added support for expressions in LIMIT and OFFSET clauses
  - Added built-in predicate IsRef()
  - Added new error reporting for unsupported syntax
  - Added rdf box id only serialization; stays compatible with 5/6
  - Added support for SPARQL INSERT DATA / DELETE DATA
  - Added SPARQL 1.1 syntax sugar re. HAVING CLAUSE for filtering on GROUP BY
  - Added special code generator for optimized handling of: SPARQL SELECT 
DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } }
  - Added support for HTML+RDFa representation re. output from SPARQL CONSTRUCT 
and DESCRIBE queries
  - Added support for output:maxrows
  - Improved SPARQL parsing and SQL codegen for negative numbers
  - Improved recovery of lists in DB.DBA.RDF_AUDIT_METADATA()
  - Fixed iSPARQL compatibility with 3rd party SPARQL endpoints
  - Fixed bad init in trans node if multiple inputs or step output values
  - Fixed redundant trailing '' in results of TTL load when IRIs contain 
special chars
  - Fixed problem with rfc1808_expand_uri not using proper macros and allocate 
byte extra for strings
  - Fixed when different TZ is used, find offset and transform via GMT
  - Fixed graph-level security in cluster
  - Fixed redundant equalities in case of multiple OPTIONALs with same variable
  - Fixed BOOLEAN_OF_OBJ in case of incomplete boxes
  - Fixed NTRIPLES serialization of triples
  - Merged enhancements and fixes from V5 branch

* Sponger Middleware
  - Added Extractor Cartridges mapping Zillow, O'Reilly, Amazon, Googlebase, 
BestBuy, CNET, and Crunchbase content to the GoodRelations Ontology.
  - Added Extractor Cartridges for Google Spreadsheet, Google Documents, 
Microsoft Office Docs (Excel, PowerPoint etc), OpenOffice, CSV, Text files, 
Disqus, Twitter, and Discogs.
  - Added Meta Cartridges covering Google Search, Yahoo! Boss, Bing, Sindice, 
Yelp, NYT, NPR, AlchemyAPI, Zemanta, OpenCalais, UMBEL, GetGlue, Geonames, 
DBpedia, Linked Open Data Cloud, BBC Linked Data Space, sameAs.org, whoisi, 
uclassify, RapLeaf, Journalisted, Dapper, Revyu, Zillow, BestBuy, Amazon, eBay, 
CNET, Discogs, and Crunchbase.

* ODS Applications
  - Added support for ckeditor
  - Added new popup calendar based on OAT
  - Added REST and Virtuoso PL based Controllers for user API
  - Added new API functions
  - Added FOAF+SSL groups
  - Added feed admin rights
  - Added Facebook registration and login
  - Removed deprecated rte and kupu editors
  - Removed support for IE 5 and 6 compatibility
  - Merged enhancements and fixes from V5 branch

Other links:

Virtuoso Open Source Edition:
   * Home Page: http://virtuoso.openlinksw.com/wiki/main/
   * Download Page:
  http://virtuoso.openlinksw.com/wiki/main/Main/VOSDownload

OpenLink Data Spaces:
   * Home Page: http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex
   * SPARQL Usage Examples (re. SIOC, FOAF, AtomOWL, SKOS):
  http://virtuoso.openlinksw.com/wiki/main/Main/ODSSIOCRef


Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink



Re: DBpedia-based entity recognition service / tool?

2010-02-04 Thread Nathan
Juan Sequeda wrote:
 we followed several domain term extraction techniques.

any chance you could name drop / point to a few of the techniques - very
interested in this myself and in all honesty, no idea where to start
(other than a crude string split and check word combinations against a
dictionary - not very practical!)

Many Regards,

Nathan



Announcing Virtuoso Open-Source Edition v 5.0.13

2010-02-04 Thread Hugh Williams
Hi

OpenLink Software is pleased to announce a new release of Virtuoso, Open-Source 
Edition, version 5.0.13.

This version includes:

* Database engine
  - Added configuration option BuffersAllocation
  - Added configuration option AsyncQueueMaxThreads
  - Added docbook-xsl-1.75.2
  - Added RoundRobin connection support
  - Removed deprecated samples/demos
  - Fixed copyright and license clarification
  - Fixed use MD5 from OpenSSL when possible
  - Fixed issue with XA exception, double rollback, transact timeout
  - Fixed issue reading last chunk in http session
  - Fixed use pipeline client in crawler
  - Fixed accept different headers in pipeline request
  - Fixed do not post when no post parameters
  - Fixed checkpoint messages in log
  - Fixed read after allocated memory
  - Fixed shortened long URLs in the crawlers view to avoid UI breakage
  - Fixed building with external zlib
  - Removed support for deprecated JDK 1.0, 1.1 and 1.2
  - Rebuilt JDBC drivers
 
* SPARQL and RDF
  - Added initial support for SPARQL-FED
  - Added initial support for SERVICE { ... };
  - Added support for expressions in LIMIT and OFFSET clauses
  - Added built-in predicate IsRef()
  - Added new error reporting for unsupported syntax
  - Added rdf box id only serialization; stays compatible with 5/6
  - Added support for SPARQL INSERT DATA / DELETE DATA
  - Added support for HAVING in sparql
  - Added special optimizations for handling: SPARQL SELECT DISTINCT ?g WHERE { 
GRAPH ?g { ?s ?p ?o } }
  - Added support for HTML+RDFa representation re. SPARQL CONSTRUCT and 
DESCRIBE query results
  - Added support for output:maxrows
  - Updated ontologies API
  - Updated iSPARQL application
  - Fixed IRI parts syntax to match SPARQL 1.0 W3C recommendation
  - Fixed support for XMLLiteral
  - Fixed bad box flags for strings for bnodes and types
  - Fixed replace lost filters with equivs that have no spog vars and no good 
subequivs.
  - Fixed cnet doublt awol:content
  - Fixed Googlebase query results with multiple entries
  - Fixed Googlebase location info
  - Fixed default sitemap crawling functions/pages
  - Fixed use SPARUL LOAD instead of SOFT
  - Fixed make sure version is intact as changes to .ttl file must reflect in 
sparql.sql
  - Fixed missing qualification of aggregate
  - Fixed compilation of ORDER BY column_idz clause in iterator of sponge with 
loop
  - Fixed UNION of SELECTs and for multiple OPTIONALs at one level with good 
and bad equalities
  - Fixed support for define output:format JSON
  - Fixed crash of rfc1808_expand_uri on base without schema
  - Fixed redundant trailing '' in results of TTL load when IRIs contain 
special chars
  - Fixed option (score ...) in a gp with multiple OPTIONAL {...}
  - Fixed when different TZ is used, must find offset and transform via GMT
  - Fixed SPARQL parsing and SQL codegen for negative numbers
  - Fixed some 'exotic' cases of NT outputs

* ODS Applications
  - Added support for ckeditor
  - Added new popup calendar based on OAT
  - Added VSP and REST implementation for user API
  - Added new API functions
  - Added FOAF+SSL groups
  - Added feed admin rights
  - Added Facebook registration and login
  - Removed support for Kupu editor
  - Removed support for rte editor
  - Removed support for IE 5 and 6 compatibility
  - Fixed users paths to physical location
  - Fixed problem with activity pages

Other links:

Virtuoso Open Source Edition:
   * Home Page: http://virtuoso.openlinksw.com/wiki/main/
   * Download Page:
  http://virtuoso.openlinksw.com/wiki/main/Main/VOSDownload

OpenLink Data Spaces:
   * Home Page: http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex
   * SPARQL Usage Examples (re. SIOC, FOAF, AtomOWL, SKOS):
  http://virtuoso.openlinksw.com/wiki/main/Main/ODSSIOCRef

Best Regards
Hugh Williams
Professional Services
OpenLink Software
Web: http://www.openlinksw.com
Support: http://support.openlinksw.com
Forums: http://boards.openlinksw.com/support
Twitter: http://twitter.com/OpenLink





Re: [Virtuoso-users] Announcing Virtuoso Open-Source Edition v 6.1.0

2010-02-04 Thread Nathan
Wow - that's a nice release!

Did I read it correctly; all the cartridges for sponger are now in v6.1???

Also didn't note mention of GEO extension, is that for Virtuoso
Commercial only at this time?

Many Regards  Congrats,

Nathan

Hugh Williams wrote:
 Hi,
 
 OpenLink Software is pleased to announce the official release of Virtuoso 
 Open-Source Edition, Version 6.1.0:
 
 ***IMPORTANT NOTE*** for up-graders from pre-6.x versions: The database file 
 format has substantially changed between VOS 5.x and VOS 6.x. To upgrade your 
 database, you must dump all data from the VOS 5.x database and re-load it 
 into VOS 6.x. Complete instructions may be found here.
 
 ***IMPORTANT NOTE*** for up-graders from earlier 6.x versions: The database 
 file format has not changed, but the introduction of a newer RDF index 
 requires you run a script to upgrade the RDF_QUAD table. Since this can be a 
 lengthy task and take extra disk space (up to twice the space used by the 
 original RDF_QUAD table may be required during conversion) this is not done 
 automatically on startup. Complete instructions may be found here.
 
 New and updated product features include:
 
 * Database engine
   - Added new 2+3 index scheme for RDF_QUAD table
   - Added new inlined string table for RDF_QUAD
   - Added optimizations to cost based optimizer
   - Added RoundRobin connection support
   - Removed deprecated samples/demos
   - Fixed align buffer to sizeof pointer to avoid crash on strict checking 
 platforms like sparc
   - Fixed text of version mismatch messages
   - Fixed issue with XA exception, double rollback, transact timeout
   - Merged enhancements and fixes from V5 branch
 
 * SPARQL and RDF
   - Added support for owl:inverseOf, owl:SymmetricProperty, and 
 owl:TransitiveProperty.
   - Added DB.DBA.BEST_LANGMATCH() and bif_langmatches_pct_http()
   - Added initial support for SPARQL-FED
   - Added initial support for SERVICE { ... };
   - Added support for expressions in LIMIT and OFFSET clauses
   - Added built-in predicate IsRef()
   - Added new error reporting for unsupported syntax
   - Added rdf box id only serialization; stays compatible with 5/6
   - Added support for SPARQL INSERT DATA / DELETE DATA
   - Added SPARQL 1.1 syntax sugar re. HAVING CLAUSE for filtering on GROUP BY
   - Added special code generator for optimized handling of: SPARQL SELECT 
 DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } }
   - Added support for HTML+RDFa representation re. output from SPARQL 
 CONSTRUCT and DESCRIBE queries
   - Added support for output:maxrows
   - Improved SPARQL parsing and SQL codegen for negative numbers
   - Improved recovery of lists in DB.DBA.RDF_AUDIT_METADATA()
   - Fixed iSPARQL compatibility with 3rd party SPARQL endpoints
   - Fixed bad init in trans node if multiple inputs or step output values
   - Fixed redundant trailing '' in results of TTL load when IRIs contain 
 special chars
   - Fixed problem with rfc1808_expand_uri not using proper macros and 
 allocate byte extra for strings
   - Fixed when different TZ is used, find offset and transform via GMT
   - Fixed graph-level security in cluster
   - Fixed redundant equalities in case of multiple OPTIONALs with same 
 variable
   - Fixed BOOLEAN_OF_OBJ in case of incomplete boxes
   - Fixed NTRIPLES serialization of triples
   - Merged enhancements and fixes from V5 branch
 
 * Sponger Middleware
   - Added Extractor Cartridges mapping Zillow, O'Reilly, Amazon, Googlebase, 
 BestBuy, CNET, and Crunchbase content to the GoodRelations Ontology.
   - Added Extractor Cartridges for Google Spreadsheet, Google Documents, 
 Microsoft Office Docs (Excel, PowerPoint etc), OpenOffice, CSV, Text files, 
 Disqus, Twitter, and Discogs.
   - Added Meta Cartridges covering Google Search, Yahoo! Boss, Bing, Sindice, 
 Yelp, NYT, NPR, AlchemyAPI, Zemanta, OpenCalais, UMBEL, GetGlue, Geonames, 
 DBpedia, Linked Open Data Cloud, BBC Linked Data Space, sameAs.org, whoisi, 
 uclassify, RapLeaf, Journalisted, Dapper, Revyu, Zillow, BestBuy, Amazon, 
 eBay, CNET, Discogs, and Crunchbase.
 
 * ODS Applications
   - Added support for ckeditor
   - Added new popup calendar based on OAT
   - Added REST and Virtuoso PL based Controllers for user API
   - Added new API functions
   - Added FOAF+SSL groups
   - Added feed admin rights
   - Added Facebook registration and login
   - Removed deprecated rte and kupu editors
   - Removed support for IE 5 and 6 compatibility
   - Merged enhancements and fixes from V5 branch
 
 Other links:
 
 Virtuoso Open Source Edition:
* Home Page: http://virtuoso.openlinksw.com/wiki/main/
* Download Page:
   http://virtuoso.openlinksw.com/wiki/main/Main/VOSDownload
 
 OpenLink Data Spaces:
* Home Page: http://virtuoso.openlinksw.com/wiki/main/Main/OdsIndex
* SPARQL Usage Examples (re. SIOC, FOAF, AtomOWL, SKOS):
   http://virtuoso.openlinksw.com/wiki/main/Main/ODSSIOCRef
 
 
 Best Regards
 Hugh Williams
 Professional Services
 OpenLink 

Re: [Virtuoso-users] Announcing Virtuoso Open-Source Edition v 6.1.0

2010-02-04 Thread Kingsley Idehen

Nathan wrote:

Wow - that's a nice release!
  
Did I read it correctly; all the cartridges for sponger are now in v6.1???
  
We just have new Cartridges (Extractor and Meta). The Extractors are 
Open Source ( versions 5  6) while the Meta Cartridges are commercial only.

Also didn't note mention of GEO extension, is that for Virtuoso
Commercial only at this time?
  
Yes, re. SPARQL-GEO. Thus, the fundamental differentiators between the 
Open Source and Commercial Editions come down to:


1. Sponger's Meta Cartridges -- these embellish the basic Extractor 
Cartridge generated graphs by performing LOD Cloud (and other Linked 
Data Space) lookups and joins
2. SPARQL-GEO  GeoSpatial indexing in general (so it applies to SQL 
engine also)
3. Virtual Database Functionality --  RDF Views over ODBC and JDBC 
accessible data sources

4. Replication
5. Clustering  High Availability.

Kingsley

Many Regards  Congrats,

Nathan

Hugh Williams wrote:
  

Hi,

OpenLink Software is pleased to announce the official release of Virtuoso 
Open-Source Edition, Version 6.1.0:

***IMPORTANT NOTE*** for up-graders from pre-6.x versions: The database file 
format has substantially changed between VOS 5.x and VOS 6.x. To upgrade your 
database, you must dump all data from the VOS 5.x database and re-load it into 
VOS 6.x. Complete instructions may be found here.

***IMPORTANT NOTE*** for up-graders from earlier 6.x versions: The database 
file format has not changed, but the introduction of a newer RDF index requires 
you run a script to upgrade the RDF_QUAD table. Since this can be a lengthy 
task and take extra disk space (up to twice the space used by the original 
RDF_QUAD table may be required during conversion) this is not done 
automatically on startup. Complete instructions may be found here.

New and updated product features include:

* Database engine
  - Added new 2+3 index scheme for RDF_QUAD table
  - Added new inlined string table for RDF_QUAD
  - Added optimizations to cost based optimizer
  - Added RoundRobin connection support
  - Removed deprecated samples/demos
  - Fixed align buffer to sizeof pointer to avoid crash on strict checking 
platforms like sparc
  - Fixed text of version mismatch messages
  - Fixed issue with XA exception, double rollback, transact timeout
  - Merged enhancements and fixes from V5 branch

* SPARQL and RDF
  - Added support for owl:inverseOf, owl:SymmetricProperty, and 
owl:TransitiveProperty.
  - Added DB.DBA.BEST_LANGMATCH() and bif_langmatches_pct_http()
  - Added initial support for SPARQL-FED
  - Added initial support for SERVICE { ... };
  - Added support for expressions in LIMIT and OFFSET clauses
  - Added built-in predicate IsRef()
  - Added new error reporting for unsupported syntax
  - Added rdf box id only serialization; stays compatible with 5/6
  - Added support for SPARQL INSERT DATA / DELETE DATA
  - Added SPARQL 1.1 syntax sugar re. HAVING CLAUSE for filtering on GROUP BY
  - Added special code generator for optimized handling of: SPARQL SELECT 
DISTINCT ?g WHERE { GRAPH ?g { ?s ?p ?o } }
  - Added support for HTML+RDFa representation re. output from SPARQL CONSTRUCT 
and DESCRIBE queries
  - Added support for output:maxrows
  - Improved SPARQL parsing and SQL codegen for negative numbers
  - Improved recovery of lists in DB.DBA.RDF_AUDIT_METADATA()
  - Fixed iSPARQL compatibility with 3rd party SPARQL endpoints
  - Fixed bad init in trans node if multiple inputs or step output values
  - Fixed redundant trailing '' in results of TTL load when IRIs contain 
special chars
  - Fixed problem with rfc1808_expand_uri not using proper macros and allocate 
byte extra for strings
  - Fixed when different TZ is used, find offset and transform via GMT
  - Fixed graph-level security in cluster
  - Fixed redundant equalities in case of multiple OPTIONALs with same variable
  - Fixed BOOLEAN_OF_OBJ in case of incomplete boxes
  - Fixed NTRIPLES serialization of triples
  - Merged enhancements and fixes from V5 branch

* Sponger Middleware
  - Added Extractor Cartridges mapping Zillow, O'Reilly, Amazon, Googlebase, 
BestBuy, CNET, and Crunchbase content to the GoodRelations Ontology.
  - Added Extractor Cartridges for Google Spreadsheet, Google Documents, 
Microsoft Office Docs (Excel, PowerPoint etc), OpenOffice, CSV, Text files, 
Disqus, Twitter, and Discogs.
  - Added Meta Cartridges covering Google Search, Yahoo! Boss, Bing, Sindice, 
Yelp, NYT, NPR, AlchemyAPI, Zemanta, OpenCalais, UMBEL, GetGlue, Geonames, 
DBpedia, Linked Open Data Cloud, BBC Linked Data Space, sameAs.org, whoisi, 
uclassify, RapLeaf, Journalisted, Dapper, Revyu, Zillow, BestBuy, Amazon, eBay, 
CNET, Discogs, and Crunchbase.

* ODS Applications
  - Added support for ckeditor
  - Added new popup calendar based on OAT
  - Added REST and Virtuoso PL based Controllers for user API
  - Added new API functions
  - Added FOAF+SSL groups
  - Added feed admin rights
  - Added Facebook registration and login
  - 

RE: DBpedia-based entity recognition service / tool?

2010-02-04 Thread Rafi.Shachar
Matthias,

OpenCalais does have links to DBpedia URIs for large subset of entities.
The DBpedia URIs are not included in OpenCalais output but in the
LinkedData endpoint. For example,
http://d.opencalais.com/er/geo/city/ralg-geo1/f08025f6-8e95-c3ff-2909-0a
5219ed3bfa

The entities which have links to DBpedia are documented here:
http://www.opencalais.com/documentation/linked-data-entities

Rafi

-Original Message-
From: public-lod-requ...@w3.org [mailto:public-lod-requ...@w3.org] On
Behalf Of Matthias Samwald
Sent: Tuesday, February 02, 2010 2:26 PM
To: public-lod@w3.org
Subject: DBpedia-based entity recognition service / tool?

Dear LOD community,

I would be glad to hear your advice on how to best accomplish a simple
task: 
extracting DBpedia entities (identified with DBpedia URIs) from a string
of 
text. With good accuracy and recall, possibly with some options to 
constraint the recognized entities to some subset of DBpedia, based on 
categories. The tool or service should be performant enough to process
large 
numbers of strings in a reasonable amount of time.
Given the prolific creation of tiny tools and services in this community
I 
am puzzled about my inability to find anything that accomplishes this
task.
Could you point me to something like that? Are there tools/services for 
Wikipedia that I could use?
Zemanta seems to be too much geared towards 'enhanced blogging', while 
OpenCalais does not return Wikipedia/DBpedia identifiers. Please correct
me 
if I am wrong.

Cheers,
Matthias 




This email was sent to you by Thomson Reuters, the global news and information 
company.
Any views expressed in this message are those of the individual sender, except 
where the sender specifically states them to be the views of Thomson Reuters. 





EKAW 2010 – Call for Workshop Proposals

2010-02-04 Thread Siegfried Handschuh
Apologies for cross-postings. Please send to interested colleagues and 
students


-

*** Call for workshop proposals ***
*** Knowledge Engineering and Knowledge Management  EKAW - 2010 ***

11th October-15th October 2010 - Lisbon, Portugal
http://ekaw2010.inesc-id.pt

see also workshop call at:
http://www.siegfried-handschuh.net/events/EKAW2010/


Background and Motivation
---

Workshops provide members of a community a forum to discuss common 
interests in a focused way. If you are working in an emerging area in 
Knowledge Acquisition, Knowledge Engineering and/or Knowledge 
Management, consider organizing a workshop. They are a chance to meet 
mind-alike researchers and discover what others are doing. A workshop 
offers a good opportunity for young researchers to present their work 
and to obtain feedback from an interested community. Successful 
workshops may result in edited books or special issues in international 
journals.


EKAW 2010 is looking for exciting proposals for half-day or full-day 
workshops to be held. Each workshop should generate discussions that 
give the EKAW community a new, organized way of thinking about the 
topic, or ideas that suggest promising directions for future research. A 
successful workshop can move the field forward and help in building 
community.


EKAW workshops provide an informal setting where the participants have 
the opportunity to discuss specific technical topics in an atmosphere 
that fosters the active exchange of  ideas. Our aim for the workshop 
program is to promote and collect multidisciplinary research directions 
that contribute to knowledge management and engineering. Members from 
all research areas related to knowledge are invited to submit workshop 
proposals.


Workshop @ EKAW 2010
-
The 17th International Conference on Knowledge Engineering and Knowledge 
Management is concerned with all aspects of eliciting, acquiring, 
modelling and managing knowledge, and its role in the construction of 
knowledge-intensive systems and services for the semantic web, knowledge 
management, e-business, natural language processing, intelligent 
information integration, etc.


We seek high quality proposals for workshops about topics related to the 
conference. Of particular interest are proposals that address one of the 
following areas:


1) Knowledge Management
2) Knowledge Engineering and Acquisition
3) Knowledge In Use
4) Social and Cognitive Aspects of Knowledge Engineering
5) Special focus knowledge management and engineering by the masses
  * Human-machine synergy in knowledge acquisition
  * Incentives for knowledge creation and semantic annotation
  * Enhancing human productivity (e.g. knowledge workers)
  * Social and human factors in knowledge management
  * Collective and collaborative intelligence in knowledge management
  * Social tagging and folksonomies, social networks
  * Web2.0 approaches to KM (including semantic wikis, folksonomies, etc.)
  * Games with a Purpose and KM
  * Linked Open Data / Web of Data

Submission Requirements
--
Each workshop should have one or more organizers and an international 
program committee. Proposals for workshops should be a maximum of 1000 
words, and should contain the following information to judge the 
importance, quality, and benefits for the research community:


A cover page including:
- Workshop title
- Name, affiliation, full postal address, homepage and e-mail address
for each organizer
- Identification of the primary contact person(s)
- One-paragraph biography for each workshop organizer

Motivation and Objectives:

This section is about a brief description of the workshop topic and 
goals, its relevance to EKAW 2010 and significance for the research field.


- Motivation: What is the overall topic of the workshop? Relation to the 
topic of the conference, including a brief  discussion of why and to 
whom the workshop is of interest.
- A tentative list of PC members, clearly stating those that have 
already accepted
- The tentative dates (submission, notification, camera-ready deadline, 
etc.)


Workshop intentions and proposals should be submitted via email in PDF 
format to the EKAW 2010 workshop chair and selection committee:

ekaw10worksh...@lists.deri.org
with a subject line of: EKAW 2010 Workshop Proposal


Submission Dates and Details


Friday 19 March -  Informal “intention to submit” email with general details
Friday   2  April - Submission of proposal by email to
Friday 16 April - Notification of workshop acceptance
Friday 23 April - Publication of Workshop's Call for Paper  Setup of 
workshop web site
Wed 1 September - Deadline for camera-ready workshop notes and other 
information


Organizer's Responsibilities


The organizers of accepted workshops are expected to:

- Define, produce and distribute the 

Re: DBpedia-based entity recognition service / tool?

2010-02-04 Thread Aldo Bucchi

Nathan,

On Feb 4, 2010, at 8:10, Nathan nat...@webr3.org wrote:


Juan Sequeda wrote:

we followed several domain term extraction techniques.


any chance you could name drop / point to a few of the techniques -  
very

interested in this myself and in all honesty, no idea where to start
(other than a crude string split and check word combinations against a
dictionary - not very practical!)


--- http://gate.ac.uk/



Many Regards,

Nathan





Re: DBpedia-based entity recognition service / tool?

2010-02-04 Thread Juan Sequeda
On Thu, Feb 4, 2010 at 5:10 AM, Nathan nat...@webr3.org wrote:

 Juan Sequeda wrote:
  we followed several domain term extraction techniques.

 any chance you could name drop / point to a few of the techniques - very
 interested in this myself and in all honesty, no idea where to start
 (other than a crude string split and check word combinations against a
 dictionary - not very practical!)


yes, that would be very naive :)

Look into the Term Extraction [1] area of Information extraction. There are
several techniques which can be combined including POS tagging, Phrase
chunking, etc...

[1] http://en.wikipedia.org/wiki/Terminology_extraction


 Many Regards,

 Nathan



Re: DBpedia-based entity recognition service / tool?

2010-02-04 Thread Tom Morris
On Tue, Feb 2, 2010 at 10:21 AM, Nathan nat...@webr3.org wrote:

 I should probably be replying here as I've been doing this, and working
 on this for the past few months.

 I've found from experience that the only viable way to address this need
 is to do as follows:
 1: Pass content through to both OpenCalais and Zemanta
 2: Combine the results to provide a list of string terms to be
 associated with dbpedia resources (where zemanta hasn't already done it)
 3: Lookup each string resource and try and associate it to the string
 4: Return all matches with results to the end user in order for them to
 manually confirm the results.

 Steps 3 and 4 are the killers here, because no matter how could the
 service you can't always match to exact URIs (sometimes you can only
 determine that you may mean one of X many ambiguous URIs); and ...

I don't understand the roundabout approach since both of these
services output Freebase identifiers and they are all mapped
explicitly to both DBpedia by owl:sameAs and Wikipedia via normal URL.

Why not just follow the links directly?  The only time this won't work
is where the concept was sourced from someplace other than Wikipedia
or Wikipedia article(s) were split/merged so there isn't a 1:1
correspondence.

Tom



Re: DBpedia-based entity recognition service / tool?

2010-02-04 Thread Nathan
Tom Morris wrote:
 On Tue, Feb 2, 2010 at 10:21 AM, Nathan nat...@webr3.org wrote:
 
 I should probably be replying here as I've been doing this, and working
 on this for the past few months.

 I've found from experience that the only viable way to address this need
 is to do as follows:
 1: Pass content through to both OpenCalais and Zemanta
 2: Combine the results to provide a list of string terms to be
 associated with dbpedia resources (where zemanta hasn't already done it)
 3: Lookup each string resource and try and associate it to the string
 4: Return all matches with results to the end user in order for them to
 manually confirm the results.

 Steps 3 and 4 are the killers here, because no matter how could the
 service you can't always match to exact URIs (sometimes you can only
 determine that you may mean one of X many ambiguous URIs); and ...
 
 I don't understand the roundabout approach since both of these
 services output Freebase identifiers and they are all mapped
 explicitly to both DBpedia by owl:sameAs and Wikipedia via normal URL.
 
 Why not just follow the links directly?  The only time this won't work
 is where the concept was sourced from someplace other than Wikipedia
 or Wikipedia article(s) were split/merged so there isn't a 1:1
 correspondence.

Where they are available; I do - but you still get an amount of terms
which are not mapped, and can be mapped by doing lookups; and where you
are unsure promoting the user to do the disambiguation provides a fuller
result :)

also obviously as the services improve the need for lookups drops; and
finally it allows for domain specific document / thing relations

regards!