Author: rwesten Date: Mon Jun 20 12:57:29 2011 New Revision: 1137617 URL: http://svn.apache.org/viewvc?rev=1137617&view=rev Log: STANBOL-230: Adds support for using the Entityhub for linking Entities for TextAnnotations
* By configuring "local" or "entityhub" one can now use the Entityhub to suggest Entities for TextAnnotations * Because now it is no longer true that only ReferencedSites can be used for entity tagging the engine was renamed to NamedEntityTaggingEngine. All Configuration files (e.g. in the launcher) where adapted to the new name. * Added a second default configuration to the full launcher that uses the Entityhub for enhancing Entities. This Configuration uses "http://schema.org" types for "Person", "Organization" and "Place" as well as the "name" field for lookup. * The ids "local" and "entityhub" are now reserved for the Entityhub and MUST NOT be used as IDs for Referenced Sites. * added a list of IDs reserved for the Entityhub to the Entityhub Interface. * added a list of prohibited IDs for ReferencedSites to the ReferencedSite interface. Currently only the IDs assigned to the Entityhub are in that list ### other changes * corrected a type in the Entityhub interface (getQueryFavtory -> getQueryFactory) * added "http://schema.org" to the NamespaceEnum (prefix is "schema") * added support for a default namespace to the NamespaceEnum. * the NamespaceEnum now uses schema.org as default namespace. This means that the property "name" in configurations would be mapped to "http://schema.org/name". * The "indexing.properties" files of all indexing utilities used wrong IDs for EntityDereferencer and EntitySearcher implementations. This is now corrected * "rdfs:label" was hard coded to retrieve the value for the entity-label property of TextAnnotations. Now the value of the property used to search for entities is used. * corrected a bug in ContentItemResource (enhancer.jersey) that caused NPEs in case a Clerezza PlainLiteral had no language defined. Added: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/LabelBasedEntityTaggingEngine.java (with props) incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntityTaggingEngine.java (contents, props changed) - copied, changed from r1136995, incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/ReferencedSiteEntityTaggingEnhancementEngine.java incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-dbpedia.config - copied unchanged from r1136995, incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.ReferencedSiteEntityTaggingEnhancementEngine-DBpedia.config incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-local.config Removed: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/ReferencedSiteEntityTaggingEnhancementEngine.java incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.ReferencedSiteEntityTaggingEnhancementEngine-DBpedia.config Modified: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/EnhancementRDFUtils.java incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/resources/OSGI-INF/metatype/metatype.properties incubator/stanbol/trunk/enhancer/engines/entitytagging/src/test/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/TestEntityLinkingEnhancementEngine.java incubator/stanbol/trunk/enhancer/jersey/src/main/java/org/apache/stanbol/enhancer/jersey/resource/ContentItemResource.java incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/EntityhubImpl.java incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/ReferencedSiteImpl.java incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/Entityhub.java incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/defaults/NamespaceEnum.java incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/site/ReferencedSite.java incubator/stanbol/trunk/entityhub/indexing/dblp/src/main/resources/indexing/config/indexing.properties incubator/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/indexing.properties incubator/stanbol/trunk/entityhub/indexing/genericrdf/src/main/resources/indexing/config/indexing.properties Modified: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/EnhancementRDFUtils.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/EnhancementRDFUtils.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/EnhancementRDFUtils.java (original) +++ incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/EnhancementRDFUtils.java Mon Jun 20 12:57:29 2011 @@ -61,17 +61,19 @@ public class EnhancementRDFUtils { * enhancements this textAnnotation is related to * @param entity * the related entity + * @param nameField the field used to extract the name */ public static UriRef writeEntityAnnotation(EnhancementEngine engine, LiteralFactory literalFactory, MGraph graph, UriRef contentItemId, Collection<NonLiteral> relatedEnhancements, - Entity entity) { + Entity entity, + String nameField) { // 1. check if the returned Entity does has a label -> if not return null // add labels (set only a single label. Use "en" if available! Text label = null; - Iterator<Text> labels = entity.getRepresentation().getText(RDFS_LABEL.getUnicodeString()); + Iterator<Text> labels = entity.getRepresentation().getText(nameField); while (labels.hasNext()) { Text actLabel = labels.next(); if (label == null) { Added: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/LabelBasedEntityTaggingEngine.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/LabelBasedEntityTaggingEngine.java?rev=1137617&view=auto ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/LabelBasedEntityTaggingEngine.java (added) +++ incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/LabelBasedEntityTaggingEngine.java Mon Jun 20 12:57:29 2011 @@ -0,0 +1,36 @@ +package org.apache.stanbol.enhancer.engines.entitytagging.impl; + +import java.util.Map; + +import org.apache.felix.scr.annotations.Component; +import org.apache.felix.scr.annotations.ConfigurationPolicy; +import org.apache.felix.scr.annotations.Service; +import org.apache.stanbol.enhancer.servicesapi.ContentItem; +import org.apache.stanbol.enhancer.servicesapi.EngineException; +import org.apache.stanbol.enhancer.servicesapi.EnhancementEngine; +import org.apache.stanbol.enhancer.servicesapi.ServiceProperties; + +@Component(configurationFactory = true, policy = ConfigurationPolicy.REQUIRE, // the baseUri is required! + specVersion = "1.1", metatype = true, immediate = true) +@Service +public class LabelBasedEntityTaggingEngine implements EnhancementEngine, ServiceProperties { + + @Override + public int canEnhance(ContentItem ci) throws EngineException { + // TODO Auto-generated method stub + return 0; + } + + @Override + public void computeEnhancements(ContentItem ci) throws EngineException { + // TODO Auto-generated method stub + + } + + @Override + public Map<String,Object> getServiceProperties() { + // TODO Auto-generated method stub + return null; + } + +} Propchange: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/LabelBasedEntityTaggingEngine.java ------------------------------------------------------------------------------ svn:mime-type = text/plain Copied: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntityTaggingEngine.java (from r1136995, incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/ReferencedSiteEntityTaggingEnhancementEngine.java) URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntityTaggingEngine.java?p2=incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntityTaggingEngine.java&p1=incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/ReferencedSiteEntityTaggingEnhancementEngine.java&r1=1136995&r2=1137617&rev=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/ReferencedSiteEntityTaggingEnhancementEngine.java (original) +++ incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntityTaggingEngine.java Mon Jun 20 12:57:29 2011 @@ -22,12 +22,15 @@ import static org.apache.stanbol.enhance import static org.apache.stanbol.enhancer.servicesapi.rdf.Properties.RDF_TYPE; import java.util.ArrayList; +import java.util.Arrays; import java.util.Collections; import java.util.Dictionary; import java.util.HashMap; +import java.util.HashSet; import java.util.Iterator; import java.util.List; import java.util.Map; +import java.util.Set; import org.apache.clerezza.rdf.core.LiteralFactory; import org.apache.clerezza.rdf.core.MGraph; @@ -54,6 +57,8 @@ import org.apache.stanbol.enhancer.servi import org.apache.stanbol.enhancer.servicesapi.rdf.OntologicalClasses; import org.apache.stanbol.enhancer.servicesapi.rdf.Properties; import org.apache.stanbol.enhancer.servicesapi.rdf.TechnicalClasses; +import org.apache.stanbol.entityhub.servicesapi.Entityhub; +import org.apache.stanbol.entityhub.servicesapi.EntityhubException; import org.apache.stanbol.entityhub.servicesapi.defaults.NamespaceEnum; import org.apache.stanbol.entityhub.servicesapi.model.Entity; import org.apache.stanbol.entityhub.servicesapi.query.FieldQuery; @@ -77,45 +82,53 @@ import org.slf4j.LoggerFactory; @Component(configurationFactory = true, policy = ConfigurationPolicy.REQUIRE, // the baseUri is required! specVersion = "1.1", metatype = true, immediate = true) @Service -public class ReferencedSiteEntityTaggingEnhancementEngine implements EnhancementEngine, ServiceProperties { +public class NamedEntityTaggingEngine implements EnhancementEngine, ServiceProperties { private final Logger log = LoggerFactory.getLogger(getClass()); - @Property(value = "dbpedia") + @Property//(value = "dbpedia") public static final String REFERENCED_SITE_ID = "org.apache.stanbol.enhancer.engines.entitytagging.referencedSiteId"; - @Property(boolValue = true) + @Property//(boolValue = true) public static final String PERSON_STATE = "org.apache.stanbol.enhancer.engines.entitytagging.personState"; - @Property(value = "dbp-ont:Person") + @Property//(value = "dbp-ont:Person") public static final String PERSON_TYPE = "org.apache.stanbol.enhancer.engines.entitytagging.personType"; - @Property(boolValue = true) + @Property//(boolValue = true) public static final String ORG_STATE = "org.apache.stanbol.enhancer.engines.entitytagging.organisationState"; - @Property(value = "dbp-ont:Organisation") + @Property//(value = "dbp-ont:Organisation") public static final String ORG_TYPE = "org.apache.stanbol.enhancer.engines.entitytagging.organisationType"; - @Property(boolValue = true) + @Property//(boolValue = true) public static final String PLACE_STATE = "org.apache.stanbol.enhancer.engines.entitytagging.placeState"; - @Property(value = "dbp-ont:Place") + @Property//(value = "dbp-ont:Place") public static final String PLACE_TYPE = "org.apache.stanbol.enhancer.engines.entitytagging.placeType"; - + /** + * Use the RDFS label as default + */ @Property(value = "rdfs:label") public static final String NAME_FIELD = "org.apache.stanbol.enhancer.engines.entitytagging.nameField"; /** - * Service of the RICK that manages all the active referenced Site. This Service is used to lookup the + * Service of the Entityhub that manages all the active referenced Site. This Service is used to lookup the * configured Referenced Site when we need to enhance a content item. */ @Reference protected ReferencedSiteManager siteManager; /** - * This is the configured name of the referenced Site used to find entities. The - * {@link ReferencedSiteManager} service of the RICK is used to get the actual {@link ReferencedSite} - * instance for each request to this Engine. + * Used to lookup Entities if the {@link #REFERENCED_SITE_ID} property is + * set to "entityhub" or "local" + */ + @Reference + protected Entityhub entityhub; + + /** + * This holds the id of the {@link ReferencedSite} used to lookup Entities + * or <code>null</code> if the {@link Entityhub} is used. */ protected String referencedSiteID; @@ -125,6 +138,7 @@ public class ReferencedSiteEntityTagging */ public static final Integer defaultOrder = ORDERING_EXTRACTION_ENHANCEMENT; + /** * State if text annotations of type {@link OntologicalClasses#DBPEDIA_PERSON} are enhanced by this engine */ @@ -222,6 +236,10 @@ public class ReferencedSiteEntityTagging throw new ConfigurationException(REFERENCED_SITE_ID, "The ID of the Referenced Site is a required Parameter and MUST NOT be an empty String!"); } + if(Entityhub.ENTITYHUB_IDS.contains(this.referencedSiteID.toLowerCase())){ + log.info("Init NamedEntityTaggingEngine instance for the Entityhub"); + this.referencedSiteID = null; + } Object state = config.get(PERSON_STATE); personState = state == null ? true : Boolean.parseBoolean(state.toString()); state = config.get(ORG_STATE); @@ -252,21 +270,28 @@ public class ReferencedSiteEntityTagging } public void computeEnhancements(ContentItem ci) throws EngineException { - ReferencedSite site = siteManager.getReferencedSite(referencedSiteID); - if (site == null) { - String msg = String.format( - "Unable to enhance %s because Referenced Site %s is currently not active!", ci.getId(), - referencedSiteID); - log.warn(msg); - // TODO: throwing Exceptions is currently deactivated. We need a more clear - // policy what do to in such situations - // throw new EngineException(msg); - return; - } - if (isOfflineMode() && !site.supportsLocalMode()) { - log.warn("Unable to enhance ci {} because OfflineMode is not supported by ReferencedSite {}.", - ci.getId(), site.getId()); - return; + final ReferencedSite site; + if(referencedSiteID != null) { //lookup the referenced site + site = siteManager.getReferencedSite(referencedSiteID); + //ensure that it is present + if (site == null) { + String msg = String.format( + "Unable to enhance %s because Referenced Site %s is currently not active!", ci.getId(), + referencedSiteID); + log.warn(msg); + // TODO: throwing Exceptions is currently deactivated. We need a more clear + // policy what do to in such situations + // throw new EngineException(msg); + return; + } + //and that it supports offline mode if required + if (isOfflineMode() && !site.supportsLocalMode()) { + log.warn("Unable to enhance ci {} because OfflineMode is not supported by ReferencedSite {}.", + ci.getId(), site.getId()); + return; + } + } else { // null indicates to use the Entityhub to lookup Entities + site = null; } UriRef contentItemId = new UriRef(ci.getId()); @@ -292,20 +317,33 @@ public class ReferencedSiteEntityTagging for (Map.Entry<UriRef,List<UriRef>> entry : textAnnotations.entrySet()) { try { - computeEntityRecommentations(site, literalFactory, graph, contentItemId, entry.getKey(), + computeEntityRecommentations(site,literalFactory, graph, contentItemId, entry.getKey(), entry.getValue()); - } catch (ReferencedSiteException e) { + } catch (EntityhubException e) { throw new EngineException(this, ci, e); } } } + /** + * Computes the Enhancements + * @param site The {@link ReferencedSiteException} id or <code>null</code> to + * use the {@link Entityhub} + * @param literalFactory the {@link LiteralFactory} used to create RDF Literals + * @param graph the graph to write the lined entities + * @param contentItemId the id of the contentItem + * @param textAnnotation the text annotation to enhance + * @param subsumedAnnotations other text annotations for the same entity + * @return the suggested {@link Entity entities} + * @throws EntityhubException On any Error while looking up Entities via + * the Entityhub + */ protected final Iterable<Entity> computeEntityRecommentations(ReferencedSite site, - LiteralFactory literalFactory, - MGraph graph, - UriRef contentItemId, - UriRef textAnnotation, - List<UriRef> subsumedAnnotations) throws ReferencedSiteException { + LiteralFactory literalFactory, + MGraph graph, + UriRef contentItemId, + UriRef textAnnotation, + List<UriRef> subsumedAnnotations) throws EntityhubException { // First get the required properties for the parsed textAnnotation // ... and check the values String name = EnhancementEngineHelper.getString(graph, textAnnotation, ENHANCER_SELECTED_TEXT); @@ -325,7 +363,9 @@ public class ReferencedSiteEntityTagging name = cleanupKeywords(name); log.debug("Process TextAnnotation " + name + " type=" + type); - FieldQuery query = site.getQueryFactory().createFieldQuery(); + FieldQuery query = site == null ? //if site is NULL use the Entityhub + entityhub.getQueryFactory().createFieldQuery() : + site.getQueryFactory().createFieldQuery(); // replace spaces with plus to create an AND search for all words in the name! query.setConstraint(nameField, new TextConstraint(name));// name.replace(' ', '+'))); if (OntologicalClasses.DBPEDIA_PERSON.equals(type)) { @@ -360,7 +400,9 @@ public class ReferencedSiteEntityTagging } } query.setLimit(this.numSuggestions); - QueryResultList<Entity> results = site.findEntities(query); + QueryResultList<Entity> results = site == null? //if site is NULL + entityhub.findEntities(query) : //use the Entityhub + site.findEntities(query); //else the referenced site log.debug("{} results returned by query {}", results.size(), query); List<NonLiteral> annotationsToRelate = new ArrayList<NonLiteral>(); @@ -370,7 +412,7 @@ public class ReferencedSiteEntityTagging for (Entity guess : results) { log.debug("Adding {} to ContentItem {}", guess, contentItemId); EnhancementRDFUtils.writeEntityAnnotation(this, literalFactory, graph, contentItemId, - annotationsToRelate, guess); + annotationsToRelate, guess, nameField); } return results; } Propchange: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/NamedEntityTaggingEngine.java ------------------------------------------------------------------------------ svn:mime-type = text/plain Modified: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/resources/OSGI-INF/metatype/metatype.properties URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/resources/OSGI-INF/metatype/metatype.properties?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/resources/OSGI-INF/metatype/metatype.properties (original) +++ incubator/stanbol/trunk/enhancer/engines/entitytagging/src/main/resources/OSGI-INF/metatype/metatype.properties Mon Jun 20 12:57:29 2011 @@ -1,8 +1,8 @@ #=============================================================================== #Properties and Options used to configure ReferencedSiteEntityTaggingEnhancementEngine #=============================================================================== -org.apache.stanbol.enhancer.engines.entitytagging.impl.ReferencedSiteEntityTaggingEnhancementEngine.name=Entityhub Referenced Site based Entity-Tagging-Engine -org.apache.stanbol.enhancer.engines.entitytagging.impl.ReferencedSiteEntityTaggingEnhancementEngine.description=Enhancement Engine that uses Entities managed by a Entityhub Referenced Site for semantic lifting of TextAnnotations +org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.name=Named Entity Tagging Engine +org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine.description=Links named entities (Persons, Organisations, Places) to Entities managed by an Entityhub Referenced Site org.apache.stanbol.enhancer.engines.entitytagging.referencedSiteId.name=Referenced Site org.apache.stanbol.enhancer.engines.entitytagging.referencedSiteId.description=The ID of the Entityhub Referenced Site used for semantic lifting of TextAnnotations Modified: incubator/stanbol/trunk/enhancer/engines/entitytagging/src/test/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/TestEntityLinkingEnhancementEngine.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/engines/entitytagging/src/test/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/TestEntityLinkingEnhancementEngine.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/engines/entitytagging/src/test/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/TestEntityLinkingEnhancementEngine.java (original) +++ incubator/stanbol/trunk/enhancer/engines/entitytagging/src/test/java/org/apache/stanbol/enhancer/engines/entitytagging/impl/TestEntityLinkingEnhancementEngine.java Mon Jun 20 12:57:29 2011 @@ -63,8 +63,8 @@ public class TestEntityLinkingEnhancemen */ public static final String PLACE = "New Zealand"; - static ReferencedSiteEntityTaggingEnhancementEngine entityLinkingEngine - = new ReferencedSiteEntityTaggingEnhancementEngine(); + static NamedEntityTaggingEngine entityLinkingEngine + = new NamedEntityTaggingEngine(); @BeforeClass public static void setUpServices() throws IOException { Modified: incubator/stanbol/trunk/enhancer/jersey/src/main/java/org/apache/stanbol/enhancer/jersey/resource/ContentItemResource.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/enhancer/jersey/src/main/java/org/apache/stanbol/enhancer/jersey/resource/ContentItemResource.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/enhancer/jersey/src/main/java/org/apache/stanbol/enhancer/jersey/resource/ContentItemResource.java (original) +++ incubator/stanbol/trunk/enhancer/jersey/src/main/java/org/apache/stanbol/enhancer/jersey/resource/ContentItemResource.java Mon Jun 20 12:57:29 2011 @@ -409,7 +409,7 @@ public class ContentItemResource extends Resource object = abstracts.next().getObject(); if (object instanceof PlainLiteral) { PlainLiteral abstract_ = (PlainLiteral) object; - if (abstract_.getLanguage().equals(new Language("en"))) { + if (new Language("en").equals(abstract_.getLanguage())) { return abstract_.getLexicalForm(); } } Modified: incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/EntityhubImpl.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/EntityhubImpl.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/EntityhubImpl.java (original) +++ incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/EntityhubImpl.java Mon Jun 20 12:57:29 2011 @@ -402,7 +402,7 @@ public final class EntityhubImpl impleme } } private void deleteEntities(Yard yard, Collection<String> ids) throws YardException { - FieldQuery fieldQuery = getQueryFavtory().createFieldQuery(); + FieldQuery fieldQuery = getQueryFactory().createFieldQuery(); Collection<String> toDelete = new HashSet<String>(ids); for(String id : ids){ if(id != null && !id.isEmpty()){ @@ -420,7 +420,7 @@ public final class EntityhubImpl impleme private void deleteMappingsbyTarget(Yard yard,String id) throws YardException { if(id != null && !id.isEmpty()){ - FieldQuery fieldQuery = getQueryFavtory().createFieldQuery(); + FieldQuery fieldQuery = getQueryFactory().createFieldQuery(); fieldQuery.setConstraint(RdfResourceEnum.mappingTarget.getUri(), new ReferenceConstraint(id)); deleteEntities(yard, ModelUtils.asCollection( yard.findReferences(fieldQuery).iterator())); @@ -578,7 +578,7 @@ public final class EntityhubImpl impleme log.warn("NULL parsed as Reference -> call to getMappingByEntity ignored (return null)"); return null; } - FieldQuery fieldQuery = getQueryFavtory().createFieldQuery(); + FieldQuery fieldQuery = getQueryFactory().createFieldQuery(); fieldQuery.setConstraint(RdfResourceEnum.mappingSource.getUri(), new ReferenceConstraint(reference)); Yard entityhubYard = lookupYard(); QueryResultList<Representation> resultList = entityhubYard.findRepresentation(fieldQuery); @@ -609,7 +609,7 @@ public final class EntityhubImpl impleme log.warn("NULL parsed as Reference -> call to getMappingsBySymbol ignored (return null)"); return null; } - FieldQuery fieldQuery = getQueryFavtory().createFieldQuery(); + FieldQuery fieldQuery = getQueryFactory().createFieldQuery(); fieldQuery.setConstraint(RdfResourceEnum.mappingTarget.getUri(), new ReferenceConstraint(targetId)); Yard enttiyhubYard = lookupYard(); QueryResultList<Representation> resultList = enttiyhubYard.findRepresentation(fieldQuery); @@ -725,7 +725,7 @@ public final class EntityhubImpl impleme } } @Override - public FieldQueryFactory getQueryFavtory() { + public FieldQueryFactory getQueryFactory() { Yard entityhubYard = getYard(); return entityhubYard==null? //if no yard available DefaultQueryFactory.getInstance(): //use the default Modified: incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/ReferencedSiteImpl.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/ReferencedSiteImpl.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/ReferencedSiteImpl.java (original) +++ incubator/stanbol/trunk/entityhub/generic/core/src/main/java/org/apache/stanbol/entityhub/core/impl/ReferencedSiteImpl.java Mon Jun 20 12:57:29 2011 @@ -612,6 +612,12 @@ public class ReferencedSiteImpl implemen } //NOTE that the constructor also validation of the parsed configuration siteConfiguration = new DefaultSiteConfiguration(config); + if(PROHIBITED_SITE_IDS.contains(siteConfiguration.getId().toLowerCase())){ + throw new ConfigurationException(SiteConfiguration.ID, String.format( + "The ID '%s' of this Referenced Site is one of the following " + + "prohibited IDs: {} (case insensitive)",siteConfiguration.getId(), + PROHIBITED_SITE_IDS)); + } log.info(" > initialise Referenced Site {}",siteConfiguration.getName()); //if the accessUri is the same as the queryUri and both the dereferencer and //the entitySearcher uses the same component, than we need only one component Modified: incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/Entityhub.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/Entityhub.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/Entityhub.java (original) +++ incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/Entityhub.java Mon Jun 20 12:57:29 2011 @@ -16,7 +16,11 @@ */ package org.apache.stanbol.entityhub.servicesapi; +import java.util.Arrays; import java.util.Collection; +import java.util.Collections; +import java.util.HashSet; +import java.util.Set; import org.apache.stanbol.entityhub.servicesapi.mapping.FieldMapper; import org.apache.stanbol.entityhub.servicesapi.mapping.FieldMapping; @@ -46,6 +50,18 @@ import org.apache.stanbol.entityhub.serv public interface Entityhub { String DEFAUTL_ENTITYHUB_PREFIX = "urn:org.apache.stanbol:entityhub"; + /** + * Protected keys to be used as name for the Entityhub. Such keys MUST NOT + * be used as {@link ReferencedSite#getId() id}s for + * {@link ReferencedSite}s. (case insensitive)<p> + * The protected values are <ul> + * <li><code>"local"</code> + * <li><code>"entityhub"</code> + * </ul> + */ + Set<String> ENTITYHUB_IDS = Collections.unmodifiableSet( + new HashSet<String>(Arrays.asList( + "local","entityhub"))); /** * Getter for the Yard storing the Entities and Mappings managed by this @@ -150,7 +166,7 @@ public interface Entityhub { * {@link Yard} used by the entity hub. * @return the query factory */ - FieldQueryFactory getQueryFavtory(); + FieldQueryFactory getQueryFactory(); /** * Getter for the FieldMappings configured for this Site * @return The {@link FieldMapping} present for this Site. Modified: incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/defaults/NamespaceEnum.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/defaults/NamespaceEnum.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/defaults/NamespaceEnum.java (original) +++ incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/defaults/NamespaceEnum.java Mon Jun 20 12:57:29 2011 @@ -19,6 +19,9 @@ package org.apache.stanbol.entityhub.ser import java.util.Collections; import java.util.HashMap; import java.util.Map; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; /** * Defines commonly used name spaces to prevent multiple definitions in several * classes @@ -26,6 +29,7 @@ import java.util.Map; * */ public enum NamespaceEnum { + //Namespaces defined by the entityhub entityhubModel("entityhub","http://www.iks-project.eu/ontology/rick/model/"), entityhubQuery("entityhub-query","http://www.iks-project.eu/ontology/rick/query/"), @@ -69,18 +73,76 @@ public enum NamespaceEnum { geonames("http://www.geonames.org/ontology#"), //copyright and license cc("http://creativecommons.org/ns#"), + //Schema.org (see http://schema.org/docs/schemaorg.owl for the Ontology) + schema("http://schema.org/",true), ; - private String ns; - private String prefix; + /** + * The logger + */ + private static final Logger log = LoggerFactory.getLogger(NamespaceEnum.class); + + private final String ns; + private final String prefix; + private final boolean defaultPrefix; + /** + * Defines a namespace that used the {@link #name()} as prefix. + * @param ns the namespace. MUST NOT be NULL nor empty + */ NamespaceEnum(String ns) { - if(ns == null){ - throw new IllegalArgumentException("The namespace MUST NOT be NULL"); - } - this.ns = ns; + this(null,ns,false); + } + /** + * Defines a namespace by using the {@link #name()} as prefix. If + * <code>true</code> is parsed a second parameter this namespace is marked + * as the default<p> + * <b>NOTE: </b> Only a single namespace can be defined as default. In case + * multiple namespaces are marked as default the one with the lowest + * {@link #ordinal()} will be used as default. This will be the topmost entry + * in this enumeration. + * @param ns the namespace. MUST NOT be <code>null</code> nor empty + * @param defaultPrefix the default namespace indicator + */ + NamespaceEnum(String ns,boolean defaultPrefix) { + this(null,ns,defaultPrefix); } + /** + * Defines a namespace with a customised prefix. This should be used if the + * prefix needs to be different as the {@link #name()} of the enumeration + * entry. + * @param prefix the prefix. If <code>null</code> the {@link #name()} is + * used. MUST NOT be an empty string + * @param ns the namespace. MUST NOT be <code>null</code> nor empty + */ NamespaceEnum(String prefix, String ns) { - this(ns); - this.prefix = prefix; + this(prefix,ns,false); + } + /** + * Defines a namespace with a customised prefix. This should be used if the + * prefix needs to be different as the {@link #name()} of the enumeration + * entry.<p> + * <b>NOTE: </b> Only a single namespace can be defined as default. In case + * multiple namespaces are marked as default the one with the lowest + * {@link #ordinal()} will be used as default. This will be the topmost entry + * in this enumeration. + * @param prefix the prefix. If <code>null</code> the {@link #name()} is + * used. MUST NOT be an empty string + * @param ns the namespace. MUST NOT be <code>null</code> nor empty + * @param defaultPrefix the default namespace indicator + */ + NamespaceEnum(String prefix, String ns,boolean defaultPrefix) { + if(ns == null || ns.isEmpty()){ + throw new IllegalArgumentException("The namespace MUST NOT be NULL nor empty"); + } + this.ns = ns; + if(prefix == null){ + this.prefix = name(); + } else if(prefix.isEmpty()){ + throw new IllegalArgumentException("The prefix MUST NOT be emtpty." + + "Use NULL to use the name or parse the prefix to use"); + } else { + this.prefix = prefix; + } + this.defaultPrefix = defaultPrefix; } public String getNamespace(){ return ns; @@ -95,14 +157,27 @@ public enum NamespaceEnum { /* * ==== Code for Lookup Methods based on Prefix and Namespace ==== */ - private static Map<String, NamespaceEnum> prefix2Namespace; - private static Map<String, NamespaceEnum> namespace2Prefix; + private final static Map<String, NamespaceEnum> prefix2Namespace; + private final static Map<String, NamespaceEnum> namespace2Prefix; + private final static NamespaceEnum defaultNamespace; static { Map<String,NamespaceEnum> p2n = new HashMap<String, NamespaceEnum>(); Map<String,NamespaceEnum> n2p = new HashMap<String, NamespaceEnum>(); //The Exceptions are only thrown to check that this Enum is configured //correctly! + NamespaceEnum defaultNs = null; for(NamespaceEnum entry : NamespaceEnum.values()){ + if(entry.isDefault()){ + if(defaultNs == null){ + defaultNs = entry; + } else { + log.warn("Found multiple default namespace definitions! Will use the one with the lowest ordinal value."); + log.warn(" > used default: prefix:{}, namespace:{}, ordinal:{}", + new Object[]{defaultNs.getPrefix(),defaultNs.getNamespace(),defaultNs.ordinal()}); + log.warn(" > this one : prefix:{}, namespace:{}, ordinal:{}", + new Object[]{entry.getPrefix(),entry.getNamespace(),entry.ordinal()}); + } + } if(p2n.containsKey(entry.getPrefix())){ throw new IllegalStateException( String.format("Prefix %s used for multiple namespaces: %s and %s", @@ -110,6 +185,7 @@ public enum NamespaceEnum { p2n.get(entry.getPrefix()), entry.getNamespace())); } else { + log.debug("add {} -> {} mapping",entry.getPrefix(),entry.getNamespace()); p2n.put(entry.getPrefix(), entry); } if(n2p.containsKey(entry.getNamespace())){ @@ -119,11 +195,13 @@ public enum NamespaceEnum { p2n.get(entry.getNamespace()), entry.getNamespace())); } else { + log.debug("add {} -> {} mapping",entry.getNamespace(),entry.getPrefix()); n2p.put(entry.getNamespace(), entry); } } prefix2Namespace = Collections.unmodifiableMap(p2n); namespace2Prefix = Collections.unmodifiableMap(n2p); + defaultNamespace = defaultNs; } /** * Getter for the {@link NamespaceEnum} entry based on the string namespace @@ -136,18 +214,18 @@ public enum NamespaceEnum { } /** * Getter for the {@link NamespaceEnum} entry based on the prefix - * @param prefix the prefix + * @param prefix the prefix or <code>null</code> to get the default namespace * @return the {@link NamespaceEnum} entry or <code>null</code> if the prased * prefix is not present */ public static NamespaceEnum forPrefix(String prefix){ - return prefix2Namespace.get(prefix); + return prefix == null ? defaultNamespace : prefix2Namespace.get(prefix); } /** - * Lookup if the parsed URI uses one of the registered prefixes of this - * Enumeration. If this is the case, the prefix is replaced by the namespace - * and the full URI is returned. If no prefix is returned, the - * parsed URI is returned + * Lookup if the parsed short URI (e.g "rdfs:label") uses one of the + * registered prefixes of this Enumeration of if the parsed short URI uses + * the default namespace (e.g. "name"). In case the prefix could not be found + * the parsed URI is returned unchanged * @param shortUri the short URI * @return the full URI if the parsed shortUri uses a prefix defined by this * Enumeration. Otherwise the parsed value. @@ -162,7 +240,15 @@ public enum NamespaceEnum { if(namespace!= null){ shortUri = namespace.getNamespace()+shortUri.substring(index+1); } + } else if(defaultNamespace != null){ + shortUri = defaultNamespace.getNamespace()+shortUri; } return shortUri; } + /** + * @return the defaultPrefix + */ + public boolean isDefault() { + return defaultPrefix; + } } Modified: incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/site/ReferencedSite.java URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/site/ReferencedSite.java?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/site/ReferencedSite.java (original) +++ incubator/stanbol/trunk/entityhub/generic/servicesapi/src/main/java/org/apache/stanbol/entityhub/servicesapi/site/ReferencedSite.java Mon Jun 20 12:57:29 2011 @@ -17,7 +17,9 @@ package org.apache.stanbol.entityhub.servicesapi.site; import java.io.InputStream; +import java.util.Set; +import org.apache.stanbol.entityhub.servicesapi.Entityhub; import org.apache.stanbol.entityhub.servicesapi.mapping.FieldMapper; import org.apache.stanbol.entityhub.servicesapi.mapping.FieldMapping; import org.apache.stanbol.entityhub.servicesapi.model.Representation; @@ -31,9 +33,15 @@ import org.apache.stanbol.entityhub.serv public interface ReferencedSite { /** + * List of {@link #getId() ids} that are not allowed to be used (case + * insensitive) for referenced sites. + */ + Set<String> PROHIBITED_SITE_IDS = Entityhub.ENTITYHUB_IDS; + /** * The Id of this site. This Method MUST return the same value as - * <code>{@link #getConfiguration()}.getId()</code>. It is only there to - * make it more easy to access the Id of the site + * <code>{@link #getConfiguration()}.getId()</code>. + * The configured ID MUST NOT be <code>null</code>, empty or one of the + * {@link #PROHIBITED_SITE_IDS}. * @return the ID of this site */ String getId(); Modified: incubator/stanbol/trunk/entityhub/indexing/dblp/src/main/resources/indexing/config/indexing.properties URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/indexing/dblp/src/main/resources/indexing/config/indexing.properties?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/indexing/dblp/src/main/resources/indexing/config/indexing.properties (original) +++ incubator/stanbol/trunk/entityhub/indexing/dblp/src/main/resources/indexing/config/indexing.properties Mon Jun 20 12:57:29 2011 @@ -31,10 +31,10 @@ indexingDestination=org.apache.stanbol.e org.apache.stanbol.entityhub.site.entityPrefix=http://dblp.l3s.de/d2r/resource org.apache.stanbol.entityhub.site.accessUri=http://dblp.l3s.de/d2r/resource/ -org.apache.stanbol.entityhub.site.dereferencerType=org.apache.stanbol.entityhub.site.CoolUriDereferencer +org.apache.stanbol.entityhub.site.dereferencerType=org.apache.stanbol.entityhub.dereferencer.CoolUriDereferencer # The SPARQL endpoint is a d2r server so use a standard sparql server org.apache.stanbol.entityhub.site.queryUri=http://dblp.l3s.de/d2r/sparql -org.apache.stanbol.entityhub.site.searcherType=org.apache.stanbol.entityhub.site.SparqlSearcher +org.apache.stanbol.entityhub.site.searcherType=org.apache.stanbol.entityhub.searcher.SparqlSearcher # The mappings used when importing an entity form this site to the Entityhub (optional) # The value need to point to the file with the mappings within the config directory Modified: incubator/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/indexing.properties URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/indexing.properties?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/indexing.properties (original) +++ incubator/stanbol/trunk/entityhub/indexing/dbpedia/src/main/resources/indexing/config/indexing.properties Mon Jun 20 12:57:29 2011 @@ -49,9 +49,9 @@ org.apache.stanbol.entityhub.site.entity # Dereferencer should use SPARQL because Cool URI will omit statements for popular # Entities. org.apache.stanbol.entityhub.site.accessUri=http://dbpedia.org/sparql/ -org.apache.stanbol.entityhub.site.dereferencerType=org.apache.stanbol.entityhub.site.SparqlDereferencer +org.apache.stanbol.entityhub.site.dereferencerType=org.apache.stanbol.entityhub.dereferencer.SparqlDereferencer # The SPARQL endpoint of DBpedia supports Virtuoso specific extensions -org.apache.stanbol.entityhub.site.searcherType=org.apache.stanbol.entityhub.site.VirtuosoSearcher +org.apache.stanbol.entityhub.site.searcherType=org.apache.stanbol.entityhub.searcher.VirtuosoSearcher org.apache.stanbol.entityhub.site.queryUri=http://dbpedia.org/sparql # The mappings used when importing an entity form this site to the Entityhub (optional) Modified: incubator/stanbol/trunk/entityhub/indexing/genericrdf/src/main/resources/indexing/config/indexing.properties URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/entityhub/indexing/genericrdf/src/main/resources/indexing/config/indexing.properties?rev=1137617&r1=1137616&r2=1137617&view=diff ============================================================================== --- incubator/stanbol/trunk/entityhub/indexing/genericrdf/src/main/resources/indexing/config/indexing.properties (original) +++ incubator/stanbol/trunk/entityhub/indexing/genericrdf/src/main/resources/indexing/config/indexing.properties Mon Jun 20 12:57:29 2011 @@ -114,16 +114,16 @@ indexingDestination=org.apache.stanbol.e #org.apache.stanbol.entityhub.site.accessUri=http://example.org/resource" #org.apache.stanbol.entityhub.site.dereferencerType= # available EntityDereferencer implementation -# - org.apache.stanbol.entityhub.site.CoolUriDereferencer -# - org.apache.stanbol.entityhub.site.SparqlDereferencer +# - org.apache.stanbol.entityhub.dereferencer.CoolUriDereferencer +# - org.apache.stanbol.entityhub.dereferencer.SparqlDereferencer # (b) search entities (queryUri and EntitySearcher implementation) #org.apache.stanbol.entityhub.site.queryUri=http://example.org/sparql #org.apache.stanbol.entityhub.site.searcherType= # available EntitySearcher implementation -# - org.apache.stanbol.entityhub.site.SparqlSearcher (generic SPARQL) -# - org.apache.stanbol.entityhub.site.LarqSearcher (Larq SPARQL extensions) -# - org.apache.stanbol.entityhub.site.VirtuosoSearcher (Virtuoso SPARQL extensions) +# - org.apache.stanbol.entityhub.searcher.SparqlSearcher (generic SPARQL) +# - org.apache.stanbol.entityhub.searcher.LarqSearcher (Larq SPARQL extensions) +# - org.apache.stanbol.entityhub.searcher.VirtuosoSearcher (Virtuoso SPARQL extensions) # The referenced site can also specify additional mappings to be used in the # case an entity of this site is imported to the Entityhub. Added: incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-local.config URL: http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-local.config?rev=1137617&view=auto ============================================================================== --- incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-local.config (added) +++ incubator/stanbol/trunk/launchers/full/src/main/resources/resources/config/org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEngine-local.config Mon Jun 20 12:57:29 2011 @@ -0,0 +1,8 @@ +org.apache.stanbol.enhancer.engines.entitytagging.nameField="name" +org.apache.stanbol.enhancer.engines.entitytagging.personType="Person" +org.apache.stanbol.enhancer.engines.entitytagging.personState=B"true" +org.apache.stanbol.enhancer.engines.entitytagging.referencedSiteId="local" +org.apache.stanbol.enhancer.engines.entitytagging.placeState=B"true" +org.apache.stanbol.enhancer.engines.entitytagging.organisationState=B"true" +org.apache.stanbol.enhancer.engines.entitytagging.organisationType="Organization" +org.apache.stanbol.enhancer.engines.entitytagging.placeType="Place"
