Hi Rupert,
Thanks for testing it on your side.
I invest and compare iptc configuration VS mine and found the problem !
This come from this line in indexing.properties :
# the entity prefixes are used to determine if an entity needs to be
searched
# on a referenced site. If not specified requests for any entity will be
# forwarded to this referenced site.
# use ';' to seperate multiple values
#org.apache.stanbol.entityhub.site.entityPrefix=http://example.org/resource;urn:mycompany:
Reading this comment, I first leave it commented (not specify an
entityPrefix), because reading the comment I understand that in any
case, all requests go to it... and that's fine ! :)
But in fact with this configuration, in Felix configuration "Apache
Stanbol Entityhub Referenced Site Configuration", entity prefixes are
set by default to :
- http://dbpedia.org/resource/
- http://dbpedia.org/ontology/
So IMO, there may be a bug in the code, or the comment may be change.
During this investigation I also "discover" theses (not closely related
to this problem) :
1) There is a typo error in mapping.txt :
==> change
# copy dc:titel to rdfs:label
dc:titel > rdfs:label
==> to
# copy dc:title to rdfs:label
dc:title > rdfs:label
2) In the Felix console, when try to modify the "Apache Stanbol
Entityhub Referenced Site Configuration" of an imported index.
There is an ajax error on save :
The request failed:
[object XMLDocument]
(I see this when try to modify entity prefixes of my imported index).
Please ask if you prefer to have Jira tickets for this issues (if they
are really ones).
Thanks for you help.
++
On 06/13/2011 03:54 PM, Rupert Westenthaler wrote:
Hi florent
Using a '#' in the URI has the disadvantages, that browsers will not
send the part behind the hash to the server because they assume, that
they need to download the whole document and navigate to the anchor
within the document.
Using curl (or javascript) I think the full URL should be sent to the
server (was not able to find some good information about this, but at
least "curl -v" says that it sends the whole URL to the server).
However on the server side Jersey does also not provide the #{anchor}
part of the URL.
Sending
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
will parse only "http://www.test.fr/terminology" to a method annotated with
@GET
@Path("/entity")
public Response getEntity(@QueryParam(value = "id") String id) {
// get the Entity
...
URL encoding the '#' to '%23' causes Jersey to parse
"http://www.test.fr/terminology#entity_gradient_1306341921902".
In this case the query for an entity with this ID is correctly parsed
to the ReferencedSite ( '#' not '%23'). So if you parse '%23' and the
indexed Entity uses '#' it should work as long as Entities are cached
locally. If a remote service is used, than the same problem of the '#'
reappears for the remote service.
To test on my side I have done the following:
* renamed the Entities of the IPTC worldregions from
"http://cv.iptc.org/newscodes/worldregion/r001" to
"http://cv.iptc.org/newscodes/worldregion#r001"
* indexed the IPTC using the indexing tools
* installed the index to the entityhub
* curl -v
"http://localhost:8080/entityhub/sites/entity?id=http://cv.iptc.org/newscodes/worldregion%23r001"
Assuming that
curl
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
==> answer is
Entity with ID
'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
an any referenced site
happend on a referenced site with a full cache (e.g. as created by the
Indexing Utility. I was not able to reproduce the Error. If the
referenced site uses a remote service to dereferenced entity ids (e.g.
the Cool URI) this might happen. In this case I suggest to directly
test the remote service.
best
Rupert Westenthaler
On Mon, Jun 13, 2011 at 1:06 PM, florent andré
<[email protected]> wrote:
Hi Rupert,
Hope you are fine.
I have another problem...
In my skos, entity are identify by an #, like this :
<rdf:Description
rdf:about="http://www.test.fr/terminology#entity_gradient_1306341921902">
<skos:broader
rdf:resource="http://www.test.fr/terminology#entity_operateur_mathematique_1306341918995"/>
<skos:prefLabel>GRADIENT</skos:prefLabel>
<skos:inScheme
rdf:resource="http://www.test.fr/terminology#space_mathematiques_1306341820765"/>
<rdf:type rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
</rdf:Description>
And I can't arrive to find the entity with the entity endpoint.
* With the # char :
curl
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology#entity_gradient_1306341921902"
==> answer is
Entity with ID 'http://www.test.fr/terminology' not found an any referenced
site
==> the part after the # is remove
* With replacement of the # by %23 (the urlencode equivalent) :
curl
"http://localhost:8080/entityhub/sites/entity?id=http://www.test.fr/terminology%23space_mathematiques_1306341820765"
==> answer is
Entity with ID
'http://www.test.fr/terminology#space_mathematiques_1306341820765' not found
an any referenced site
==> all the id is keep, but still not found...
The result is the same if I urlencode all the entity id.
This is related to a bug or something I do wrong ?
Thanks.
++