Know I should add to jira, just want to make sure I didn't need to some
additional step to get the index to work

Was actually getting a different error, not the cache thing, but index not
yet installed when use /engines

On Windows with latest code get the index not yet installed error (and
weirdly also with what I built 7/10 that used to work with the
sling/datafiles workaround on Windows)  (Linux with 7/10 code is still
fine):

(org.apache.stanbol.enhancer.servicesapi.EngineException:
'NamedEntityTaggingEngine' failed to process content item
'urn:content-item-sha1-88a2b5f6520df87e4567c06b48e742b7d1c71e9c' with type
'text/plain': org.apache.stanbol.entityhub.servicesapi.yard.YardException:
SolrIndex entityhub is not available. The necessary Index is not yet
installed.) org.apache.stanbol.enhancer.servicesapi.EngineException:
'NamedEntityTaggingEngine' failed to process content item
'urn:content-item-sha1-88a2b5f6520df87e4567c06b48e742b7d1c71e9c' with type
'text/plain': org.apache.stanbol.entityhub.servicesapi.yard.YardException:
SolrIndex entityhub is not available. The necessary Index is not yet
installed.
        at
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng
ine.computeEnhancements(NamedEntityTaggingEngine.java:323)

Have workarounds in sling/datafiles (have en-*.bin,
dbpedia_43k.solrindex.zip )
 (change for STANBOL-259, as Fabian commented, didn't fix the en-*.bin load
issue, still needed the workaround)

>From the felix web console "Stanbol Data File Provider"
Seems to be looking for entityhub.solrindex.zip and and not finding it
(tried having dbpedia_43k.solrindex.zip copied to entityhub.solrindex.zip in
datafiles but got same error after restart and engine use)

Tried also after being clean:  blow away sling dir, mvn clean, run shell
script script to get defaultdata files,  mvn install -DskipTests  
MAVEN_OPTS=-Xmx1024M -XX:MaxPermSize=128M in env

Steve
-----Original Message-----
From: Steve Reiner [mailto:[email protected]] 
Sent: Wednesday, July 13, 2011 12:09 AM
To: '[email protected]'
Subject: RE: EntityHub and DBpedia

I am getting something like this too after updating with the code checked in
yesterday. Problem wasn't there in the code the day before.

(using /engines page)

-----Original Message-----
From: David Riccitelli [mailto:[email protected]]
Sent: Tuesday, July 12, 2011 11:58 PM
To: [email protected]
Subject: Re: EntityHub and DBpedia

Thanks Rupert,

I'm trying to follow your instructions but I encounter a couple of issues
(probably due to inexperience):
 [1] when dropping the config files, they enter some loop of
REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
bundle), is that normal?
 [2] after I restart Stanbol, and try to query an entity from the entityhub
I receive the following error:

13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
org.apache.felix.http.jetty /entityhub/sites/entity/
(java.lang.IllegalStateException: Unable to initialize the Cache with Yard
dbpediaCache! This is usually caused by Errors while reading the Cache
Configuration from the Yard.) java.lang.IllegalStateException: Unable to
initialize the Cache with Yard dbpediaCache! This is usually caused by
Errors while reading the Cache Configuration from the Yard.
at
org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java
:214)


Do I need to initialize the Cache in some way?

Thanks for your help,

David


On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
[email protected]> wrote:

> Hi
>
> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese 
> <[email protected]> wrote:
> > I solved in the same way, but loosing the caching capabilities.
> > Is there any possibility to keep both all the data and the cache?
> >
> > Andrea
> >
> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >
> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >>
> >> Thanks,
> >> David
> >>
> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli < 
> >> [email protected]> wrote:
> >>
> >>> Hi Rupert,
> >>>
> >>> I recently updated the Stanbol install, and I found that the RDF
> returned
> >>> by the EntityHub is missing some props (specifically the dbprop as 
> >>> far
> as I
> >>> can see).
> >>>
> >>> This is the command that I use for testing:
> >>> curl -H "accept: application/rdf+xml" "
> >>>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.
> org/resource/Valentino_Rossi
> >>> "
> >>>
> >>> which outputs the attached RDF file.
> >>>
> >>> I cleared all of the sling folder (rm -fr sling) and checked the 
> >>> with
> the
> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >>>
> >>> Does this depend on the mapping.txt file?
> >>>
>
> If you plan to create your own dbpedia index, than the mapping.txt 
> file would be the way how to configure what properties are 
> includes/excluded.
> Typically dbprop values are low quality. They are just naive 1:1 
> mappings of key value pairs as found in the info boxes. Because of 
> this they are excluded from the indexes.
>
> At runtime the returned data depend on the used Cache strategy:
>
> Currently there are three possibilities (configured with the 
> referenced
> Site)
> 1) no cache: bot queries and retrieval so use a remote service
> 2) used: Queries are executed by the remote service. Retrieved 
> Entities are stored locally. The cached data depend on the mappings 
> defined for the cache.
> 3) all: Both queries and retrieval are based on the cache. The remote 
> service are only used as fallback in the case that the cache is not 
> available (e.g. if you deactivate solrYard).
>
> So if you you are fine with (2) than you could use the configuration 
> as previously used by the stable launcher [1].
> I think the easiest way to install this is to use this is to add the 
> Felix File Installer [2] to the Stanbol Environment. You will need to 
> delete the current referencedSite for dbpedia first and than add the 
> three configuration files as described by [1].
>
> If your requirements are not covered by the currently available option 
> it would be nice if you could write a short user story, because I am 
> thinking about how to improve this feature and input like that would 
> be really valuable.
>
> best
> Rupert Westenthaler
>
> [1] The dbpedia config consists of three files. the referenced site, 
> cache and solryard components with the "-dbpedia" endings.
>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/
> src/main/resources/resources/config/?pathrev=1140181
>
> [2] http://felix.apache.org/site/apache-felix-file-install.html
>
> p.s. I keep this part because it describes very well how the cache 
> strategy "used" work:
> >>>>> Hi David
> >>>>>
> >>>>> Assuming that you are using the default distribution of Apache
> Stanbol.
> >>>>>
> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will 
> >>>>> be
> >>>>> - only the first time answered by retrieving the Entity form
> DBpedia.org
> >>>>> - the Information are cached in a local cache. By that values of 
> >>>>> the documents are filtered (see (a) for details)
> >>>>> - the cached version is returned
> >>>>>
> >>>>> (a) The default configuration for dbpedia stores all fields 
> >>>>> however filters values for literals so that only values with the 
> >>>>> language
> "en,
> >>>>> de, fr, it, es" or no language are stored.
> >>>>>
> >>>>>
> >>>>> Assuming that you have started for zero when updating to a new
> version
> >>>>> this also means that you have downloaded a new version of this 
> >>>>> Entity from dbPedia.
> >>>>>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



--
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Reply via email to