Know I should add to jira, just want to make sure I didn't need to some
additional step to get the index to work
Was actually getting a different error, not the cache thing, but index not
yet installed when use /engines
On Windows with latest code get the index not yet installed error (and
weirdly also with what I built 7/10 that used to work with the
sling/datafiles workaround on Windows) (Linux with 7/10 code is still
fine):
(org.apache.stanbol.enhancer.servicesapi.EngineException:
'NamedEntityTaggingEngine' failed to process content item
'urn:content-item-sha1-88a2b5f6520df87e4567c06b48e742b7d1c71e9c' with type
'text/plain': org.apache.stanbol.entityhub.servicesapi.yard.YardException:
SolrIndex entityhub is not available. The necessary Index is not yet
installed.) org.apache.stanbol.enhancer.servicesapi.EngineException:
'NamedEntityTaggingEngine' failed to process content item
'urn:content-item-sha1-88a2b5f6520df87e4567c06b48e742b7d1c71e9c' with type
'text/plain': org.apache.stanbol.entityhub.servicesapi.yard.YardException:
SolrIndex entityhub is not available. The necessary Index is not yet
installed.
at
org.apache.stanbol.enhancer.engines.entitytagging.impl.NamedEntityTaggingEng
ine.computeEnhancements(NamedEntityTaggingEngine.java:323)
Have workarounds in sling/datafiles (have en-*.bin,
dbpedia_43k.solrindex.zip )
(change for STANBOL-259, as Fabian commented, didn't fix the en-*.bin load
issue, still needed the workaround)
>From the felix web console "Stanbol Data File Provider"
Seems to be looking for entityhub.solrindex.zip and and not finding it
(tried having dbpedia_43k.solrindex.zip copied to entityhub.solrindex.zip in
datafiles but got same error after restart and engine use)
Tried also after being clean: blow away sling dir, mvn clean, run shell
script script to get defaultdata files, mvn install -DskipTests
MAVEN_OPTS=-Xmx1024M -XX:MaxPermSize=128M in env
Steve
-----Original Message-----
From: Steve Reiner [mailto:[email protected]]
Sent: Wednesday, July 13, 2011 12:09 AM
To: '[email protected]'
Subject: RE: EntityHub and DBpedia
I am getting something like this too after updating with the code checked in
yesterday. Problem wasn't there in the code the day before.
(using /engines page)
-----Original Message-----
From: David Riccitelli [mailto:[email protected]]
Sent: Tuesday, July 12, 2011 11:58 PM
To: [email protected]
Subject: Re: EntityHub and DBpedia
Thanks Rupert,
I'm trying to follow your instructions but I encounter a couple of issues
(probably due to inexperience):
[1] when dropping the config files, they enter some loop of
REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
bundle), is that normal?
[2] after I restart Stanbol, and try to query an entity from the entityhub
I receive the following error:
13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
org.apache.felix.http.jetty /entityhub/sites/entity/
(java.lang.IllegalStateException: Unable to initialize the Cache with Yard
dbpediaCache! This is usually caused by Errors while reading the Cache
Configuration from the Yard.) java.lang.IllegalStateException: Unable to
initialize the Cache with Yard dbpediaCache! This is usually caused by
Errors while reading the Cache Configuration from the Yard.
at
org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java
:214)
Do I need to initialize the Cache in some way?
Thanks for your help,
David
On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
[email protected]> wrote:
> Hi
>
> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> <[email protected]> wrote:
> > I solved in the same way, but loosing the caching capabilities.
> > Is there any possibility to keep both all the data and the cache?
> >
> > Andrea
> >
> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >
> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >>
> >> Thanks,
> >> David
> >>
> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >> [email protected]> wrote:
> >>
> >>> Hi Rupert,
> >>>
> >>> I recently updated the Stanbol install, and I found that the RDF
> returned
> >>> by the EntityHub is missing some props (specifically the dbprop as
> >>> far
> as I
> >>> can see).
> >>>
> >>> This is the command that I use for testing:
> >>> curl -H "accept: application/rdf+xml" "
> >>>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.
> org/resource/Valentino_Rossi
> >>> "
> >>>
> >>> which outputs the attached RDF file.
> >>>
> >>> I cleared all of the sling folder (rm -fr sling) and checked the
> >>> with
> the
> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >>>
> >>> Does this depend on the mapping.txt file?
> >>>
>
> If you plan to create your own dbpedia index, than the mapping.txt
> file would be the way how to configure what properties are
> includes/excluded.
> Typically dbprop values are low quality. They are just naive 1:1
> mappings of key value pairs as found in the info boxes. Because of
> this they are excluded from the indexes.
>
> At runtime the returned data depend on the used Cache strategy:
>
> Currently there are three possibilities (configured with the
> referenced
> Site)
> 1) no cache: bot queries and retrieval so use a remote service
> 2) used: Queries are executed by the remote service. Retrieved
> Entities are stored locally. The cached data depend on the mappings
> defined for the cache.
> 3) all: Both queries and retrieval are based on the cache. The remote
> service are only used as fallback in the case that the cache is not
> available (e.g. if you deactivate solrYard).
>
> So if you you are fine with (2) than you could use the configuration
> as previously used by the stable launcher [1].
> I think the easiest way to install this is to use this is to add the
> Felix File Installer [2] to the Stanbol Environment. You will need to
> delete the current referencedSite for dbpedia first and than add the
> three configuration files as described by [1].
>
> If your requirements are not covered by the currently available option
> it would be nice if you could write a short user story, because I am
> thinking about how to improve this feature and input like that would
> be really valuable.
>
> best
> Rupert Westenthaler
>
> [1] The dbpedia config consists of three files. the referenced site,
> cache and solryard components with the "-dbpedia" endings.
>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/
> src/main/resources/resources/config/?pathrev=1140181
>
> [2] http://felix.apache.org/site/apache-felix-file-install.html
>
> p.s. I keep this part because it describes very well how the cache
> strategy "used" work:
> >>>>> Hi David
> >>>>>
> >>>>> Assuming that you are using the default distribution of Apache
> Stanbol.
> >>>>>
> >>>>> Requests for http://dbpedia.org/resource/Valentino_Rossi will
> >>>>> be
> >>>>> - only the first time answered by retrieving the Entity form
> DBpedia.org
> >>>>> - the Information are cached in a local cache. By that values of
> >>>>> the documents are filtered (see (a) for details)
> >>>>> - the cached version is returned
> >>>>>
> >>>>> (a) The default configuration for dbpedia stores all fields
> >>>>> however filters values for literals so that only values with the
> >>>>> language
> "en,
> >>>>> de, fr, it, es" or no language are stored.
> >>>>>
> >>>>>
> >>>>> Assuming that you have started for zero when updating to a new
> version
> >>>>> this also means that you have downloaded a new version of this
> >>>>> Entity from dbPedia.
> >>>>>
>
> --
> | Rupert Westenthaler [email protected]
> | Bodenlehenstraße 11 ++43-699-11108907
> | A-5500 Bischofshofen
>
--
David Riccitelli
Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma
T +39 06 58318 301
F +39 06 58318 303