Thanks Rupert,

A description on how to do this is available in [1].


I can't see the [1] :-)

David

On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
[email protected]> wrote:

> Hi
>
> Yes this is possible, but would need (depending on the hardware) quite
> some time.
> A description on how to do this is available in [1].
>
> Instead of installing the dbpedia.solrindex.zip file as described in
> the readme, you could directly
>
> * shutdown stanbol
> * delete the "dbpedia_43k" index in
> "{stanbol-root}/sling/entityhub/solrYard/indexes"
> * copy the index located in the
> "{indexing-root}/indexing/destination/indexes" to
> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
> "dbpedia_43k"
> * restart stanbol.
>
> After that Stanbol should use the new index.
>
> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
> than changing the value of "Solr Index/Core" in the configuration of
> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
> work.
>
> best
> Rupert
>
> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
> <[email protected]> wrote:
> > Hi,
> >
> > As another workaround, I was thinking that I could actually generate
> locally
> > the DBpedia index with all the data using the dumps (
> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
> dbpedia_43k.
> >
> > What do you think?
> >
> > Thanks,
> > David
> >
> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
> > [email protected]> wrote:
> >
> >> Hi
> >>
> >> I will try to find some time in the evening to reproduce this.
> >>
> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
> >> <[email protected]> wrote:
> >> > Thanks Rupert,
> >> >
> >> > I'm trying to follow your instructions but I encounter a couple of
> issues
> >> > (probably due to inexperience):
> >> >  [1] when dropping the config files, they enter some loop of
> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
> >> > bundle), is that normal?
> >>
> >> This is very strange and should not be caused by the FileInstaller.
> >> Maybe there is some loop between the Sling Installer - trying to
> >> install the default configuration and the FileInstaller that may cause
> >> this under some circumstances.
> >>
> >> >  [2] after I restart Stanbol, and try to query an entity from the
> >> entityhub
> >> > I receive the following error:
> >> >
> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with
> >> Yard
> >> > dbpediaCache! This is usually caused by Errors while reading the Cache
> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable
> to
> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by
> >> > Errors while reading the Cache Configuration from the Yard.
> >> > at
> >> >
> >>
> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
> >> >
> >> >
> >> > Do I need to initialize the Cache in some way?
> >> >
> >> No it does not. Prepared in Indexes do include a document that
> >> provides a list of the indexed fields. In future this may be used to
> >> determine if a query can be successfully executed on the local index
> >> or not. In addition this is used in case an Entity within the index is
> >> updated with an newer version.
> >> However this configuration is optional and is not required. This
> >> Exception should only appear if the document is present but illegal
> >> formatted. However the SolrYard initialized for the dbpediaCache
> >> should be empty.
> >>
> >> Therefore I think it is somehow related to the above problem of
> >> overriding configurations.
> >>
> >> In general the way how the default configuration is loaded is
> >> sub-optional in the moment. Especially using a single defaultdata
> >> bundle for both the OpenNLP models and the dbpedia configuration +
> >> default index was not a good Idea, because one can not exclude/change
> >> the dbpedia stuff without affecting other components that depend on
> >> OpenNLP.
> >> Therefore I think we need to discuss how to better structure the
> >> configurations and data needed to run stanbol.
> >>
> >> There is also an other issue that the SolrYard only once copies
> >> provided indexes and does not check for updates. This would it make
> >> hard the upgrade from the small index provided with the default data
> >> to a bigger version.
> >>
> >> Both this things are related to the problems and need to be addressed
> >> before the first stanbol release. Independent of those I will try to
> >> find a simple solution for what you intend to do.
> >>
> >> In the meantime I suggest you go for the initially proposed workaround.
> >>
> >> best
> >> Rupert Westenthaler
> >>
> >> > Thanks for your help,
> >> >
> >> > David
> >> >
> >> >
> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
> >> > [email protected]> wrote:
> >> >
> >> >> Hi
> >> >>
> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
> >> >> <[email protected]> wrote:
> >> >> > I solved in the same way, but loosing the caching capabilities.
> >> >> > Is there any possibility to keep both all the data and the cache?
> >> >> >
> >> >> > Andrea
> >> >> >
> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
> >> >> >
> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
> >> >> >>
> >> >> >> Thanks,
> >> >> >> David
> >> >> >>
> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
> >> >> >> [email protected]> wrote:
> >> >> >>
> >> >> >>> Hi Rupert,
> >> >> >>>
> >> >> >>> I recently updated the Stanbol install, and I found that the RDF
> >> >> returned
> >> >> >>> by the EntityHub is missing some props (specifically the dbprop
> as
> >> far
> >> >> as I
> >> >> >>> can see).
> >> >> >>>
> >> >> >>> This is the command that I use for testing:
> >> >> >>> curl -H "accept: application/rdf+xml" "
> >> >> >>>
> >> >>
> >>
> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
> >> >> >>> "
> >> >> >>>
> >> >> >>> which outputs the attached RDF file.
> >> >> >>>
> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
> >> with
> >> >> the
> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
> >> >> >>>
> >> >> >>> Does this depend on the mapping.txt file?
> >> >> >>>
> >> >>
> >> >> If you plan to create your own dbpedia index, than the mapping.txt
> >> >> file would be the way how to configure what properties are
> >> >> includes/excluded.
> >> >> Typically dbprop values are low quality. They are just naive 1:1
> >> >> mappings of key value pairs as found in the info boxes. Because of
> >> >> this they are excluded from the indexes.
> >> >>
> >> >> At runtime the returned data depend on the used Cache strategy:
> >> >>
> >> >> Currently there are three possibilities (configured with the
> referenced
> >> >> Site)
> >> >> 1) no cache: bot queries and retrieval so use a remote service
> >> >> 2) used: Queries are executed by the remote service. Retrieved
> >> >> Entities are stored locally. The cached data depend on the mappings
> >> >> defined for the cache.
> >> >> 3) all: Both queries and retrieval are based on the cache. The remote
> >> >> service are only used as fallback in the case that the cache is not
> >> >> available (e.g. if you deactivate solrYard).
> >> >>
> >> >> So if you you are fine with (2) than you could use the configuration
> >> >> as previously used by the stable launcher [1].
> >> >> I think the easiest way to install this is to use this is to add the
> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to
> >> >> delete the current referencedSite for dbpedia first and than add the
> >> >> three configuration files as described by [1].
> >> >>
> >> >> If your requirements are not covered by the currently available
> option
> >> >> it would be nice if you could write a short user story, because I am
> >> >> thinking about how to improve this feature and input like that would
> >> >> be really valuable.
> >> >>
> >> >> best
> >> >> Rupert Westenthaler
> >> >>
> >> >> [1] The dbpedia config consists of three files. the referenced site,
> >> >> cache and solryard components with the "-dbpedia" endings.
> >> >>
> >> >>
> >>
> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
> >> >>
> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
> >> >>
> >> >> p.s. I keep this part because it describes very well how the cache
> >> >> strategy "used" work:
> >> >> >>>>> Hi David
> >> >> >>>>>
> >> >> >>>>> Assuming that you are using the default distribution of Apache
> >> >> Stanbol.
> >> >> >>>>>
> >> >> >>>>> Requests for  http://dbpedia.org/resource/Valentino_Rossi will
> be
> >> >> >>>>> - only the first time answered by retrieving the Entity form
> >> >> DBpedia.org
> >> >> >>>>> - the Information are cached in a local cache. By that values
> of
> >> the
> >> >> >>>>> documents are filtered (see (a) for details)
> >> >> >>>>> - the cached version is returned
> >> >> >>>>>
> >> >> >>>>> (a) The default configuration for dbpedia stores all fields
> >> however
> >> >> >>>>> filters values for literals so that only values with the
> language
> >> >> "en,
> >> >> >>>>> de, fr, it, es" or no language are stored.
> >> >> >>>>>
> >> >> >>>>>
> >> >> >>>>> Assuming that you have started for zero when updating to a new
> >> >> version
> >> >> >>>>> this also means that you have downloaded a new version of this
> >> Entity
> >> >> >>>>> from dbPedia.
> >> >> >>>>>
> >> >>
> >> >> --
> >> >> | Rupert Westenthaler             [email protected]
> >> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> >> | A-5500 Bischofshofen
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > David Riccitelli
> >> >
> >> > Interact SpA
> >> > Via A. Bargoni 78 (scala F)
> >> > 00153 Roma
> >> >
> >> > T +39 06 58318 301
> >> > F +39 06 58318 303
> >> >
> >>
> >>
> >>
> >> --
> >> | Rupert Westenthaler             [email protected]
> >> | Bodenlehenstraße 11                             ++43-699-11108907
> >> | A-5500 Bischofshofen
> >>
> >
> >
> >
> > --
> > David Riccitelli
> >
> > Interact SpA
> > Via A. Bargoni 78 (scala F)
> > 00153 Roma
> >
> > T +39 06 58318 301
> > F +39 06 58318 303
> >
>
>
>
> --
> | Rupert Westenthaler             [email protected]
> | Bodenlehenstraße 11                             ++43-699-11108907
> | A-5500 Bischofshofen
>



-- 
David Riccitelli

Interact SpA
Via A. Bargoni 78 (scala F)
00153 Roma

T +39 06 58318 301
F +39 06 58318 303

Reply via email to