Hi David, all
With the changes from yesterday (revision r1148947 and r1148948) it is
now easily possible to deactivate the default configuration for
dbPedia provided by the Stanbol launcher and to replace it with the
one the uses the remote services with a local cache.
Steps:
1. use the current launcher
2. go to the Bundle tab of the Apache Felix Webconsole
3. stop the Bundle "Apache Stanbol Data: DBpedia.org defaultdata
version (org.apache.stanbol.data.sites.dbpedia.default)"
4. install and start the Bundle "Apache Stanbol Data: Remote
DBpedia.org with local cache
(org.apache.stanbol.data.sites.dbpedia.cached)". You can find this
bundle in "{stanbol-trunk}/data/sites/dbpediacached".
best
Rupert
On Mon, Jul 18, 2011 at 1:15 PM, Rupert Westenthaler
<[email protected]> wrote:
> Hi
>
> Rather than working on the Workaround I decided to invest some time in
> finishing STANBOL-140 and implementing STANBOL-287.
> Together with the proposal made in [1] to split up the default data in
> several bundles this should solve the issues described/discussed here.
>
> best
> Rupert
>
> [1] http://markmail.org/message/bf7qurmzos45h23b
>
> On Thu, Jul 14, 2011 at 8:34 AM, Rupert Westenthaler
> <[email protected]> wrote:
>> On Thu, Jul 14, 2011 at 8:30 AM, David Riccitelli
>> <[email protected]> wrote:
>>> Thanks Rupert,
>>>
>>> A description on how to do this is available in [1].
>>>
>>>
>>> I can't see the [1] :-)
>>
>> does this count as missing attachment? ^^
>>
>> [1]
>> http://svn.apache.org/repos/asf/incubator/stanbol/trunk/entityhub/yard/solr/src/main/resources/solr/core/
>>
>>>
>>> David
>>>
>>> On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler <
>>> [email protected]> wrote:
>>>
>>>> Hi
>>>>
>>>> Yes this is possible, but would need (depending on the hardware) quite
>>>> some time.
>>>> A description on how to do this is available in [1].
>>>>
>>>> Instead of installing the dbpedia.solrindex.zip file as described in
>>>> the readme, you could directly
>>>>
>>>> * shutdown stanbol
>>>> * delete the "dbpedia_43k" index in
>>>> "{stanbol-root}/sling/entityhub/solrYard/indexes"
>>>> * copy the index located in the
>>>> "{indexing-root}/indexing/destination/indexes" to
>>>> "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to
>>>> "dbpedia_43k"
>>>> * restart stanbol.
>>>>
>>>> After that Stanbol should use the new index.
>>>>
>>>> Copying the "dbpedia.solrindex.zip" to the datafiles directory and
>>>> than changing the value of "Solr Index/Core" in the configuration of
>>>> the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also
>>>> work.
>>>>
>>>> best
>>>> Rupert
>>>>
>>>> On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli
>>>> <[email protected]> wrote:
>>>> > Hi,
>>>> >
>>>> > As another workaround, I was thinking that I could actually generate
>>>> locally
>>>> > the DBpedia index with all the data using the dumps (
>>>> > http://wiki.dbpedia.org/Downloads36), in a way similar to the
>>>> dbpedia_43k.
>>>> >
>>>> > What do you think?
>>>> >
>>>> > Thanks,
>>>> > David
>>>> >
>>>> > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler <
>>>> > [email protected]> wrote:
>>>> >
>>>> >> Hi
>>>> >>
>>>> >> I will try to find some time in the evening to reproduce this.
>>>> >>
>>>> >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli
>>>> >> <[email protected]> wrote:
>>>> >> > Thanks Rupert,
>>>> >> >
>>>> >> > I'm trying to follow your instructions but I encounter a couple of
>>>> issues
>>>> >> > (probably due to inexperience):
>>>> >> > [1] when dropping the config files, they enter some loop of
>>>> >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall
>>>> >> > bundle), is that normal?
>>>> >>
>>>> >> This is very strange and should not be caused by the FileInstaller.
>>>> >> Maybe there is some loop between the Sling Installer - trying to
>>>> >> install the default configuration and the FileInstaller that may cause
>>>> >> this under some circumstances.
>>>> >>
>>>> >> > [2] after I restart Stanbol, and try to query an entity from the
>>>> >> entityhub
>>>> >> > I receive the following error:
>>>> >> >
>>>> >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0]
>>>> >> > org.apache.felix.http.jetty /entityhub/sites/entity/
>>>> >> > (java.lang.IllegalStateException: Unable to initialize the Cache with
>>>> >> Yard
>>>> >> > dbpediaCache! This is usually caused by Errors while reading the Cache
>>>> >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable
>>>> to
>>>> >> > initialize the Cache with Yard dbpediaCache! This is usually caused by
>>>> >> > Errors while reading the Cache Configuration from the Yard.
>>>> >> > at
>>>> >> >
>>>> >>
>>>> org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214)
>>>> >> >
>>>> >> >
>>>> >> > Do I need to initialize the Cache in some way?
>>>> >> >
>>>> >> No it does not. Prepared in Indexes do include a document that
>>>> >> provides a list of the indexed fields. In future this may be used to
>>>> >> determine if a query can be successfully executed on the local index
>>>> >> or not. In addition this is used in case an Entity within the index is
>>>> >> updated with an newer version.
>>>> >> However this configuration is optional and is not required. This
>>>> >> Exception should only appear if the document is present but illegal
>>>> >> formatted. However the SolrYard initialized for the dbpediaCache
>>>> >> should be empty.
>>>> >>
>>>> >> Therefore I think it is somehow related to the above problem of
>>>> >> overriding configurations.
>>>> >>
>>>> >> In general the way how the default configuration is loaded is
>>>> >> sub-optional in the moment. Especially using a single defaultdata
>>>> >> bundle for both the OpenNLP models and the dbpedia configuration +
>>>> >> default index was not a good Idea, because one can not exclude/change
>>>> >> the dbpedia stuff without affecting other components that depend on
>>>> >> OpenNLP.
>>>> >> Therefore I think we need to discuss how to better structure the
>>>> >> configurations and data needed to run stanbol.
>>>> >>
>>>> >> There is also an other issue that the SolrYard only once copies
>>>> >> provided indexes and does not check for updates. This would it make
>>>> >> hard the upgrade from the small index provided with the default data
>>>> >> to a bigger version.
>>>> >>
>>>> >> Both this things are related to the problems and need to be addressed
>>>> >> before the first stanbol release. Independent of those I will try to
>>>> >> find a simple solution for what you intend to do.
>>>> >>
>>>> >> In the meantime I suggest you go for the initially proposed workaround.
>>>> >>
>>>> >> best
>>>> >> Rupert Westenthaler
>>>> >>
>>>> >> > Thanks for your help,
>>>> >> >
>>>> >> > David
>>>> >> >
>>>> >> >
>>>> >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler <
>>>> >> > [email protected]> wrote:
>>>> >> >
>>>> >> >> Hi
>>>> >> >>
>>>> >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese
>>>> >> >> <[email protected]> wrote:
>>>> >> >> > I solved in the same way, but loosing the caching capabilities.
>>>> >> >> > Is there any possibility to keep both all the data and the cache?
>>>> >> >> >
>>>> >> >> > Andrea
>>>> >> >> >
>>>> >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote:
>>>> >> >> >
>>>> >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me.
>>>> >> >> >>
>>>> >> >> >> Thanks,
>>>> >> >> >> David
>>>> >> >> >>
>>>> >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli <
>>>> >> >> >> [email protected]> wrote:
>>>> >> >> >>
>>>> >> >> >>> Hi Rupert,
>>>> >> >> >>>
>>>> >> >> >>> I recently updated the Stanbol install, and I found that the RDF
>>>> >> >> returned
>>>> >> >> >>> by the EntityHub is missing some props (specifically the dbprop
>>>> as
>>>> >> far
>>>> >> >> as I
>>>> >> >> >>> can see).
>>>> >> >> >>>
>>>> >> >> >>> This is the command that I use for testing:
>>>> >> >> >>> curl -H "accept: application/rdf+xml" "
>>>> >> >> >>>
>>>> >> >>
>>>> >>
>>>> http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi
>>>> >> >> >>> "
>>>> >> >> >>>
>>>> >> >> >>> which outputs the attached RDF file.
>>>> >> >> >>>
>>>> >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the
>>>> >> with
>>>> >> >> the
>>>> >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it.
>>>> >> >> >>>
>>>> >> >> >>> Does this depend on the mapping.txt file?
>>>> >> >> >>>
>>>> >> >>
>>>> >> >> If you plan to create your own dbpedia index, than the mapping.txt
>>>> >> >> file would be the way how to configure what properties are
>>>> >> >> includes/excluded.
>>>> >> >> Typically dbprop values are low quality. They are just naive 1:1
>>>> >> >> mappings of key value pairs as found in the info boxes. Because of
>>>> >> >> this they are excluded from the indexes.
>>>> >> >>
>>>> >> >> At runtime the returned data depend on the used Cache strategy:
>>>> >> >>
>>>> >> >> Currently there are three possibilities (configured with the
>>>> referenced
>>>> >> >> Site)
>>>> >> >> 1) no cache: bot queries and retrieval so use a remote service
>>>> >> >> 2) used: Queries are executed by the remote service. Retrieved
>>>> >> >> Entities are stored locally. The cached data depend on the mappings
>>>> >> >> defined for the cache.
>>>> >> >> 3) all: Both queries and retrieval are based on the cache. The remote
>>>> >> >> service are only used as fallback in the case that the cache is not
>>>> >> >> available (e.g. if you deactivate solrYard).
>>>> >> >>
>>>> >> >> So if you you are fine with (2) than you could use the configuration
>>>> >> >> as previously used by the stable launcher [1].
>>>> >> >> I think the easiest way to install this is to use this is to add the
>>>> >> >> Felix File Installer [2] to the Stanbol Environment. You will need to
>>>> >> >> delete the current referencedSite for dbpedia first and than add the
>>>> >> >> three configuration files as described by [1].
>>>> >> >>
>>>> >> >> If your requirements are not covered by the currently available
>>>> option
>>>> >> >> it would be nice if you could write a short user story, because I am
>>>> >> >> thinking about how to improve this feature and input like that would
>>>> >> >> be really valuable.
>>>> >> >>
>>>> >> >> best
>>>> >> >> Rupert Westenthaler
>>>> >> >>
>>>> >> >> [1] The dbpedia config consists of three files. the referenced site,
>>>> >> >> cache and solryard components with the "-dbpedia" endings.
>>>> >> >>
>>>> >> >>
>>>> >>
>>>> http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181
>>>> >> >>
>>>> >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html
>>>> >> >>
>>>> >> >> p.s. I keep this part because it describes very well how the cache
>>>> >> >> strategy "used" work:
>>>> >> >> >>>>> Hi David
>>>> >> >> >>>>>
>>>> >> >> >>>>> Assuming that you are using the default distribution of Apache
>>>> >> >> Stanbol.
>>>> >> >> >>>>>
>>>> >> >> >>>>> Requests for http://dbpedia.org/resource/Valentino_Rossi will
>>>> be
>>>> >> >> >>>>> - only the first time answered by retrieving the Entity form
>>>> >> >> DBpedia.org
>>>> >> >> >>>>> - the Information are cached in a local cache. By that values
>>>> of
>>>> >> the
>>>> >> >> >>>>> documents are filtered (see (a) for details)
>>>> >> >> >>>>> - the cached version is returned
>>>> >> >> >>>>>
>>>> >> >> >>>>> (a) The default configuration for dbpedia stores all fields
>>>> >> however
>>>> >> >> >>>>> filters values for literals so that only values with the
>>>> language
>>>> >> >> "en,
>>>> >> >> >>>>> de, fr, it, es" or no language are stored.
>>>> >> >> >>>>>
>>>> >> >> >>>>>
>>>> >> >> >>>>> Assuming that you have started for zero when updating to a new
>>>> >> >> version
>>>> >> >> >>>>> this also means that you have downloaded a new version of this
>>>> >> Entity
>>>> >> >> >>>>> from dbPedia.
>>>> >> >> >>>>>
>>>> >> >>
>>>> >> >> --
>>>> >> >> | Rupert Westenthaler [email protected]
>>>> >> >> | Bodenlehenstraße 11 ++43-699-11108907
>>>> >> >> | A-5500 Bischofshofen
>>>> >> >>
>>>> >> >
>>>> >> >
>>>> >> >
>>>> >> > --
>>>> >> > David Riccitelli
>>>> >> >
>>>> >> > Interact SpA
>>>> >> > Via A. Bargoni 78 (scala F)
>>>> >> > 00153 Roma
>>>> >> >
>>>> >> > T +39 06 58318 301
>>>> >> > F +39 06 58318 303
>>>> >> >
>>>> >>
>>>> >>
>>>> >>
>>>> >> --
>>>> >> | Rupert Westenthaler [email protected]
>>>> >> | Bodenlehenstraße 11 ++43-699-11108907
>>>> >> | A-5500 Bischofshofen
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > --
>>>> > David Riccitelli
>>>> >
>>>> > Interact SpA
>>>> > Via A. Bargoni 78 (scala F)
>>>> > 00153 Roma
>>>> >
>>>> > T +39 06 58318 301
>>>> > F +39 06 58318 303
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> | Rupert Westenthaler [email protected]
>>>> | Bodenlehenstraße 11 ++43-699-11108907
>>>> | A-5500 Bischofshofen
>>>>
>>>
>>>
>>>
>>> --
>>> David Riccitelli
>>>
>>> Interact SpA
>>> Via A. Bargoni 78 (scala F)
>>> 00153 Roma
>>>
>>> T +39 06 58318 301
>>> F +39 06 58318 303
>>>
>>
>>
>>
>> --
>> | Rupert Westenthaler [email protected]
>> | Bodenlehenstraße 11 ++43-699-11108907
>> | A-5500 Bischofshofen
>>
>
>
>
> --
> | Rupert Westenthaler [email protected]
> | Bodenlehenstraße 11 ++43-699-11108907
> | A-5500 Bischofshofen
>
--
| Rupert Westenthaler [email protected]
| Bodenlehenstraße 11 ++43-699-11108907
| A-5500 Bischofshofen