Thanks Rupert, A description on how to do this is available in [1].
I can't see the [1] :-) David On Thu, Jul 14, 2011 at 8:56 AM, Rupert Westenthaler < [email protected]> wrote: > Hi > > Yes this is possible, but would need (depending on the hardware) quite > some time. > A description on how to do this is available in [1]. > > Instead of installing the dbpedia.solrindex.zip file as described in > the readme, you could directly > > * shutdown stanbol > * delete the "dbpedia_43k" index in > "{stanbol-root}/sling/entityhub/solrYard/indexes" > * copy the index located in the > "{indexing-root}/indexing/destination/indexes" to > "{stanbol-root}/sling/entityhub/solrYard/indexes" and rename it to > "dbpedia_43k" > * restart stanbol. > > After that Stanbol should use the new index. > > Copying the "dbpedia.solrindex.zip" to the datafiles directory and > than changing the value of "Solr Index/Core" in the configuration of > the SolrYard for dbPedia form "dbpedia_43k" to "dbpedia" should also > work. > > best > Rupert > > On Wed, Jul 13, 2011 at 11:58 AM, David Riccitelli > <[email protected]> wrote: > > Hi, > > > > As another workaround, I was thinking that I could actually generate > locally > > the DBpedia index with all the data using the dumps ( > > http://wiki.dbpedia.org/Downloads36), in a way similar to the > dbpedia_43k. > > > > What do you think? > > > > Thanks, > > David > > > > On Wed, Jul 13, 2011 at 12:11 PM, Rupert Westenthaler < > > [email protected]> wrote: > > > >> Hi > >> > >> I will try to find some time in the evening to reproduce this. > >> > >> On Wed, Jul 13, 2011 at 8:57 AM, David Riccitelli > >> <[email protected]> wrote: > >> > Thanks Rupert, > >> > > >> > I'm trying to follow your instructions but I encounter a couple of > issues > >> > (probably due to inexperience): > >> > [1] when dropping the config files, they enter some loop of > >> > REGISTERING/UNREGISTERING (which I solve by stopping the FileInstall > >> > bundle), is that normal? > >> > >> This is very strange and should not be caused by the FileInstaller. > >> Maybe there is some loop between the Sling Installer - trying to > >> install the default configuration and the FileInstaller that may cause > >> this under some circumstances. > >> > >> > [2] after I restart Stanbol, and try to query an entity from the > >> entityhub > >> > I receive the following error: > >> > > >> > 13.07.2011 09:54:17.939 *WARN* [509017110@qtp-1586831707-0] > >> > org.apache.felix.http.jetty /entityhub/sites/entity/ > >> > (java.lang.IllegalStateException: Unable to initialize the Cache with > >> Yard > >> > dbpediaCache! This is usually caused by Errors while reading the Cache > >> > Configuration from the Yard.) java.lang.IllegalStateException: Unable > to > >> > initialize the Cache with Yard dbpediaCache! This is usually caused by > >> > Errors while reading the Cache Configuration from the Yard. > >> > at > >> > > >> > org.apache.stanbol.entityhub.core.site.CacheImpl.getCacheYard(CacheImpl.java:214) > >> > > >> > > >> > Do I need to initialize the Cache in some way? > >> > > >> No it does not. Prepared in Indexes do include a document that > >> provides a list of the indexed fields. In future this may be used to > >> determine if a query can be successfully executed on the local index > >> or not. In addition this is used in case an Entity within the index is > >> updated with an newer version. > >> However this configuration is optional and is not required. This > >> Exception should only appear if the document is present but illegal > >> formatted. However the SolrYard initialized for the dbpediaCache > >> should be empty. > >> > >> Therefore I think it is somehow related to the above problem of > >> overriding configurations. > >> > >> In general the way how the default configuration is loaded is > >> sub-optional in the moment. Especially using a single defaultdata > >> bundle for both the OpenNLP models and the dbpedia configuration + > >> default index was not a good Idea, because one can not exclude/change > >> the dbpedia stuff without affecting other components that depend on > >> OpenNLP. > >> Therefore I think we need to discuss how to better structure the > >> configurations and data needed to run stanbol. > >> > >> There is also an other issue that the SolrYard only once copies > >> provided indexes and does not check for updates. This would it make > >> hard the upgrade from the small index provided with the default data > >> to a bigger version. > >> > >> Both this things are related to the problems and need to be addressed > >> before the first stanbol release. Independent of those I will try to > >> find a simple solution for what you intend to do. > >> > >> In the meantime I suggest you go for the initially proposed workaround. > >> > >> best > >> Rupert Westenthaler > >> > >> > Thanks for your help, > >> > > >> > David > >> > > >> > > >> > On Mon, Jul 11, 2011 at 11:42 PM, Rupert Westenthaler < > >> > [email protected]> wrote: > >> > > >> >> Hi > >> >> > >> >> On Mon, Jul 11, 2011 at 8:17 PM, Andrea Giovanni Nuzzolese > >> >> <[email protected]> wrote: > >> >> > I solved in the same way, but loosing the caching capabilities. > >> >> > Is there any possibility to keep both all the data and the cache? > >> >> > > >> >> > Andrea > >> >> > > >> >> > On Jul 11, 2011, at 4:08 PM, David Riccitelli wrote: > >> >> > > >> >> >> Ok, stopping the solrYard dbpedia_43k component solved for me. > >> >> >> > >> >> >> Thanks, > >> >> >> David > >> >> >> > >> >> >> On Mon, Jul 11, 2011 at 4:13 PM, David Riccitelli < > >> >> >> [email protected]> wrote: > >> >> >> > >> >> >>> Hi Rupert, > >> >> >>> > >> >> >>> I recently updated the Stanbol install, and I found that the RDF > >> >> returned > >> >> >>> by the EntityHub is missing some props (specifically the dbprop > as > >> far > >> >> as I > >> >> >>> can see). > >> >> >>> > >> >> >>> This is the command that I use for testing: > >> >> >>> curl -H "accept: application/rdf+xml" " > >> >> >>> > >> >> > >> > http://localhost:8080/entityhub/site/dbpedia/entity?id=http://dbpedia.org/resource/Valentino_Rossi > >> >> >>> " > >> >> >>> > >> >> >>> which outputs the attached RDF file. > >> >> >>> > >> >> >>> I cleared all of the sling folder (rm -fr sling) and checked the > >> with > >> >> the > >> >> >>> SPAQL end-point at DBpedia, but I wasn't able to fix it. > >> >> >>> > >> >> >>> Does this depend on the mapping.txt file? > >> >> >>> > >> >> > >> >> If you plan to create your own dbpedia index, than the mapping.txt > >> >> file would be the way how to configure what properties are > >> >> includes/excluded. > >> >> Typically dbprop values are low quality. They are just naive 1:1 > >> >> mappings of key value pairs as found in the info boxes. Because of > >> >> this they are excluded from the indexes. > >> >> > >> >> At runtime the returned data depend on the used Cache strategy: > >> >> > >> >> Currently there are three possibilities (configured with the > referenced > >> >> Site) > >> >> 1) no cache: bot queries and retrieval so use a remote service > >> >> 2) used: Queries are executed by the remote service. Retrieved > >> >> Entities are stored locally. The cached data depend on the mappings > >> >> defined for the cache. > >> >> 3) all: Both queries and retrieval are based on the cache. The remote > >> >> service are only used as fallback in the case that the cache is not > >> >> available (e.g. if you deactivate solrYard). > >> >> > >> >> So if you you are fine with (2) than you could use the configuration > >> >> as previously used by the stable launcher [1]. > >> >> I think the easiest way to install this is to use this is to add the > >> >> Felix File Installer [2] to the Stanbol Environment. You will need to > >> >> delete the current referencedSite for dbpedia first and than add the > >> >> three configuration files as described by [1]. > >> >> > >> >> If your requirements are not covered by the currently available > option > >> >> it would be nice if you could write a short user story, because I am > >> >> thinking about how to improve this feature and input like that would > >> >> be really valuable. > >> >> > >> >> best > >> >> Rupert Westenthaler > >> >> > >> >> [1] The dbpedia config consists of three files. the referenced site, > >> >> cache and solryard components with the "-dbpedia" endings. > >> >> > >> >> > >> > http://svn.apache.org/viewvc/incubator/stanbol/trunk/launchers/stable/src/main/resources/resources/config/?pathrev=1140181 > >> >> > >> >> [2] http://felix.apache.org/site/apache-felix-file-install.html > >> >> > >> >> p.s. I keep this part because it describes very well how the cache > >> >> strategy "used" work: > >> >> >>>>> Hi David > >> >> >>>>> > >> >> >>>>> Assuming that you are using the default distribution of Apache > >> >> Stanbol. > >> >> >>>>> > >> >> >>>>> Requests for http://dbpedia.org/resource/Valentino_Rossi will > be > >> >> >>>>> - only the first time answered by retrieving the Entity form > >> >> DBpedia.org > >> >> >>>>> - the Information are cached in a local cache. By that values > of > >> the > >> >> >>>>> documents are filtered (see (a) for details) > >> >> >>>>> - the cached version is returned > >> >> >>>>> > >> >> >>>>> (a) The default configuration for dbpedia stores all fields > >> however > >> >> >>>>> filters values for literals so that only values with the > language > >> >> "en, > >> >> >>>>> de, fr, it, es" or no language are stored. > >> >> >>>>> > >> >> >>>>> > >> >> >>>>> Assuming that you have started for zero when updating to a new > >> >> version > >> >> >>>>> this also means that you have downloaded a new version of this > >> Entity > >> >> >>>>> from dbPedia. > >> >> >>>>> > >> >> > >> >> -- > >> >> | Rupert Westenthaler [email protected] > >> >> | Bodenlehenstraße 11 ++43-699-11108907 > >> >> | A-5500 Bischofshofen > >> >> > >> > > >> > > >> > > >> > -- > >> > David Riccitelli > >> > > >> > Interact SpA > >> > Via A. Bargoni 78 (scala F) > >> > 00153 Roma > >> > > >> > T +39 06 58318 301 > >> > F +39 06 58318 303 > >> > > >> > >> > >> > >> -- > >> | Rupert Westenthaler [email protected] > >> | Bodenlehenstraße 11 ++43-699-11108907 > >> | A-5500 Bischofshofen > >> > > > > > > > > -- > > David Riccitelli > > > > Interact SpA > > Via A. Bargoni 78 (scala F) > > 00153 Roma > > > > T +39 06 58318 301 > > F +39 06 58318 303 > > > > > > -- > | Rupert Westenthaler [email protected] > | Bodenlehenstraße 11 ++43-699-11108907 > | A-5500 Bischofshofen > -- David Riccitelli Interact SpA Via A. Bargoni 78 (scala F) 00153 Roma T +39 06 58318 301 F +39 06 58318 303
