Hi Rupert,
see comments inline
Hi Alex,
sorry for the late response ...
See my comments inline
On 07.11.2011, at 14:50, Alex Lopez wrote:
Hi Rupert,
I had to 'svn update' and generate again the indexer to see the full stack
trace of the exception. The problem turned out to be an encoding issue within
the stopwords_PT.txt file I'm using to process Portuguese text, now it's fixed
:)
Have you downloaded the file from the Snowball Stemmer web page
(http://snowball.tartarus.org/)? Because I had the exactly same problem with
the german stop words.
If you would agree contribute you Solr Configurations for Portuguese to Stanbol
I could add them to the defaults used by the SolrYard. Currently the default
only includes special configurations for English and German. Adding more would
a great thing todo.
Yes I downloaded the stop-word list from the Snowball Stemmer page,
compared to other sources and decided to trim down the list a bit,
because I saw the English one with ~30 and didn't want to have a big
difference. However now I'm seeing the German list in the SVN (the
revision I was working with still had only the English one) with 200+
stop-words. So I might change things again and go with the full
stop-words Portuguese on the Snowball Stemmer web page that has 200+
words. I still have to better test the results of my custom index (I'll
post some impressions about this to the mailing list on a separate
mail), once I do that of course I'll post the configurations and
stop-word list for Portuguese if you want to incorporate it to the defaults.
Now I'm getting occasional WARNs about some XMLSchema#date not in lexical form
but otherwise it's indexing alright.
This is typically because of wrong Dates in DBPedia. e.g. there is no 31st
February and also no 31st November …
In such cases the values are still added (as String values) to the Index. So
you might not find such entities if you search for date values, but you can
still retrieve the values as String.
best
Rupert
Thanks!
Alex
Em 04-11-2011 12:24, Rupert Westenthaler escreveu:
Hi Alex,
the YardException occurs if the SolrIndex used to store the indexed
data encounters some Error. Normally the Exception should be reported,
but a small bug prevented that from happening. I corrected this with
revision 1197527 [1]. Now you should see the stack trace of the
YardExceptions within the log.
To use the newest version you need to update the indexing jar by
#assuming you are in the stanbol root folder
cd /entityhub/indexing
mvn clean install
cd dbpedia
mvn assembly:single
cd target
cp
org.apache.stanbol.entityhub.indexing.dbpedia-0.9.0-incubating-SNAPSHOT-jar-with-dependencies.jar
{you-indexing-folder}
Independent of that you could try if deleting the
{you-indexing-foler}/indexing/destination
{you-indexing-foler}/indexing/dist
folders solves this issue.
If you have not changed the Solr configuration manually you could also
try to delete the
{you-indexing-foler}/indexing/config/dbpedia
folder. This will reinitialize the Solr configuration with the
defaults included within the indexing jar.
best
Rupert
[1] http://svn.apache.org/viewvc?rev=1197527&view=rev
On Fri, Nov 4, 2011 at 12:29 PM, Alex Lopez<[email protected]> wrote:
Hi!
After fulling up the TDB store and calculating the incoming_links scores I
restarted the indexing and I'm getting:
...
11:13:02,916 [Thread-2] INFO solryard.SolrYardIndexingDestination - ...
create SolrYard
11:13:03,119 [main] INFO impl.IndexerImpl - Initialisation completed
11:13:03,119 [main] INFO impl.IndexerImpl - Start Indexing
11:13:03,119 [main] INFO impl.IndexerImpl - Indexing started ...
11:14:11,981 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/France: Unable to store
Entity http://dbpedia.org/resource/France to Yard dbpediaIndex because of an
YardException
11:14:11,981 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Category:Living_people:
Unable to store Entity http://dbpedia.org/resource/Category:Living_people to
Yard dbpediaIndex because of an YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Germany: Unable to store
Entity http://dbpedia.org/resource/Germany to Yard dbpediaIndex because of
an YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Animal: Unable to store
Entity http://dbpedia.org/resource/Animal to Yard dbpediaIndex because of an
YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Canada: Unable to store
Entity http://dbpedia.org/resource/Canada to Yard dbpediaIndex because of an
YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/United_Kingdom: Unable to
store Entity http://dbpedia.org/resource/United_Kingdom to Yard dbpediaIndex
because of an YardException
...
And many more, one after another, as nothing was actually being indexed. Is
there a way to get better error reporting to know the cause of the problem
or any of you happen to know why this could be happening?
Best,
Alex