Hi Rupert,

see comments inline
Hi Alex,

sorry for the late response ...

See my comments inline
On 07.11.2011, at 14:50, Alex Lopez wrote:

Hi Rupert,

I had to 'svn update' and generate again the indexer to see the full stack 
trace of the exception. The problem turned out to be an encoding issue within 
the stopwords_PT.txt file I'm using to process Portuguese text, now it's fixed 
:)

Have you downloaded the file from the Snowball Stemmer web page 
(http://snowball.tartarus.org/)? Because I had the exactly same problem with 
the german stop words.

If you would agree contribute you Solr Configurations for Portuguese to Stanbol 
I could add them to the defaults used by the SolrYard. Currently the default 
only includes special configurations for English and German. Adding more would 
a great thing todo.

Yes I downloaded the stop-word list from the Snowball Stemmer page, compared to other sources and decided to trim down the list a bit, because I saw the English one with ~30 and didn't want to have a big difference. However now I'm seeing the German list in the SVN (the revision I was working with still had only the English one) with 200+ stop-words. So I might change things again and go with the full stop-words Portuguese on the Snowball Stemmer web page that has 200+ words. I still have to better test the results of my custom index (I'll post some impressions about this to the mailing list on a separate mail), once I do that of course I'll post the configurations and stop-word list for Portuguese if you want to incorporate it to the defaults.


Now I'm getting occasional WARNs about some XMLSchema#date not in lexical form 
but otherwise it's indexing alright.

This is typically because of wrong Dates in DBPedia. e.g. there is no 31st 
February and also no 31st November …

In such cases the values are still added (as String values) to the Index. So 
you might not find such entities if you search for date values, but you can 
still retrieve the values as String.

best
Rupert


Thanks!
Alex

Em 04-11-2011 12:24, Rupert Westenthaler escreveu:
Hi Alex,

the YardException occurs if the SolrIndex used to store the indexed
data encounters some Error. Normally the Exception should be reported,
but a small bug prevented that from happening. I corrected this with
revision 1197527 [1]. Now you should see the stack trace of the
YardExceptions within the log.

To use the newest version you need to update the indexing jar by

     #assuming you are in the stanbol root folder
     cd /entityhub/indexing
     mvn clean install
     cd dbpedia
     mvn assembly:single
     cd target
    cp 
org.apache.stanbol.entityhub.indexing.dbpedia-0.9.0-incubating-SNAPSHOT-jar-with-dependencies.jar
{you-indexing-folder}


Independent of that you could try if deleting the

     {you-indexing-foler}/indexing/destination
     {you-indexing-foler}/indexing/dist

folders solves this issue.

If you have not changed the Solr configuration manually you could also
try to delete the

         {you-indexing-foler}/indexing/config/dbpedia

folder. This will reinitialize the Solr configuration with the
defaults included within the indexing jar.

best
Rupert

[1] http://svn.apache.org/viewvc?rev=1197527&view=rev

On Fri, Nov 4, 2011 at 12:29 PM, Alex Lopez<[email protected]>   wrote:
Hi!

After fulling up the TDB store and calculating the incoming_links scores I
restarted the indexing and I'm getting:

...
11:13:02,916 [Thread-2] INFO  solryard.SolrYardIndexingDestination - ...
create SolrYard
11:13:03,119 [main] INFO  impl.IndexerImpl - Initialisation completed
11:13:03,119 [main] INFO  impl.IndexerImpl - Start Indexing
11:13:03,119 [main] INFO  impl.IndexerImpl - Indexing started ...
11:14:11,981 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/France: Unable to store
Entity http://dbpedia.org/resource/France to Yard dbpediaIndex because of an
YardException
11:14:11,981 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Category:Living_people:
Unable to store Entity http://dbpedia.org/resource/Category:Living_people to
Yard dbpediaIndex because of an YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Germany: Unable to store
Entity http://dbpedia.org/resource/Germany to Yard dbpediaIndex because of
an YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Animal: Unable to store
Entity http://dbpedia.org/resource/Animal to Yard dbpediaIndex because of an
YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/Canada: Unable to store
Entity http://dbpedia.org/resource/Canada to Yard dbpediaIndex because of an
YardException
11:14:11,982 [Indexer: Entity Error Logging Daemon] ERROR impl.IndexerImpl -
Error while indexing http://dbpedia.org/resource/United_Kingdom: Unable to
store Entity http://dbpedia.org/resource/United_Kingdom to Yard dbpediaIndex
because of an YardException
...

And many more, one after another, as nothing was actually being indexed. Is
there a way to get better error reporting to know the cause of the problem
or any of you happen to know why this could be happening?

Best,
Alex





Reply via email to