[ 
https://issues.apache.org/jira/browse/STANBOL-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108798#comment-13108798
 ] 

Olivier Grisel commented on STANBOL-328:
----------------------------------------

Here is a sample error that gets ignored: the rest of the images_en.nt.bz2 file 
is just ignored without indexing.

17:24:25,985 [Thread-4] ERROR openjena.riot - [line: 1999931, col: 67] illegal 
escape sequence value: \ (0x5C)
17:24:25,986 [Thread-4] ERROR source.ResourceLoader - Unable to load resource 
/tmp/dbpedia-index/indexing/resources/rdfdata/images_en.nt.bz2!
org.openjena.riot.RiotException: [line: 1999931, col: 67] illegal escape 
sequence value: \ (0x5C)
        at 
org.openjena.riot.ErrorHandlerLib$ErrorHandlerStd.fatal(ErrorHandlerLib.java:97)
        at org.openjena.riot.lang.LangBase.raiseException(LangBase.java:205)
        at org.openjena.riot.lang.LangBase.nextToken(LangBase.java:152)
        at org.openjena.riot.lang.LangNQuads.parseOne(LangNQuads.java:50)
        at org.openjena.riot.lang.LangNQuads.parseOne(LangNQuads.java:22)
        at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:58)
        at org.openjena.riot.lang.LangBase.parse(LangBase.java:75)
        at org.openjena.riot.RiotReader.parseQuads(RiotReader.java:173)
        at 
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:154)
        at 
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:113)
        at com.hp.hpl.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:282)
        at com.hp.hpl.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:193)
        at com.hp.hpl.jena.tdb.TDBLoader.load(TDBLoader.java:74)
        at 
org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfResourceImporter.importResource(RdfResourceImporter.java:72)
        at 
org.apache.stanbol.entityhub.indexing.core.source.ResourceLoader.loadResource(ResourceLoader.java:199)
        at 
org.apache.stanbol.entityhub.indexing.core.source.ResourceLoader.loadResources(ResourceLoader.java:135)
        at 
org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource.initialise(RdfIndexingSource.java:244)
        at 
org.apache.stanbol.entityhub.indexing.core.impl.IndexingSourceInitialiser.run(IndexingSourceInitialiser.java:43)
        at java.lang.Thread.run(Thread.java:662)

> EntityHub batch indexing tools should fail early if the input is incorrect
> --------------------------------------------------------------------------
>
>                 Key: STANBOL-328
>                 URL: https://issues.apache.org/jira/browse/STANBOL-328
>             Project: Stanbol
>          Issue Type: Bug
>            Reporter: Olivier Grisel
>
> If exceptions occur during the initialization step of the RdfIndexingSource 
> they are just loggued and the indexing process continue with incomplete data: 
> if the error is at the beginning of an RDF dump most of the dump is just 
> ingored and the SolrYard indexer goes on with partial data which is just a 
> wast of CPU resources since the resulting index in incomplete.
> Instead the indexer should fail early by raising the exceptions that are 
> caught in ResourceLoader#loadResource (while adding the incriminated filename 
> to the exception message) and ask the user to fix the input files before 
> proceeding.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to