[
https://issues.apache.org/jira/browse/STANBOL-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13108798#comment-13108798
]
Olivier Grisel commented on STANBOL-328:
----------------------------------------
Here is a sample error that gets ignored: the rest of the images_en.nt.bz2 file
is just ignored without indexing.
17:24:25,985 [Thread-4] ERROR openjena.riot - [line: 1999931, col: 67] illegal
escape sequence value: \ (0x5C)
17:24:25,986 [Thread-4] ERROR source.ResourceLoader - Unable to load resource
/tmp/dbpedia-index/indexing/resources/rdfdata/images_en.nt.bz2!
org.openjena.riot.RiotException: [line: 1999931, col: 67] illegal escape
sequence value: \ (0x5C)
at
org.openjena.riot.ErrorHandlerLib$ErrorHandlerStd.fatal(ErrorHandlerLib.java:97)
at org.openjena.riot.lang.LangBase.raiseException(LangBase.java:205)
at org.openjena.riot.lang.LangBase.nextToken(LangBase.java:152)
at org.openjena.riot.lang.LangNQuads.parseOne(LangNQuads.java:50)
at org.openjena.riot.lang.LangNQuads.parseOne(LangNQuads.java:22)
at org.openjena.riot.lang.LangNTuple.runParser(LangNTuple.java:58)
at org.openjena.riot.lang.LangBase.parse(LangBase.java:75)
at org.openjena.riot.RiotReader.parseQuads(RiotReader.java:173)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadQuads$(BulkLoader.java:154)
at
com.hp.hpl.jena.tdb.store.bulkloader.BulkLoader.loadDataset(BulkLoader.java:113)
at com.hp.hpl.jena.tdb.TDBLoader.loadDataset$(TDBLoader.java:282)
at com.hp.hpl.jena.tdb.TDBLoader.loadDataset(TDBLoader.java:193)
at com.hp.hpl.jena.tdb.TDBLoader.load(TDBLoader.java:74)
at
org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfResourceImporter.importResource(RdfResourceImporter.java:72)
at
org.apache.stanbol.entityhub.indexing.core.source.ResourceLoader.loadResource(ResourceLoader.java:199)
at
org.apache.stanbol.entityhub.indexing.core.source.ResourceLoader.loadResources(ResourceLoader.java:135)
at
org.apache.stanbol.entityhub.indexing.source.jenatdb.RdfIndexingSource.initialise(RdfIndexingSource.java:244)
at
org.apache.stanbol.entityhub.indexing.core.impl.IndexingSourceInitialiser.run(IndexingSourceInitialiser.java:43)
at java.lang.Thread.run(Thread.java:662)
> EntityHub batch indexing tools should fail early if the input is incorrect
> --------------------------------------------------------------------------
>
> Key: STANBOL-328
> URL: https://issues.apache.org/jira/browse/STANBOL-328
> Project: Stanbol
> Issue Type: Bug
> Reporter: Olivier Grisel
>
> If exceptions occur during the initialization step of the RdfIndexingSource
> they are just loggued and the indexing process continue with incomplete data:
> if the error is at the beginning of an RDF dump most of the dump is just
> ingored and the SolrYard indexer goes on with partial data which is just a
> wast of CPU resources since the resulting index in incomplete.
> Instead the indexer should fail early by raising the exceptions that are
> caught in ResourceLoader#loadResource (while adding the incriminated filename
> to the exception message) and ask the user to fix the input files before
> proceeding.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira