Hi,

I can contribute some advice from another Virtuoso user, I'm not with 
Openlink.

Whilst not a direct solution, I'd recommend splitting the files into 
smaller chunks and then isolating the chunks with errors.  If you're not 
getting feedback on the error then just keep splitting the bad files 
like a binary sort to narrow down the problem.

You can then manually inspect and identify the classes of errors 
encountered and write regexps to identify them for manual correction.  
As you identify each erroneous triple, move it into a file of only 
errors and trim it from the input files.  Eventually you'll just have 
erroneous triples in one single file that you can attempt to manually 
correct or write regexps to correct.  Stuff like '<<' is pretty easy to 
detect as is '@1'.  This is all pretty easy to do using bash or perl 
scripts.

I used to use RDFabout.com to do this kind of thing but that no longer 
exists.  You can write a tool with ARC2 to do this as well (which is 
what RDFabout used to be) but that requires actual coding which might be 
more trouble than you want to go to.

If you don't mind a very slow approach, it would be pretty easy to write 
a bash/expect script that read individual lines from input files, called 
the RDF loader with each line and then saved that line to an error file 
if it didn't load.  You could do this in PL-SQL in Virtuoso as well 
though it might not be as easy because handling very large text files in 
Virtuoso can be problematic sometimes.  Just split the files into 
manageable chunks (under 2GB at least, under 20MB would be better) first 
and then you should be able to use Virtuoso's string handling functions 
to process the input file.

You could hack the bulk loader script to do something like this.  
Ideally, just modify the error handler in the bulk loader to call a 
specialised loader that loads/parses the input line by line and then 
writes any erroneous lines out to an output file for manual correction.

Quentin.
Guiding Hand Solutions.

On 2016-05-02 15:50, Thomas Trattnig wrote:
> Hello,
> 
> I have lots of nq-dump files. Some lines in this files are not valid.
> e.g.
> <http://asdf.com/ [1]> <http://purl.org/dc/terms/title [2]> ""@1
> <http://asdf.com/ [1]>   .
> or
> <http://asdf.com/ [1]> <<http://purl.org/dc/terms/title [2]>
> "adsf"@en,JA,US <http://asdf.com/ [1]>   .
> 
> I want to load all dumps into Virtuoso.
> 
> If I use standard ld_dir import function the loader stops importing
> the current file at the first occurrence of an invalidation with the
> message:
> _syntax error processed pending to here._
> 
> and moves on, to the next file.
> 
> Is there a way to just skip the invalid lines? So I can import all
> other lines of files which contain some invalid lines?
> 
> Kind regards,
> Thomas
> Virtuoso Open Source Edition (Column Store) (multi threaded)
> Version 7.2.4.2.3217-pthreads as of Apr 29 2016
> 
> 
> 
> Links:
> ------
> [1] http://asdf.com/
> [2] http://purl.org/dc/terms/title
> 
> ------------------------------------------------------------------------------
> Find and fix application performance issues faster with Applications 
> Manager
> Applications Manager provides deep performance insights into multiple 
> tiers of
> your business applications. It resolves application problems quickly 
> and
> reduces your MTTR. Get your free trial!
> https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
> 
> _______________________________________________
> Virtuoso-users mailing list
> Virtuoso-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users

------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Virtuoso-users mailing list
Virtuoso-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Reply via email to