I want to link two Linked Open Data sets (specifically dbpedia and 
data.deichman.no) that are 800,000 and 200,000 records respectively. The only 
way I have of accessing these are open SPARQL endpoints. Both run Virtuoso, but 
I have no JDBC, iSQL or any other form of access except SPARQL.

Trying to do simple paging via

  order by ?uri  limit 1000 offset 4000

and stepping up the offset doesn't work, because I get

  Virtuoso 22023 Error SR353: Sorted TOP clause specifies more then 10001000 
  rows to sort. Only 10000 are allowed. Either decrease the offset and/or row 
  count or use a scrollable cursor


I've tried simply turning off paging and trying to get the entire data set in 
one go, but then the result set is chopped off at 10,000 rows.


I see that this can be solved in SQL with scrollable cursors, and also via JDBC:
  http://boards.openlinksw.com/phpBB3/viewtopic.php?f=12&t=1452

but that doesn't help me at all. :-(


This is a recurring problem for me, as I'm developing a record linkage tool[1] 
and use open data sets to try it out. Many of the open data sets are hosted in 
Virtuoso, and so this keeps hitting me.

Is there any general way to solve this or work around it?


[1] http://code.google.com/p/duke/

--Lars M.
http://www.garshol.priv.no/tmphoto/
http://www.garshol.priv.no/blog/


Reply via email to