Hi!

I'm trying to run a simple update query that reads strings from one graph, 
processes them, and stores to another:

------------------------------------------------------------------------------
  insert {
    graph vice:pageocrdata_clean {
      ?page vice:ocrtext ?ocr7 .
    }
  }
  where {
    graph vice:pageocrdata {
      ?page vice:ocrtext ?ocr .
    }
    bind (replace(str(?ocr),'ſ','s') as ?ocr1)
    bind (replace(?ocr1,'uͤ','ü') as ?ocr2)
    bind (replace(?ocr2,'aͤ','ä') as ?ocr3)
    bind (replace(?ocr3,'oͤ','ö') as ?ocr4)
    bind (replace(?ocr4,"[⸗—]\n",'') as ?ocr5)
    bind (replace(?ocr5,"\n",' ') as ?ocr6)
    bind (replace(?ocr6,"[ ]+",' ') as ?ocr7)
  }
-------------------------------------------------------------------------------
The source graph has some 250,000 triples that fill the WHERE criterium. The 
strings are one to two thousand characters in length.

I'm running the query using the Fuseki web UI, and it ends each time with "Bad 
Request (#400) Java heap space". The fuseki log does not show any error except 
for the Bad Request #400. I'm quite surprised by this problem, because the 
update operation is a simple and straightforward data processing, with no 
ordering etc.

I started with -Xmx2G, but even increasing the heap to -Xmx12G only increases 
the time it takes for Fuseki to return the same error.

Is there something wrong with the SPARQL above? Is there something that 
increases the memory use unnecessarily? 

Best,

Harri Kiiskinen

Reply via email to