Hi!
I'm trying to run a simple update query that reads strings from one graph,
processes them, and stores to another:
------------------------------------------------------------------------------
insert {
graph vice:pageocrdata_clean {
?page vice:ocrtext ?ocr7 .
}
}
where {
graph vice:pageocrdata {
?page vice:ocrtext ?ocr .
}
bind (replace(str(?ocr),'ſ','s') as ?ocr1)
bind (replace(?ocr1,'uͤ','ü') as ?ocr2)
bind (replace(?ocr2,'aͤ','ä') as ?ocr3)
bind (replace(?ocr3,'oͤ','ö') as ?ocr4)
bind (replace(?ocr4,"[⸗—]\n",'') as ?ocr5)
bind (replace(?ocr5,"\n",' ') as ?ocr6)
bind (replace(?ocr6,"[ ]+",' ') as ?ocr7)
}
-------------------------------------------------------------------------------
The source graph has some 250,000 triples that fill the WHERE criterium. The
strings are one to two thousand characters in length.
I'm running the query using the Fuseki web UI, and it ends each time with "Bad
Request (#400) Java heap space". The fuseki log does not show any error except
for the Bad Request #400. I'm quite surprised by this problem, because the
update operation is a simple and straightforward data processing, with no
ordering etc.
I started with -Xmx2G, but even increasing the heap to -Xmx12G only increases
the time it takes for Fuseki to return the same error.
Is there something wrong with the SPARQL above? Is there something that
increases the memory use unnecessarily?
Best,
Harri Kiiskinen