Hello,
We have 23K texts averaging 10 pp/text (total pages: 229K) and ~17KB/page, for
a total of 4GB of text. These texts are currently indexed via Lucene in an
XMLdb and we’re wanting to know if there are any known issues regarding large
literals in Jena.
In other words we are considering storing the texts like:
:Text_08357 a :EText ;
various metadata about the EText
:hasPage
[ :pageNum 1 ;
:content “. . . 17,000 Bytes . . .” ] ,
[ :pageNum 2 ;
:content “. . . 17,000 Bytes . . .” ] ,
. . .
We know that Lucene is happy with this data, but we’re not sure whether
Jena/TDB will be stressed with 229K triples with 17KB literals.
The Jena-text offers the possibility of indexing in Lucene via a separate
process and just using the search in Jena without actually storing the literals
in TDB. This is a somewhat complex configuration and it would be preferred to
not use this approach unless the size of the literals will present a problem.
Thank you,
Chris