large literals best practice?

Chris Tomlinson Thu, 17 Aug 2017 10:48:53 -0700

Hello,

We have 23K texts averaging 10 pp/text (total pages: 229K) and ~17KB/page, for 
a total of 4GB of text. These texts are currently indexed via Lucene in an 
XMLdb and we’re wanting to know if there are any known issues regarding large 
literals in Jena.


In other words we are considering storing the texts like:

    :Text_08357 a :EText ;
        various metadata about the EText
        :hasPage 
          [ :pageNum 1 ;
            :content “. . . 17,000 Bytes . . .” ] ,
          [ :pageNum 2 ;
            :content “. . . 17,000 Bytes . . .” ] ,
          . . .

We know that Lucene is happy with this data, but we’re not sure whether 
Jena/TDB will be stressed with 229K triples with 17KB literals.

The Jena-text offers the possibility of indexing in Lucene via a separate 
process and just using the search in Jena without actually storing the literals 
in TDB. This is a somewhat complex configuration and it would be preferred to 
not use this approach unless the size of the literals will present a problem.

Thank you,
Chris

large literals best practice?

Reply via email to