Performance with very long strings - Re: large literals best practice?

Chris Tomlinson Sat, 19 Aug 2017 07:20:50 -0700

Hi again,

Is anyone aware of any issues that may arise when storing triples in TDB that 
have very large string literals (~17KB)?


The use case is illustrated below. This seems a reasonable question under the 
assumption that literals are presumed to be small - like names, titles, maybe 
summaries or abstracts and such, rather than entire pages of text.

Thanks,
Chris


> On Aug 17, 2017, at 12:48 PM, Chris Tomlinson <[email protected]> 
> wrote:
> 
> Hello,
> 
> We have 23K texts averaging 10 pp/text (total pages: 229K) and ~17KB/page, 
> for a total of 4GB of text. These texts are currently indexed via Lucene in 
> an XMLdb and we’re wanting to know if there are any known issues regarding 
> large literals in Jena.
> 
> In other words we are considering storing the texts like:
> 
>     :Text_08357 a :EText ;
>         various metadata about the EText
>         :hasPage 
>           [ :pageNum 1 ;
>             :content “. . . 17,000 Bytes . . .” ] ,
>           [ :pageNum 2 ;
>             :content “. . . 17,000 Bytes . . .” ] ,
>           . . .
> 
> We know that Lucene is happy with this data, but we’re not sure whether 
> Jena/TDB will be stressed with 229K triples with 17KB literals.
> 
> The Jena-text offers the possibility of indexing in Lucene via a separate 
> process and just using the search in Jena without actually storing the 
> literals in TDB. This is a somewhat complex configuration and it would be 
> preferred to not use this approach unless the size of the literals will 
> present a problem.
> 
> Thank you,
> Chris
> 
>

Performance with very long strings - Re: large literals best practice?

Reply via email to