Hi again, Is anyone aware of any issues that may arise when storing triples in TDB that have very large string literals (~17KB)?
The use case is illustrated below. This seems a reasonable question under the assumption that literals are presumed to be small - like names, titles, maybe summaries or abstracts and such, rather than entire pages of text. Thanks, Chris > On Aug 17, 2017, at 12:48 PM, Chris Tomlinson <[email protected]> > wrote: > > Hello, > > We have 23K texts averaging 10 pp/text (total pages: 229K) and ~17KB/page, > for a total of 4GB of text. These texts are currently indexed via Lucene in > an XMLdb and we’re wanting to know if there are any known issues regarding > large literals in Jena. > > In other words we are considering storing the texts like: > > :Text_08357 a :EText ; > various metadata about the EText > :hasPage > [ :pageNum 1 ; > :content “. . . 17,000 Bytes . . .” ] , > [ :pageNum 2 ; > :content “. . . 17,000 Bytes . . .” ] , > . . . > > We know that Lucene is happy with this data, but we’re not sure whether > Jena/TDB will be stressed with 229K triples with 17KB literals. > > The Jena-text offers the possibility of indexing in Lucene via a separate > process and just using the search in Jena without actually storing the > literals in TDB. This is a somewhat complex configuration and it would be > preferred to not use this approach unless the size of the literals will > present a problem. > > Thank you, > Chris > >
