Hello, @Peter I've already remove full text index for the literal of the predicate sequence. But as of now, I have not seen a big difference in size. For the O index. I would need to remove those concerning the predicate sequence but not the others since I might want to do queries on chromosome numbers for example. So I would need to tweak GSPO indexes to be there, but exclude the predicate sequence with rules like I can do with the full text index.
@Ivan Do you want a fully compile virtuoso.db to do your test? I've a complete NCBI RefSeq. (68 GB compressed or 388 GB uncompressed). The corresponding n3 files compressed weight 33 GB. Refseq have long literal with sequences. I've also Genbank in n3, but I haven't finish loading a virtuoso.db. Only available in compressed n3 files for a weight of 74 GB. Tell me what you want for your tests and I will make it available to you for download. Bye !! Marc-Alexandre 2010/3/12 Ivan Mikhailov <[email protected]>: > Hello Marc-Alexandre, > > There were identical requests before but the implementation was > postponed. The reason was that 6.0 makes indexes more compact at a price > of slightly increased CPU load, it was not practical to make other > changes at the same time. Now we can (and will) try. > Note that the compression will not be efficient for texts shorter than > 1Kb and give bad results for texts of length 4-8Kb. If shorter than 1Kb > then the won space does not pay for the spent CPU. If 4 to 8Kb then the > compression of 1 blob page will usually produce 1 half-complete page of > inlined data in index plus probably a remap version of that page, so > disk image size will finally _grow_ with the compression. It contradicts > with a common sense but matches the experiment. > > I can make a storage version that will selectively compress some > literals, but then I will need an experiment with some real data. Not > right now in any case, in a week or two. Meanwhile I can download your > data if you have some big (and live) specimen. > > Best Regards, > > Ivan Mikhailov > OpenLink Software > http://openlinksw.com > > On Wed, 2010-03-10 at 10:09 -0500, Marc-Alexandre Nolin wrote: >> I've a N3 dump I'm currently loading into a Virtuoso Server (a >> complete NCBI Genbank). One literal have always huge size. Its the one >> related to the predicate "sequence". Is it possible to compress >> literal with a rule based on predicate? > > > > ------------------------------------------------------------------------------ > Download Intel® Parallel Studio Eval > Try the new software tools for yourself. Speed compiling, find bugs > proactively, and fine-tune applications for parallel performance. > See why Intel Parallel Studio got high marks during beta. > http://p.sf.net/sfu/intel-sw-dev > _______________________________________________ > Virtuoso-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/virtuoso-users >
