Re: [Virtuoso-users] compression

Marc-Alexandre Nolin Fri, 12 Mar 2010 19:18:49 +0000

Hello,

@Peter
I've already remove full text index for the literal of the predicate
sequence. But as of now, I have not seen a big difference in size. For
the O index. I would need to remove those concerning the predicate
sequence but not the others since I might want to do queries on
chromosome numbers for example. So I would need to tweak GSPO indexes
to be there, but exclude the predicate sequence with rules like I can
do with the full text index.


@Ivan
Do you want a fully compile virtuoso.db to do your test? I've a
complete NCBI RefSeq. (68 GB compressed or 388 GB uncompressed). The
corresponding n3 files compressed weight 33 GB. Refseq have long
literal with sequences. I've also Genbank in n3, but I haven't finish
loading a virtuoso.db. Only available in compressed n3 files for a
weight of 74 GB. Tell me what you want for your tests and I will make
it available to you for download.

Bye !!

Marc-Alexandre

2010/3/12 Ivan Mikhailov <[email protected]>:
> Hello Marc-Alexandre,
>
> There were identical requests before but the implementation was
> postponed. The reason was that 6.0 makes indexes more compact at a price
> of slightly increased CPU load, it was not practical to make other
> changes at the same time. Now we can (and will) try.
> Note that the compression will not be efficient for texts shorter than
> 1Kb and give bad results for texts of length 4-8Kb. If shorter than 1Kb
> then the won space does not pay for the spent CPU. If 4 to 8Kb then the
> compression of 1 blob page will usually produce 1 half-complete page of
> inlined data in index plus probably a remap version of that page, so
> disk image size will finally _grow_ with the compression. It contradicts
> with a common sense but matches the experiment.
>
> I can make a storage version that will selectively compress some
> literals, but then I will need an experiment with some real data. Not
> right now in any case, in a week or two. Meanwhile I can download your
> data if you have some big (and live) specimen.
>
> Best Regards,
>
> Ivan Mikhailov
> OpenLink Software
> http://openlinksw.com
>
> On Wed, 2010-03-10 at 10:09 -0500, Marc-Alexandre Nolin wrote:
>> I've a N3 dump I'm currently loading into a Virtuoso Server (a
>> complete NCBI Genbank). One literal have always huge size. Its the one
>> related to the predicate "sequence". Is it possible to compress
>> literal with a rule based on predicate?
>
>
>
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Virtuoso-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users
>

Re: [Virtuoso-users] compression

Reply via email to