Re: [Virtuoso-users] compression

Kingsley Idehen Fri, 12 Mar 2010 20:33:26 +0000

Marc-Alexandre Nolin wrote:

Hello,


@Peter
I've already remove full text index for the literal of the predicate
sequence. But as of now, I have not seen a big difference in size. For
the O index. I would need to remove those concerning the predicate
sequence but not the others since I might want to do queries on
chromosome numbers for example. So I would need to tweak GSPO indexes
to be there, but exclude the predicate sequence with rules like I can
do with the full text index.

@Ivan
Do you want a fully compile virtuoso.db to do your test? I've a
complete NCBI RefSeq. (68 GB compressed or 388 GB uncompressed). The
corresponding n3 files compressed weight 33 GB. Refseq have long
literal with sequences. I've also Genbank in n3, but I haven't finish
loading a virtuoso.db. Only available in compressed n3 files for a
weight of 74 GB. Tell me what you want for your tests and I will make
it available to you for download.

Fully compressed .db or backup set would be preferred.

Kingsley

Bye !!

Marc-Alexandre

2010/3/12 Ivan Mikhailov <[email protected]>:

Hello Marc-Alexandre,

There were identical requests before but the implementation was
postponed. The reason was that 6.0 makes indexes more compact at a price
of slightly increased CPU load, it was not practical to make other
changes at the same time. Now we can (and will) try.
Note that the compression will not be efficient for texts shorter than
1Kb and give bad results for texts of length 4-8Kb. If shorter than 1Kb
then the won space does not pay for the spent CPU. If 4 to 8Kb then the
compression of 1 blob page will usually produce 1 half-complete page of
inlined data in index plus probably a remap version of that page, so
disk image size will finally _grow_ with the compression. It contradicts
with a common sense but matches the experiment.

I can make a storage version that will selectively compress some
literals, but then I will need an experiment with some real data. Not
right now in any case, in a week or two. Meanwhile I can download your
data if you have some big (and live) specimen.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://openlinksw.com

On Wed, 2010-03-10 at 10:09 -0500, Marc-Alexandre Nolin wrote:

I've a N3 dump I'm currently loading into a Virtuoso Server (a
complete NCBI Genbank). One literal have always huge size. Its the one
related to the predicate "sequence". Is it possible to compress
literal with a rule based on predicate?


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Virtuoso-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtuoso-users


------------------------------------------------------------------------------
Download Intel&#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Virtuoso-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/virtuoso-users



--

Regards,

Kingsley IdehenPresident & CEOOpenLink SoftwareWeb: http://www.openlinksw.com

Weblog: http://www.openlinksw.com/blog/~kidehen

Twitter/Identi.ca: kidehen

Re: [Virtuoso-users] compression

Reply via email to