[chromium-dev] SQLite compression in history database.

2009-11-24 Thread Scott Hess
Long ago when developing fts1, I experimented with using zlib
compression as part of the implementation.  It fell by the wayside
because it really didn't provide enough performance improvement (I
needed an order of magnitude, it didn't provide it), and because of
licensing issues (fts1/2/3 are part of core SQLite, which does not
include zlib).

Chromium already has zlib, and I don't think there's any particular
reason not to hack our version of fts to support it.  Looking at my
October history file, I get the following (numbers are in megabytes):

ls -lh History\ Index\ 2009-10
# -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
.../sqlite3 History\ Index\ 2009-10
select round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
from pages_content;
# 34.9
select 
round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
from pages_content;
# 12.29
select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
# 24.6
select round(sum(length(compress(block)))/1024.0/1024.0,2) from pages_segments;
# 14.3

pages_segments is the fts index.  Since it is consulted very
frequently, I'd be slightly nervous about compressing it.
pages_content is the document data, which is hit after the index (or
when doing a lookup by document id), so compressing it shouldn't have
much performance impact.

Does this seem like a win worth pursuing?

-scott

-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Elliot Glaysher (Chromium)
I'm all for it. I vaguely remember people complaining about the size
of our history files, and most of my history files are over 50M.

-- Elliot

On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess sh...@chromium.org wrote:
 Long ago when developing fts1, I experimented with using zlib
 compression as part of the implementation.  It fell by the wayside
 because it really didn't provide enough performance improvement (I
 needed an order of magnitude, it didn't provide it), and because of
 licensing issues (fts1/2/3 are part of core SQLite, which does not
 include zlib).

 Chromium already has zlib, and I don't think there's any particular
 reason not to hack our version of fts to support it.  Looking at my
 October history file, I get the following (numbers are in megabytes):

 ls -lh History\ Index\ 2009-10
 # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
 .../sqlite3 History\ Index\ 2009-10
 select 
 round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
 from pages_content;
 # 34.9
 select 
 round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
 from pages_content;
 # 12.29
 select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
 # 24.6
 select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
 pages_segments;
 # 14.3

 pages_segments is the fts index.  Since it is consulted very
 frequently, I'd be slightly nervous about compressing it.
 pages_content is the document data, which is hit after the index (or
 when doing a lookup by document id), so compressing it shouldn't have
 much performance impact.

 Does this seem like a win worth pursuing?

 -scott

 --
 Chromium Developers mailing list: chromium-dev@googlegroups.com
 View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev


-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Nico Weber
On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium)
e...@chromium.org wrote:
 I'm all for it. I vaguely remember people complaining about the size
 of our history files, and most of my history files are over 50M.

Part of the reason for this are bugs like
http://code.google.com/p/chromium/issues/detail?id=24946 . Shouldn't
we fix these first?


 -- Elliot

 On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess sh...@chromium.org wrote:
 Long ago when developing fts1, I experimented with using zlib
 compression as part of the implementation.  It fell by the wayside
 because it really didn't provide enough performance improvement (I
 needed an order of magnitude, it didn't provide it), and because of
 licensing issues (fts1/2/3 are part of core SQLite, which does not
 include zlib).

 Chromium already has zlib, and I don't think there's any particular
 reason not to hack our version of fts to support it.  Looking at my
 October history file, I get the following (numbers are in megabytes):

 ls -lh History\ Index\ 2009-10
 # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
 .../sqlite3 History\ Index\ 2009-10
 select 
 round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
 from pages_content;
 # 34.9
 select 
 round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
 from pages_content;
 # 12.29
 select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
 # 24.6
 select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
 pages_segments;
 # 14.3

 pages_segments is the fts index.  Since it is consulted very
 frequently, I'd be slightly nervous about compressing it.
 pages_content is the document data, which is hit after the index (or
 when doing a lookup by document id), so compressing it shouldn't have
 much performance impact.

 Does this seem like a win worth pursuing?

 -scott

 --
 Chromium Developers mailing list: chromium-dev@googlegroups.com
 View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev


 --
 Chromium Developers mailing list: chromium-dev@googlegroups.com
 View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev


-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev


Re: [chromium-dev] SQLite compression in history database.

2009-11-24 Thread Evan Martin
Due to bugs we've seen users with 10gb history files, which may
contribute to complaints.
  http://code.google.com/p/chromium/issues/detail?id=24947

Even if compression ends up being pretty slow, you could imagine using
it for our archived history (history more than a month old).

On Tue, Nov 24, 2009 at 10:21 AM, Elliot Glaysher (Chromium)
e...@chromium.org wrote:
 I'm all for it. I vaguely remember people complaining about the size
 of our history files, and most of my history files are over 50M.

 -- Elliot

 On Tue, Nov 24, 2009 at 10:13 AM, Scott Hess sh...@chromium.org wrote:
 Long ago when developing fts1, I experimented with using zlib
 compression as part of the implementation.  It fell by the wayside
 because it really didn't provide enough performance improvement (I
 needed an order of magnitude, it didn't provide it), and because of
 licensing issues (fts1/2/3 are part of core SQLite, which does not
 include zlib).

 Chromium already has zlib, and I don't think there's any particular
 reason not to hack our version of fts to support it.  Looking at my
 October history file, I get the following (numbers are in megabytes):

 ls -lh History\ Index\ 2009-10
 # -rw-r--r--@ 1 shess  eng    66M Nov 24 09:38 History Index 2009-10
 .../sqlite3 History\ Index\ 2009-10
 select 
 round(sum(length(c0url)+length(c1title)+length(c2body))/1024.0/1024.0,2)
 from pages_content;
 # 34.9
 select 
 round(sum(length(compress(c0url))+length(compress(c1title))+length(compress(c2body)))/1024.0/1024.0,2)
 from pages_content;
 # 12.29
 select round(sum(length(block))/1024.0/1024.0,2) from pages_segments;
 # 24.6
 select round(sum(length(compress(block)))/1024.0/1024.0,2) from 
 pages_segments;
 # 14.3

 pages_segments is the fts index.  Since it is consulted very
 frequently, I'd be slightly nervous about compressing it.
 pages_content is the document data, which is hit after the index (or
 when doing a lookup by document id), so compressing it shouldn't have
 much performance impact.

 Does this seem like a win worth pursuing?

 -scott

 --
 Chromium Developers mailing list: chromium-dev@googlegroups.com
 View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev


 --
 Chromium Developers mailing list: chromium-dev@googlegroups.com
 View archives, change email options, or unsubscribe:
    http://groups.google.com/group/chromium-dev


-- 
Chromium Developers mailing list: chromium-dev@googlegroups.com 
View archives, change email options, or unsubscribe: 
http://groups.google.com/group/chromium-dev