Re: [sqlite] SQLite and Columnar Databases

2007-12-15 Thread John Stanton

[EMAIL PROTECTED] wrote:

Joe Wilson <[EMAIL PROTECTED]> wrote:

--- [EMAIL PROTECTED] wrote:

Joe Wilson <[EMAIL PROTECTED]> wrote:
The reason why I asked is that I haven't had much luck with sqlite3 
performance for databases larger than the size of RAM on my machine

regardless of PRAGMA settings.

This is probably do to the cache locality problem.  We know how
to fix this, Joe.  Would you like to have a go at it?
I think setting aside contiguous pages in the file for exclusive use 
by each btree would help improve locality of reference on disk.


For example, let A, B and C represent in-use pages of 3 btrees and 
a, b and c represent free pages corresponding to the same btrees:


 AAaAAaAaaaBBbbBBBbbbCCcCCC

Is this what you had in mind in your post?



No.

The problem is when inserting into large database that is
indexed, the values being indexed are randomly distributed.
So with each insert, SQLite has to seek to a new random
place in the file to insert the new index entry there.
It does not matter that pages of the index are not in
consecutive order.  What matters is that each insertion
is into a different place and that the places are randomly
distributed over a large file - larger than will fit in cache.
In this situation, each new entry probably needs to go on 
a page that is not in cache, and hence a real disk

seek and read is required (as opposed to simply reading
the value out of cache) when inserting each new entry.  Disk
seeks and reads are much, much slower than disk cache hits,
which really slows down insertion performance.

If you do many inserts such that the indexed values are
in sorted order, then most inserts will go on the same page
as the previous.  The previous page is already in cache
so it doesn't need to be read from disk.  It has also
already been journaled, so no excess writing is required.
Disk I/O is greatly reduced. Things go much, much faster.
The problem is that you really have the luxury of being
able to insert entries in sorted order.  And if you are
indexing multiple columns, it is impossible to sort
the entries on both columns at once.

The usual way to work around this problem is to only do random
inserts into indices which are small enough to fit into
your disk cache.  Suppose you start inserting into index A.
Once A gets too big to fit entirely in cache, stop inserting
into it and start inserting into B.  Once B gets to be the
cache size, merge A and B together into a new index C.
The merge operation requires reading A and B straight through
from beginning to end once.  This is a lot of disk I/O but
it is still much faster than jumping around within the file
reading bits here and there.  After creating C, reset A and
B back to being empty (since all records have been transfered
into C).  Start inserting into A again until it fills up.
Then fill up B again.  Merge A and B into D, then merge C and D
into E.  Reset A, B, C, and D.  Keep doing this, merging
smaller indices into larger indices, until you insert all
the records you need to insert.  Then make a single pass
through all of the smaller indices and merge them all
together into a single big index Z.  Z becomes the new
index used for search operations on the database.

The algorithm above should probably go into SQLite 
to construct an index as part of the CREATE INDEX

statement.  When populating a database will a large
amount of data, first put all of the data into an
unindexed table.  This will be very fast because each
new entry goes at the end of the table (and also at the
end of the file.)  After all data is in place, issue
the CREATE INDEX statements to index any fields you
think need indexing.  The CREATE INDEX statement has
to populate the index.  The current algorithm is to
scan through the original table and create and insert
index entries one by one as they are encountered.  I am
proposing that you substitute the algorithm outlined in
the previous paragraph in place of the current algorithm.

--
D. Richard Hipp <[EMAIL PROTECTED]>

I experimented with that problem of building B-Tree indices on tables 
and discovered that to first build the table then build a list of keys 
into a file, sort the file using an iterative combination of quicksorts 
and merges then build the index in an optimal manner since the space 
needed for non-leaf nodes can be calculated and reserved, produced good 
performance on large tables even in a restricted memory environment.


Using this approach the data table and temporary files are only ever 
accessed sequentially and the "locality of reference" situation 
sidestepped.  If there are multiple disk drives available the 
intermediate files can be on another drive to further limit head movement.


Ensuring that interior nodes were contiguous seemed to be a winning 
strategy.  Only filling nodes to say 80% also delivered an index which 
would accept quite a lot of inserts before fragmenting with node splits 
and implementing node merging if possible instead of a 

Re: [sqlite] SQLite and Columnar Databases

2007-12-15 Thread Joe Wilson
--- [EMAIL PROTECTED] wrote:
> The problem is when inserting into large database that is
> indexed, the values being indexed are randomly distributed.
> So with each insert, SQLite has to seek to a new random
> place in the file to insert the new index entry there.
> It does not matter that pages of the index are not in
> consecutive order.

For bulk INSERTs I see your point.

But for SELECT performance on the resulting database in a cold OS file 
cache situation (assuming you do not have the luxury of running VACUUM 
beforehand) or when the database file is many times the size of RAM, it 
does improve SELECT performance if the pages of each index are contiguous.
It would be nice if contiguous index pages could be a fortunate "by-product" 
of the bulk INSERT optimization algorithm for this reason.

>  What matters is that each insertion
> is into a different place and that the places are randomly
> distributed over a large file - larger than will fit in cache.
> In this situation, each new entry probably needs to go on 
> a page that is not in cache, and hence a real disk
> seek and read is required (as opposed to simply reading
> the value out of cache) when inserting each new entry.  Disk
> seeks and reads are much, much slower than disk cache hits,
> which really slows down insertion performance.
> 
> If you do many inserts such that the indexed values are
> in sorted order, then most inserts will go on the same page
> as the previous.  The previous page is already in cache
> so it doesn't need to be read from disk.  It has also
> already been journaled, so no excess writing is required.
> Disk I/O is greatly reduced. Things go much, much faster.
> The problem is that you really have the luxury of being
> able to insert entries in sorted order.  And if you are
> indexing multiple columns, it is impossible to sort
> the entries on both columns at once.
> 
> The usual way to work around this problem is to only do random
> inserts into indices which are small enough to fit into
> your disk cache.  Suppose you start inserting into index A.
> Once A gets too big to fit entirely in cache, stop inserting
> into it and start inserting into B.  Once B gets to be the
> cache size, merge A and B together into a new index C.
> The merge operation requires reading A and B straight through
> from beginning to end once.  This is a lot of disk I/O but
> it is still much faster than jumping around within the file
> reading bits here and there.  After creating C, reset A and
> B back to being empty (since all records have been transfered
> into C).  Start inserting into A again until it fills up.
> Then fill up B again.  Merge A and B into D, then merge C and D
> into E.  Reset A, B, C, and D.  Keep doing this, merging
> smaller indices into larger indices, until you insert all
> the records you need to insert.  Then make a single pass
> through all of the smaller indices and merge them all
> together into a single big index Z.  Z becomes the new
> index used for search operations on the database.
> 
> The algorithm above should probably go into SQLite 
> to construct an index as part of the CREATE INDEX
> statement.  When populating a database will a large
> amount of data, first put all of the data into an
> unindexed table.  This will be very fast because each
> new entry goes at the end of the table (and also at the
> end of the file.)  After all data is in place, issue
> the CREATE INDEX statements to index any fields you
> think need indexing.  The CREATE INDEX statement has
> to populate the index.  The current algorithm is to
> scan through the original table and create and insert
> index entries one by one as they are encountered.  I am
> proposing that you substitute the algorithm outlined in
> the previous paragraph in place of the current algorithm.

Are you assuming that you do not have any indexes already in place on 
the table being populated?

It would be nice if the optimization could work in the general case
where the table already has data and indexes (implicit or explicit) 
as in:

  create foo (
a UNIQUE,
b UNIQUE,
c,
d,
primary key(d, b, c)
  );

Or are you suggesting that you first determine whether you're running 
INSERTs within a larger transaction and perform the new table/index population 
algorithm for each INSERT and only combine the various indexes at the
end of the transaction or whenever a page threshold is exceeded?

Guessing the end of INSERTs into the same table could be tricky within a 
larger transaction. End might be defined as when you're running an SQL 
command other than an insert into the same table, as in:

  BEGIN;
  INSERT INTO foo VALUES ...;
-- detect that we've started inserting into table foo
-- and begin the bulk insert optimization
  INSERT INTO foo VALUES ...;
  ...a million other inserts into foo...
  INSERT INTO foo VALUES ...;
  INSERT INTO foo VALUES ...;
  INSERT INTO foo VALUES ...;
-- detect that we've stopped inserting into foo 

Re: [sqlite] SQLite and Columnar Databases

2007-12-15 Thread drh
Joe Wilson <[EMAIL PROTECTED]> wrote:
> --- [EMAIL PROTECTED] wrote:
> > Joe Wilson <[EMAIL PROTECTED]> wrote:
> > > The reason why I asked is that I haven't had much luck with sqlite3 
> > > performance for databases larger than the size of RAM on my machine
> > > regardless of PRAGMA settings.
> > 
> > This is probably do to the cache locality problem.  We know how
> > to fix this, Joe.  Would you like to have a go at it?
> 
> I think setting aside contiguous pages in the file for exclusive use 
> by each btree would help improve locality of reference on disk.
> 
> For example, let A, B and C represent in-use pages of 3 btrees and 
> a, b and c represent free pages corresponding to the same btrees:
> 
>  AAaAAaAaaaBBbbBBBbbbCCcCCC
> 
> Is this what you had in mind in your post?
> 

No.

The problem is when inserting into large database that is
indexed, the values being indexed are randomly distributed.
So with each insert, SQLite has to seek to a new random
place in the file to insert the new index entry there.
It does not matter that pages of the index are not in
consecutive order.  What matters is that each insertion
is into a different place and that the places are randomly
distributed over a large file - larger than will fit in cache.
In this situation, each new entry probably needs to go on 
a page that is not in cache, and hence a real disk
seek and read is required (as opposed to simply reading
the value out of cache) when inserting each new entry.  Disk
seeks and reads are much, much slower than disk cache hits,
which really slows down insertion performance.

If you do many inserts such that the indexed values are
in sorted order, then most inserts will go on the same page
as the previous.  The previous page is already in cache
so it doesn't need to be read from disk.  It has also
already been journaled, so no excess writing is required.
Disk I/O is greatly reduced. Things go much, much faster.
The problem is that you really have the luxury of being
able to insert entries in sorted order.  And if you are
indexing multiple columns, it is impossible to sort
the entries on both columns at once.

The usual way to work around this problem is to only do random
inserts into indices which are small enough to fit into
your disk cache.  Suppose you start inserting into index A.
Once A gets too big to fit entirely in cache, stop inserting
into it and start inserting into B.  Once B gets to be the
cache size, merge A and B together into a new index C.
The merge operation requires reading A and B straight through
from beginning to end once.  This is a lot of disk I/O but
it is still much faster than jumping around within the file
reading bits here and there.  After creating C, reset A and
B back to being empty (since all records have been transfered
into C).  Start inserting into A again until it fills up.
Then fill up B again.  Merge A and B into D, then merge C and D
into E.  Reset A, B, C, and D.  Keep doing this, merging
smaller indices into larger indices, until you insert all
the records you need to insert.  Then make a single pass
through all of the smaller indices and merge them all
together into a single big index Z.  Z becomes the new
index used for search operations on the database.

The algorithm above should probably go into SQLite 
to construct an index as part of the CREATE INDEX
statement.  When populating a database will a large
amount of data, first put all of the data into an
unindexed table.  This will be very fast because each
new entry goes at the end of the table (and also at the
end of the file.)  After all data is in place, issue
the CREATE INDEX statements to index any fields you
think need indexing.  The CREATE INDEX statement has
to populate the index.  The current algorithm is to
scan through the original table and create and insert
index entries one by one as they are encountered.  I am
proposing that you substitute the algorithm outlined in
the previous paragraph in place of the current algorithm.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-15 Thread Joe Wilson
--- [EMAIL PROTECTED] wrote:
> Joe Wilson <[EMAIL PROTECTED]> wrote:
> > The reason why I asked is that I haven't had much luck with sqlite3 
> > performance for databases larger than the size of RAM on my machine
> > regardless of PRAGMA settings.
> 
> This is probably do to the cache locality problem.  We know how
> to fix this, Joe.  Would you like to have a go at it?

I think setting aside contiguous pages in the file for exclusive use 
by each btree would help improve locality of reference on disk.

For example, let A, B and C represent in-use pages of 3 btrees and 
a, b and c represent free pages corresponding to the same btrees:

 AAaAAaAaaaBBbbBBBbbbCCcCCC

Is this what you had in mind in your post?

 http://www.mail-archive.com/sqlite-users@sqlite.org/msg27851.html

If not, could you elaborate?




  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-14 Thread Michael Scharf

This is probably do to the cache locality problem.  We know how
to fix this, Joe.  Would you like to have a go at it?


What is the fix to the "cache locality problem"?


Michael
--
http://MichaelScharf.blogspot.com/


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-14 Thread Tom Briggs
 
> This model is completely removed from how the data is physically 
> stored, eg whether in rows first or in columns first, and the 
> physical store is determined just by the DBMS behind the scenes, and 
> hence is an implementation detail.  The DBMS can arrange how it likes 
> in order to satisfy the logical model in a way that performs well. 
> The users do not contort their schemas beyond what is logically clean 
> in order to gain performance; having a clean schema will let the DBMS 
> infer this automatically.

   This may be true in theory, but if it were really true in practice,
why would anyone bother building databases with different storage
models?  How a database does what it does impacts everything from where
you should use it to how you design your schema to how you write your
queries.

   And if you think that no one contorts their schemas to make their
queries run faster then, umm... Well, I don't think I need to say
anything else about that.

> According to some comments, Vertica (a column-store maker) is making 
> the same case that I am, which is just to have a logical clean 
> schema, and performance benefits will automatically follow from that.

   They say that because their system performs better when your schema
is simpler.  It's not just a philosophical belief that simpler is
better.  

> I will also note that a column-based store essentially works like a 
> heavily indexed row-based store, in which there is an index on every 
> key or every column, and so all searches, which includes those on 
> which joins are performed, can/do look in what is otherwise indexes. 

   Conceptually I agree, though I do not think you could build an
equally-effective system by indexing a row store.

> This is potentially slower for updates (or maybe not), but can be 
> faster for queries, depending on circumstances.

   But... But... How the data is stored is an implementation detail that
users don't need to worry about, right?  So who cares if updates are
slower?  The logical model is the same, right?

   Thank you for making my point. :)

   -T



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread drh
Joe Wilson <[EMAIL PROTECTED]> wrote:
> The reason why I asked is that I haven't had much luck with sqlite3 
> performance for databases larger than the size of RAM on my machine
> regardless of PRAGMA settings.
> 

This is probably do to the cache locality problem.  We know how
to fix this, Joe.  Would you like to have a go at it?
--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Darren Duncan

At 7:59 AM -0500 12/13/07, Tom Briggs wrote:

 > Something I will say about this, for people who don't know, is that

 this columnar thing is strictly an implementation detail.  While


   I think that this is an oversimplification.  That's somewhat like
saying that the way you use a sledge hammer is no different than how you
use a claw hammer, because they're both hammers.  Anyone who tries to
hang a picture with a sledge hammer will be rather unhappy with your
advice.

   Though at some level how the data is stored is indeed an
implementation detail, to take full advantage of the fact that it is
requires re-thinking schema design and in some cases even query design.
See
http://www.fulltablescan.com/index.php?/archives/44-Compression-as-Normalization...-Kinda.html 
for more info.


If you read further into your own link, including the comments, 
you'll see my case also being made.


I don't see your claw hammer / sledge hammer argument being applicable here.

The relational model of data is all about letting users defining the 
logical/possible structure of their data, with a focus on the meaning 
of the data relative to other data.  It is intended to make data 
manipulation and queries both easy to use and logically rigorous. 
The schemas that users design, including their so-called 
normalization traits, are just for reflecting the meaning of their 
data in a logical manner.


This model is completely removed from how the data is physically 
stored, eg whether in rows first or in columns first, and the 
physical store is determined just by the DBMS behind the scenes, and 
hence is an implementation detail.  The DBMS can arrange how it likes 
in order to satisfy the logical model in a way that performs well. 
The users do not contort their schemas beyond what is logically clean 
in order to gain performance; having a clean schema will let the DBMS 
infer this automatically.


According to some comments, Vertica (a column-store maker) is making 
the same case that I am, which is just to have a logical clean 
schema, and performance benefits will automatically follow from that.


I will also note that the highest level of schema normalization, 6th 
Normal Form, essentially puts every non-key column in its own table.


I will also note that a column-based store essentially works like a 
heavily indexed row-based store, in which there is an index on every 
key or every column, and so all searches, which includes those on 
which joins are performed, can/do look in what is otherwise indexes. 
This is potentially slower for updates (or maybe not), but can be 
faster for queries, depending on circumstances.


-- Darren Duncan

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread drh
Dennis Cote <[EMAIL PROTECTED]> wrote:
> 
> Richard,
> 
> How was the current default page size of 1K determined? Was there any 
> testing to see if larger page sizes would be beneficial for general use? 
> Or is it just a historical circumstance (i.e. an arbitrarily chosen 
> value that has been maintained for backwards compatability)?
> 
> I'm thinking the "usual" case may be changing as available storage gets 
> larger, and users start storing more and larger BLOBs  in the database.
> 

Experiments with version 2.0.0 shows that 1KiB was fastest
on Linux.

Since version 3.5.0, SQLite will sometimes choose a larger
default page size.  The size choosen is the largest of:

* 1024 bytes
* The "sector size" as reported by the VFS layer
* The largest size that can be written atomically

In the current implementations for win32 and unix, the
sector size is hard-coded to 512.  (This may need to change
in the future with newer 2048-byte sector disk drives.)
And both interfaces assume writes are never atomic.  So
the default still ends up being 1024.  But the mechanism
is in place to choose different page sizes automatically.
Some embedded device makers with custom VFS layers use
this mechanism already.

Note that a page is the minimum unit of I/O for SQLite.
So increasing the page size from 1024 to 4096 means that
4 times as many bytes have to be written in order to make
a 1-byte change to a single record.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Dennis Cote

[EMAIL PROTECTED] wrote:


Whenever you start a new transaction, SQLite has to
allocate and clear a bitmap used to record which
pages have changes in the database file.  The size
of this bitmap is proportional to the size of the
database file.  The size of the bitmap is 256 bytes
per megabyte of database file assuming a 1KiB page
size.  That translates into about 52MB for a 200GiB
database.  This is doable, but probably not optimal.

Methods for reducing the memory requirements of this
bitmap (so that it is proportional to the size of the
change rather than the size of the database) have been
proposed, but we have not taken any steps toward
implementing them since the current approach works
well enough.  But as people start to push SQLite 
into the 100GiB size and beyond, we will likely revisit

that decision.

  


Richard,

How was the current default page size of 1K determined? Was there any 
testing to see if larger page sizes would be beneficial for general use? 
Or is it just a historical circumstance (i.e. an arbitrarily chosen 
value that has been maintained for backwards compatability)?


I'm thinking the "usual" case may be changing as available storage gets 
larger, and users start storing more and larger BLOBs  in the database.


Dennis Cote


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Dennis Cote

Samuel R. Neff wrote:

That also brings up the simple solution in that if you know you're db is
going to be in the 200GiB range, declare a larger page size before creating
the DB.

  


My thoughts exactly.

Dennis Cote






-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Samuel R. Neff

That also brings up the simple solution in that if you know you're db is
going to be in the 200GiB range, declare a larger page size before creating
the DB.

Sam


---
We're Hiring! Seeking a passionate developer to join our team building Flex
based products. Position is in the Washington D.C. metro area. If interested
contact [EMAIL PROTECTED]
 
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 13, 2007 12:21 PM
To: sqlite-users@sqlite.org
Subject: Re: [sqlite] SQLite and Columnar Databases

Joe Wilson <[EMAIL PROTECTED]> wrote:

... The size of the bitmap is 256 bytes
per megabyte of database file assuming a 1KiB page
size.  That translates into about 52MB for a 200GiB
database.  ...

--
D. Richard Hipp <[EMAIL PROTECTED]>




-
To unsubscribe, send email to [EMAIL PROTECTED]

-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Joe Wilson
The reason why I asked is that I haven't had much luck with sqlite3 
performance for databases larger than the size of RAM on my machine
regardless of PRAGMA settings.

Maybe querying speed is fine for multi gigabyte database files, but 
INSERT speed into tables with multiple indexes is slow when you exceed 
the RAM threshold, so I did not pursue it.

I guess the answer is to get more RAM - it's cheap.
...and use a 64 bit OS.

High speed non-volatile RAM drives will be commonplace soon enough 
and alleviate this problem.

--- Tom Briggs <[EMAIL PROTECTED]> wrote:
>Heh, no, I've never tried, but I don't see much reason why I
> couldn't.  I was just trying to make the point that labeling SQLite as
> "good ... for smaller databases" was not a slight. 
> 
>-T
> 
> > -Original Message-
> > From: Joe Wilson [mailto:[EMAIL PROTECTED] 
> > Sent: Thursday, December 13, 2007 11:51 AM
> > To: sqlite-users@sqlite.org
> > Subject: RE: [sqlite] SQLite and Columnar Databases
> > 
> > --- Tom Briggs <[EMAIL PROTECTED]> wrote:
> > >For clarity, my definition of small is about 200GB, so I'm not
> > > selling SQLite short here... 
> > 
> > Are you able to get decent performance out of sqlite3 for a 
> > 200GB database?
> > 
> > How much RAM do you have on such a machine?




  

Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread drh
Joe Wilson <[EMAIL PROTECTED]> wrote:
> --- Tom Briggs <[EMAIL PROTECTED]> wrote:
> >For clarity, my definition of small is about 200GB, so I'm not
> > selling SQLite short here... 
> 
> Are you able to get decent performance out of sqlite3 for a 200GB database?
> 
> How much RAM do you have on such a machine?
> 

Whenever you start a new transaction, SQLite has to
allocate and clear a bitmap used to record which
pages have changes in the database file.  The size
of this bitmap is proportional to the size of the
database file.  The size of the bitmap is 256 bytes
per megabyte of database file assuming a 1KiB page
size.  That translates into about 52MB for a 200GiB
database.  This is doable, but probably not optimal.

Methods for reducing the memory requirements of this
bitmap (so that it is proportional to the size of the
change rather than the size of the database) have been
proposed, but we have not taken any steps toward
implementing them since the current approach works
well enough.  But as people start to push SQLite 
into the 100GiB size and beyond, we will likely revisit
that decision.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Tom Briggs

   Heh, no, I've never tried, but I don't see much reason why I
couldn't.  I was just trying to make the point that labeling SQLite as
"good ... for smaller databases" was not a slight. 

   -T

> -Original Message-
> From: Joe Wilson [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, December 13, 2007 11:51 AM
> To: sqlite-users@sqlite.org
> Subject: RE: [sqlite] SQLite and Columnar Databases
> 
> --- Tom Briggs <[EMAIL PROTECTED]> wrote:
> >For clarity, my definition of small is about 200GB, so I'm not
> > selling SQLite short here... 
> 
> Are you able to get decent performance out of sqlite3 for a 
> 200GB database?
> 
> How much RAM do you have on such a machine?
> 
> 
>   
> __
> __
> Looking for last minute shopping deals?  
> Find them fast with Yahoo! Search.  
> http://tools.search.yahoo.com/newsearch/category.php?category=shopping
> 
> --
> ---
> To unsubscribe, send email to [EMAIL PROTECTED]
> --
> ---
> 
> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Joe Wilson
--- Tom Briggs <[EMAIL PROTECTED]> wrote:
>For clarity, my definition of small is about 200GB, so I'm not
> selling SQLite short here... 

Are you able to get decent performance out of sqlite3 for a 200GB database?

How much RAM do you have on such a machine?


  

Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  
http://tools.search.yahoo.com/newsearch/category.php?category=shopping

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Tom Briggs

   Based on my experience with SQLite, it would be a huge undertaking to
re-work it to use column-oriented storage.  And I don't think it would
really fit with SQLite's goal, either; column oriented databases are
best suited to aggregate queries against large amounts of data, while
SQLite is best at transactional operations against smaller amounts of
data.

   For clarity, my definition of small is about 200GB, so I'm not
selling SQLite short here... Don't everyone get all upset at me. :)

   -T

> -Original Message-
> From: Yuvaraj Athur Raghuvir [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, December 12, 2007 6:12 PM
> To: sqlite-users@sqlite.org
> Subject: [sqlite] SQLite and Columnar Databases
> 
> Hello,
> 
> There seems to be a high interest in columnar databases recently.
> 
> Is there any plan of supporting data organization as a 
> columnar database in
> SQLite? What are the challenges here?
> 
> Regards,
> Yuva
> 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] SQLite and Columnar Databases

2007-12-13 Thread Tom Briggs
 

> Something I will say about this, for people who don't know, is that 
> this columnar thing is strictly an implementation detail.  While 

   I think that this is an oversimplification.  That's somewhat like
saying that the way you use a sledge hammer is no different than how you
use a claw hammer, because they're both hammers.  Anyone who tries to
hang a picture with a sledge hammer will be rather unhappy with your
advice.

   Though at some level how the data is stored is indeed an
implementation detail, to take full advantage of the fact that it is
requires re-thinking schema design and in some cases even query design.
See
http://www.fulltablescan.com/index.php?/archives/44-Compression-as-Norma
lization...-Kinda.html for more info.

   -T

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] SQLite and Columnar Databases

2007-12-12 Thread Darren Duncan

At 4:42 AM +0530 12/13/07, Yuvaraj Athur Raghuvir wrote:

Hello,
There seems to be a high interest in columnar databases recently.
Is there any plan of supporting data organization as a columnar database in
SQLite? What are the challenges here?
Regards,
Yuva


Something I will say about this, for people who don't know, is that 
this columnar thing is strictly an implementation detail.  While 
adopting it would be probably backwards-incompatible file formats, 
the programmatic API and SQL et al wouldn't have to change a bit and 
would remain compatible. -- Darren Duncan


-
To unsubscribe, send email to [EMAIL PROTECTED]
-