Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
That's what I'm testing right now... thanks a lot for the heads up...
b

On Nov 14, 2007 1:41 AM, Trevor Talbot <[EMAIL PROTECTED]> wrote:
> On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote:
>
> > Then each column is added on the fly, using:
> >
> > ALTER TABLE table ADD COLUMN colunm REAL;
> >
> > The columns in then filled used UPDATE as described previously.
>
> I haven't investigated the storage layout of this, but from what the
> docs imply, I suspect making use of ADD COLUMN will result in SELECTs
> being as slow as UPDATEs, possibly worse.  When I said I expect the
> final queries to be faster, I was envisioning all of the columns
> created initially, just filled with zeros.  I suspect filling them
> with NULL initially would make update performance worse also.
>
> I'll let others confirm/deny this, and comment on the performance of
> ADD COLUMN in general.
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Trevor Talbot
On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote:

> Then each column is added on the fly, using:
>
> ALTER TABLE table ADD COLUMN colunm REAL;
>
> The columns in then filled used UPDATE as described previously.

I haven't investigated the storage layout of this, but from what the
docs imply, I suspect making use of ADD COLUMN will result in SELECTs
being as slow as UPDATEs, possibly worse.  When I said I expect the
final queries to be faster, I was envisioning all of the columns
created initially, just filled with zeros.  I suspect filling them
with NULL initially would make update performance worse also.

I'll let others confirm/deny this, and comment on the performance of
ADD COLUMN in general.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Trevor Talbot
On 11/13/07, Joe Wilson <[EMAIL PROTECTED]> wrote:

> If you use the idiom whereby each thread solely takes its tasks
> from a thread-safe work queue, you can have a clear separation of
> responsibilities and minimal or preferably no shared-state between
> threads. You get concurrency as a side effect of this simple hybrid
> event-passing/thread scheme.

That model is one Windows NT has explicit support for.  Its "I/O
Completion Port" is essentially a message queue that worker threads
can wait on for tasks.  In cooperation with the scheduler, it tries to
keep exactly as many threads as there are CPU cores active at any
given time, such as by waking a new thread when a busy one blocks for
I/O.

In the context of sqlite, though, I don't see much point to sharing a
single connection across threads.  I'd prefer to just dedicate a
message-based thread to the job.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Joe Wilson
--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote:
> 
> On Nov 13, 2007, at 10:55 PM, Joe Wilson wrote:
> 
> > http://home.pacbell.net/ouster/threads.pdf
> 
> JO and I reach a similar conclusion but by different
> reasoning, I think.

I like this line:

  Should You Abandon Threads?
  * No: important for high-end servers (e.g. databases).

If you use the idiom whereby each thread solely takes its tasks 
from a thread-safe work queue, you can have a clear separation of 
responsibilities and minimal or preferably no shared-state between 
threads. You get concurrency as a side effect of this simple hybrid 
event-passing/thread scheme.

Thread madness lies in complex multi-layer mutexes and shared data.


  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread D. Richard Hipp


On Nov 13, 2007, at 10:55 PM, Joe Wilson wrote:


http://home.pacbell.net/ouster/threads.pdf


JO and I reach a similar conclusion but by different
reasoning, I think.



--- Richard Klein <[EMAIL PROTECTED]> wrote:

[EMAIL PROTECTED] wrote:

What?  And encourage people to write multitheaded programs?
Not likely...


I've been meaning to ask ... When you say that multiple threads
are evil, do you mean "as opposed to multiple processes"?  Or
do you feel that multiprogramming in general is evil?

- Richard Klein




   
__ 
__

Get easy, one-click access to your favorites.
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs

-- 
---

To unsubscribe, send email to [EMAIL PROTECTED]
-- 
---





D. Richard Hipp
[EMAIL PROTECTED]




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread D. Richard Hipp


On Nov 13, 2007, at 10:37 PM, Richard Klein wrote:


[EMAIL PROTECTED] wrote:

What?  And encourage people to write multitheaded programs?
Not likely...


I've been meaning to ask ... When you say that multiple threads
are evil, do you mean "as opposed to multiple processes"?  Or
do you feel that multiprogramming in general is evil?


Threads are (usually) fine as long as each thread has its own
address space that the other threads cannot mess with.  In other
words, I  have no issues with separate processes provided that
separate processes really are needed.  For example, it is often
a good idea to run your GUI in a separate process from your
compute engine so that long computations don't free the display.


D. Richard Hipp
[EMAIL PROTECTED]




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Joe Wilson
http://home.pacbell.net/ouster/threads.pdf

--- Richard Klein <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > What?  And encourage people to write multitheaded programs?
> > Not likely...
> 
> I've been meaning to ask ... When you say that multiple threads
> are evil, do you mean "as opposed to multiple processes"?  Or
> do you feel that multiprogramming in general is evil?
> 
> - Richard Klein



  

Get easy, one-click access to your favorites. 
Make Yahoo! your homepage.
http://www.yahoo.com/r/hs 

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Richard Klein

[EMAIL PROTECTED] wrote:

What?  And encourage people to write multitheaded programs?
Not likely...


I've been meaning to ask ... When you say that multiple threads
are evil, do you mean "as opposed to multiple processes"?  Or
do you feel that multiprogramming in general is evil?

- Richard Klein


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Joe Wilson
--- [EMAIL PROTECTED] wrote:
> Joe Wilson <[EMAIL PROTECTED]> wrote:
> > You might mention the library is multi-thread safe in the Features 
> > section of http://www.sqlite.org/about.html
> 
> What?  And encourage people to write multitheaded programs?
> Not likely...

Good luck with that quest.

People are going to want to know it whether you want to encourage it
or not. 

How about mentioning it on the http://www.sqlite.org/compile.html page,
at least?

> > Do you have a page that describes all the SQLITE_OMIT_* ifdefs 
> > and compile options?
> > 
> 
> http://www.sqlite.org/compile.html  
> Likely it needs to be updated.

Is there a link to it from the Features section of the About page?
Maybe with a comment like "Highly customizable" or something.



  

Be a better sports nut!  Let your teams follow you 
with Yahoo Mobile. Try it now.  
http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
It appears my storage bandwidth is awesome if I had only 10
columns (as my toy example had).

My script creates the table and adds values to the primary key field.

Then each column is added on the fly, using:

ALTER TABLE table ADD COLUMN colunm REAL;

The columns in then filled used UPDATE as described previously.

Maybe you guys don't care and I apologize in case the contents of this
message are not of interest, but the following link demonstrates the
time needed to add one extra column of 1M reals on our system here:

http://www.biostat.jhsph.edu/~bcarvalh/sqlite.png

I totally understand that the amount of time should increase, but I
found it striking to see the jump when I add the 111th column.

I'll try to process and add a few columns at a time as suggested.

Thank you very much,

Kindest regards,

--benilton

On Nov 13, 2007 9:29 PM, Trevor Talbot <[EMAIL PROTECTED]> wrote:
> On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote:
>
> > I have "fixed" my problem, working late night and not sleeping usually
> > causes that: I missed a WHERE on the UPDATE statement, so I could
> > modify multiple rows at a time. Basically what I did was create a
> > primary key (integer), which is the "row number of my matrix" and run
> > the UPDATE like:
> >
> > UPDATE table SET column =  WHERE id = ;
> >
> > shame on me... and I apologize for missing such basic thing...
>
> No worries, we all miss that one at one time or another.
>
> > But, anyways, the first INSERT takes 4secs for all 1M rows.. the
> > updates take 9secs, each.
>
> I'm impressed with your storage bandwidth.  If you don't need the
> database to be persistent (you restart the whole process in case of
> failure), turning the synchronous pragma off may help your I/O times a
> bit.  SQLite won't try so hard to keep the data safe from
> interruptions that way.
>
> > I'm in the process of creating the whole db now and will give it a
> > try... maybe creation is the worse part, as I'm going to be accessing
> > contiguous sets of rows (all columns) at a time... something like:
> >
> > SELECT * FROM table LIMIT 1 OFFSET 2;
>
> If possible, I'd recommend a simple "SELECT * FROM table;" and only
> fetch 1 at a time.  I don't know what the RSQLite interface is
> like though.  (The reasoning is that OFFSET must look at all the rows
> prior, so each query will take longer than the previous.)  But yes, I
> would expect this to work much faster than the updates, as it should
> be very nearly contiguous on disk.
>
> My only other suggestion at the moment would be to buffer and batch
> updates if possible.  Process a few columns at a time, and update them
> all at once like:
>
> UPDATE table SET column1 = , column2 =  ... WHERE id = ;
>
> Each update should take about the same amount of time (9s), but you'll
> get more done each time.
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread drh
Joe Wilson <[EMAIL PROTECTED]> wrote:
> You might mention the library is multi-thread safe in the Features 
> section of http://www.sqlite.org/about.html

What?  And encourage people to write multitheaded programs?
Not likely...

> 
> Do you have a page that describes all the SQLITE_OMIT_* ifdefs 
> and compile options?
> 

http://www.sqlite.org/compile.html  
Likely it needs to be updated.

--
D. Richard Hipp <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Joe Wilson
You might mention the library is multi-thread safe in the Features 
section of http://www.sqlite.org/about.html

Do you have a page that describes all the SQLITE_OMIT_* ifdefs 
and compile options?



  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Joe Wilson
--- [EMAIL PROTECTED] wrote:
>   *  Somebody please suggest a better tag line - 
>  something better than "The World's Most Widely
>  Used SQL Database".

New site looks good. The tag line is perfect.

Some ideas:

- Reduce the size of the logo and fonts everywhere

- Allow clicking on the logo to go to the home page

- This line is not necessary: "There are no known issues 
  affecting database integrity or correctness."
  It implies past versions had problems - which is true of
  all software every made.

- remove  dots

- How much are you attached to the color cyan?
  Might you consider an alternate color scheme?



  

Never miss a thing.  Make Yahoo your home page. 
http://www.yahoo.com/r/hs

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Trevor Talbot
On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote:

> I have "fixed" my problem, working late night and not sleeping usually
> causes that: I missed a WHERE on the UPDATE statement, so I could
> modify multiple rows at a time. Basically what I did was create a
> primary key (integer), which is the "row number of my matrix" and run
> the UPDATE like:
>
> UPDATE table SET column =  WHERE id = ;
>
> shame on me... and I apologize for missing such basic thing...

No worries, we all miss that one at one time or another.

> But, anyways, the first INSERT takes 4secs for all 1M rows.. the
> updates take 9secs, each.

I'm impressed with your storage bandwidth.  If you don't need the
database to be persistent (you restart the whole process in case of
failure), turning the synchronous pragma off may help your I/O times a
bit.  SQLite won't try so hard to keep the data safe from
interruptions that way.

> I'm in the process of creating the whole db now and will give it a
> try... maybe creation is the worse part, as I'm going to be accessing
> contiguous sets of rows (all columns) at a time... something like:
>
> SELECT * FROM table LIMIT 1 OFFSET 2;

If possible, I'd recommend a simple "SELECT * FROM table;" and only
fetch 1 at a time.  I don't know what the RSQLite interface is
like though.  (The reasoning is that OFFSET must look at all the rows
prior, so each query will take longer than the previous.)  But yes, I
would expect this to work much faster than the updates, as it should
be very nearly contiguous on disk.

My only other suggestion at the moment would be to buffer and batch
updates if possible.  Process a few columns at a time, and update them
all at once like:

UPDATE table SET column1 = , column2 =  ... WHERE id = ;

Each update should take about the same amount of time (9s), but you'll
get more done each time.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Samuel R. Neff

I think the "about" text misses some of what, to me, are the most important
parts of SQLite

- in-process
- zero maintenance

Also as a .NET developer I would be put off by the "C-Library" reference.
SQLite works very well in many languages regardless of the fact that it's
written in C.

I would propose something along these lines..


SQLite is a high-efficiency, in-process, transactional database engine that
supports the majority of SQL92 syntax, stores data in a single a disk file,
and requires zero maintenance.


HTH,

Sam 


---
We're Hiring! Seeking a passionate developer to join our team building Flex
based products. Position is in the Washington D.C. metro area. If interested
contact [EMAIL PROTECTED]
 
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 13, 2007 8:40 PM
To: sqlite-users@sqlite.org
Subject: [sqlite] Request for help with the SQLite Website

The new look for the SQLite website is now in place,
if you haven't already noticed:

http://www.sqlite.org/


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Roger Binns
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

[EMAIL PROTECTED] wrote:
> I am still looking for suggestions, comments,
> and bug reports. 

There is no search on the front page or even anywhere else on the site
that I can find.  You don't even have to go to the hassle of adding your
own - Google has a widget that is trivial to add.  As an example that is
how the Apache site deals with their searching.

 http://www.google.com/coop/cse/

Roger
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHOleHmOOfHg372QQRAlH3AKDZa1p44CQ3fe7mHOPxv046fy4O3wCgolbd
GnxFqiLR4BrDWnKF3Gnjvng=
=nxYj
-END PGP SIGNATURE-

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Request for help with the SQLite Website

2007-11-13 Thread Rick Langschultz
This is a big improvement to the website. Keep up the great work on  
SQLite. Looking forward to 4.0

On Nov 13, 2007, at 7:40 PM, [EMAIL PROTECTED] wrote:


The new look for the SQLite website is now in place,
if you haven't already noticed:

   http://www.sqlite.org/

Even though the new look is "in place" you should
understand this as a work in progress, not a done
deal.  I am still looking for suggestions, comments,
and bug reports.  I am particularly interested in
help in the following ways:

 *  Suggestions for something better to put on
the home page.

 *  Suggestions for better CSS for the CVSTrac
pages.  Example:  http://www.sqlite.org/cvstrac/timeline
If you can save off a copy of that page, adjust
the CSS to make it look better, then send me
your adjustments (or post them here) that would
be a *big* help.

 *  Somebody please suggest a better tag line -
something better than "The World's Most Widely
Used SQL Database".

 *  Suggest changes that will provide a search
bar in the upper right-hand corner.

Over the next couple of weeks we plan on adding
some additional pages to the site (mostly moving
over information currently in the wiki) and doing
additional reorganization and cleanup.  We also
plan to offer the documentation pages as a ZIP or
tarball download so that users can view them
offline.

The feedback from this mailing list has so far been
very helpful.  Please don't stop offering suggestions.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
Unfortunately the matrix is not sparse nor the direction of operations
can be changed.

The reason is that I have 2K samples, which are given in one file each
(2K binary files). Each of the files has roughly 7M numbers, which I
process by running some algorithms I have, reducing from 7M to 1M
elements.

The best scenario I see here is dumping these data somewhere before
proceeding to the next sample... This processing is in a sample by
sample (column) basis After the process is done, the second step
is done on a row by row basis (actually batches of rows, so reads are
minimized and there's still enough RAM to perform the computations I
need).

I have "fixed" my problem, working late night and not sleeping usually
causes that: I missed a WHERE on the UPDATE statement, so I could
modify multiple rows at a time. Basically what I did was create a
primary key (integer), which is the "row number of my matrix" and run
the UPDATE like:

UPDATE table SET column =  WHERE id = ;

shame on me... and I apologize for missing such basic thing...

But, anyways, the first INSERT takes 4secs for all 1M rows.. the
updates take 9secs, each.

I'm in the process of creating the whole db now and will give it a
try... maybe creation is the worse part, as I'm going to be accessing
contiguous sets of rows (all columns) at a time... something like:

SELECT * FROM table LIMIT 1 OFFSET 2;

Any recommendation/suggestion is welcome... I very much appreciate...

Best,

Benilton

On Nov 13, 2007 6:31 PM, Trevor Talbot <[EMAIL PROTECTED]> wrote:
> On 11/13/07, John Stanton <[EMAIL PROTECTED]> wrote:
>
> > The point is that the more statements you have in each transaction the
> > better, up to a limit of perhaps a thousand or so.
>
> Actually, I get the impression that number of statements is not really
> the problem here.
>
>
> On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote:
>
> > I'm in a situation where I need to handle 6 matrices with 1M rows and
> > about 2K columns, ie each matrix takes roughly 15GB RAM.
>
> > My (very naive) idea was to use SQLite to store these matrices (via
> > RSQLite package). So I have the following:
> >
> > CREATE TABLE alleleA (sample1 REAL, sample2 REAL 
> > sample2000 REAL);
> >
> > When I have the data for sample1, I use an INSERT statement, which
> > takes about 4secs.
>
> For all 1 million rows?
>
> > For all the other columns, I use and UPDATE statement, which is taking
> > hours (more the 8 now).
>
> Using one UPDATE per row, identifying each individual row with SQLite's rowid?
>
>
> So to verify: you're building a 100 x 2000 matrix by column, and
> intend to process it by row afterward.  Is there any way you can make
> the two operations use the same direction: either build by row or
> process by column?
>
> In physical layout terms, handling a matrix of that size in two
> directions is lousy in the efficiency department, as there's no room
> for effective caching.  If you consider a storage format that stores a
> row at a time sequentially, then the first pass simply writes out rows
> of 2K zero values, in order, which is fine.  In disk and OS terms, a
> file consists of blocks or pages, and it does I/O and caching in units
> of that size.  We'll say 4KB pages for the sake of argument.  So that
> first pass collects 4KB of data and writes it to the disk in one shot,
> which is about as efficient as you can get.
>
> When you decide to update a column at a time, then it's skipping
> around: modify a tiny value on the first page, skip 16000 bytes ahead
> (2000x 8byte REAL values) modify another tiny value, etc.  The I/O
> cost of this is huge, since you're reading and writing an entire page
> just to touch one tiny little value.  Since you never do anything else
> with that page, it gets evicted from cache pretty quickly, to make
> room for the following pages as you work through the file.
>
> That works out to roughly 4GB of raw I/O for each column updated.
> Repeat 2000 times.
>
> If you can't change how you do the operations, you'll have to find
> some storage structure that's more efficient.  There are still some
> things to try, but it may simply mean discarding SQLite (and most
> other relational databases) as inappropriate for the job.
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Request for help with the SQLite Website

2007-11-13 Thread drh
The new look for the SQLite website is now in place,
if you haven't already noticed:

http://www.sqlite.org/

Even though the new look is "in place" you should
understand this as a work in progress, not a done
deal.  I am still looking for suggestions, comments,
and bug reports.  I am particularly interested in
help in the following ways:

  *  Suggestions for something better to put on
 the home page.  

  *  Suggestions for better CSS for the CVSTrac
 pages.  Example:  http://www.sqlite.org/cvstrac/timeline
 If you can save off a copy of that page, adjust
 the CSS to make it look better, then send me
 your adjustments (or post them here) that would
 be a *big* help.

  *  Somebody please suggest a better tag line - 
 something better than "The World's Most Widely
 Used SQL Database".

  *  Suggest changes that will provide a search
 bar in the upper right-hand corner.

Over the next couple of weeks we plan on adding
some additional pages to the site (mostly moving
over information currently in the wiki) and doing
additional reorganization and cleanup.  We also 
plan to offer the documentation pages as a ZIP or
tarball download so that users can view them
offline.

The feedback from this mailing list has so far been
very helpful.  Please don't stop offering suggestions.

--
D. Richard Hipp <[EMAIL PROTECTED]>



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



RE: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Griggs, Donald
Regarding:building a 100 row x 2000 column matrix 

If by chance these are sparse matrices (i.e., the majority of the values
are empty) then conceivably you could store only the available values.
The schema might then be something like:

CREATE TABLE myTable ( 
 matrixRow   INTEGER,
 matrixCol   INTEGER,
 sample  REAL );

I imagine you'd want to create an index on matrixRow after import.

If the matrices are NOT sparse, then you'd need 2 thousand million
inserts per matrix (2 billion in USA parlance) and more disk space than
your original schema -- probably not fun.




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Trevor Talbot
On 11/13/07, John Stanton <[EMAIL PROTECTED]> wrote:

> The point is that the more statements you have in each transaction the
> better, up to a limit of perhaps a thousand or so.

Actually, I get the impression that number of statements is not really
the problem here.


On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote:

> I'm in a situation where I need to handle 6 matrices with 1M rows and
> about 2K columns, ie each matrix takes roughly 15GB RAM.

> My (very naive) idea was to use SQLite to store these matrices (via
> RSQLite package). So I have the following:
>
> CREATE TABLE alleleA (sample1 REAL, sample2 REAL 
> sample2000 REAL);
>
> When I have the data for sample1, I use an INSERT statement, which
> takes about 4secs.

For all 1 million rows?

> For all the other columns, I use and UPDATE statement, which is taking
> hours (more the 8 now).

Using one UPDATE per row, identifying each individual row with SQLite's rowid?


So to verify: you're building a 100 x 2000 matrix by column, and
intend to process it by row afterward.  Is there any way you can make
the two operations use the same direction: either build by row or
process by column?

In physical layout terms, handling a matrix of that size in two
directions is lousy in the efficiency department, as there's no room
for effective caching.  If you consider a storage format that stores a
row at a time sequentially, then the first pass simply writes out rows
of 2K zero values, in order, which is fine.  In disk and OS terms, a
file consists of blocks or pages, and it does I/O and caching in units
of that size.  We'll say 4KB pages for the sake of argument.  So that
first pass collects 4KB of data and writes it to the disk in one shot,
which is about as efficient as you can get.

When you decide to update a column at a time, then it's skipping
around: modify a tiny value on the first page, skip 16000 bytes ahead
(2000x 8byte REAL values) modify another tiny value, etc.  The I/O
cost of this is huge, since you're reading and writing an entire page
just to touch one tiny little value.  Since you never do anything else
with that page, it gets evicted from cache pretty quickly, to make
room for the following pages as you work through the file.

That works out to roughly 4GB of raw I/O for each column updated.
Repeat 2000 times.

If you can't change how you do the operations, you'll have to find
some storage structure that's more efficient.  There are still some
things to try, but it may simply mean discarding SQLite (and most
other relational databases) as inappropriate for the job.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] One line batch file?

2007-11-13 Thread John

Owen Watson wrote:

I'd like to have a one line batch file:
sqlite3 test 'insert into testable values ('value1','value2')'


have you tried something like:

echo insert into testable values ('value1', 'value2'); | sqlite3 test

note ";" at end of statement and "|" pipe char in command line.
This should work in a Windows "cmd" console - not tested, unix/linux 
would be similar.




but the few variants of this I've tried don't work.

I've seen and understood the batch file that calls another text file
approach but I was wondering  if I could avoid this overhead for a
one-liner.

-
To unsubscribe, send email to [EMAIL PROTECTED]
-





--
Regards
   John McMahon
  [EMAIL PROTECTED]


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] BLOB data performance?

2007-11-13 Thread John Stanton
You might find the method used by Squid to manage its cache would be 
worth emulating.


Using TransmitFile on Windows or sendfile on Unix to despatch the file 
to the network is by far the most efficient way to pass on files from a 
cache.  It avoids a number of levels of buffer shadowing.


Andreas Volz wrote:

Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton:

In a cache situation I would expect that keeping the binary data in 
files would be preferable because you can use far more efficient 
mechanisms for loading them into your cache and in particular in 
transmitting them downstream.  Your DB only needs to store a pathname.


Just be wary of directory size, and do not put them all in the one 
directory.


I noticed that problem in my current situation. I don't know the file
number and size limit in Linux or Windows, but I'm sure there is a
limit.

My main problem is to find a good algorithm to name the cached files
and split them into directories. My current idea is:

1) Put the URL into DB
2) Use a hash function to create a unique name for the cache file
3) Insert the hash name into the same row as the URL

The problem with many files in a directory:

4) Use e.g. 'modulo 11' on the URL hash value to get one of ten
directory names where to find a file.

But this has the drawback to have a static number of cache directories.
The algorithm isn't scalable with growing files.

Do you think is a good way? Or do you've another idea?

regards
Andreas

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread John Stanton
The point is that the more statements you have in each transaction the 
better, up to a limit of perhaps a thousand or so.


Benilton Carvalho wrote:

I'm using RSQLite functions like:

dbBeginTransaction(db2)
dbGetPreparedQuery(db2, sql, bind.data=df)
dbCommit(db2)

where db2 is the connection to the SQLite db, sql is either the INSERT
or UPDATE statements I mentioned and df is the R object (data.frame)
containing the data to be inserted/updated.

INSERT INTO alleleA sample1 VALUES (:x);
UPDATE alleleA SET sample2 = :x;

where "x" is the column in the data.frame "df".

Is that type of thing you want to know? Sorry for any misunderstanding.

b

On Nov 13, 2007 9:26 AM, John Stanton <[EMAIL PROTECTED]> wrote:

How are you using transactions?


Benilton Carvalho wrote:

Hi Everyone,

I use R to create tools for analysis of microarrays
(http://www.bioconductor.org).

I'm in a situation where I need to handle 6 matrices with 1M rows and
about 2K columns, ie each matrix takes roughly 15GB RAM.

The procedure I'm working on can be divided in two parts:

1) I read an input file, from which I compute one column for each of
the matrices I mentioned above;

2) After the matrices are ready,all the computations I do can be
performed in batches of rows (say 10K rows at a time), so there's no
need to have all the matrices in memory at the same time.

My (very naive) idea was to use SQLite to store these matrices (via
RSQLite package). So I have the following:

CREATE TABLE alleleA (sample1 REAL, sample2 REAL 
sample2000 REAL);

When I have the data for sample1, I use an INSERT statement, which
takes about 4secs.

For all the other columns, I use and UPDATE statement, which is taking
hours (more the 8 now).

What are the obvious things I'm missing here? Or do you have any other
suggestions in order to improve the performance?

Thank you very much,

b

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] BLOB data performance?

2007-11-13 Thread Ken
I think your blob file performance may greatly depend upon the file system that 
it used and the workload.

I found this article:

http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf


Andreas Volz <[EMAIL PROTECTED]> wrote: Am Tue, 13 Nov 2007 07:18:19 -0600 
schrieb John Stanton:

> In a cache situation I would expect that keeping the binary data in 
> files would be preferable because you can use far more efficient 
> mechanisms for loading them into your cache and in particular in 
> transmitting them downstream.  Your DB only needs to store a pathname.
> 
> Just be wary of directory size, and do not put them all in the one 
> directory.

I noticed that problem in my current situation. I don't know the file
number and size limit in Linux or Windows, but I'm sure there is a
limit.

My main problem is to find a good algorithm to name the cached files
and split them into directories. My current idea is:

1) Put the URL into DB
2) Use a hash function to create a unique name for the cache file
3) Insert the hash name into the same row as the URL

The problem with many files in a directory:

4) Use e.g. 'modulo 11' on the URL hash value to get one of ten
directory names where to find a file.

But this has the drawback to have a static number of cache directories.
The algorithm isn't scalable with growing files.

Do you think is a good way? Or do you've another idea?

regards
Andreas

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




RE: [sqlite] BLOB data performance?

2007-11-13 Thread Griggs, Donald
Regarding:  
The problem with many files in a directory:

Another possible algorithm:
Simply name the blob using the ROWID, e.g. Image0783.png

In ancient days, a DOS directory of several hundred files might bog
things down, but you can put thousands into a modern O.S.'s directory if
necessary.  So even though you can't predict the exact size of the
customer's data, you can probably get the maximum order of magnitude --
and that's all you need.

For example,  you could create a two level directory structure of 100
directories as:
/0/0/
/0/1/
/0/2/
etc., up to /9/9

Then, a blob with ROWID of "783" could be stored in directory /8/3/.
(Alternately, if you want to allow for future three-level, four-level,
etc, you could 
 store it in /3/8/ with the first directory chosen for the LEAST
significant digit, etc.
and get better dispersion.)



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] BLOB data performance?

2007-11-13 Thread Andreas Volz
Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton:

> In a cache situation I would expect that keeping the binary data in 
> files would be preferable because you can use far more efficient 
> mechanisms for loading them into your cache and in particular in 
> transmitting them downstream.  Your DB only needs to store a pathname.
> 
> Just be wary of directory size, and do not put them all in the one 
> directory.

I noticed that problem in my current situation. I don't know the file
number and size limit in Linux or Windows, but I'm sure there is a
limit.

My main problem is to find a good algorithm to name the cached files
and split them into directories. My current idea is:

1) Put the URL into DB
2) Use a hash function to create a unique name for the cache file
3) Insert the hash name into the same row as the URL

The problem with many files in a directory:

4) Use e.g. 'modulo 11' on the URL hash value to get one of ten
directory names where to find a file.

But this has the drawback to have a static number of cache directories.
The algorithm isn't scalable with growing files.

Do you think is a good way? Or do you've another idea?

regards
Andreas

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Foreign keys ?

2007-11-13 Thread Lothar Behrens

Hi,

I am new to the list and to Sqlite. Reading over the archive didn't 
helped me much. It was only a 'flight' over the result searching for 
'foreign'.


What I have captured is the ability to define fireign keys in the 
tables and with the help of Cody Pisto creating the constraints as 
triggers.


My question:

Is there an API function to get the foreign keys per table ?

Or must I use always the code from Cody to get the information ?

If there is no solution, is it possible to automate this by modifying 
the CREATE TABLE code to trigger the code of Cody and hold the 
information
in some 'system tables' (DROP TABLE could remove these informations 
again) ?


Then the API could be extended by looking in these tables for foreign 
keys, what would be reasonably fast.


Thanks, Lothar

--
Lothar Behrens  |   Rapid Prototyping ...
Heinrich-Scheufelen-Platz 2 |   
73252 Lenningen |   www.lollisoft.de



RE: [sqlite] Data encryption

2007-11-13 Thread Wilson, Ron
After some googling, Gunter has this domain:  http://greschenz.de but it merely 
redirects to the dyndns url.  It was active as late as 10/13/07 though.  Maybe 
he shuts down his server at night?

Ron Wilson, Senior Engineer, MPR Associates, 518.831.7546

-Original Message-
From: A.J.Millan [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 13, 2007 2:58 AM
To: sqlite-users@sqlite.org
Subject: RE: [sqlite] Data encryption

Andreas:

Some time ago, in response to a similar question, Günter Greschenz sent to 
this forum a email:

>hi,

>i've written some sqlite-functions to crypt (blowfish) or compress (bzip) 
>data in sqlite:

>e.g.
>   insert into blubs values (crypt('data','pwd'))
>or
>   select from xyz where decompress(data) = 'blablabla'
>or
>   select from xyz where data = compress('blablabla')

>but you have to wait until next weekend, because i'v traveling for my 
>company at the moment and >return on friday (i hope :-)

Some day later:

>after a long time being on a business trip, i finally came home and have 
>now the chance to upload >the sources to my webserver:

>http://greschenz.dyndns.org/sqlite.html

>these sources have never been in a productive system, i just implemented 
>it for fun...

>what i want to say: i never tested it really good !

To me it was a wonderful source of information and ideas, but this morning 
the URL was unable.  May be if you recite certain magic spell, Günter can 
appear again...

Cheers

A.J.Millan 


-
To unsubscribe, send email to [EMAIL PROTECTED]
-


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
I'm using RSQLite functions like:

dbBeginTransaction(db2)
dbGetPreparedQuery(db2, sql, bind.data=df)
dbCommit(db2)

where db2 is the connection to the SQLite db, sql is either the INSERT
or UPDATE statements I mentioned and df is the R object (data.frame)
containing the data to be inserted/updated.

INSERT INTO alleleA sample1 VALUES (:x);
UPDATE alleleA SET sample2 = :x;

where "x" is the column in the data.frame "df".

Is that type of thing you want to know? Sorry for any misunderstanding.

b

On Nov 13, 2007 9:26 AM, John Stanton <[EMAIL PROTECTED]> wrote:
> How are you using transactions?
>
>
> Benilton Carvalho wrote:
> > Hi Everyone,
> >
> > I use R to create tools for analysis of microarrays
> > (http://www.bioconductor.org).
> >
> > I'm in a situation where I need to handle 6 matrices with 1M rows and
> > about 2K columns, ie each matrix takes roughly 15GB RAM.
> >
> > The procedure I'm working on can be divided in two parts:
> >
> > 1) I read an input file, from which I compute one column for each of
> > the matrices I mentioned above;
> >
> > 2) After the matrices are ready,all the computations I do can be
> > performed in batches of rows (say 10K rows at a time), so there's no
> > need to have all the matrices in memory at the same time.
> >
> > My (very naive) idea was to use SQLite to store these matrices (via
> > RSQLite package). So I have the following:
> >
> > CREATE TABLE alleleA (sample1 REAL, sample2 REAL 
> > sample2000 REAL);
> >
> > When I have the data for sample1, I use an INSERT statement, which
> > takes about 4secs.
> >
> > For all the other columns, I use and UPDATE statement, which is taking
> > hours (more the 8 now).
> >
> > What are the obvious things I'm missing here? Or do you have any other
> > suggestions in order to improve the performance?
> >
> > Thank you very much,
> >
> > b
> >
> > -
> > To unsubscribe, send email to [EMAIL PROTECTED]
> > -
> >
>
>
> -
> To unsubscribe, send email to [EMAIL PROTECTED]
> -
>
>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread John Stanton

How are you using transactions?

Benilton Carvalho wrote:

Hi Everyone,

I use R to create tools for analysis of microarrays
(http://www.bioconductor.org).

I'm in a situation where I need to handle 6 matrices with 1M rows and
about 2K columns, ie each matrix takes roughly 15GB RAM.

The procedure I'm working on can be divided in two parts:

1) I read an input file, from which I compute one column for each of
the matrices I mentioned above;

2) After the matrices are ready,all the computations I do can be
performed in batches of rows (say 10K rows at a time), so there's no
need to have all the matrices in memory at the same time.

My (very naive) idea was to use SQLite to store these matrices (via
RSQLite package). So I have the following:

CREATE TABLE alleleA (sample1 REAL, sample2 REAL 
sample2000 REAL);

When I have the data for sample1, I use an INSERT statement, which
takes about 4secs.

For all the other columns, I use and UPDATE statement, which is taking
hours (more the 8 now).

What are the obvious things I'm missing here? Or do you have any other
suggestions in order to improve the performance?

Thank you very much,

b

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] suggestion on improving performance on UPDATE

2007-11-13 Thread Benilton Carvalho
Hi Everyone,

I use R to create tools for analysis of microarrays
(http://www.bioconductor.org).

I'm in a situation where I need to handle 6 matrices with 1M rows and
about 2K columns, ie each matrix takes roughly 15GB RAM.

The procedure I'm working on can be divided in two parts:

1) I read an input file, from which I compute one column for each of
the matrices I mentioned above;

2) After the matrices are ready,all the computations I do can be
performed in batches of rows (say 10K rows at a time), so there's no
need to have all the matrices in memory at the same time.

My (very naive) idea was to use SQLite to store these matrices (via
RSQLite package). So I have the following:

CREATE TABLE alleleA (sample1 REAL, sample2 REAL 
sample2000 REAL);

When I have the data for sample1, I use an INSERT statement, which
takes about 4secs.

For all the other columns, I use and UPDATE statement, which is taking
hours (more the 8 now).

What are the obvious things I'm missing here? Or do you have any other
suggestions in order to improve the performance?

Thank you very much,

b

-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] BLOB data performance?

2007-11-13 Thread John Stanton
In a cache situation I would expect that keeping the binary data in 
files would be preferable because you can use far more efficient 
mechanisms for loading them into your cache and in particular in 
transmitting them downstream.  Your DB only needs to store a pathname.


Just be wary of directory size, and do not put them all in the one 
directory.


Andreas Volz wrote:

Hello,

I'll tell you my current situation. I implemented a web cache function
for images and other data in my application. In the past I saved the
data on the hard disk with a special name and had a text file with the
relation "cache file name <-> url". But I didn't like it. Now I like to
evaluate sqlite as solution.

So my question is about the binary data. Is it better to insert the
images and other media data (e.g. videos with < 10 MB of size) into the
DB or only a "pointer" to a file laying around on my hard disk? I would
estimate a maximum DB size of several hundred MB.

How good/bad is reading/writing this data into a BLOB compared to write
it as file beside the DB and write only a small name into the DB? Where
is the difference between both ways regarding memory and CPU usage?

BTW: My current use case writes data slow, but reads data fast. Reading
BLOB's must be as fast as reading on the hard disk.

regards
Andreas

-
To unsubscribe, send email to [EMAIL PROTECTED]
-




-
To unsubscribe, send email to [EMAIL PROTECTED]
-



[sqlite] Version 3.52 text columns problem

2007-11-13 Thread Jacek Oleksy

Hi All,
I have following problem:

I'm using VC++6.0.
I want to perform a select query on a single database table.
What I do is:
1. prepare the statement using sqlite3_prepare_v2
2. step through the results using sqlite3_step
No other process or thread is accessing the database at the time.

Everything works fine for version 3.42.
Now, I want to move to v3.52.
I perform the same steps, but the sqlite3_step fails with SQLITE_ERROR.

What is interesting: the problem occurs only for few records in the 
table. The row is always the same, after deleting the row the problem 
arises after e.g. 100 rows.
This happens when my select includes some text columns, but it has 
nothing to do with the data inside (it fails even for a text column 
which is null for all rows in select).


When I try to select only one of the "problematic" rows, sqlite3_step 
fails immediately. If I select multiple rows, sqlite3_step fails when it 
reaches one of these "bad" rows (e.g. ~1000 rows are fetched and then 
SQLITE_ERROR is returned).


What is even more interesting: this happens only in release build, in 
debug everything is ok. I tried to turn optimization off, but that did 
not help.


As I said, everything was fine for 3.42.
Does anyone have a clue what's wrong here?

Thanks in advance,
Jacek



-
To unsubscribe, send email to [EMAIL PROTECTED]
-