Re: [sqlite] suggestion on improving performance on UPDATE
That's what I'm testing right now... thanks a lot for the heads up... b On Nov 14, 2007 1:41 AM, Trevor Talbot <[EMAIL PROTECTED]> wrote: > On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > > > Then each column is added on the fly, using: > > > > ALTER TABLE table ADD COLUMN colunm REAL; > > > > The columns in then filled used UPDATE as described previously. > > I haven't investigated the storage layout of this, but from what the > docs imply, I suspect making use of ADD COLUMN will result in SELECTs > being as slow as UPDATEs, possibly worse. When I said I expect the > final queries to be faster, I was envisioning all of the columns > created initially, just filled with zeros. I suspect filling them > with NULL initially would make update performance worse also. > > I'll let others confirm/deny this, and comment on the performance of > ADD COLUMN in general. > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > Then each column is added on the fly, using: > > ALTER TABLE table ADD COLUMN colunm REAL; > > The columns in then filled used UPDATE as described previously. I haven't investigated the storage layout of this, but from what the docs imply, I suspect making use of ADD COLUMN will result in SELECTs being as slow as UPDATEs, possibly worse. When I said I expect the final queries to be faster, I was envisioning all of the columns created initially, just filled with zeros. I suspect filling them with NULL initially would make update performance worse also. I'll let others confirm/deny this, and comment on the performance of ADD COLUMN in general. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
On 11/13/07, Joe Wilson <[EMAIL PROTECTED]> wrote: > If you use the idiom whereby each thread solely takes its tasks > from a thread-safe work queue, you can have a clear separation of > responsibilities and minimal or preferably no shared-state between > threads. You get concurrency as a side effect of this simple hybrid > event-passing/thread scheme. That model is one Windows NT has explicit support for. Its "I/O Completion Port" is essentially a message queue that worker threads can wait on for tasks. In cooperation with the scheduler, it tries to keep exactly as many threads as there are CPU cores active at any given time, such as by waking a new thread when a busy one blocks for I/O. In the context of sqlite, though, I don't see much point to sharing a single connection across threads. I'd prefer to just dedicate a message-based thread to the job. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
--- "D. Richard Hipp" <[EMAIL PROTECTED]> wrote: > > On Nov 13, 2007, at 10:55 PM, Joe Wilson wrote: > > > http://home.pacbell.net/ouster/threads.pdf > > JO and I reach a similar conclusion but by different > reasoning, I think. I like this line: Should You Abandon Threads? * No: important for high-end servers (e.g. databases). If you use the idiom whereby each thread solely takes its tasks from a thread-safe work queue, you can have a clear separation of responsibilities and minimal or preferably no shared-state between threads. You get concurrency as a side effect of this simple hybrid event-passing/thread scheme. Thread madness lies in complex multi-layer mutexes and shared data. Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
On Nov 13, 2007, at 10:55 PM, Joe Wilson wrote: http://home.pacbell.net/ouster/threads.pdf JO and I reach a similar conclusion but by different reasoning, I think. --- Richard Klein <[EMAIL PROTECTED]> wrote: [EMAIL PROTECTED] wrote: What? And encourage people to write multitheaded programs? Not likely... I've been meaning to ask ... When you say that multiple threads are evil, do you mean "as opposed to multiple processes"? Or do you feel that multiprogramming in general is evil? - Richard Klein __ __ Get easy, one-click access to your favorites. Make Yahoo! your homepage. http://www.yahoo.com/r/hs -- --- To unsubscribe, send email to [EMAIL PROTECTED] -- --- D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
On Nov 13, 2007, at 10:37 PM, Richard Klein wrote: [EMAIL PROTECTED] wrote: What? And encourage people to write multitheaded programs? Not likely... I've been meaning to ask ... When you say that multiple threads are evil, do you mean "as opposed to multiple processes"? Or do you feel that multiprogramming in general is evil? Threads are (usually) fine as long as each thread has its own address space that the other threads cannot mess with. In other words, I have no issues with separate processes provided that separate processes really are needed. For example, it is often a good idea to run your GUI in a separate process from your compute engine so that long computations don't free the display. D. Richard Hipp [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
http://home.pacbell.net/ouster/threads.pdf --- Richard Klein <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > What? And encourage people to write multitheaded programs? > > Not likely... > > I've been meaning to ask ... When you say that multiple threads > are evil, do you mean "as opposed to multiple processes"? Or > do you feel that multiprogramming in general is evil? > > - Richard Klein Get easy, one-click access to your favorites. Make Yahoo! your homepage. http://www.yahoo.com/r/hs - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
[EMAIL PROTECTED] wrote: What? And encourage people to write multitheaded programs? Not likely... I've been meaning to ask ... When you say that multiple threads are evil, do you mean "as opposed to multiple processes"? Or do you feel that multiprogramming in general is evil? - Richard Klein - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
--- [EMAIL PROTECTED] wrote: > Joe Wilson <[EMAIL PROTECTED]> wrote: > > You might mention the library is multi-thread safe in the Features > > section of http://www.sqlite.org/about.html > > What? And encourage people to write multitheaded programs? > Not likely... Good luck with that quest. People are going to want to know it whether you want to encourage it or not. How about mentioning it on the http://www.sqlite.org/compile.html page, at least? > > Do you have a page that describes all the SQLITE_OMIT_* ifdefs > > and compile options? > > > > http://www.sqlite.org/compile.html > Likely it needs to be updated. Is there a link to it from the Features section of the About page? Maybe with a comment like "Highly customizable" or something. Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now. http://mobile.yahoo.com/sports;_ylt=At9_qDKvtAbMuh1G1SQtBI7ntAcJ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
It appears my storage bandwidth is awesome if I had only 10 columns (as my toy example had). My script creates the table and adds values to the primary key field. Then each column is added on the fly, using: ALTER TABLE table ADD COLUMN colunm REAL; The columns in then filled used UPDATE as described previously. Maybe you guys don't care and I apologize in case the contents of this message are not of interest, but the following link demonstrates the time needed to add one extra column of 1M reals on our system here: http://www.biostat.jhsph.edu/~bcarvalh/sqlite.png I totally understand that the amount of time should increase, but I found it striking to see the jump when I add the 111th column. I'll try to process and add a few columns at a time as suggested. Thank you very much, Kindest regards, --benilton On Nov 13, 2007 9:29 PM, Trevor Talbot <[EMAIL PROTECTED]> wrote: > On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > > > I have "fixed" my problem, working late night and not sleeping usually > > causes that: I missed a WHERE on the UPDATE statement, so I could > > modify multiple rows at a time. Basically what I did was create a > > primary key (integer), which is the "row number of my matrix" and run > > the UPDATE like: > > > > UPDATE table SET column = WHERE id = ; > > > > shame on me... and I apologize for missing such basic thing... > > No worries, we all miss that one at one time or another. > > > But, anyways, the first INSERT takes 4secs for all 1M rows.. the > > updates take 9secs, each. > > I'm impressed with your storage bandwidth. If you don't need the > database to be persistent (you restart the whole process in case of > failure), turning the synchronous pragma off may help your I/O times a > bit. SQLite won't try so hard to keep the data safe from > interruptions that way. > > > I'm in the process of creating the whole db now and will give it a > > try... maybe creation is the worse part, as I'm going to be accessing > > contiguous sets of rows (all columns) at a time... something like: > > > > SELECT * FROM table LIMIT 1 OFFSET 2; > > If possible, I'd recommend a simple "SELECT * FROM table;" and only > fetch 1 at a time. I don't know what the RSQLite interface is > like though. (The reasoning is that OFFSET must look at all the rows > prior, so each query will take longer than the previous.) But yes, I > would expect this to work much faster than the updates, as it should > be very nearly contiguous on disk. > > My only other suggestion at the moment would be to buffer and batch > updates if possible. Process a few columns at a time, and update them > all at once like: > > UPDATE table SET column1 = , column2 = ... WHERE id = ; > > Each update should take about the same amount of time (9s), but you'll > get more done each time. > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
Joe Wilson <[EMAIL PROTECTED]> wrote: > You might mention the library is multi-thread safe in the Features > section of http://www.sqlite.org/about.html What? And encourage people to write multitheaded programs? Not likely... > > Do you have a page that describes all the SQLITE_OMIT_* ifdefs > and compile options? > http://www.sqlite.org/compile.html Likely it needs to be updated. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
You might mention the library is multi-thread safe in the Features section of http://www.sqlite.org/about.html Do you have a page that describes all the SQLITE_OMIT_* ifdefs and compile options? Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
--- [EMAIL PROTECTED] wrote: > * Somebody please suggest a better tag line - > something better than "The World's Most Widely > Used SQL Database". New site looks good. The tag line is perfect. Some ideas: - Reduce the size of the logo and fonts everywhere - Allow clicking on the logo to go to the home page - This line is not necessary: "There are no known issues affecting database integrity or correctness." It implies past versions had problems - which is true of all software every made. - remove dots - How much are you attached to the color cyan? Might you consider an alternate color scheme? Never miss a thing. Make Yahoo your home page. http://www.yahoo.com/r/hs - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > I have "fixed" my problem, working late night and not sleeping usually > causes that: I missed a WHERE on the UPDATE statement, so I could > modify multiple rows at a time. Basically what I did was create a > primary key (integer), which is the "row number of my matrix" and run > the UPDATE like: > > UPDATE table SET column = WHERE id = ; > > shame on me... and I apologize for missing such basic thing... No worries, we all miss that one at one time or another. > But, anyways, the first INSERT takes 4secs for all 1M rows.. the > updates take 9secs, each. I'm impressed with your storage bandwidth. If you don't need the database to be persistent (you restart the whole process in case of failure), turning the synchronous pragma off may help your I/O times a bit. SQLite won't try so hard to keep the data safe from interruptions that way. > I'm in the process of creating the whole db now and will give it a > try... maybe creation is the worse part, as I'm going to be accessing > contiguous sets of rows (all columns) at a time... something like: > > SELECT * FROM table LIMIT 1 OFFSET 2; If possible, I'd recommend a simple "SELECT * FROM table;" and only fetch 1 at a time. I don't know what the RSQLite interface is like though. (The reasoning is that OFFSET must look at all the rows prior, so each query will take longer than the previous.) But yes, I would expect this to work much faster than the updates, as it should be very nearly contiguous on disk. My only other suggestion at the moment would be to buffer and batch updates if possible. Process a few columns at a time, and update them all at once like: UPDATE table SET column1 = , column2 = ... WHERE id = ; Each update should take about the same amount of time (9s), but you'll get more done each time. - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Request for help with the SQLite Website
I think the "about" text misses some of what, to me, are the most important parts of SQLite - in-process - zero maintenance Also as a .NET developer I would be put off by the "C-Library" reference. SQLite works very well in many languages regardless of the fact that it's written in C. I would propose something along these lines.. SQLite is a high-efficiency, in-process, transactional database engine that supports the majority of SQL92 syntax, stores data in a single a disk file, and requires zero maintenance. HTH, Sam --- We're Hiring! Seeking a passionate developer to join our team building Flex based products. Position is in the Washington D.C. metro area. If interested contact [EMAIL PROTECTED] -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 13, 2007 8:40 PM To: sqlite-users@sqlite.org Subject: [sqlite] Request for help with the SQLite Website The new look for the SQLite website is now in place, if you haven't already noticed: http://www.sqlite.org/ - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 [EMAIL PROTECTED] wrote: > I am still looking for suggestions, comments, > and bug reports. There is no search on the front page or even anywhere else on the site that I can find. You don't even have to go to the hassle of adding your own - Google has a widget that is trivial to add. As an example that is how the Apache site deals with their searching. http://www.google.com/coop/cse/ Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHOleHmOOfHg372QQRAlH3AKDZa1p44CQ3fe7mHOPxv046fy4O3wCgolbd GnxFqiLR4BrDWnKF3Gnjvng= =nxYj -END PGP SIGNATURE- - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Request for help with the SQLite Website
This is a big improvement to the website. Keep up the great work on SQLite. Looking forward to 4.0 On Nov 13, 2007, at 7:40 PM, [EMAIL PROTECTED] wrote: The new look for the SQLite website is now in place, if you haven't already noticed: http://www.sqlite.org/ Even though the new look is "in place" you should understand this as a work in progress, not a done deal. I am still looking for suggestions, comments, and bug reports. I am particularly interested in help in the following ways: * Suggestions for something better to put on the home page. * Suggestions for better CSS for the CVSTrac pages. Example: http://www.sqlite.org/cvstrac/timeline If you can save off a copy of that page, adjust the CSS to make it look better, then send me your adjustments (or post them here) that would be a *big* help. * Somebody please suggest a better tag line - something better than "The World's Most Widely Used SQL Database". * Suggest changes that will provide a search bar in the upper right-hand corner. Over the next couple of weeks we plan on adding some additional pages to the site (mostly moving over information currently in the wiki) and doing additional reorganization and cleanup. We also plan to offer the documentation pages as a ZIP or tarball download so that users can view them offline. The feedback from this mailing list has so far been very helpful. Please don't stop offering suggestions. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
Unfortunately the matrix is not sparse nor the direction of operations can be changed. The reason is that I have 2K samples, which are given in one file each (2K binary files). Each of the files has roughly 7M numbers, which I process by running some algorithms I have, reducing from 7M to 1M elements. The best scenario I see here is dumping these data somewhere before proceeding to the next sample... This processing is in a sample by sample (column) basis After the process is done, the second step is done on a row by row basis (actually batches of rows, so reads are minimized and there's still enough RAM to perform the computations I need). I have "fixed" my problem, working late night and not sleeping usually causes that: I missed a WHERE on the UPDATE statement, so I could modify multiple rows at a time. Basically what I did was create a primary key (integer), which is the "row number of my matrix" and run the UPDATE like: UPDATE table SET column = WHERE id = ; shame on me... and I apologize for missing such basic thing... But, anyways, the first INSERT takes 4secs for all 1M rows.. the updates take 9secs, each. I'm in the process of creating the whole db now and will give it a try... maybe creation is the worse part, as I'm going to be accessing contiguous sets of rows (all columns) at a time... something like: SELECT * FROM table LIMIT 1 OFFSET 2; Any recommendation/suggestion is welcome... I very much appreciate... Best, Benilton On Nov 13, 2007 6:31 PM, Trevor Talbot <[EMAIL PROTECTED]> wrote: > On 11/13/07, John Stanton <[EMAIL PROTECTED]> wrote: > > > The point is that the more statements you have in each transaction the > > better, up to a limit of perhaps a thousand or so. > > Actually, I get the impression that number of statements is not really > the problem here. > > > On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > > > I'm in a situation where I need to handle 6 matrices with 1M rows and > > about 2K columns, ie each matrix takes roughly 15GB RAM. > > > My (very naive) idea was to use SQLite to store these matrices (via > > RSQLite package). So I have the following: > > > > CREATE TABLE alleleA (sample1 REAL, sample2 REAL > > sample2000 REAL); > > > > When I have the data for sample1, I use an INSERT statement, which > > takes about 4secs. > > For all 1 million rows? > > > For all the other columns, I use and UPDATE statement, which is taking > > hours (more the 8 now). > > Using one UPDATE per row, identifying each individual row with SQLite's rowid? > > > So to verify: you're building a 100 x 2000 matrix by column, and > intend to process it by row afterward. Is there any way you can make > the two operations use the same direction: either build by row or > process by column? > > In physical layout terms, handling a matrix of that size in two > directions is lousy in the efficiency department, as there's no room > for effective caching. If you consider a storage format that stores a > row at a time sequentially, then the first pass simply writes out rows > of 2K zero values, in order, which is fine. In disk and OS terms, a > file consists of blocks or pages, and it does I/O and caching in units > of that size. We'll say 4KB pages for the sake of argument. So that > first pass collects 4KB of data and writes it to the disk in one shot, > which is about as efficient as you can get. > > When you decide to update a column at a time, then it's skipping > around: modify a tiny value on the first page, skip 16000 bytes ahead > (2000x 8byte REAL values) modify another tiny value, etc. The I/O > cost of this is huge, since you're reading and writing an entire page > just to touch one tiny little value. Since you never do anything else > with that page, it gets evicted from cache pretty quickly, to make > room for the following pages as you work through the file. > > That works out to roughly 4GB of raw I/O for each column updated. > Repeat 2000 times. > > If you can't change how you do the operations, you'll have to find > some storage structure that's more efficient. There are still some > things to try, but it may simply mean discarding SQLite (and most > other relational databases) as inappropriate for the job. > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Request for help with the SQLite Website
The new look for the SQLite website is now in place, if you haven't already noticed: http://www.sqlite.org/ Even though the new look is "in place" you should understand this as a work in progress, not a done deal. I am still looking for suggestions, comments, and bug reports. I am particularly interested in help in the following ways: * Suggestions for something better to put on the home page. * Suggestions for better CSS for the CVSTrac pages. Example: http://www.sqlite.org/cvstrac/timeline If you can save off a copy of that page, adjust the CSS to make it look better, then send me your adjustments (or post them here) that would be a *big* help. * Somebody please suggest a better tag line - something better than "The World's Most Widely Used SQL Database". * Suggest changes that will provide a search bar in the upper right-hand corner. Over the next couple of weeks we plan on adding some additional pages to the site (mostly moving over information currently in the wiki) and doing additional reorganization and cleanup. We also plan to offer the documentation pages as a ZIP or tarball download so that users can view them offline. The feedback from this mailing list has so far been very helpful. Please don't stop offering suggestions. -- D. Richard Hipp <[EMAIL PROTECTED]> - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] suggestion on improving performance on UPDATE
Regarding:building a 100 row x 2000 column matrix If by chance these are sparse matrices (i.e., the majority of the values are empty) then conceivably you could store only the available values. The schema might then be something like: CREATE TABLE myTable ( matrixRow INTEGER, matrixCol INTEGER, sample REAL ); I imagine you'd want to create an index on matrixRow after import. If the matrices are NOT sparse, then you'd need 2 thousand million inserts per matrix (2 billion in USA parlance) and more disk space than your original schema -- probably not fun. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
On 11/13/07, John Stanton <[EMAIL PROTECTED]> wrote: > The point is that the more statements you have in each transaction the > better, up to a limit of perhaps a thousand or so. Actually, I get the impression that number of statements is not really the problem here. On 11/13/07, Benilton Carvalho <[EMAIL PROTECTED]> wrote: > I'm in a situation where I need to handle 6 matrices with 1M rows and > about 2K columns, ie each matrix takes roughly 15GB RAM. > My (very naive) idea was to use SQLite to store these matrices (via > RSQLite package). So I have the following: > > CREATE TABLE alleleA (sample1 REAL, sample2 REAL > sample2000 REAL); > > When I have the data for sample1, I use an INSERT statement, which > takes about 4secs. For all 1 million rows? > For all the other columns, I use and UPDATE statement, which is taking > hours (more the 8 now). Using one UPDATE per row, identifying each individual row with SQLite's rowid? So to verify: you're building a 100 x 2000 matrix by column, and intend to process it by row afterward. Is there any way you can make the two operations use the same direction: either build by row or process by column? In physical layout terms, handling a matrix of that size in two directions is lousy in the efficiency department, as there's no room for effective caching. If you consider a storage format that stores a row at a time sequentially, then the first pass simply writes out rows of 2K zero values, in order, which is fine. In disk and OS terms, a file consists of blocks or pages, and it does I/O and caching in units of that size. We'll say 4KB pages for the sake of argument. So that first pass collects 4KB of data and writes it to the disk in one shot, which is about as efficient as you can get. When you decide to update a column at a time, then it's skipping around: modify a tiny value on the first page, skip 16000 bytes ahead (2000x 8byte REAL values) modify another tiny value, etc. The I/O cost of this is huge, since you're reading and writing an entire page just to touch one tiny little value. Since you never do anything else with that page, it gets evicted from cache pretty quickly, to make room for the following pages as you work through the file. That works out to roughly 4GB of raw I/O for each column updated. Repeat 2000 times. If you can't change how you do the operations, you'll have to find some storage structure that's more efficient. There are still some things to try, but it may simply mean discarding SQLite (and most other relational databases) as inappropriate for the job. - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] One line batch file?
Owen Watson wrote: I'd like to have a one line batch file: sqlite3 test 'insert into testable values ('value1','value2')' have you tried something like: echo insert into testable values ('value1', 'value2'); | sqlite3 test note ";" at end of statement and "|" pipe char in command line. This should work in a Windows "cmd" console - not tested, unix/linux would be similar. but the few variants of this I've tried don't work. I've seen and understood the batch file that calls another text file approach but I was wondering if I could avoid this overhead for a one-liner. - To unsubscribe, send email to [EMAIL PROTECTED] - -- Regards John McMahon [EMAIL PROTECTED] - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
You might find the method used by Squid to manage its cache would be worth emulating. Using TransmitFile on Windows or sendfile on Unix to despatch the file to the network is by far the most efficient way to pass on files from a cache. It avoids a number of levels of buffer shadowing. Andreas Volz wrote: Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton: In a cache situation I would expect that keeping the binary data in files would be preferable because you can use far more efficient mechanisms for loading them into your cache and in particular in transmitting them downstream. Your DB only needs to store a pathname. Just be wary of directory size, and do not put them all in the one directory. I noticed that problem in my current situation. I don't know the file number and size limit in Linux or Windows, but I'm sure there is a limit. My main problem is to find a good algorithm to name the cached files and split them into directories. My current idea is: 1) Put the URL into DB 2) Use a hash function to create a unique name for the cache file 3) Insert the hash name into the same row as the URL The problem with many files in a directory: 4) Use e.g. 'modulo 11' on the URL hash value to get one of ten directory names where to find a file. But this has the drawback to have a static number of cache directories. The algorithm isn't scalable with growing files. Do you think is a good way? Or do you've another idea? regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
The point is that the more statements you have in each transaction the better, up to a limit of perhaps a thousand or so. Benilton Carvalho wrote: I'm using RSQLite functions like: dbBeginTransaction(db2) dbGetPreparedQuery(db2, sql, bind.data=df) dbCommit(db2) where db2 is the connection to the SQLite db, sql is either the INSERT or UPDATE statements I mentioned and df is the R object (data.frame) containing the data to be inserted/updated. INSERT INTO alleleA sample1 VALUES (:x); UPDATE alleleA SET sample2 = :x; where "x" is the column in the data.frame "df". Is that type of thing you want to know? Sorry for any misunderstanding. b On Nov 13, 2007 9:26 AM, John Stanton <[EMAIL PROTECTED]> wrote: How are you using transactions? Benilton Carvalho wrote: Hi Everyone, I use R to create tools for analysis of microarrays (http://www.bioconductor.org). I'm in a situation where I need to handle 6 matrices with 1M rows and about 2K columns, ie each matrix takes roughly 15GB RAM. The procedure I'm working on can be divided in two parts: 1) I read an input file, from which I compute one column for each of the matrices I mentioned above; 2) After the matrices are ready,all the computations I do can be performed in batches of rows (say 10K rows at a time), so there's no need to have all the matrices in memory at the same time. My (very naive) idea was to use SQLite to store these matrices (via RSQLite package). So I have the following: CREATE TABLE alleleA (sample1 REAL, sample2 REAL sample2000 REAL); When I have the data for sample1, I use an INSERT statement, which takes about 4secs. For all the other columns, I use and UPDATE statement, which is taking hours (more the 8 now). What are the obvious things I'm missing here? Or do you have any other suggestions in order to improve the performance? Thank you very much, b - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
I think your blob file performance may greatly depend upon the file system that it used and the workload. I found this article: http://oss.sgi.com/projects/xfs/papers/filesystem-perf-tm.pdf Andreas Volz <[EMAIL PROTECTED]> wrote: Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton: > In a cache situation I would expect that keeping the binary data in > files would be preferable because you can use far more efficient > mechanisms for loading them into your cache and in particular in > transmitting them downstream. Your DB only needs to store a pathname. > > Just be wary of directory size, and do not put them all in the one > directory. I noticed that problem in my current situation. I don't know the file number and size limit in Linux or Windows, but I'm sure there is a limit. My main problem is to find a good algorithm to name the cached files and split them into directories. My current idea is: 1) Put the URL into DB 2) Use a hash function to create a unique name for the cache file 3) Insert the hash name into the same row as the URL The problem with many files in a directory: 4) Use e.g. 'modulo 11' on the URL hash value to get one of ten directory names where to find a file. But this has the drawback to have a static number of cache directories. The algorithm isn't scalable with growing files. Do you think is a good way? Or do you've another idea? regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] BLOB data performance?
Regarding: The problem with many files in a directory: Another possible algorithm: Simply name the blob using the ROWID, e.g. Image0783.png In ancient days, a DOS directory of several hundred files might bog things down, but you can put thousands into a modern O.S.'s directory if necessary. So even though you can't predict the exact size of the customer's data, you can probably get the maximum order of magnitude -- and that's all you need. For example, you could create a two level directory structure of 100 directories as: /0/0/ /0/1/ /0/2/ etc., up to /9/9 Then, a blob with ROWID of "783" could be stored in directory /8/3/. (Alternately, if you want to allow for future three-level, four-level, etc, you could store it in /3/8/ with the first directory chosen for the LEAST significant digit, etc. and get better dispersion.) - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
Am Tue, 13 Nov 2007 07:18:19 -0600 schrieb John Stanton: > In a cache situation I would expect that keeping the binary data in > files would be preferable because you can use far more efficient > mechanisms for loading them into your cache and in particular in > transmitting them downstream. Your DB only needs to store a pathname. > > Just be wary of directory size, and do not put them all in the one > directory. I noticed that problem in my current situation. I don't know the file number and size limit in Linux or Windows, but I'm sure there is a limit. My main problem is to find a good algorithm to name the cached files and split them into directories. My current idea is: 1) Put the URL into DB 2) Use a hash function to create a unique name for the cache file 3) Insert the hash name into the same row as the URL The problem with many files in a directory: 4) Use e.g. 'modulo 11' on the URL hash value to get one of ten directory names where to find a file. But this has the drawback to have a static number of cache directories. The algorithm isn't scalable with growing files. Do you think is a good way? Or do you've another idea? regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Foreign keys ?
Hi, I am new to the list and to Sqlite. Reading over the archive didn't helped me much. It was only a 'flight' over the result searching for 'foreign'. What I have captured is the ability to define fireign keys in the tables and with the help of Cody Pisto creating the constraints as triggers. My question: Is there an API function to get the foreign keys per table ? Or must I use always the code from Cody to get the information ? If there is no solution, is it possible to automate this by modifying the CREATE TABLE code to trigger the code of Cody and hold the information in some 'system tables' (DROP TABLE could remove these informations again) ? Then the API could be extended by looking in these tables for foreign keys, what would be reasonably fast. Thanks, Lothar -- Lothar Behrens | Rapid Prototyping ... Heinrich-Scheufelen-Platz 2 | 73252 Lenningen | www.lollisoft.de
RE: [sqlite] Data encryption
After some googling, Gunter has this domain: http://greschenz.de but it merely redirects to the dyndns url. It was active as late as 10/13/07 though. Maybe he shuts down his server at night? Ron Wilson, Senior Engineer, MPR Associates, 518.831.7546 -Original Message- From: A.J.Millan [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 13, 2007 2:58 AM To: sqlite-users@sqlite.org Subject: RE: [sqlite] Data encryption Andreas: Some time ago, in response to a similar question, Günter Greschenz sent to this forum a email: >hi, >i've written some sqlite-functions to crypt (blowfish) or compress (bzip) >data in sqlite: >e.g. > insert into blubs values (crypt('data','pwd')) >or > select from xyz where decompress(data) = 'blablabla' >or > select from xyz where data = compress('blablabla') >but you have to wait until next weekend, because i'v traveling for my >company at the moment and >return on friday (i hope :-) Some day later: >after a long time being on a business trip, i finally came home and have >now the chance to upload >the sources to my webserver: >http://greschenz.dyndns.org/sqlite.html >these sources have never been in a productive system, i just implemented >it for fun... >what i want to say: i never tested it really good ! To me it was a wonderful source of information and ideas, but this morning the URL was unable. May be if you recite certain magic spell, Günter can appear again... Cheers A.J.Millan - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
I'm using RSQLite functions like: dbBeginTransaction(db2) dbGetPreparedQuery(db2, sql, bind.data=df) dbCommit(db2) where db2 is the connection to the SQLite db, sql is either the INSERT or UPDATE statements I mentioned and df is the R object (data.frame) containing the data to be inserted/updated. INSERT INTO alleleA sample1 VALUES (:x); UPDATE alleleA SET sample2 = :x; where "x" is the column in the data.frame "df". Is that type of thing you want to know? Sorry for any misunderstanding. b On Nov 13, 2007 9:26 AM, John Stanton <[EMAIL PROTECTED]> wrote: > How are you using transactions? > > > Benilton Carvalho wrote: > > Hi Everyone, > > > > I use R to create tools for analysis of microarrays > > (http://www.bioconductor.org). > > > > I'm in a situation where I need to handle 6 matrices with 1M rows and > > about 2K columns, ie each matrix takes roughly 15GB RAM. > > > > The procedure I'm working on can be divided in two parts: > > > > 1) I read an input file, from which I compute one column for each of > > the matrices I mentioned above; > > > > 2) After the matrices are ready,all the computations I do can be > > performed in batches of rows (say 10K rows at a time), so there's no > > need to have all the matrices in memory at the same time. > > > > My (very naive) idea was to use SQLite to store these matrices (via > > RSQLite package). So I have the following: > > > > CREATE TABLE alleleA (sample1 REAL, sample2 REAL > > sample2000 REAL); > > > > When I have the data for sample1, I use an INSERT statement, which > > takes about 4secs. > > > > For all the other columns, I use and UPDATE statement, which is taking > > hours (more the 8 now). > > > > What are the obvious things I'm missing here? Or do you have any other > > suggestions in order to improve the performance? > > > > Thank you very much, > > > > b > > > > - > > To unsubscribe, send email to [EMAIL PROTECTED] > > - > > > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > - > > - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] suggestion on improving performance on UPDATE
How are you using transactions? Benilton Carvalho wrote: Hi Everyone, I use R to create tools for analysis of microarrays (http://www.bioconductor.org). I'm in a situation where I need to handle 6 matrices with 1M rows and about 2K columns, ie each matrix takes roughly 15GB RAM. The procedure I'm working on can be divided in two parts: 1) I read an input file, from which I compute one column for each of the matrices I mentioned above; 2) After the matrices are ready,all the computations I do can be performed in batches of rows (say 10K rows at a time), so there's no need to have all the matrices in memory at the same time. My (very naive) idea was to use SQLite to store these matrices (via RSQLite package). So I have the following: CREATE TABLE alleleA (sample1 REAL, sample2 REAL sample2000 REAL); When I have the data for sample1, I use an INSERT statement, which takes about 4secs. For all the other columns, I use and UPDATE statement, which is taking hours (more the 8 now). What are the obvious things I'm missing here? Or do you have any other suggestions in order to improve the performance? Thank you very much, b - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] suggestion on improving performance on UPDATE
Hi Everyone, I use R to create tools for analysis of microarrays (http://www.bioconductor.org). I'm in a situation where I need to handle 6 matrices with 1M rows and about 2K columns, ie each matrix takes roughly 15GB RAM. The procedure I'm working on can be divided in two parts: 1) I read an input file, from which I compute one column for each of the matrices I mentioned above; 2) After the matrices are ready,all the computations I do can be performed in batches of rows (say 10K rows at a time), so there's no need to have all the matrices in memory at the same time. My (very naive) idea was to use SQLite to store these matrices (via RSQLite package). So I have the following: CREATE TABLE alleleA (sample1 REAL, sample2 REAL sample2000 REAL); When I have the data for sample1, I use an INSERT statement, which takes about 4secs. For all the other columns, I use and UPDATE statement, which is taking hours (more the 8 now). What are the obvious things I'm missing here? Or do you have any other suggestions in order to improve the performance? Thank you very much, b - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] BLOB data performance?
In a cache situation I would expect that keeping the binary data in files would be preferable because you can use far more efficient mechanisms for loading them into your cache and in particular in transmitting them downstream. Your DB only needs to store a pathname. Just be wary of directory size, and do not put them all in the one directory. Andreas Volz wrote: Hello, I'll tell you my current situation. I implemented a web cache function for images and other data in my application. In the past I saved the data on the hard disk with a special name and had a text file with the relation "cache file name <-> url". But I didn't like it. Now I like to evaluate sqlite as solution. So my question is about the binary data. Is it better to insert the images and other media data (e.g. videos with < 10 MB of size) into the DB or only a "pointer" to a file laying around on my hard disk? I would estimate a maximum DB size of several hundred MB. How good/bad is reading/writing this data into a BLOB compared to write it as file beside the DB and write only a small name into the DB? Where is the difference between both ways regarding memory and CPU usage? BTW: My current use case writes data slow, but reads data fast. Reading BLOB's must be as fast as reading on the hard disk. regards Andreas - To unsubscribe, send email to [EMAIL PROTECTED] - - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Version 3.52 text columns problem
Hi All, I have following problem: I'm using VC++6.0. I want to perform a select query on a single database table. What I do is: 1. prepare the statement using sqlite3_prepare_v2 2. step through the results using sqlite3_step No other process or thread is accessing the database at the time. Everything works fine for version 3.42. Now, I want to move to v3.52. I perform the same steps, but the sqlite3_step fails with SQLITE_ERROR. What is interesting: the problem occurs only for few records in the table. The row is always the same, after deleting the row the problem arises after e.g. 100 rows. This happens when my select includes some text columns, but it has nothing to do with the data inside (it fails even for a text column which is null for all rows in select). When I try to select only one of the "problematic" rows, sqlite3_step fails immediately. If I select multiple rows, sqlite3_step fails when it reaches one of these "bad" rows (e.g. ~1000 rows are fetched and then SQLITE_ERROR is returned). What is even more interesting: this happens only in release build, in debug everything is ok. I tried to turn optimization off, but that did not help. As I said, everything was fine for 3.42. Does anyone have a clue what's wrong here? Thanks in advance, Jacek - To unsubscribe, send email to [EMAIL PROTECTED] -