[sqlite] Using SQLite to record Web log file data (a threading question)
I've been exploring SQLite for a number of applications, but one I'd like to embark on soon is using SQLite to record Web site log file data, so I can perform SQL-based analysis on my logs, rather than using some thing like Analog. Unfortunately, each Web access on the server is likely to be in its own thread. The scenario I'd probably have to build might go something like this: * Web server launches * SQLite issues sqlite3_open to log db, gets log_ID * Web server handles user 1 in thread 1, which writes to log db * Web server handles user 2 in thread 2, which writes to log db * Web server handles user n in thread n, which writes to log db * Web server handles admin request for log analysis, which reads from log db * Web server begins shutdown, closes log db * Web server shuts down >From my reading, it's just not clear to me whether this is bad behavior for SQLite 3.3.7. Can SQLite handle this sort of sequence reliably. If not, any suggestions about how I might proceed or how I should think about it? Thanks, David - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Re: Regarding sqlite3_exec
You cannot even consider loading even a thousand records directly. Get a set of ID numbers. Load each record as it's needed. This is very basic stuff, and not even that hard to implement. I am just saying for the record that this is not hard to do, hopefully no one else will be scared away from the concept. As a general rule of coding, do it right the first time. One thread reading a list of integers while another is on the GUI is not complex. Isaac On 10/28/06, Da Martian <[EMAIL PROTECTED]> wrote: Hi Thanks for the reposnse. The main reason is my record count could be from a few thousands to a million. But even at the lowly numbers of around 1 the interface can seem slugish if you read every record before displaying anything. As you mention, and has been disucssed above, doing stuff in the background is good way to go, but more complex. As a generla rule of coding I put as few unneccessary threads into a "phase 1" program as I can, because the complexity goes up hugly, threads can be complex to use, co-ordinate, interrupt etc... and chance of bugs goes up drmatically. So I tend to do that for a "Phase 2" - Bells and whistles phase and only when there isnt a simpler way. This thread has covered just about all approaches I can think of :-) thanks for the reponses. S On 10/27/06, Isaac Raway <[EMAIL PROTECTED]> wrote: > > Why don't you design the table with a unique row ID, stored in an > integer field, then fetch a list of those ID numbers? > > For 5000 rows, assuming you store them in you application as 4 byte > longs, that's about 19 k of memory. > > Counting that result as you receive it isn't that difficult. If it > takes a long time (it probably won't) you can do it in another thread > and update the interface as appropriate. > > I'm not seeing a downside here. > > Isaac > > On 10/26/06, Da Martian <[EMAIL PROTECTED]> wrote: > > No there isnt, but RDBM systems are a generalised data retrieval > mechanism. > > As such they suffer from that generality. > > > > Dont get me wrong, RDBM systems are appropriate for 95% of all data > > requirements I have had to deal with and I would never dream of trying > to > > write one from scratch, nor can I imagine a world without them. > > > > However certain applications (Weather data, Gnome data, Large indices > (like > > google)) require using somethng designed specifically for that purpose. > If > > you customise data retrieval (and particluar your sorting/indcies/access > > path) you can leave rdbms in the dust in terms of performance. All I > have > > read about google, suggests they do exactly this. Although I must point > out, > > I dont actually know anything about google with any certainty. Just what > has > > "leaked" out over the years on the rumour mill. But designiing my own > > "google" like indices (on a smaller scale of coure) and some > specialisted > > weather stuff, it neccessary to throw away the rdbms and do it yourself. > For > > a goole query for instance, they know they will get a list of 1 or more > > words. They also know they will only ever search through the index of > words. > > They dont have other data types, records or tables. Why go through all > the > > hassles of compiling SQLs, and that generic overhead when your > application > > will only ever do one thing? You can just make an API like this > > "search(wordlist): Resultset. " > > > > You immediatly save yourself complexity and processing time. Then for > large > > indices you will know your data set, so instead of using a std BTree you > > would use a more appropraite DS possible with skip lists etc.. > > . > > As for performing a database search twice, this whole thread has shown, > that > > sometimes the you have to :-) > > > > S > > > > On 10/25/06, John Stanton <[EMAIL PROTECTED]> wrote: > > > > > > There is no magic in data retrieval. Google use the same physical > laws > > > as us ordinary mortals. > > > > > > I see no reason to ever perform a dataabase search twice. > > > > > > > > > > -- > Isaac Raway > Entia non sunt multiplicanda praeter necessitatem. > > http://blueapples.org - blog > http://stonenotes.com - personal knowledge management > > > - > To unsubscribe, send email to [EMAIL PROTECTED] > > - > > -- Isaac Raway Entia non sunt multiplicanda praeter necessitatem. http://blueapples.org - blog http://stonenotes.com - personal knowledge management - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] serious performance problems with indexes
If you pre-sort the data prior to inserting into sqlite you will see much better timings for both the default cache size and the larger cache size. When I run your script (unmodified, except for the removal of MySQL) up to 40: tablesize sqlite sqlite-largecache 0 1.956 1.956 2 2.064 2.079 4 2.095 2.111 6 2.173 2.173 8 2.189 2.283 10 2.345 2.298 12 2.345 2.360 14 2.407 2.361 16 2.470 2.423 18 2.548 2.501 20 2.564 2.439 22 2.830 2.439 24 2.876 2.907 26 2.829 2.564 28 3.423 3.533 30 4.939 3.564 32 7.236 3.736 34 7.283 3.751 36 10.611 3.767 38 11.142 3.845 40 13.736 3.798 When I make the following change to your script to simulate an ordered data set (okay, it's cheating slightly - no pre-sort need be performed here): # set r [expr {int(rand()*50)}] set r [expr {($num+$i)*5}] tablesize sqlite sqlite-largecache 0 1.878 1.894 2 1.925 1.923 4 1.923 1.923 6 1.923 1.954 8 1.970 1.923 10 1.923 1.908 12 1.923 1.970 14 1.940 2.383 16 2.048 1.908 18 1.923 2.002 20 1.923 1.923 22 1.939 1.923 24 1.923 1.923 26 1.939 1.923 28 1.939 1.939 30 1.954 1.939 32 1.939 1.923 34 1.970 1.954 36 1.939 1.954 38 1.923 1.939 40 1.970 1.939 I wonder if other databases pre-sort their batch inserts by index order to achieve more consistant insert speed with datasets that exceed cache size. - Original Message From: Peter De Rijk <[EMAIL PROTECTED]> To: sqlite-users@sqlite.org Sent: Saturday, October 28, 2006 10:40:03 AM Subject: Re: [sqlite] serious performance problems with indexes I have checked more fine grained (results and code at the end of the mail), and it seems indeed that the extreme growth of time needed for inserts only starts at around 16 rows (on my system). Before that, it also rises, but much much more slowly. As this indeed looked like a cache effect, I tried setting the default_cache_size to something big (2). This was enough to remove the bend (at least till 200 rows), so that the increase in time kept going at the same slower pace as for the smaller data sizes. Curiously, mysql which I included as a reference, manages to keep nearly the same insert time even for the largest data sets. Using the large cache size, sqlite is faster than mysql until about 70 rows. At 200 rows mysql is about 3 times faster than sqlite with large cache (vs more than 20 times faster than sqlite with default cache size). --- tablesizesqlitesqlite-largecache mysql 01.121.042.86 21.091.172.89 41.141.153.27 61.231.23.18 81.471.253.01 101.31.322.94 121.41.342.94 142.061.383 162.411.462.99 183.091.543.03 203.311.623.04 223.841.723.06 244.221.73.03 264.231.763.06 284.831.942.98 306.272.083.07 329.022.063.11 349.481.942.88 36102.133.04 3810.62.213.13 4010.722.353.05 4211.352.243.11 4411.552.493.05 4611.752.53.07 4816.892.393.1 5017.812.593.09 5219.222.43.03 5419.352.773.05 5619.442.593.1 5819.872.673.08 6021.472.883.09 6220.822.743.08 6421.5533.11 6625.173.093.12 6829.892.843.16 7028.483.253.09 7228.623.313.2 7428.693.233.07 7630.343.593.1 7830.273.643.13 8030.573.373.14 8231.413.573.06 8431.893.893.09 8633.773.573.05 8836.013.873.1 9037.584.043.07 9240.593.683.04 9440.93.723.09 9641.454.243.1 9842.054.283.13 10041.114.093.03 10242.294.633.11 10442.094.813.17 10642.84.343.14 10843.824.433.02 11043.014.53.12 11244.084.843.1 11445.45.083.16 11647.514.663.13 11847.465.023.17 12048.215.423.17 12249.095.333.21 12449.655.123.1 12650.555.33.11 12850.565.83.17 13052.355.73.17 13253.675.433.14 13452.385.793.26 13651.446.033.18 13852.696.163.08
Re: [sqlite] serious performance problems with indexes
--On Friday 27 October 2006 15:31, [EMAIL PROTECTED] wrote: > When a table is indexed, INSERT performance is logorithmic in the > number of rows in the table and linear in the number of indices. > This is because entries have to be inserted into the index in > sorted order (otherwise it wouldn't be an index). And each > insert thus requires a binary search. > > If your index becomes very large so that it no longer fits > in your disk cache, then the binary search can involve a lot > of disk I/O which can make things really slow. The usual > work-around here is to keep a much smaller staging table into > which you do your inserts and then periodically merge the > staging table into the main table. This makes your queries > more complex (and slower) since you are now having to > search multiple tables. But it does speed up inserts. > -- > D. Richard Hipp <[EMAIL PROTECTED]> > I have checked more fine grained (results and code at the end of the mail), and it seems indeed that the extreme growth of time needed for inserts only starts at around 16 rows (on my system). Before that, it also rises, but much much more slowly. As this indeed looked like a cache effect, I tried setting the default_cache_size to something big (2). This was enough to remove the bend (at least till 200 rows), so that the increase in time kept going at the same slower pace as for the smaller data sizes. Curiously, mysql which I included as a reference, manages to keep nearly the same insert time even for the largest data sets. Using the large cache size, sqlite is faster than mysql until about 70 rows. At 200 rows mysql is about 3 times faster than sqlite with large cache (vs more than 20 times faster than sqlite with default cache size). --- tablesize sqlite sqlite-largecachemysql 0 1.121.042.86 2 1.091.172.89 4 1.141.153.27 6 1.231.2 3.18 8 1.471.253.01 10 1.3 1.322.94 12 1.4 1.342.94 14 2.061.383 16 2.411.462.99 18 3.091.543.03 20 3.311.623.04 22 3.841.723.06 24 4.221.7 3.03 26 4.231.763.06 28 4.831.942.98 30 6.272.083.07 32 9.022.063.11 34 9.481.942.88 36 10 2.133.04 38 10.62.213.13 40 10.72 2.353.05 42 11.35 2.243.11 44 11.55 2.493.05 46 11.75 2.5 3.07 48 16.89 2.393.1 50 17.81 2.593.09 52 19.22 2.4 3.03 54 19.35 2.773.05 56 19.44 2.593.1 58 19.87 2.673.08 60 21.47 2.883.09 62 20.82 2.743.08 64 21.55 3 3.11 66 25.17 3.093.12 68 29.89 2.843.16 70 28.48 3.253.09 72 28.62 3.313.2 74 28.69 3.233.07 76 30.34 3.593.1 78 30.27 3.643.13 80 30.57 3.373.14 82 31.41 3.573.06 84 31.89 3.893.09 86 33.77 3.573.05 88 36.01 3.873.1 90 37.58 4.043.07 92 40.59 3.683.04 94 40.93.723.09 96 41.45 4.243.1 98 42.05 4.283.13 100 41.11 4.093.03 102 42.29 4.633.11 104 42.09 4.813.17 106 42.84.343.14 108 43.82 4.433.02 110 43.01 4.5 3.12 112 44.08 4.843.1 114 45.45.083.16 116 47.51 4.663.13 118 47.46 5.023.17 120 48.21 5.423.17 122 49.09 5.333.21 124 49.65 5.123.1 126 50.55 5.3 3.11 128 50.56 5.8 3.17 130 52.35 5.7 3.17 132 53.67 5.433.14 134 52.38 5.793.26 136 51.44 6.033.18 138 52.69 6.163.08 140 53.88 5.673.04 142 52.61 5.863.2 144 53.61 5.923.01 146 52.98 5.983.08 148 54.74 6.273.18 150 55.46.813.18 152 58.45 6.543.12 154 57.14 6.653.12 156 58.92 6.973.17 158 58.17 6.863.22 160 58.29 6.833.15 162 58.19 6.6 3.18 164 58.77 7.3 3.16 166 60.59 7.443.22 168 61.12 7.143.18 170 61.25 7.413.33 172 62.27.183.21 174 61.48 7.953.02 176 62.63 7.913.17 178 62.58 7.523.19 180 62.28 7.563.14 182 64.27.623.33 184 64.18 8.6 3.17 186 64.34 8.713.33 188 63.93 8.443.22 190 62.98 8.133.3 192 63.98.453.29 194 64.85 8.363.21 196 68.26 10.75 4.81 198 67.81 9.463.17 -- code -- proc runtest {sqlfile} { set result {} set delay 1000 exec sync; after $delay; set t [time "exec sqlite3 def.db <$sqlfile" 1] set t [expr {[lindex $t 0]/100.0}] lappend result [format %.3f $t] exec sync; after $delay;
Re: [sqlite] Regarding sqlite3_open
On 10/28/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: I am doing it on Windows. Hence permissions shouldn't be a problem. NTFS has permissions too. Sysinternals' FileMon utility may be useful, as it will show exactly what is going on in terms of file I/O. - To unsubscribe, send email to [EMAIL PROTECTED] -
RE: [sqlite] Regarding sqlite3_open
I am doing it on Windows. Hence permissions shouldn't be a problem. Regards, Ravi K -Original Message- From: Lloyd [mailto:[EMAIL PROTECTED] Sent: Saturday, October 28, 2006 4:49 PM To: sqlite-users@sqlite.org Subject: Re: [sqlite] Regarding sqlite3_open Which platform are you using? Do you (application) have enough permission to create files in the specified location ? On Sat, 2006-10-28 at 16:19 +0530, [EMAIL PROTECTED] wrote: > Hi All, > > I am using sqlite3_open() in my program. It is returning > error code 14(file not found) every time. > > Where does it search for the file? I even tried with the fully qualified > path, Also placing the file in the working directory. Even if the file > does not exist, it should create the file and return SQLITE_OK. > > Please throw some light on it. > > > > Thanks, > > Regards, > > Ravi K > > > > > > > The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. > > www.wipro.com __ Scanned and protected by Email scanner - To unsubscribe, send email to [EMAIL PROTECTED] - The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Compatibility sqlite 2.8 and 3.x
Hi all, Is it possible to make requests on a database created with 2.8.(14 or 17) with an (tcl) API developped for sqlite 3.0? I am still trying to find a way to create/have acces from OpenWatcom Fortran. Since I noted that the version 2.8.14 of sqlite can be compiled with Openwatcom (is it also true with 2.8.17 available on the sqlite homepage?), the idea would be to create a sqlite database in 2.8 and to access it in Fortran with a sqlite library in 2.8 and a tcl API in 3.0. Does it make sense? Victor - To unsubscribe, send email to [EMAIL PROTECTED] -
Re: [sqlite] Regarding sqlite3_open
Which platform are you using? Do you (application) have enough permission to create files in the specified location ? On Sat, 2006-10-28 at 16:19 +0530, [EMAIL PROTECTED] wrote: > Hi All, > > I am using sqlite3_open() in my program. It is returning > error code 14(file not found) every time. > > Where does it search for the file? I even tried with the fully qualified > path, Also placing the file in the working directory. Even if the file > does not exist, it should create the file and return SQLITE_OK. > > Please throw some light on it. > > > > Thanks, > > Regards, > > Ravi K > > > > > > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain proprietary, confidential or privileged information. If you are not > the intended recipient, you should not disseminate, distribute or copy this > e-mail. Please notify the sender immediately and destroy all copies of this > message and any attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > www.wipro.com __ Scanned and protected by Email scanner - To unsubscribe, send email to [EMAIL PROTECTED] -
[sqlite] Regarding sqlite3_open
Hi All, I am using sqlite3_open() in my program. It is returning error code 14(file not found) every time. Where does it search for the file? I even tried with the fully qualified path, Also placing the file in the working directory. Even if the file does not exist, it should create the file and return SQLITE_OK. Please throw some light on it. Thanks, Regards, Ravi K The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments. WARNING: Computer viruses can be transmitted via email. The recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email. www.wipro.com
Re: [sqlite] Re: Regarding sqlite3_exec
Hi Thanks for the reposnse. The main reason is my record count could be from a few thousands to a million. But even at the lowly numbers of around 1 the interface can seem slugish if you read every record before displaying anything. As you mention, and has been disucssed above, doing stuff in the background is good way to go, but more complex. As a generla rule of coding I put as few unneccessary threads into a "phase 1" program as I can, because the complexity goes up hugly, threads can be complex to use, co-ordinate, interrupt etc... and chance of bugs goes up drmatically. So I tend to do that for a "Phase 2" - Bells and whistles phase and only when there isnt a simpler way. This thread has covered just about all approaches I can think of :-) thanks for the reponses. S On 10/27/06, Isaac Raway <[EMAIL PROTECTED]> wrote: Why don't you design the table with a unique row ID, stored in an integer field, then fetch a list of those ID numbers? For 5000 rows, assuming you store them in you application as 4 byte longs, that's about 19 k of memory. Counting that result as you receive it isn't that difficult. If it takes a long time (it probably won't) you can do it in another thread and update the interface as appropriate. I'm not seeing a downside here. Isaac On 10/26/06, Da Martian <[EMAIL PROTECTED]> wrote: > No there isnt, but RDBM systems are a generalised data retrieval mechanism. > As such they suffer from that generality. > > Dont get me wrong, RDBM systems are appropriate for 95% of all data > requirements I have had to deal with and I would never dream of trying to > write one from scratch, nor can I imagine a world without them. > > However certain applications (Weather data, Gnome data, Large indices (like > google)) require using somethng designed specifically for that purpose. If > you customise data retrieval (and particluar your sorting/indcies/access > path) you can leave rdbms in the dust in terms of performance. All I have > read about google, suggests they do exactly this. Although I must point out, > I dont actually know anything about google with any certainty. Just what has > "leaked" out over the years on the rumour mill. But designiing my own > "google" like indices (on a smaller scale of coure) and some specialisted > weather stuff, it neccessary to throw away the rdbms and do it yourself. For > a goole query for instance, they know they will get a list of 1 or more > words. They also know they will only ever search through the index of words. > They dont have other data types, records or tables. Why go through all the > hassles of compiling SQLs, and that generic overhead when your application > will only ever do one thing? You can just make an API like this > "search(wordlist): Resultset. " > > You immediatly save yourself complexity and processing time. Then for large > indices you will know your data set, so instead of using a std BTree you > would use a more appropraite DS possible with skip lists etc.. > . > As for performing a database search twice, this whole thread has shown, that > sometimes the you have to :-) > > S > > On 10/25/06, John Stanton <[EMAIL PROTECTED]> wrote: > > > > There is no magic in data retrieval. Google use the same physical laws > > as us ordinary mortals. > > > > I see no reason to ever perform a dataabase search twice. > > > > -- Isaac Raway Entia non sunt multiplicanda praeter necessitatem. http://blueapples.org - blog http://stonenotes.com - personal knowledge management - To unsubscribe, send email to [EMAIL PROTECTED] -