[sqlite] Using SQLite in threadly Web server applications (was: Using SQLite to record Web log file data (a threading question))

2006-10-29 Thread David Gewirtz
 
Those suggestions are great. For something like log analysis, there's no
problem either reading in log file tails ever so often or queuing up a
single thread to give more real-time access. I think I'm leaning towards a
more real-time view, but I might take the tail option if it codes easier.

On the topic of threads, though, this does open a pile of other "running
behind a Web server" sort of operations:

* To keep a db of user authentication data in an SQLite db, users will need
some level of real-time response to logins

* To keep a db of, say, a discussion forum, user posts will need to be
written and made available in realish-time

* To update Web pages dynamically with information populated from a user's
preference or users' usage patterns, the page will need to be generated in
real time

And each of these interactions with the server will occur with different
user sessions in different threads. Can SQLite be used to build applications
that do the operations above? Do I need to do some single-thread queue
structure for each approach?

I'd love some guiding thoughts on these things to help understand how to
really approach the problem.

Thanks!

-- David



-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] serious performance problems with indexes

2006-10-29 Thread Joe Wilson
The last test that simulated pre-sorting 2 million rows 
in index order prior to insert may show optimal insert speed
since it's only appending pages to the table and the index, 
but it may not be realistic, since you may not have sufficient 
memory to pre-sort all data in memory prior to insert.
(Although if you have the memory to spare, this is the way to go).

Here's another test on the same machine that uses 
random inputs but inserts batches of 2 pre-sorted 
rows at time. You can see that insert time increases with 
the size of the table, but it is not exponential as was the
case with the original test program.

You can also see that having a large sqlite cache does
not significantly effect the insert timings under 2M rows 
as compared to the default sqlite cache.

Tcl code change follows the data.

tablesize   sqlite  sqlite-largecache
0   1.909   1.909
2   2.019   2.048
4   2.064   2.079
6   2.111   2.079
8   2.142   2.158
10  2.157   2.157
12  2.282   2.235
14  2.298   2.267
16  2.392   2.298
18  2.439   2.314
20  2.361   2.345
22  2.501   2.439
24  2.689   3.439
26  2.704   3.423
28  2.673   3.564
30  2.767   3.533
32  2.829   3.564
34  2.876   2.657
36  2.955   2.626
38  3.017   3.673
40  3.001   3.704
42  3.048   2.767
44  2.970   3.783
46  3.048   2.845
48  3.095   3.861
50  3.157   3.861
52  3.173   3.923
54  3.142   3.970
56  3.236   3.986
58  3.220   4.002
60  3.408   4.970
62  3.267   4.142
64  3.423   5.017
66  3.454   4.189
68  3.611   4.220
70  3.611   5.142
72  3.704   5.251
74  3.829   4.330
76  3.657   4.330
78  3.970   4.455
80  4.001   4.470
82  4.046   4.454
84  4.173   4.470
86  4.142   4.611
88  4.126   4.548
90  4.298   4.501
92  4.517   4.642
94  4.641   4.689
96  4.517   5.579
98  4.673   4.736
100 4.736   4.751
102 4.689   4.845
104 4.923   4.798
106 5.111   4.939
108 5.204   4.314
110 5.048   4.876
112 5.064   5.814
114 5.205   5.876
116 5.314   5.876
118 5.501   5.142
120 6.517   5.032
122 5.736   5.048
124 5.861   6.986
126 5.892   5.986
128 5.767   5.298
130 6.095   6.095
132 6.080   5.329
134 5.954   5.283
136 6.548   5.986
138 6.157   5.423
140 6.642   6.330
142 6.439   5.643
144 6.705   5.579
146 6.626   5.611
148 6.814   6.517
150 6.720   6.533
152 7.111   6.579
154 7.142   5.845
156 7.533   6.173
158 7.345   6.064
160 7.689   6.142
162 7.798   6.283
164 7.876   6.486
166 7.704   6.408
168 8.283   6.470
170 8.455   6.517
172 8.126   8.049
174 8.314   7.720
176 8.564   7.017
178 8.845   7.095
180 8.814   7.079
182 9.126   7.002
184 8.798   7.174
186 9.189   7.236
188 9.111   7.184
190 9.986   11.830
192 9.658   7.673
194 9.876   9.126
196 9.892   7.908
198 9.626   7.580

set o [open results.csv w]
puts $o "tablesize\tsqlite\tsqlite-largecache\t mysql"
set step 2
for {set num 0} {$num < 200} {incr num $step} {
  puts $num
  set fd [open test.sql w]
  set rows [list]
  for {set i 1} {$i<=$step} {incr i} {
lappend rows [list [expr {$num+$i}] [expr {int(rand()*50)}]]
  }
  puts $fd "BEGIN;"
  foreach i [lsort -integer -index 1 $rows] {
puts $fd "INSERT INTO t1 VALUES([lindex $i 0], [lindex $i 1]);"
  }
  puts $fd "COMMIT;"
  close $fd
  puts $o $num\t[join [runtest test.sql] \t]
  flush $o
}
close $o


- Original Message 
From: Joe Wilson <[EMAIL PROTECTED]>
To: sqlite-users@sqlite.org
Sent: Saturday, October 28, 2006 1:08:20 PM
Subject: Re: [sqlite] serious performance problems with indexes

If you pre-sort the data prior to inserting into sqlite you will
see much better timings for both the default cache size and
the larger cache size.

When I run your script (unmodified, except for the removal of 
MySQL) up to 40:

tablesize   sqlite  sqlite-largecache
0   1.956   1.956
2   2.064   2.079
4   2.095   2.111
6   2.173   2.173
8   2.189   2.283
10  2.345   2.298
12  2.345   2.360
14  2.407   2.361
16  2.470   2.423
18  2.548   2.501
20  2.564   2.439
22  2.830   2.439
24  2.876   2.907
26  2.829   2.564
28  3.423   3.533
30  4.939   3.564
32  7.236   3.736
34  7.283   3.751
36  10.611  3.767
38  11.142  3.845
40  13.736  3.798

When I make the following change to your script to simulate an ordered data set
(okay, it's cheating slightly - no pre-sort need be performed here):

   # set r [expr {int(rand()*50)}]
   set r [expr {($num+$i)*5}]

tablesize   sqlite  sqlite-largecache
0   1.878   1.894
2   1.925   1.923
4   1.923   1.923
6   1.923   1.954
8   1.970   1.923
10  1.923   1.908
12  1.923   

Re: [sqlite] Using SQLite to record Web log file data (a threading question)

2006-10-29 Thread John Stanton

[EMAIL PROTECTED] wrote:

"David Gewirtz" <[EMAIL PROTECTED]> wrote:


I've been exploring SQLite for a number of applications, but one I'd like to
embark on soon is using SQLite to record Web site log file data, so I can
perform SQL-based analysis on my logs, rather than using some thing like
Analog.

Unfortunately, each Web access on the server is likely to be in its own
thread. The scenario I'd probably have to build might go something like
this:

* Web server launches
* SQLite issues sqlite3_open to log db, gets log_ID
* Web server handles user 1 in thread 1, which writes to log db
* Web server handles user 2 in thread 2, which writes to log db
* Web server handles user n in thread n, which writes to log db
* Web server handles admin request for log analysis, which reads from log db
* Web server begins shutdown, closes log db
* Web server shuts down


From my reading, it's just not clear to me whether this is bad behavior for

SQLite 3.3.7. Can SQLite handle this sort of sequence reliably. If not, any
suggestions about how I might proceed or how I should think about it?




The way I handle this at www.sqlite.org is that web log data just
gets appended to an ordinary text file.  Then when I want to do
analysis, I make a copy of the text file and import it into an
SQLite database.  I think do queries against the SQLite database
to extract the information I want to know.

You could perhaps automate this so that a background task took
the unprocessed tail of your log file and added it an SQLite
database every 10 minutes or so.  Or every 30 seconds.  Just 
rig it so that you only have one process trying to write at

a time and so that you do not have to take transaction overhead
for every single web hit.

SQLite itself is not the best tool for doing multiple concurrent
writes from different threads or processes.

--
D. Richard Hipp  <[EMAIL PROTECTED]>


-
To unsubscribe, send email to [EMAIL PROTECTED]
-

I made Sqlite accept log data and analysis from a web server.  From a 
multi-threaded web server I used a seperate thread to handle the DB 
insertion.  It is driven from a queue of log transactions posted by the 
active threads.  That way it is single streamed and has no contention 
problems and the active threads have a very fast mechanism to post their 
log data, just queue a pointer to the buffer used by the thread.  Each 
time a thread activates it gets a freed buffer from a pool.  When the 
log thread finishes a transaction it drops the buffer onto the free list.


The advantage of this approach is that it gives real time log analysis. 
 To that end some frequently accessed summaries are maintained.  Just 
to maintain the log in a simple form in an Sqlite database offers 
little, if any, advantage over keeping a text file and analysing it from 
time to time.


-
To unsubscribe, send email to [EMAIL PROTECTED]
-



Re: [sqlite] Using SQLite to record Web log file data (a threading question)

2006-10-29 Thread Nemanja Corlija

On 10/29/06, David Gewirtz <[EMAIL PROTECTED]> wrote:


I've been exploring SQLite for a number of applications, but one I'd like to
embark on soon is using SQLite to record Web site log file data, so I can
perform SQL-based analysis on my logs, rather than using some thing like
Analog.

Unfortunately, each Web access on the server is likely to be in its own
thread. The scenario I'd probably have to build might go something like
this:

* Web server launches
* SQLite issues sqlite3_open to log db, gets log_ID
* Web server handles user 1 in thread 1, which writes to log db
* Web server handles user 2 in thread 2, which writes to log db
* Web server handles user n in thread n, which writes to log db
* Web server handles admin request for log analysis, which reads from log db
* Web server begins shutdown, closes log db
* Web server shuts down

From my reading, it's just not clear to me whether this is bad behavior for
SQLite 3.3.7. Can SQLite handle this sort of sequence reliably. If not, any
suggestions about how I might proceed or how I should think about it?


If you really need real-time insight into your logs, then see
http://www.sqlite.org/sharedcache.html and
http://www.sqlite.org/cvstrac/fileview?f=sqlite/src/server.c=1.6
for example implementation and some more docs on the subject.

Though if you can live with storing logs to text files and reading
them into SQLite db periodically and/or on demand (if that will happen
rare enough to make more sense than above scenario) it will probably
be much simpler to implement.

--
Nemanja Corlija <[EMAIL PROTECTED]>

-
To unsubscribe, send email to [EMAIL PROTECTED]
-