Hello everyone, I'm doing some web logs analysis. I decided to use a sqlite database because the application has to be easily portable to another workstation or server and there is absolutly no need of multi-clients. Moreover, I don't want to spend my time dealing with the database.
What I want to do is quite simple. I want to parse the logs from a high traffic apache website (loadbalanced on 8 front servers) and integrated them in a database. That's the easy part. In apache configuration I added the mod_usertrack in order to be able to follow a user in the logs. Then I want to create parent / child relations between requests from a unique user (id by the mod_usertrack). The final goal is to see all users. Then by user we can see the requests issued from a click on a webpage (manual human interface --> parent). Finaly for each parent requests (click) I want to have all sub requests issued from the parent page (automatic fetch (images, css, javascript, ...)). The global aim of all this is to analyse the behaviour of visitors on our website and to extract different populations of users and make statistics. Here are the tables I use: CREATE TABLE IF NOT EXISTS requests ( id integer primary key autoincrement, date date not null, request varchar not null, method varchar not null, code integer not null, referer varchar, response_time integer not null, vhost varchar DEFAULT NULL, cookie not null REFERENCES cookies(id)) CREATE TABLE IF NOT EXISTS cookies ( id integer primary key autoincrement, cookie varchar not null UNIQUE) CREATE TABLE IF NOT EXISTS siblings ( parent not null REFERENCES requests(id), child not null REFERENCES requests(id)) I ran a test on 2 hours logs from the 18th of february (from 10:00AM to 11:59AM). The first step is to parse the logs in order to populate the `requests` and `cookies` tables. It took about 2min to parse and insert 238304 cookies and 1163728 requests. After this step the database is 317Mo sized. Then I have to populate the `siblings` table. I use the following request: INSERT INTO siblings (parent, child) SELECT a.id, b.id FROM requests a INNER JOIN requests b ON a.cookie = b.cookie WHERE a.page = b.referer AND b.date BETWEEN a.date AND a.date + 20 It took about 3hours and 30 minutes. At the end there is 7158001 rows in the `siblings` table and the database is about 600Mo sized. My questions: 1- Is there a better way to populate the `siblings` table ? (optimize my sql request) 2- What can I do to optimize the all process ? (play with some PRAGMA parameters maybe) 3- Is sqlite a good choice for this case ? Would mysql or postgresql a better choice ? Thanks in advance for your time. ++ Jerome _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users