Re: [sqlite] Sqlite optim
> On Monday 23 February 2009 13:09:58 Jérôme Loyet wrote: >> My questions: >> 1- Is there a better way to populate the `siblings` table ? (optimize >> my sql request) > > You may use compose index on (cookie,referer,date) and REAL datatype for > dates. I choosed integer for the date as everything is stored in unix timestamp, so it's just about comparing integers. But maybe I'm wrong. > >> 2- What can I do to optimize the all process ? (play with some PRAGMA >> parameters maybe) > > pragma page_size=4096; > pragma default_cache_size=20; > pragma cache_size=20; > > (cache with this options is 200 000 * 4k=800 Mb). > >> 3- Is sqlite a good choice for this case ? Would mysql or postgresql a >> better choice ? > > SQLite is good choice but data flow parser is more better then sql queries for > realtime analyze. You can write demon for prepare data to load in database and > inserting data periodically with transactions. i'm not planning to do real time analysis. I want to do some stats after (several days or month) to study the users behaviours on the website. > > Best regards. Thx a lot ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Is sqlite the good choice in my case ? And how can I optimize my SQL requests
Hello everyone, I'm doing some web logs analysis. I decided to use a sqlite database because the application has to be easily portable to another workstation or server and there is absolutly no need of multi-clients. Moreover, I don't want to spend my time dealing with the database. What I want to do is quite simple. I want to parse the logs from a high traffic apache website (loadbalanced on 8 front servers) and integrated them in a database. That's the easy part. In apache configuration I added the mod_usertrack in order to be able to follow a user in the logs. Then I want to create parent / child relations between requests from a unique user (id by the mod_usertrack). The final goal is to see all users. Then by user we can see the requests issued from a click on a webpage (manual human interface --> parent). Finaly for each parent requests (click) I want to have all sub requests issued from the parent page (automatic fetch (images, css, javascript, ...)). The global aim of all this is to analyse the behaviour of visitors on our website and to extract different populations of users and make statistics. Here are the tables I use: CREATE TABLE IF NOT EXISTS requests ( id integer primary key autoincrement, date date not null, request varchar not null, method varchar not null, code integer not null, referer varchar, response_time integer not null, vhost varchar DEFAULT NULL, cookie not null REFERENCES cookies(id)) CREATE TABLE IF NOT EXISTS cookies ( id integer primary key autoincrement, cookie varchar not null UNIQUE) CREATE TABLE IF NOT EXISTS siblings ( parent not null REFERENCES requests(id), child not null REFERENCES requests(id)) I ran a test on 2 hours logs from the 18th of february (from 10:00AM to 11:59AM). The first step is to parse the logs in order to populate the `requests` and `cookies` tables. It took about 2min to parse and insert 238304 cookies and 1163728 requests. After this step the database is 317Mo sized. Then I have to populate the `siblings` table. I use the following request: INSERT INTO siblings (parent, child) SELECT a.id, b.id FROM requests a INNER JOIN requests b ON a.cookie = b.cookie WHERE a.page = b.referer AND b.date BETWEEN a.date AND a.date + 20 It took about 3hours and 30 minutes. At the end there is 7158001 rows in the `siblings` table and the database is about 600Mo sized. My questions: 1- Is there a better way to populate the `siblings` table ? (optimize my sql request) 2- What can I do to optimize the all process ? (play with some PRAGMA parameters maybe) 3- Is sqlite a good choice for this case ? Would mysql or postgresql a better choice ? Thanks in advance for your time. ++ Jerome ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Sqlite optim
Hello everyone, I'm doing some web logs analysis. I decided to use a sqlite database because the application has to be easily portable to another workstation or server and there is absolutly no need of multi-clients. Moreover, I don't want to spend my time dealing with the database. What I want to do is quite simple. I want to parse the logs from a high traffic apache website (loadbalanced on 8 front servers) and integrated them in a database. That's the easy part. In apache configuration I added the mod_usertrack in order to be able to follow a user in the logs. Then I want to create parent / child relations between requests from a unique user (id by the mod_usertrack). The final goal is to see all users. Then by user we can see the requests issued from a click on a webpage (manual human interface --> parent). Finaly for each parent requests (click) I want to have all sub requests issued from the parent page (automatic fetch (images, css, javascript, ...)). The global aim of all this is to analyse the behaviour of visitors on our website and to extract different populations of users and make statistics. Here are the tables I use: CREATE TABLE IF NOT EXISTS requests ( id integer primary key autoincrement, date date not null, request varchar not null, method varchar not null, code integer not null, referer varchar, response_time integer not null, vhost varchar DEFAULT NULL, cookie not null REFERENCES cookies(id)) CREATE TABLE IF NOT EXISTS cookies ( id integer primary key autoincrement, cookie varchar not null UNIQUE) CREATE TABLE IF NOT EXISTS siblings ( parent not null REFERENCES requests(id), child not null REFERENCES requests(id)) I ran a test on 2 hours logs from the 18th of february (from 10:00AM to 11:59AM). The first step is to parse the logs in order to populate the `requests` and `cookies` tables. It took about 2min to parse and insert 238304 cookies and 1163728 requests. After this step the database is 317Mo sized. Then I have to populate the `siblings` table. I use the following request: INSERT INTO siblings (parent, child) SELECT a.id, b.id FROM requests a INNER JOIN requests b ON a.cookie = b.cookie WHERE a.page = b.referer AND b.date BETWEEN a.date AND a.date + 20 It took about 3hours and 30 minutes. At the end there is 7158001 rows in the `siblings` table and the database is about 600Mo sized. My questions: 1- Is there a better way to populate the `siblings` table ? (optimize my sql request) 2- What can I do to optimize the all process ? (play with some PRAGMA parameters maybe) 3- Is sqlite a good choice for this case ? Would mysql or postgresql a better choice ? Thanks in advance for your time. ++ Jerome ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users