Re: [sqlite] finding pairwise similarity
where `similarity` is a measure of overlap of `t` between each pair of `c` and is calculated as the number of `t` common between a given pair divided by the total number of unique `t` in the pair. Just like Igor, I take "unique `t` in pair" to mean "distinct `t` in pair". I want the result to be source_c target_c similarity -- 1 2 0.2 (= 1 common `t` between the 2 `c` / by total 4 `t`) Here I take 0.2 to mean 0.25. If so, here's some more amunition: select src.c_no "Source c", trg.c_no "Target c", (select count(*) from (select 1 from c_t where c_t.c_no in (src.c_no, trg.c_no) group by c_t.t_no having count(*) > 1)) * 1.0 / (select count(*) from (select distinct t_no from c_t where c_t.c_no in (src.c_no, trg.c_no))) Similarity from c src join c trg on (src.c_no < trg.c_no); If I'm not mistaken, both versions should give identical results. Check by yourself on real data and see which is faster for you (most probably Igor's). ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Some pieces of the puzzle are coming together
On Fri, Jan 13, 2012 at 5:13 PM, Max Vlasovwrote: > John, > I read your previous post and unfortunately, your conversation with Richard > didn't reveal much details. I worked with scenario similar tou yours > (Delphi + statically linked sqlite) for several years and during this > period an advanced admin/console form was developed that allows dynamical > loading of different versions of sqlite, reporting vfs bandwidth, times and > even the number of memory requests. So consider sharing more information > about your db and queries here or contact me directly if you wish, I hope I > can help at least a little. > > As a quick guess I remember that fts was one of the places that allocated > heavily with some queries, but I can't recall right now. > > There are 88 unique queries run against the database. They are mostly automatically created from a class structure. However, without dumping the entire database in front of you, the following query should provide a good, isolated example: 5,008 calls to UPDATE RESPONSES SET RESPONSE_NAME = :RESPONSE_NAME, prelisted_value = :prelisted_value WHERE RESPONSE_OID = :RESPONSE_OID 3.6.17: 382 ms 3.7.9: 5,924 ms There are 80,021 rows in RESPONSES. The CREATE statement for RESPONSES reads: CREATE TABLE responses ( response_oid integer primary key autoincrement, response_name varchar, definition_parent integer, instance_parent integer, prelisted_value varchar, override_behavior_oid integer, soft_deleted_char varchar default 'F' ); CREATE INDEX response_definition_parent_idx on responses (definition_parent); CREATE UNIQUE INDEX response_instance_parent_idx on responses (instance_parent, definition_parent); CREATE UNIQUE INDEX response_instance_parent_idx2 on responses (response_name, instance_parent, definition_parent); ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Some pieces of the puzzle are coming together
On Sat, Jan 14, 2012 at 10:42 AM, Dan Kennedywrote: > On 01/14/2012 04:49 AM, John Elrick wrote: > >> >> So, obviously the problem is that _malloc is being called a much larger >> number of times by 3.7.9 in this particular circumstance -- roughly 70 >> times as often. _memcpy demonstrates roughly 6 times as many calls I >> ran >> a test with a much smaller dataset tracking every call to malloc back to >> the main source line. The results were as follows: >> >> 3.6.17, >> _sqlite3_exec calls _malloc 1,101 times >> _sqlite3_step calls _malloc 1,812 times >> >> 3.7.9 >> _sqlite3_exec calls _malloc 65,227 times >> _sqlite3_step calls _malloc 47,109 times >> >> At this point I've exhausted my avenues of research. Does anyone have any >> further ideas as to what may be going on which could cause this increase >> in >> memory management activity? Or any suggestions of where to look next? >> >> >> > Is there a CREATE INDEX statement in your program? There was a change > a few versions back that caused CREATE INDEX to make a lot more calls > to malloc. > > If you build SQLite with SQLITE_OMIT_MERGE_SORT it bypasses the new > code and builds indexes the old way - with fewer mallocs. There are many CREATE INDEX statements which are all run at the beginning when the database is initially created. I can run a test with your suggestion early next week. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Some pieces of the puzzle are coming together
On Fri, Jan 13, 2012 at 5:58 PM, Richard Hippwrote: > On Fri, Jan 13, 2012 at 4:49 PM, John Elrick >wrote: > > > > > 3.6.17 > > > > Procedure % TimeTimeCalls > > _sqlite3_step 58.4% 1.795,052 > > _memcpy 22.8% 0.691,342,957 > > _memset 7.8% 0.24 465,299 > > ... > > _malloc 1.9% 0.06 95,505 > > > > > > 3.7.9 > > > > Procedure % TimeTimeCalls > > _malloc 56.9% 44.396,975,335 > > _sqlite3_step 30.4% 23.685,052 > > _memcpy 4.8% 3.707,710,259 > > > > Very curious > > SQLite has lots of interfaces that can be used to determine performance and > status information. Some example code from the command-line shell that > accesses this status information is here: > http://www.sqlite.org/src/artifact/aa4183d4a5243d?ln=990-1076 > > I would very much like to see some of these same stats from your > application. Do you think you could add some code like that found in the > shell and give us some more insight into what is going on? Thanks for the direction. I'll start working on it on Monday. John ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Some pieces of the puzzle are coming together
On Fri, Jan 13, 2012 at 5:20 PM, Pavel Ivanovwrote: > > At this point I've exhausted my avenues of research. Does anyone have > any > > further ideas as to what may be going on which could cause this increase > in > > memory management activity? Or any suggestions of where to look next? > > IIRC, SQLite 3.7.9 changed its PCACHE interface and thus page cache > implementation. It introduced one level of indirection into each page > in the cache. But or course I don't think it should result in such > huge amount of additional mallocs. > Also as you see effects only inside Delphi and not in command line > utility it suggests me that Delphi installed its own page cache. And > as it uses old (deprecated in 3.7.9) interface it may have some bad > interaction with new SQLite. > > And just a thought: number of calls to memcpy with 3.7.9 is larger > than with 3.6.17 roughly on the same amount as number of mallocs. > Which suggests that all new calls are actually calls to realloc() (is > number of calls to free() also bigger on 6M?). The number of calls to _free are 6,968,427 in 3.7.9. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] sqlite3_set_auxdata & invalid pointer
Hello, I am trying to add custom aggregation function support in a golang driver (scalar functions are ok). While testing, I got this: *** glibc detected *** ./6.out: realloc(): invalid pointer: 0x02daa1c5 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x72656)[0x2b9a7b5da656] /lib/x86_64-linux-gnu/libc.so.6(realloc+0x312)[0x2b9a7b5e0762] /home/gwen/Test/sqlite-autoconf-3070900/.libs/libsqlite3.so.0(+0x30387)[0x2b9a7b2ec387] /home/gwen/Test/sqlite-autoconf-3070900/.libs/libsqlite3.so.0(+0x16a5b)[0x2b9a7b2d2a5b] /home/gwen/Test/sqlite-autoconf-3070900/.libs/libsqlite3.so.0(+0x234da)[0x2b9a7b2df4da] /home/gwen/Test/sqlite-autoconf-3070900/.libs/libsqlite3.so.0(sqlite3_set_auxdata+0xb6)[0x2b9a7b2e2d86] I just have enough skills to debug with gdb and to find this line: 62056: pVdbeFunc = sqlite3DbRealloc(pCtx->s.db, pVdbeFunc, nMalloc); Could you please help me find what I am doing wrong? I just call sqlite3_set_auxdata in my xStep function. Thanks. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] finding pairwise similarity
Mr. Puneet Kishorwrote: > I have two tables like so > >CREATE TABLE c (c_no INTEGER PRIMARY KEY, c_name TEXT); >CREATE TABLE t (t_no INTEGER PRIMARY KEY, t_name TEXT); >CREATE TABLE c_t (c_no INTEGER, t_no INTEGER); > > Every row in `c` has one or more `t`. I want, in a single SQL, the following > >source_c, target_c, similarity > > where `similarity` is a measure of overlap of `t` between each pair of `c` > and is calculated as the number of `t` common between > a given pair divided by the total number of unique `t` in the pair. select src.c_no source_c, trg.c_no target_c, (select count(*) from ( select t_no from c_t where c_t.c_no = src.c_no intersect select t_no from c_t where c_t.c_no = trg.c_no)) * 1.0 / (select count(*) from ( select t_no from c_t where c_t.c_no = src.c_no union select t_no from c_t where c_t.c_no = trg.c_no)) similarity from c src join c trg on (src.c_no < trg.c_no); -- Igor Tandetnik ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SSD with TRIM
On Sat, Jan 14, 2012 at 11:52 PM, Simon Slavinwrote: > > On 14 Jan 2012, at 7:24pm, Max Vlasov wrote: > > > On Sat, Jan 14, 2012 at 6:12 AM, Simon Slavin > wrote: > > > >> Fast. Fasty fast. Speed is high. INSERT, UPDATE, and DELETE all > >> significantly faster. SELECT is a bit faster, but there's less > difference. > > > > Simon, very interesting. Can you make some tests related to internal > > > > INSERT INTO TestTable (id, text) VALUEs (abs(random() % (1 << 48)), > > '12345678901234567...') {the text is about 512 bytes long} > > If you want to supply the CREATE command and tell me that the '...' in the > above means that that line should continue to 512 digits, I'm happy to run > the above. So the speeds are not directly comparable. > > Ok CREATE TABLE [TestTable] ( [Id] INTEGER PRIMARY KEY AUTOINCREMENT, [Text] TEXT ) The string just continues to be 512 bytes in length, I think it can be even bigger. The idea is for primary key to jump heavily inside the b-tree. As I understand the test should benefit from infamous almost instant access time of ssd, but on Asus T91MT having SSD I still noticed degradation toward 50% of job when the cpu dropped from 80% to 50% (so it's not sqlite who waited) so either Windows 7 on a 2 years ssd system isn't optimized enough or something else. Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SSD with TRIM
On 14 Jan 2012, at 7:24pm, Max Vlasov wrote: > On Sat, Jan 14, 2012 at 6:12 AM, Simon Slavinwrote: > >> Fast. Fasty fast. Speed is high. INSERT, UPDATE, and DELETE all >> significantly faster. SELECT is a bit faster, but there's less difference. > > Simon, very interesting. Can you make some tests related to internal > fragmentation? As an advanced user of sqlite, you probably will easily > invent your own tests :), but if not there is a test I used at least once > > INSERT INTO TestTable (id, text) VALUEs (abs(random() % (1 << 48)), > '12345678901234567...') {the text is about 512 bytes long} If you want to supply the CREATE command and tell me that the '...' in the above means that that line should continue to 512 digits, I'm happy to run the above. So the speeds are not directly comparable. I was running a real webserver app (Apache/PHP/SQLite). But I was debugging it on a new laptop -- a MacBook Pro 2.4GHz Core i7 with SSD -- and noticed an improvement in speed over when I debugged it with an older laptop with a hard disk. But the older one has two cores instead of four and the motherboard architecture may not be as good. And the app does quite a few different things, and there's a possible ethernet bandwidth bottleneck, so it's not suitable as a benchtest. Simon. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] SSD with TRIM
On Sat, Jan 14, 2012 at 6:12 AM, Simon Slavinwrote: > > Fast. Fasty fast. Speed is high. INSERT, UPDATE, and DELETE all > significantly faster. SELECT is a bit faster, but there's less difference. > > Simon, very interesting. Can you make some tests related to internal fragmentation? As an advanced user of sqlite, you probably will easily invent your own tests :), but if not there is a test I used at least once INSERT INTO TestTable (id, text) VALUEs (abs(random() % (1 << 48)), '12345678901234567...') {the text is about 512 bytes long} 1,000,000 inserts should create a 1GB base and the time with general hd was above 15 minutes, maybe even half an hour. So I'm very interested to see was there any progress with overall ssd performance. Thanks, Max ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] finding pairwise similarity
I have two tables like so CREATE TABLE c (c_no INTEGER PRIMARY KEY, c_name TEXT); CREATE TABLE t (t_no INTEGER PRIMARY KEY, t_name TEXT); CREATE TABLE c_t (c_no INTEGER, t_no INTEGER); Every row in `c` has one or more `t`. I want, in a single SQL, the following source_c, target_c, similarity where `similarity` is a measure of overlap of `t` between each pair of `c` and is calculated as the number of `t` common between a given pair divided by the total number of unique `t` in the pair. For example, given the following records c t --- --- 1 1 1 2 1 3 2 1 2 5 I want the result to be source_c target_c similarity -- 1 2 0.2 (= 1 common `t` between the 2 `c` / by total 4 `t`) Would appreciate any nudge toward a solution. -- Puneet Kishor ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Some pieces of the puzzle are coming together
On 01/14/2012 04:49 AM, John Elrick wrote: So, obviously the problem is that _malloc is being called a much larger number of times by 3.7.9 in this particular circumstance -- roughly 70 times as often. _memcpy demonstrates roughly 6 times as many calls I ran a test with a much smaller dataset tracking every call to malloc back to the main source line. The results were as follows: 3.6.17, _sqlite3_exec calls _malloc 1,101 times _sqlite3_step calls _malloc 1,812 times 3.7.9 _sqlite3_exec calls _malloc 65,227 times _sqlite3_step calls _malloc 47,109 times At this point I've exhausted my avenues of research. Does anyone have any further ideas as to what may be going on which could cause this increase in memory management activity? Or any suggestions of where to look next? Is there a CREATE INDEX statement in your program? There was a change a few versions back that caused CREATE INDEX to make a lot more calls to malloc. If you build SQLite with SQLITE_OMIT_MERGE_SORT it bypasses the new code and builds indexes the old way - with fewer mallocs. Dan. ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Slow commits
Hi Normand, Just for sanity, check whether Windows is not creating previous versions for you. Right click on the file in Explorer -> Properties -> Previous Versions. Also check whether Caching is enabled on your HDD: Control Panel -> Device Manager -> Disk Drives -> (Click on your HDD) Then go to the "Policies" tab (should be second). There are two settings: "Enable write caching on disk" and "Turn off Windows write-cache buffer flushing on device" The first one should probably be checked, but second one (if checked) would be more dangerous, but faster (at least for me). It depends how secure your data is. Our data can be regenerated, and taken back from P4, and at worst I can lose some hours of work (programming). It's also good idea to check with SysInternals ProcMon what exactly is going on. Another good tool, is SysInternals RamMap - with it you can check what portions of the file are in the file-cache (meta-data). You can also flush the cache with it, and test various scenarios Cheers, Dimiter 'malkia' Stanev, Senior Software Engineer, Treyarch, Activision. On 1/13/2012 3:56 PM, Normand Mongeau wrote: On 2012-01-13 18:10, Roger Binns wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 13/01/12 14:35, Normand Mongeau wrote: It gets worse. On a clean empty database, the same 534 transactions take 140 seconds. That's a not very impressive rate of 3.8 inserts per second. The FAQ says that SQLite should be able to do a "few dozen transactions per second". I'd be happy to see that. What file extension are you using for the database? There is a long list of extensions that System Restore monitors, and makes backups of the files as they change. This will kill your performance. I was using .db as an extension, and changed it to something ludicrous, but it didn't make a difference. Good idea though. Thanks, Normand Extension list and terse details are at: http://msdn.microsoft.com/en-us/library/Aa378870 Roger -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) iEYEARECAAYFAk8Quc4ACgkQmOOfHg372QQAowCgi0DlewfcHs6MIPIHSyjHw6mN nFIAnjJch3erZfRF+I88yA3CzAkCQWVl =HoSZ -END PGP SIGNATURE- ___ sqlite-users mailing list sqlite-users-czdrofg0bjidnm+yrof...@public.gmane.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
[sqlite] Bug: "Invalid statement in fillWindow()" from Explain query plan using subquery
Using SQLite 3.7.4 on Android. To reproduce: create table S (id string, name string); create table F (id string, sid string); explain query plan select (select count(*) from F where F.sid=S.id) as FCount, S.name from S; Result: Empty cursor, with log showing: SqliteCursor.cpp Invalid statement in fillWindow() Just running the select, without explain query plan, seems to be OK. Could not find anything like this among existing bugs. Luke ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
Re: [sqlite] Using non-aggregate columns in group by (with an additional issue)
A very minor change on your query is all that is needed. select max(x.timestamp),x.value,x.person from table as x, (select person,max(value) as maxvalue from table group by person order by maxvalue desc) as y where x.person = y.person and x.value = y.maxvalue group by x.person, x.value I don't know how this compares in terms of performance with Igor's solution. --David Garfield On Fri, Jan 13, 2012 at 06:07, Dilip Ranganathanwrote: > I have a table that looks like something like this: > >timestampvalue person >=== >2010-01-12 00:00:00 33 emp1 >2010-01-12 11:00:00 22 emp1 >2010-01-12 09:00:00 16 emp2 >2010-01-12 08:00:00 16 emp2 >2010-01-12 12:12:00 45 emp3 >2010-01-12 13:44:00 64 emp4 >2010-01-12 06:00:00 33 emp1 >2010-01-12 15:00:00 12 emp5 > > I wanted to find the maximum value associated with each person. The obvious > query was: > > select person,max(value) from table group by person > > Now I wanted to include the timestamp associated with each max(value). I > could not use timestamp column in the above query because as everyone > knows, it won't appear in the group by clause. So I wrote this instead: > > select x.timestamp,x.value,x.person from table as x, > (select person,max(value) as maxvalue from table group by person order by > maxvalue desc) as y > where x.person = y.person > and x.value = y.maxvalue > > This works -- to an extent. I now see: > >timestampvalue person >=== >2010-01-12 13:44:00 64 emp4 >2010-01-12 12:12:00 45 emp3 >2010-01-12 06:00:00 33 emp1 >2010-01-12 00:00:00 33 emp1 >2010-01-12 08:00:00 16 emp2 >2010-01-12 09:00:00 16 emp2 >2010-01-12 15:00:00 12 emp5 > > The problem is now I get all the entries for emp1 and emp2 that ends up > with the same max(value). > > Suppose among emp1 and emp2, I only want to see the entry with the latest > timestamp. IOW, I want this: > >timestampvalue person >=== >2010-01-12 13:44:00 64 emp4 >2010-01-12 12:12:00 45 emp3 >2010-01-12 06:00:00 33 emp1 >2010-01-12 09:00:00 16 emp2 >2010-01-12 15:00:00 12 emp5 > > What kind of query would I have to write? Is it possible to extend the > nested query I wrote to achieve what I want or does one have to rewrite > everything from the scratch? > > If its important, timestamps are actually stored as julian days. I use the > datetime() function to convert them back to a string representation in > every query. > ___ > sqlite-users mailing list > sqlite-users@sqlite.org > http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users > ___ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users