> > At first I was just doing something like this pseducode: > > update_counter(k1, k2, count=count+1, expires=now+count*1day) > > if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day) > > Assuming these 2 statements constitute each of the 10k-100k steps you > mentioned above and all of these steps are wrapped up in BEGIN-COMMIT > block this is probably the most efficient way of doing this. The only > improvement could be if you are doing creates more often than updates. > Then you can switch and do INSERT first and then UPDATE if necessary. > It could gain you a little time.
Yeah. I even tried keeping track of how many hits/misses I had and re-ordering the attempted INSERT/UPDATE as appropriate. A batch of 100k of these is done in a single transaction > > but was having serious performance problems that seems to be confined to > > those lines. So I converted ir to INSERT OR REPLACE which had no noticeable > > impact on performance. > Actually my understanding would suggest that INSERT OR REPLACE should > execute slower than UPDATE + INSERT (or INSERT + UPDATE). […] > > Convinced the problem was in my code, I decided to offload as much as > > possible to sqlite. Now my code looks like: > > This should be much-much slower than UPDATE + INSERT. That's unfortunate because the overall performance was about the same ±10% between all three approaches :( > First of all in the statement above you don't gain benefit from > uniqueness and replace about 10k rows twice. Are you sure? The SELECT in the INSERT OR UPDATE selects "FROM trans_counters_v AS c", the grouped temporary view. So it should only see any given key pair once before it starts doing any inserting at all > Second with such low > repeatability you don't gain much from doing it with such complicated > INSERT. And about journal size: imagine that you've got "lucky" and > all those 94k rows are each in it's own page in the counters table. > SQLite will have to save each of that pages in the journal which will > give journal size of about 94k * 4096 ~ 400M. I hadn't thought about it that way, that's true. And it's probably wildly seeking all over the disk to do it. The reads are probably fine because the machine has plenty of RAM to devote to page cache, it's the random writes that are killing it. > I don't think there's anything better than what you did initially. As for the fundamental approach, I figured as much. The rearrangement into the giant INSERT OR REPLACE was just to prove to myself that the problem wasn't elsewhere in my code For optimising it on the sqlite front, I've played with page sizes, journaling modes, and changing the transaction batch size without much luck. I don't have strong consistency requirements for e.g. power failures or OS crashes but I do need an application crash to not take it out so I can't just go without the journal altogether (which does help the problem, but isn't huge). _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users