>> First of all in the statement above you don't gain benefit from
>> uniqueness and replace about 10k rows twice.
>
> Are you sure? The SELECT in the INSERT OR UPDATE selects "FROM 
> trans_counters_v AS c", the grouped temporary view. So it should only see any 
> given key pair once before it starts doing any inserting at all

Sorry, you are right. I missed the GROUP BY part...


Pavel


On Tue, Mar 12, 2013 at 11:03 PM, David King <dk...@ketralnis.com> wrote:
>> > At first I was just doing something like this pseducode:
>> > update_counter(k1, k2, count=count+1, expires=now+count*1day)
>> > if rows_updated != 1: insert_counter(k1, k2, count=1, expires=now+1day)
>>
>> Assuming these 2 statements constitute each of the 10k-100k steps you
>> mentioned above and all of these steps are wrapped up in BEGIN-COMMIT
>> block this is probably the most efficient way of doing this. The only
>> improvement could be if you are doing creates more often than updates.
>> Then you can switch and do INSERT first and then UPDATE if necessary.
>> It could gain you a little time.
>
>
> Yeah. I even tried keeping track of how many hits/misses I had and 
> re-ordering the attempted INSERT/UPDATE as appropriate. A batch of 100k of 
> these is done in a single transaction
>
>> > but was having serious performance problems that seems to be confined to 
>> > those lines. So I converted ir to INSERT OR REPLACE which had no 
>> > noticeable impact on performance.
>> Actually my understanding would suggest that INSERT OR REPLACE should
>> execute slower than UPDATE + INSERT (or INSERT + UPDATE).
>
> […]
>> > Convinced the problem was in my code, I decided to offload as much as 
>> > possible to sqlite. Now my code looks like:
>>
>> This should be much-much slower than UPDATE + INSERT.
>
>
> That's unfortunate because the overall performance was about the same ±10% 
> between all three approaches :(
>
>> First of all in the statement above you don't gain benefit from
>> uniqueness and replace about 10k rows twice.
>
>
> Are you sure? The SELECT in the INSERT OR UPDATE selects "FROM 
> trans_counters_v AS c", the grouped temporary view. So it should only see any 
> given key pair once before it starts doing any inserting at all
>
>> Second with such low
>> repeatability you don't gain much from doing it with such complicated
>> INSERT. And about journal size: imagine that you've got "lucky" and
>> all those 94k rows are each in it's own page in the counters table.
>> SQLite will have to save each of that pages in the journal which will
>> give journal size of about 94k * 4096 ~ 400M.
>
>
> I hadn't thought about it that way, that's true. And it's probably wildly 
> seeking all over the disk to do it. The reads are probably fine because the 
> machine has plenty of RAM to devote to page cache, it's the random writes 
> that are killing it.
>
>> I don't think there's anything better than what you did initially.
>
> As for the fundamental approach, I figured as much. The rearrangement into 
> the giant INSERT OR REPLACE was just to prove to myself that the problem 
> wasn't elsewhere in my code
>
> For optimising it on the sqlite front, I've played with page sizes, 
> journaling modes, and changing the transaction batch size without much luck. 
> I don't have strong consistency requirements for e.g. power failures or OS 
> crashes but I do need an application crash to not take it out so I can't just 
> go without the journal altogether (which does help the problem, but isn't 
> huge).
>
>
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to