[h2] Re: compactRewriteFully safety with DB writes

2021-07-25 Thread Matthew Phillips
Hi Andrei,

sorry to be unclear. I'll try to summarise:

* I have a small file-based MVStore (~30MB on disk with LZW).
* For the reasons described previously I can't estimate memory size of 
entry values in a way that's meaningful, so instead the size reported for 
entry values is fixed at 1024 and the cache is made very large. The idea is 
that entries read from the store, and entries added to the store are never 
flushed from the cache, and instead we flush it manually every now and 
then. This should work as the dataset is far smaller than the available 
heap.
* This approach does work stably with 1.4.197, staying at around 150MB heap 
usage over a long period of many writes, reads, and deletes.
* When switching to 1.4.200 the same approach results in 800MB+ of heap 
usage and growing. I stopped the experiment when I started getting low heap 
warnings (> 90% usage).

Looking at the code for the LIRS cache, I can see it's gained weak 
references as you pointed out, so I now wonder if the JVM in my experiment 
would have taken action at some point, perhaps cleared those references and 
reduced heap usage. However, I'd like to understand where all that heap is 
going, and if I can avoid it triggering those warnings.

One thing that may be atypical in this workload is there are a *lot* of 
deletes, typically balanced by a lot of adds (the total data size tends to 
be the same over time). If those deleted entries stay in the heap, then 
that would definitely cause this.

Hope that's clearer.

Cheers,

Matt.

On Sunday, July 25, 2021 at 11:55:25 PM UTC+9:30 andrei...@gmail.com wrote:

> Hi Matt,
>
> After reading your last message, I still fail to understand what exactly 
> "no longer works" with 1.4.200?
> If your concern is a heap usage increase, I would say it should never be a 
> problem on it's own, and in this case it is kind of expected, because cache 
> now may keep weak references for items that were just dropped in previous 
> version.
> You also mention attempt to run in-memory configuration, does it fail with 
> 1.4.200?
>
> On Sunday, July 25, 2021 at 2:41:45 AM UTC-4 matt...@gmail.com wrote:
>
>> Sorry to keep replying to myself, but another question.
>>
>> After trialling the latest H2 (1.4.200) with our existing data, I saw 
>> heap usage shoot up from 150-200MB where it hovers with 1.4.147 to 840M and 
>> rising after a fairly short run time. 
>>
>> Now, I expect this is totally caused by my set up since (a) our 
>> DataType.getMemory () simply returns 1024, and our MVStore cache size is 
>> set to 32000 (~32GB). I chose this due to the above issues with object size 
>> estimation so that data would effectively never leave the cache, and we 
>> manage long-term this by flushing the cache every 24 hours.
>>
>> Since the dataset in this case fits in a 35MB .mv file (it is compressed 
>> with LZW though), the all-in-memory idea seemed reasonable, and indeed ran 
>> stably with 1.4.197. 
>>
>> I can see from the source code for Page and MVMap that a lot of work has 
>> gone into this area, but after some time looking at it I'm still not sure 
>> how to proceed, other than playing with other numbers for getMemory () 
>> and/or cache size.
>>
>> Do you have any ideas why my hacky solution no longer works and/or 
>> suggestions on how I might approach making it work with 1.4.200?
>>
>> Cheers,
>>
>> Matt.
>>
>> On Monday, July 19, 2021 at 11:24:30 AM UTC+9:30 Matthew Phillips wrote:
>>
>>> Hi Andrei,
>>>
>>> thanks again. We have a use-case that's very high on writes/deletes, and 
>>> the DB was apparently getting very fragmented. Using regular compact() with 
>>> 80 fill rate has halved the size, and we've done away with the nightly "big 
>>> compact" task. So for this use case it's working very well so far.
>>>
>>> The need for a cache flush is due to the fact that we store many 
>>> versions of the same data, most of them as small diffs against a previous 
>>> version. We reconstitute these versions into Clojure persistent maps, which 
>>> means there's a very high level of structural sharing between objects. So 
>>> when adding two maps A & B to the cache, the used memory for those maps is 
>>> almost certainly nothing like getMemory(A) + getMemory(B) due to (probable) 
>>> structural sharing between them. 
>>>
>>> We did try some heuristic workarounds, but were always off by so much 
>>> that performance suffered badly (either flushing the cache when we didn't 
>>> need to, or writing too often). So, we use an effectively infinite cache, 
>>> flush overnight and let it re-fill. Not ideal, but don't have a better 
>>> solution right now.
>>>
>>> Cheers,
>>>
>>> Matt.
>>> On Sunday, July 18, 2021 at 11:49:43 PM UTC+9:30 andrei...@gmail.com 
>>> wrote:
>>>
 Hi Matt,
 IMHO, the best way to compact will be off-line one - 
 MVStoreTool.compact(), and it can take only seconds (your mileage may 
 very, 
 of course).
 If you can not afford 1 min off-line interruption, then you 

[h2] Re: compactRewriteFully safety with DB writes

2021-07-25 Thread Andrei Tokar
Hi Matt,

After reading your last message, I still fail to understand what exactly 
"no longer works" with 1.4.200?
If your concern is a heap usage increase, I would say it should never be a 
problem on it's own, and in this case it is kind of expected, because cache 
now may keep weak references for items that were just dropped in previous 
version.
You also mention attempt to run in-memory configuration, does it fail with 
1.4.200?

On Sunday, July 25, 2021 at 2:41:45 AM UTC-4 matt...@gmail.com wrote:

> Sorry to keep replying to myself, but another question.
>
> After trialling the latest H2 (1.4.200) with our existing data, I saw heap 
> usage shoot up from 150-200MB where it hovers with 1.4.147 to 840M and 
> rising after a fairly short run time. 
>
> Now, I expect this is totally caused by my set up since (a) our 
> DataType.getMemory () simply returns 1024, and our MVStore cache size is 
> set to 32000 (~32GB). I chose this due to the above issues with object size 
> estimation so that data would effectively never leave the cache, and we 
> manage long-term this by flushing the cache every 24 hours.
>
> Since the dataset in this case fits in a 35MB .mv file (it is compressed 
> with LZW though), the all-in-memory idea seemed reasonable, and indeed ran 
> stably with 1.4.197. 
>
> I can see from the source code for Page and MVMap that a lot of work has 
> gone into this area, but after some time looking at it I'm still not sure 
> how to proceed, other than playing with other numbers for getMemory () 
> and/or cache size.
>
> Do you have any ideas why my hacky solution no longer works and/or 
> suggestions on how I might approach making it work with 1.4.200?
>
> Cheers,
>
> Matt.
>
> On Monday, July 19, 2021 at 11:24:30 AM UTC+9:30 Matthew Phillips wrote:
>
>> Hi Andrei,
>>
>> thanks again. We have a use-case that's very high on writes/deletes, and 
>> the DB was apparently getting very fragmented. Using regular compact() with 
>> 80 fill rate has halved the size, and we've done away with the nightly "big 
>> compact" task. So for this use case it's working very well so far.
>>
>> The need for a cache flush is due to the fact that we store many versions 
>> of the same data, most of them as small diffs against a previous version. 
>> We reconstitute these versions into Clojure persistent maps, which means 
>> there's a very high level of structural sharing between objects. So when 
>> adding two maps A & B to the cache, the used memory for those maps is 
>> almost certainly nothing like getMemory(A) + getMemory(B) due to (probable) 
>> structural sharing between them. 
>>
>> We did try some heuristic workarounds, but were always off by so much 
>> that performance suffered badly (either flushing the cache when we didn't 
>> need to, or writing too often). So, we use an effectively infinite cache, 
>> flush overnight and let it re-fill. Not ideal, but don't have a better 
>> solution right now.
>>
>> Cheers,
>>
>> Matt.
>> On Sunday, July 18, 2021 at 11:49:43 PM UTC+9:30 andrei...@gmail.com 
>> wrote:
>>
>>> Hi Matt,
>>> IMHO, the best way to compact will be off-line one - 
>>> MVStoreTool.compact(), and it can take only seconds (your mileage may very, 
>>> of course).
>>> If you can not afford 1 min off-line interruption, then you can try just 
>>> to let it run and do it's own maintenance in the background (asuming 
>>> autoCommitDelay > 0).
>>> If I only knew some "best/better" way for on-line compaction, it would 
>>> probably be there already, as a background maintenance procedure.
>>> I expect that the existing one will fit the bill, unless you update rate 
>>> is quite high.
>>> BTW, flushing the cache looks like a futile exercise, indeed.
>>>
>>> On Thursday, July 15, 2021 at 7:36:05 PM UTC-4 matt...@gmail.com wrote:
>>>
 Hello Andrei,

 thanks very much for your reply. 

 Yes, I'm aware I'm on an old version: if it's not broken, don't fix it 
 ;) Version 1.4.197 has been rock-solid for us for years, and I'm always 
 loathe to change things for no reason. But you have given me a good 
 reason, 
 so I'll give the latest MVStore a try.

 Can you recommend the best way to 'manually' compact the database in 
 the latest release?

 And just to be sure: could there be any data-loss issues from flushing 
 the cache?

 Cheers,

 Matt.

 On Friday, July 16, 2021 at 4:45:12 AM UTC+9:30 andrei...@gmail.com 
 wrote:

> Hi Matt,
>
> If you are experiencing a problem, which looks and smells like a 
> cuncurrency issue, then there is definitely a good reason to suspect a 
> concurrency issue. 8-)
> The real question here is: if you care enough about those problems, 
> why are you still on version 1.4.197. MVStore's concurrency / 
> synchronization was totally re-designed since then (and we are talking 
> years here), for example you will not even find 
> 

[h2] Re: compactRewriteFully safety with DB writes

2021-07-25 Thread Matthew Phillips
Sorry to keep replying to myself, but another question.

After trialling the latest H2 (1.4.200) with our existing data, I saw heap 
usage shoot up from 150-200MB where it hovers with 1.4.147 to 840M and 
rising after a fairly short run time. 

Now, I expect this is totally caused by my set up since (a) our 
DataType.getMemory () simply returns 1024, and our MVStore cache size is 
set to 32000 (~32GB). I chose this due to the above issues with object size 
estimation so that data would effectively never leave the cache, and we 
manage long-term this by flushing the cache every 24 hours.

Since the dataset in this case fits in a 35MB .mv file (it is compressed 
with LZW though), the all-in-memory idea seemed reasonable, and indeed ran 
stably with 1.4.197. 

I can see from the source code for Page and MVMap that a lot of work has 
gone into this area, but after some time looking at it I'm still not sure 
how to proceed, other than playing with other numbers for getMemory () 
and/or cache size.

Do you have any ideas why my hacky solution no longer works and/or 
suggestions on how I might approach making it work with 1.4.200?

Cheers,

Matt.

On Monday, July 19, 2021 at 11:24:30 AM UTC+9:30 Matthew Phillips wrote:

> Hi Andrei,
>
> thanks again. We have a use-case that's very high on writes/deletes, and 
> the DB was apparently getting very fragmented. Using regular compact() with 
> 80 fill rate has halved the size, and we've done away with the nightly "big 
> compact" task. So for this use case it's working very well so far.
>
> The need for a cache flush is due to the fact that we store many versions 
> of the same data, most of them as small diffs against a previous version. 
> We reconstitute these versions into Clojure persistent maps, which means 
> there's a very high level of structural sharing between objects. So when 
> adding two maps A & B to the cache, the used memory for those maps is 
> almost certainly nothing like getMemory(A) + getMemory(B) due to (probable) 
> structural sharing between them. 
>
> We did try some heuristic workarounds, but were always off by so much that 
> performance suffered badly (either flushing the cache when we didn't need 
> to, or writing too often). So, we use an effectively infinite cache, flush 
> overnight and let it re-fill. Not ideal, but don't have a better solution 
> right now.
>
> Cheers,
>
> Matt.
> On Sunday, July 18, 2021 at 11:49:43 PM UTC+9:30 andrei...@gmail.com 
> wrote:
>
>> Hi Matt,
>> IMHO, the best way to compact will be off-line one - 
>> MVStoreTool.compact(), and it can take only seconds (your mileage may very, 
>> of course).
>> If you can not afford 1 min off-line interruption, then you can try just 
>> to let it run and do it's own maintenance in the background (asuming 
>> autoCommitDelay > 0).
>> If I only knew some "best/better" way for on-line compaction, it would 
>> probably be there already, as a background maintenance procedure.
>> I expect that the existing one will fit the bill, unless you update rate 
>> is quite high.
>> BTW, flushing the cache looks like a futile exercise, indeed.
>>
>> On Thursday, July 15, 2021 at 7:36:05 PM UTC-4 matt...@gmail.com wrote:
>>
>>> Hello Andrei,
>>>
>>> thanks very much for your reply. 
>>>
>>> Yes, I'm aware I'm on an old version: if it's not broken, don't fix it 
>>> ;) Version 1.4.197 has been rock-solid for us for years, and I'm always 
>>> loathe to change things for no reason. But you have given me a good reason, 
>>> so I'll give the latest MVStore a try.
>>>
>>> Can you recommend the best way to 'manually' compact the database in the 
>>> latest release?
>>>
>>> And just to be sure: could there be any data-loss issues from flushing 
>>> the cache?
>>>
>>> Cheers,
>>>
>>> Matt.
>>>
>>> On Friday, July 16, 2021 at 4:45:12 AM UTC+9:30 andrei...@gmail.com 
>>> wrote:
>>>
 Hi Matt,

 If you are experiencing a problem, which looks and smells like a 
 cuncurrency issue, then there is definitely a good reason to suspect a 
 concurrency issue. 8-)
 The real question here is: if you care enough about those problems, why 
 are you still on version 1.4.197. MVStore's concurrency / synchronization 
 was totally re-designed since then (and we are talking years here), for 
 example you will not even find MVStore.compactRewriteFully() method 
 anymore, but instead it might just do all that space management, so you 
 won't need that background operation at all.
 In any case, I would not expect that someone will look at 1.4.197 
 issues at this point. On the other hand, if you will find similar problem 
 with current trunk version, and will be able to reproduce it, I will be 
 more than happy to work on it.

 Cheers,
 Andrei.

 On Thursday, July 15, 2021 at 3:39:40 AM UTC-4 matt...@gmail.com wrote:

> Hello,
>
> I'm trying to track down a perplexing problem when using an MVStore, 
> where it appears 

Re: [h2] H2 Inmemory not able to load 7lc records and giving heapspace issues

2021-07-25 Thread Noel Grandin
try using nioMemFS to store the data on the native heap

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/CAFYHVnUhgJAP3CLUED_jD5r62odjngxLGoyovV03EO7j4AfKYQ%40mail.gmail.com.


Re: [h2] Autonomous commit - commit single savepoint or transaction

2021-07-25 Thread Noel Grandin
you're going to need to use two connections to achieve that

-- 
You received this message because you are subscribed to the Google Groups "H2 
Database" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to h2-database+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/h2-database/CAFYHVnUNkjHdv%3DmcpQ4S_zM6mZNUmxcpMg5riNZK_sR6cjTBnQ%40mail.gmail.com.