Re: Max connections reached without max connections reached
> On Fri, Dec 03, 2021 at 10:32:03AM -0500, Tom Lane wrote: > Dilip Kumar writes: > > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar wrote: > >> I think there is no such view or anything which tells about which > >> backend or transaction has more than 64 sub transaction. But if we > >> are ready to modify the code then we can LOG that information in > >> GetNewTransactionId(), when first time we are marking it overflown. > > > I have prepared a small patch to log this information. > > Putting an elog call into GetNewTransactionId seems like a completely > horrid idea from a performance standpoint. Especially if you put it > inside the XidGenLock hold, where it can block the entire system not just > the one process. But even without that, this seems like a performance > penalty with basically no real-world benefit. People who have issues > like this are not going to want to trawl the postmaster log for such > messages. Maybe it's worth a trace point then, to allow collect such info sort of on demand?
Re: Why is writing JSONB faster than just JSON?
> On Fri, Apr 23, 2021 at 01:56:57AM -0700, Mitar wrote: > Hi! > > On Thu, Apr 15, 2021 at 12:11 PM Dmitry Dolgov <9erthali...@gmail.com> wrote: > > > My point was that for JSON, after validating that the input is > > > syntactically correct, we just store it as-received. So in particular > > > the amount of whitespace in the value would depend on how the client > > > had chosen to format the JSON. This'd affect the stored size of > > > course, and I think it would have an effect on compression time too. > > > > Yes, I got it and just wanted to confirm you were right - this was the > > reason I've observed slowdown trying to reproduce the report. > > Thank you for trying to reproduce the report. I did a bit more digging > myself and I am still confused. > > ... > > So I do not know what is happening and why you cannot reproduce it. Could you maybe get a profile with perf for both cases? Since they're executed within a single backend, you can profile only a single pid. Having a reasonable profiling frequency, --call-graph dwarf and probably limit events to only user space with precise tagging (cycles:uppp) should give an impression what's going on.
Re: Why is writing JSONB faster than just JSON?
> On Thu, Apr 15, 2021 at 12:47:25PM -0400, Tom Lane wrote: > Dmitry Dolgov <9erthali...@gmail.com> writes: > > On Wed, Apr 14, 2021 at 10:26:23AM -0400, Tom Lane wrote: > >> My own guess about this, without having tried to reproduce it, is that > >> JSONB might end up physically smaller than JSON, resulting in less work > >> to push the toasted datum out to disk. This'd depend a lot on your > >> formatting habits for JSON, of course. But in any case, it'd be worth > >> comparing pg_column_size() results to see what's up with that. > > > Oh, of course I've missed that the input I was using was indeed > > formatted, without formatting both cases perform equally well and I > > can't reproduce the issue. Although if I understand correctly the > > original code in question doesn't actually do any formatting. > > My point was that for JSON, after validating that the input is > syntactically correct, we just store it as-received. So in particular > the amount of whitespace in the value would depend on how the client > had chosen to format the JSON. This'd affect the stored size of > course, and I think it would have an effect on compression time too. Yes, I got it and just wanted to confirm you were right - this was the reason I've observed slowdown trying to reproduce the report.
Re: Why is writing JSONB faster than just JSON?
> On Wed, Apr 14, 2021 at 10:26:23AM -0400, Tom Lane wrote: > Dmitry Dolgov <9erthali...@gmail.com> writes: > >> On Tue, Apr 13, 2021 at 11:38:04PM -0700, Mitar wrote: > >> ... Namely, it looks like writing into a jsonb typed > >> column is 30% faster than writing into a json typed column. Why is > >> that? Does not jsonb require parsing of JSON and conversion? That > >> should be slower than just storing a blob as-is? > > > * Parsing is happening in both cases, for json it's done for validation > > purposes. > > Right. > > > * Compression of json data is actually dominates the runtime load for large > > json objects, making other factors less visible and reducing difference in > > size (jsonb also should be a bit bigger, that's why I would think it > > would be > > slower). > > * At the same time perf shows that json compression spends a bit more time > > in > > pglz_find_match (mostly around accessing and checking history entries), > > which > > is compression optimization via history table. So probably due to > > differences > > in data layout this optimization works slightly worse for plain json than > > for > > jsonb? > > Interesting. I recall that we made some last-minute changes in the JSONB > physical representation after finding that the original choices resulted > in sucky compression behavior. But I think we were only looking at the > resultant compressed size, not time-to-compress. > > My own guess about this, without having tried to reproduce it, is that > JSONB might end up physically smaller than JSON, resulting in less work > to push the toasted datum out to disk. This'd depend a lot on your > formatting habits for JSON, of course. But in any case, it'd be worth > comparing pg_column_size() results to see what's up with that. Oh, of course I've missed that the input I was using was indeed formatted, without formatting both cases perform equally well and I can't reproduce the issue. Although if I understand correctly the original code in question doesn't actually do any formatting.
Re: Why is writing JSONB faster than just JSON?
> On Tue, Apr 13, 2021 at 11:38:04PM -0700, Mitar wrote: > Hi! > > I have a project where we among other data want to store static JSON > objects which can get pretty large (10-100 KB). I was trying to > evaluate how it would work if we simply store it as an additional > column in a PostgreSQL database. So I made a benchmark [1]. The > results surprised me a bit and I am writing here because I would like > to understand them. Namely, it looks like writing into a jsonb typed > column is 30% faster than writing into a json typed column. Why is > that? Does not jsonb require parsing of JSON and conversion? That > should be slower than just storing a blob as-is? > > [1] https://gitlab.com/mitar/benchmark-pg-json Interesting. I've tried to reproduce the schema and queries from the repository above (with a different generated data though and without the app itself) and indeed got a bit slower (not 30%, but visibly) execution for json column instead of jsonb. There are couple of important points: * Parsing is happening in both cases, for json it's done for validation purposes. * Compression of json data is actually dominates the runtime load for large json objects, making other factors less visible and reducing difference in size (jsonb also should be a bit bigger, that's why I would think it would be slower). * At the same time perf shows that json compression spends a bit more time in pglz_find_match (mostly around accessing and checking history entries), which is compression optimization via history table. So probably due to differences in data layout this optimization works slightly worse for plain json than for jsonb?
Re: jsonb_set() strictness considered harmful to data
> On Sat, Oct 19, 2019 at 1:08 PM Tomas Vondra > wrote: > > >Here is how other implementations handle this case: > > > >MySQL/MariaDB: > > > >select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in: > > {"a":null,"b":2,"c":3} > > > >Microsoft SQL Server: > > > >select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in: > > {"b":2,"c":3} > > > >Both of these outcomes make sense, given the nature of JSON objects. > >I am actually more in favor of what MSSQL does however, I think that > >makes the most sense of all. > > > > I do mostly agree with this. The json[b]_set behavior seems rather > surprising, and I think I've seen a couple of cases running into exactly > this issue. I've solved that with a simple CASE, but maybe changing the > behavior would be better. That's unlikely to be back-patchable, though, > so maybe a better option is to create a non-strict wrappers. But that > does not work when the user is unaware of the behavior :-( Agree, that could be confusing. If I remember correctly, so far I've seen four or five such complains in mailing lists, but of course number of people who didn't reach out hackers is probably bigger. If we want to change it, the question is where to stop? Essentially we have: update table set data = some_func(data, some_args_with_null); where some_func happened to be jsonb_set, but could be any strict function. I wonder if in this case it makes sense to think about an alternative? For example, there is generic type subscripting patch, that allows to update a jsonb in the following way: update table set jsonb_data[key] = 'value'; It doesn't look like a function, so it's not a big deal if it will handle NULL values differently. And at the same time one can argue, that people, who are not aware about this caveat with jsonb_set and NULL values, will most likely use it due to a bit simpler syntax (more similar to some popular programming languages).
Re: JSON vs. JSONB storage size
> On Fri, Oct 11, 2019 at 1:40 PM Thomas Kellerer wrote: > > I am a bit surprised by this (not because the jsonb sizes are generally > bigger, but that the string value takes less space) > > Is this caused by the fact that a string value compresses better internally? Those jsonb objects are quite small, so it could be that an alignment kicks in, since as far as I remember, jsonb header and data should be aligned by 4 byte boundary.