Re: Max connections reached without max connections reached

2021-12-03 Thread Dmitry Dolgov
> On Fri, Dec 03, 2021 at 10:32:03AM -0500, Tom Lane wrote:
> Dilip Kumar  writes:
> > On Thu, Dec 2, 2021 at 9:35 AM Dilip Kumar  wrote:
> >> I think there is no such view or anything which tells about which
> >> backend or transaction has more than 64 sub transaction.  But if we
> >> are ready to modify the code then we can LOG that information in
> >> GetNewTransactionId(), when first time we are marking it overflown.
>
> > I have prepared a small patch to log this information.
>
> Putting an elog call into GetNewTransactionId seems like a completely
> horrid idea from a performance standpoint.  Especially if you put it
> inside the XidGenLock hold, where it can block the entire system not just
> the one process.  But even without that, this seems like a performance
> penalty with basically no real-world benefit.  People who have issues
> like this are not going to want to trawl the postmaster log for such
> messages.

Maybe it's worth a trace point then, to allow collect such info sort of
on demand?




Re: Why is writing JSONB faster than just JSON?

2021-05-02 Thread Dmitry Dolgov
> On Fri, Apr 23, 2021 at 01:56:57AM -0700, Mitar wrote:
> Hi!
> 
> On Thu, Apr 15, 2021 at 12:11 PM Dmitry Dolgov <9erthali...@gmail.com> wrote:
> > > My point was that for JSON, after validating that the input is
> > > syntactically correct, we just store it as-received.  So in particular
> > > the amount of whitespace in the value would depend on how the client
> > > had chosen to format the JSON.  This'd affect the stored size of
> > > course, and I think it would have an effect on compression time too.
> >
> > Yes, I got it and just wanted to confirm you were right - this was the
> > reason I've observed slowdown trying to reproduce the report.
> 
> Thank you for trying to reproduce the report. I did a bit more digging
> myself and I am still confused.
>
> ...
>
> So I do not know what is happening and why you cannot reproduce it.

Could you maybe get a profile with perf for both cases? Since they're
executed within a single backend, you can profile only a single pid.
Having a reasonable profiling frequency, --call-graph dwarf and probably
limit events to only user space with precise tagging (cycles:uppp)
should give an impression what's going on.




Re: Why is writing JSONB faster than just JSON?

2021-04-15 Thread Dmitry Dolgov
> On Thu, Apr 15, 2021 at 12:47:25PM -0400, Tom Lane wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
> > On Wed, Apr 14, 2021 at 10:26:23AM -0400, Tom Lane wrote:
> >> My own guess about this, without having tried to reproduce it, is that
> >> JSONB might end up physically smaller than JSON, resulting in less work
> >> to push the toasted datum out to disk.  This'd depend a lot on your
> >> formatting habits for JSON, of course.  But in any case, it'd be worth
> >> comparing pg_column_size() results to see what's up with that.
>
> > Oh, of course I've missed that the input I was using was indeed
> > formatted, without formatting both cases perform equally well and I
> > can't reproduce the issue. Although if I understand correctly the
> > original code in question doesn't actually do any formatting.
>
> My point was that for JSON, after validating that the input is
> syntactically correct, we just store it as-received.  So in particular
> the amount of whitespace in the value would depend on how the client
> had chosen to format the JSON.  This'd affect the stored size of
> course, and I think it would have an effect on compression time too.

Yes, I got it and just wanted to confirm you were right - this was the
reason I've observed slowdown trying to reproduce the report.




Re: Why is writing JSONB faster than just JSON?

2021-04-15 Thread Dmitry Dolgov
> On Wed, Apr 14, 2021 at 10:26:23AM -0400, Tom Lane wrote:
> Dmitry Dolgov <9erthali...@gmail.com> writes:
> >> On Tue, Apr 13, 2021 at 11:38:04PM -0700, Mitar wrote:
> >> ... Namely, it looks like writing into a jsonb typed
> >> column is 30% faster than writing into a json typed column. Why is
> >> that? Does not jsonb require parsing of JSON and conversion? That
> >> should be slower than just storing a blob as-is?
>
> > * Parsing is happening in both cases, for json it's done for validation
> >   purposes.
>
> Right.
>
> > * Compression of json data is actually dominates the runtime load for large
> >   json objects, making other factors less visible and reducing difference in
> >   size (jsonb also should be a bit bigger, that's why I would think it 
> > would be
> >   slower).
> > * At the same time perf shows that json compression spends a bit more time 
> > in
> >   pglz_find_match (mostly around accessing and checking history entries), 
> > which
> >   is compression optimization via history table. So probably due to 
> > differences
> >   in data layout this optimization works slightly worse for plain json than 
> > for
> >   jsonb?
>
> Interesting.  I recall that we made some last-minute changes in the JSONB
> physical representation after finding that the original choices resulted
> in sucky compression behavior.  But I think we were only looking at the
> resultant compressed size, not time-to-compress.
>
> My own guess about this, without having tried to reproduce it, is that
> JSONB might end up physically smaller than JSON, resulting in less work
> to push the toasted datum out to disk.  This'd depend a lot on your
> formatting habits for JSON, of course.  But in any case, it'd be worth
> comparing pg_column_size() results to see what's up with that.

Oh, of course I've missed that the input I was using was indeed
formatted, without formatting both cases perform equally well and I
can't reproduce the issue. Although if I understand correctly the
original code in question doesn't actually do any formatting.




Re: Why is writing JSONB faster than just JSON?

2021-04-14 Thread Dmitry Dolgov
> On Tue, Apr 13, 2021 at 11:38:04PM -0700, Mitar wrote:
> Hi!
>
> I have a project where we among other data want to store static JSON
> objects which can get pretty large (10-100 KB). I was trying to
> evaluate how it would work if we simply store it as an additional
> column in a PostgreSQL database. So I made a benchmark [1]. The
> results surprised me a bit and I am writing here because I would like
> to understand them. Namely, it looks like writing into a jsonb typed
> column is 30% faster than writing into a json typed column. Why is
> that? Does not jsonb require parsing of JSON and conversion? That
> should be slower than just storing a blob as-is?
>
> [1] https://gitlab.com/mitar/benchmark-pg-json

Interesting. I've tried to reproduce the schema and queries from the repository
above (with a different generated data though and without the app itself) and
indeed got a bit slower (not 30%, but visibly) execution for json column
instead of jsonb. There are couple of important points:

* Parsing is happening in both cases, for json it's done for validation
  purposes.

* Compression of json data is actually dominates the runtime load for large
  json objects, making other factors less visible and reducing difference in
  size (jsonb also should be a bit bigger, that's why I would think it would be
  slower).

* At the same time perf shows that json compression spends a bit more time in
  pglz_find_match (mostly around accessing and checking history entries), which
  is compression optimization via history table. So probably due to differences
  in data layout this optimization works slightly worse for plain json than for
  jsonb?




Re: jsonb_set() strictness considered harmful to data

2019-10-19 Thread Dmitry Dolgov
> On Sat, Oct 19, 2019 at 1:08 PM Tomas Vondra  
> wrote:
>
> >Here is how other implementations handle this case:
> >
> >MySQL/MariaDB:
> >
> >select json_set('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
> >   {"a":null,"b":2,"c":3}
> >
> >Microsoft SQL Server:
> >
> >select json_modify('{"a":1,"b":2,"c":3}', '$.a', NULL) results in:
> >   {"b":2,"c":3}
> >
> >Both of these outcomes make sense, given the nature of JSON objects.
> >I am actually more in favor of what MSSQL does however, I think that
> >makes the most sense of all.
> >
>
> I do mostly agree with this. The json[b]_set behavior seems rather
> surprising, and I think I've seen a couple of cases running into exactly
> this issue. I've solved that with a simple CASE, but maybe changing the
> behavior would be better. That's unlikely to be back-patchable, though,
> so maybe a better option is to create a non-strict wrappers. But that
> does not work when the user is unaware of the behavior :-(

Agree, that could be confusing. If I remember correctly, so far I've seen four
or five such complains in mailing lists, but of course number of people who
didn't reach out hackers is probably bigger.

If we want to change it, the question is where to stop? Essentially we have:

update table set data = some_func(data, some_args_with_null);

where some_func happened to be jsonb_set, but could be any strict function.

I wonder if in this case it makes sense to think about an alternative? For
example, there is generic type subscripting patch, that allows to update a
jsonb in the following way:

update table set jsonb_data[key] = 'value';

It doesn't look like a function, so it's not a big deal if it will handle NULL
values differently. And at the same time one can argue, that people, who are
not aware about this caveat with jsonb_set and NULL values, will most likely
use it due to a bit simpler syntax (more similar to some popular programming
languages).




Re: JSON vs. JSONB storage size

2019-10-11 Thread Dmitry Dolgov
> On Fri, Oct 11, 2019 at 1:40 PM Thomas Kellerer  wrote:
>
> I am a bit surprised by this (not because the jsonb sizes are generally
> bigger, but that the string value takes less space)
>
> Is this caused by the fact that a string value compresses better internally?

Those jsonb objects are quite small, so it could be that an alignment kicks in,
since as far as I remember, jsonb header and data should be aligned by 4 byte
boundary.