On 3/17/19 3:13 AM, Alexander Korotkov wrote: > On Sat, Mar 16, 2019 at 9:39 PM Pavel Stehule <[email protected]> wrote: >> so 16. 3. 2019 v 10:36 odesÃlatel Alexander Korotkov >> <[email protected]> napsal: >>> >>> On Thu, Mar 14, 2019 at 12:07 PM Alexander Korotkov >>> <[email protected]> wrote: >>>> On Sun, Mar 10, 2019 at 1:51 PM Alexander Korotkov >>>> <[email protected]> wrote: >>>>> On Wed, Mar 6, 2019 at 12:40 AM Nikita Glukhov <[email protected]> >>>>> wrote: >>>>>> Attached 36th version of the patches. >>>>> >>>>> Thank yo for the revision! >>>>> >>>>> In the attached revision following changes are made: >>>>> >>>>>> "unknown" refers here to ordinary three-valued logical Unknown, which is >>>>>> represented in SQL by NULL. >>>>>> >>>>>> JSON path expressions return sequences of SQL/JSON items, which are >>>>>> defined by >>>>>> SQL/JSON data model. But JSON path predicates (logical expressions), >>>>>> which are >>>>>> used in filters, return three-valued logical values: False, True, or >>>>>> Unknown. >>>>> >>>>> * I've added short explanation of this to the documentation. >>>>> * Removed no longer present data structures from typedefs.list of the >>>>> first patch. >>>>> * Moved GIN support patch to number 3. Seems to be well-isolated and >>>>> not very complex patch. I propose to consider this to 12 too. I >>>>> added high-level comment there, commit message and made some code >>>>> beautification. >>>> >>>> I think patches 1 and 2 are in committable shape (I reached Tomas >>>> off-list, he doesn't have more notes regarding them). While patch 3 >>>> requires more review. >>>> >>>> I'm going to push 1 and 2 if no objections. >>> >>> So, pushed. Many thanks to reviewers and authors! >>> >>> Remaining part I'm proposing for 12 is attached. I appreciate review of it. >> >> >> I tested this patch and I didn't find any issue - just I tested basic >> functionality and regress tests. >> >> looks well > > Thank you for your feedback!
Like Pavel, I did some basic testing of the patch (on current HEAD
4178d8b91c) trying out various JSON path expressions, and yes, it all
worked. I had a brief scare while testing on 4178d8b91c where initdb was
failing on the bootstrapping step, but after doing a thorough wipe of
build files and my output directory, it seems to be initializing okay.
I also did some testing of the GIN patch upthread, as the quickness of
retrieval of the data using JSON path is of course important as well.
Using a schema roughly like this:
CREATE TABLE news_feed (
id int GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
data jsonb NOT NULL
);
CREATE INDEX news_feed_data_gin_idx ON news_feed USING gin(data);
I loaded in a data set of roughly 420,000 rows. Each row had all the
same keys but differing values (e.g. "length" and "content" as keys)
I tested a few different JSON path scenarios. Some of the index scans
performed way better than the equivalent sequential scans, for instance:
SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ == 200)';
SELECT *
FROM news_feed
WHERE data @? '$.id ? (@ == "22613cbc-d83e-4a29-8b59-3b9f5cd61825")';
Using the index outperformed the sequential scan (and parallel seq scan)
by ~10-100x based on my config + laptop hardware!
However, when I did something a little more complex, like the below:
SELECT count(*)
FROM news_feed
WHERE data @? '$.length ? (@ < 150)';
SELECT count(*)
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "^Start")';
SELECT id, jsonb_path_query(data, '$.content')
FROM news_feed
WHERE data @? '$.content ? (@ like_regex "risk" flag "i")';
I would find that the index scan performed as well as the sequential
scan. Additionally, on my laptop, the parallel sequential scan would
beat the index scan by ~2.5x in some cases.
Reading up on what the GIN patch does, this all makes sense: it's
optimized for equality, I understand there are challenges to be able to
handle inequality, regex exps, etc. And the cases where it really does
work well, it's _incredibly_ fast.
My suggestion would be adding some additional guidance in the user
documentation around how GIN works with the @@ and @? operators so they
can understand where GIN will work very well with JSON path + their data
and not be surprised when other types of JSON path queries are
performing on par with a sequential scan (or worse than a parallel seq
scan).
Thanks,
Jonathan
signature.asc
Description: OpenPGP digital signature
