okay, messing around a bit more with the secondary k,v table it seems like
this could be a good solution..

I created a keys table to hold the 63 key values, then I dropped and
recreated the secondary table, using a FK referencing the keys table. I'm
not really sure why, but a basic full text query on 44 million row is
taking aproxx. 20ms.

my table structure is:

 Table "public.samples_lg_txt"
 Column |   Type   | Modifiers
--------+----------+-----------
 id     | integer  |
 key    | integer  |
 val    | text     |
 tsv    | tsvector |
Indexes:
    "idx_tsv_samples_lg_text" gin (tsv)
Foreign-key constraints:
    "samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) ON
DELETE CASCADE
    "samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id)


how would I write an AND query that filtered on 2 separate keys from the
samples_lg_txt table?

something like:

SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM
samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv
@@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@
to_tsquery('value2'))*;

On Wed, Apr 19, 2017 at 11:57 AM, Rj Ewing <ewing...@gmail.com> wrote:

> I did some testing using a secondary table with the key, value column.
> However I don't think this will provide the performance that we need.
> Queries we taking 60+ seconds just for a count.
>
> With 1 million rows in the primary table, this resulted in 44 million rows
> in the secondary k,v table for full text searching. The same query is es
> takes ~50 ms on my local machine with 1/10th the ram allocated to es then
> was allocated to psql.
>
> I'm gonna test using trigrams indexes on approx 10 json fields, and see if
> that gives us what we are looking for.
>
> any thought on getting sub 1 sec queries on a table with 44 million rows?
>
> RJ
>
> On Tue, Apr 18, 2017 at 10:35 PM, George Neuner <gneun...@comcast.net>
> wrote:
>
>> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing <ewing...@gmail.com>
>> wrote:
>>
>> >I am evaluating postgres for as a datastore for our webapp. We are moving
>> >away from a triple store db due to performance issues.
>> >
>> >Our data model consists of sets of user defined attributes. Approx 10% of
>> >the attributes tend to be 100% filled with 50% of the attributes having
>> >approx 25% filled. This is fairly sparse data, and it seems that jsonb or
>> >hstore will be best for us.
>> >
>> >Unfortunately, from my understanding, postres doesn't support fulltext
>> >search across hstore or jsonb key:values or even the entire document.
>> While
>> >this is not a deal breaker, this would be a great feature to have. We
>> have
>> >been experimenting w/ elasticsearch a bit, and particularly enjoy this
>> >feature, however we don't really want to involve the complexity and
>> >overhead of adding elasticsearch in front of our datasource right now.
>>
>> hstore and JSON values all really are just formatted text with a
>> custom column type.  You can create tsvectors from the values if you
>> cast them to text.
>>
>> Note that a tsvector can only work on a /flat/ key:value structure: it
>> won't understand nesting, and it and even with a flat store it won't
>> understand the difference between keys/tags and the associated values.
>>
>> E.g., you will be able to see that a value contains both "foo" and
>> "bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z'
>> you either must check the token positions (from the tsvector) or *try*
>> to extract the key(s) you are interested in and check the associated
>> value(s).
>>
>> This might work ok if you search only for keys in a "document" ... but
>> trying to search values, I think would be far too complicated.
>>
>> It might help if you stored a 2D array instead of a flat structure,
>> but even that would be fairly complicated to work with.
>>
>>
>>
>> >An idea that has come up is to use a materialized view or secondary table
>> >with triggers, where we would have 3 columns (id, key, value).
>> >
>> >I think this would allow us to store a tsvector and gin index. Giving us
>> >the ability to use fulltext search on k:v pairs, then join the original
>> >data on the id field to return the entire record.
>>
>> This is a much better idea because it separates the key from the
>> value, and unlike the full "document" case [above], you will know that
>> the FTS index is covering only the values.
>>
>> If you need to preserve key order to reconstruct records, you will
>> need an additional column to maintain that ordering.
>>
>>
>> >is anyone currently doing this? Is there a better alternative? Any
>> >performance issues that immediately jump out ( I realize the writes will
>> >take longer)?
>> >
>> >the nature of our data is "relatively" static with bulk uploads (100 -
>> 1000
>> >records). So we can sacrifice some write performance.
>> >
>> >RJ
>>
>> Having to "reconstruct" records will make reads take longer as well,
>> but I think separating the keys and values is the best way to do it.
>>
>>
>> YMMV,
>> George
>>
>>
>>
>> --
>> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-general
>>
>
>

Reply via email to