okay, messing around a bit more with the secondary k,v table it seems like this could be a good solution..
I created a keys table to hold the 63 key values, then I dropped and recreated the secondary table, using a FK referencing the keys table. I'm not really sure why, but a basic full text query on 44 million row is taking aproxx. 20ms. my table structure is: Table "public.samples_lg_txt" Column | Type | Modifiers --------+----------+----------- id | integer | key | integer | val | text | tsv | tsvector | Indexes: "idx_tsv_samples_lg_text" gin (tsv) Foreign-key constraints: "samples_lg_txt_id_fkey" FOREIGN KEY (id) REFERENCES samples_lg(id) ON DELETE CASCADE "samples_lg_txt_key_fkey" FOREIGN KEY (key) REFERENCES keys(id) how would I write an AND query that filtered on 2 separate keys from the samples_lg_txt table? something like: SELECT COUNT(*) FROM samples WHERE id IN ( SELECT DISTINCT(s.id) FROM samples_lg_txt s JOIN keys k ON s.key = k.id WHERE (*name = 'key1' AND tsv @@ to_tsquery('value1')) AND (name = 'key2' AND tsv @@ to_tsquery('value2'))*; On Wed, Apr 19, 2017 at 11:57 AM, Rj Ewing <ewing...@gmail.com> wrote: > I did some testing using a secondary table with the key, value column. > However I don't think this will provide the performance that we need. > Queries we taking 60+ seconds just for a count. > > With 1 million rows in the primary table, this resulted in 44 million rows > in the secondary k,v table for full text searching. The same query is es > takes ~50 ms on my local machine with 1/10th the ram allocated to es then > was allocated to psql. > > I'm gonna test using trigrams indexes on approx 10 json fields, and see if > that gives us what we are looking for. > > any thought on getting sub 1 sec queries on a table with 44 million rows? > > RJ > > On Tue, Apr 18, 2017 at 10:35 PM, George Neuner <gneun...@comcast.net> > wrote: > >> On Tue, 18 Apr 2017 14:38:15 -0700, Rj Ewing <ewing...@gmail.com> >> wrote: >> >> >I am evaluating postgres for as a datastore for our webapp. We are moving >> >away from a triple store db due to performance issues. >> > >> >Our data model consists of sets of user defined attributes. Approx 10% of >> >the attributes tend to be 100% filled with 50% of the attributes having >> >approx 25% filled. This is fairly sparse data, and it seems that jsonb or >> >hstore will be best for us. >> > >> >Unfortunately, from my understanding, postres doesn't support fulltext >> >search across hstore or jsonb key:values or even the entire document. >> While >> >this is not a deal breaker, this would be a great feature to have. We >> have >> >been experimenting w/ elasticsearch a bit, and particularly enjoy this >> >feature, however we don't really want to involve the complexity and >> >overhead of adding elasticsearch in front of our datasource right now. >> >> hstore and JSON values all really are just formatted text with a >> custom column type. You can create tsvectors from the values if you >> cast them to text. >> >> Note that a tsvector can only work on a /flat/ key:value structure: it >> won't understand nesting, and it and even with a flat store it won't >> understand the difference between keys/tags and the associated values. >> >> E.g., you will be able to see that a value contains both "foo" and >> "bar", but to distinguish 'foo:bar' from 'bar:foo' or 'foo:q,bar:z' >> you either must check the token positions (from the tsvector) or *try* >> to extract the key(s) you are interested in and check the associated >> value(s). >> >> This might work ok if you search only for keys in a "document" ... but >> trying to search values, I think would be far too complicated. >> >> It might help if you stored a 2D array instead of a flat structure, >> but even that would be fairly complicated to work with. >> >> >> >> >An idea that has come up is to use a materialized view or secondary table >> >with triggers, where we would have 3 columns (id, key, value). >> > >> >I think this would allow us to store a tsvector and gin index. Giving us >> >the ability to use fulltext search on k:v pairs, then join the original >> >data on the id field to return the entire record. >> >> This is a much better idea because it separates the key from the >> value, and unlike the full "document" case [above], you will know that >> the FTS index is covering only the values. >> >> If you need to preserve key order to reconstruct records, you will >> need an additional column to maintain that ordering. >> >> >> >is anyone currently doing this? Is there a better alternative? Any >> >performance issues that immediately jump out ( I realize the writes will >> >take longer)? >> > >> >the nature of our data is "relatively" static with bulk uploads (100 - >> 1000 >> >records). So we can sacrifice some write performance. >> > >> >RJ >> >> Having to "reconstruct" records will make reads take longer as well, >> but I think separating the keys and values is the best way to do it. >> >> >> YMMV, >> George >> >> >> >> -- >> Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-general >> > >