[GENERAL] select all rows where any column is NULL
Does anyone have a concise way of doing $SUBJECT? Best, Nathan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] a question about row estimation in postgres
> Following the formula outlined > in http://www.postgresql.org/docs/8.3/static/row-estimation-examples.html > Both gradrate 10 and gradrate 11 would fall in the first bucket. > Shouldn't the row estimation be: > (11 - 8) / (33 - 8) / 10 * 1302 = 15.624 > and > (10 - 8) / (33 - 8) / 10 * 1302 = 10.416 > instead of 11 and 7? > Perhaps I am missing something. I'd appreciate if you can point it out. IIRC, you need to subtract the rows that are mcv's from the row count. ie, you're not working with the full 1302 rows, only the ones that are not mcv's. Best, Nathan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] SFPUG: Video from "Statistics and Postgres -- How the Planner Sees Your Data" Now on Vimeo
> > There is reference to slides with some > frequency though, are they available somewhere? > Ya. http://encodestatistics.org/publications/statistics_and_postgres.pdf Is there a better place for this? -Nathan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Q: data modeling with inheritance
> > variant association phenotype > --- --- - > variant_id - variant_id +--- phenotype_id > genome_id phenotype_id -+ short_descr > strand origin_id (i.e., who) long_descr > start_coord ts (timestamp) > stop_coord > Is an association, for example, an experiment that establishes a dependent relationship? So could there be multiple associations between variant and phenotype? > The problem that arises is the combinatorial nature of the schema design > coupled with the lack of inherited primary keys. In the current state > of PG, one must (I think) make joining tables (association subclasses) > for every combination of referenced foreign keys (variant and phenotype > subclasses). > Is your concern that the number of joins will grow exponentially in the number of variants and phenotypes? > So, how would you model this data? Do I ditch inheritance? I've put some thought into representing an ontology via table inheritance, and I've never been able to figure out a good way ( of course, that's not to say one doesn't exist... ) . If I understand your problem correctly, I would use composite keys ( ie ( variant type, id ) ) and then use an extra join to separate ontology tables to restrict searches to specific branches. So all variants would be stored in the variants table, all phenotypes are in the phenotypes table, and you join through association. It's not as elegant as inheritance, but it will be much more flexible in both the types of queries that you can write and in case your requirements change. -Nathan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Streaming PUG meeting: PostgreSQL Genetics!
>> Will there be a saved version of this (SFPUG talk) available for later >> viewing? Don't >> make me choose between steak and beer and postgres. 8( > > Yes! I'll announce it here when it's available. > Is there any update on this? -Nathan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] full vacuum really slows down query
> I have a query that takes 2 sec if I run it from a freshly restored dump. > If I run a full vacuum on the database it then takes 30 seconds. What do the two plans look like? Can you post the explains? -Nathan -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] inheritance. more.
Because people can be smarter about the data partitioning. Consider a table of users. Some are active, most are not. The active users account for nearly all of the users table access, but I still (occasionally) want to access info about the inactive users. Partitioning users into active_users and inactive_users allows me to tell the database (indirectly) that the active users index should stay in memory, while the inactive users can relegated to disk. -Nathan On Thu, May 1, 2008 at 6:02 AM, Jeremy Harris <[EMAIL PROTECTED]> wrote: > Gurjeet Singh wrote: > > > One of the advantages > > of breaking up your data into partitions, as professed by Simon (I think) > > (and I agree), is that you have smaller indexes, which improve > performance. > > And maybe having one huge index managing the uniqueness across partitioned > > data just defeats the idea of data partitioning! > > > > Isn't "large indexes are a performance problem" just saying > "we don't implement indexes very well"? And why are they > a problem - surely a tree-structured index is giving you > range-partitioned subsets as you traverse it? Why is this > different from manual partitioning into (inherited) tables? > > Thanks, > Jeremy > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
[GENERAL] Update Perrformance with nested loop vs. merge-join
Hi, I just created a new column on a relatively large table (~ 2 mil rows over 400 mb in tablesize) and am trying to populate it from another much smaller table by using an update query . The purpose of the query to set a bool flag if the user for the action on table1 is the last user, as determined from table 2. The large table has a two column primary key and one of the columns is the primary key on the smaller table. My original update query is as follows: update table1 set lastuser = (table1.user = table2.lastuser)::bool from table2 where table1.pkeycolumn1 = table2.pkey which has an explain output of Merge Join (cost=883182.60..1024622.03 rows=6873573 width=89) Merge Cond: ("outer".pkeycolumn1 = "inner".pkey) -> Sort (cost=93387.50..94711.01 rows=529404 width=18) Sort Key: table2.pkey -> Seq Scan on table2 (cost=0.00..15037.04 rows=529404 width=18) -> Sort (cost=789795.10..795605.39 rows=2324116 width=81) Sort Key: table1.pkeycolumn1 -> Seq Scan on table1 (cost=0.00..71900.16 rows=2324116 width=81) This seems like it *should* be the best query to me, but when I try and run it it takes forever. However, when I add a subselect clause to the end to force the planner to use a nested loop select on table2, the query runs MUCH faster (1 hour instead of > 10, I never let it finish) Here is the query: update table1 set lastuser = (table1.user = table2.lastuser)::bool from table2 where table1.pkeycolumn1 = table2.pkey and table1.pkeycolumn1 is in ( select pkey from table3 limit 100) where table1.pkeycolumn1 is a foreign key in table3.pkey. Also, note that the num of rows in table 3 is *much* less than 100. Also, the limit clause is necessary to force the planner into using the nested loop. (Aside: Is there a better way to do this inside of a query?) Here is the explain for the above query: Hash Join (cost=13863.09..109298.79 rows=51388 width=89) Hash Cond: ("outer".pkeycolumn1 = "inner".pkey) -> Seq Scan on table1 (cost=0.00..71900.16 rows=2324116 width=81) -> Hash (cost=13854.99..13854.99 rows=1621 width=26) -> Nested Loop (cost=8205.72..13854.99 rows=1621 width=26) -> HashAggregate (cost=8205.72..8207.72 rows=200 width=8) -> Limit (cost=0.00..5891.43 rows=185143 width=8) -> Seq Scan on table3 (cost=0.00..5891.43rows=185143 width=8) -> Index Scan using table2-pkey-index" on table2 (cost= 0.00..28.14 rows=8 width=18) Index Cond: ("outer".pkey = table2.pkey) My only theory up to this point is that it has something to do with how the tablespace is being cached. I notice that when I use plan 1, my computer goes through long periods of io and with bursts of processor activity every minute or so. When I use plan 2, the io sits at about 90% of my total resource usage while my normal processor usage sits at about 10%. Maybe it keeps trying to cache and resort table2? I don't really have any idea, but that is my only guess. If anyone knows why this may be happenning, I would really appreciate it. Thanks, Nathan