Re: [PERFORM] Performance issues

2015-03-18 Thread Felipe Santos
2015-03-18 14:31 GMT-03:00 Vivekanand Joshi vjo...@zetainteractive.com:

 So, here is the first taste of success and which gives me the confidence
 that if properly worked out with a good hardware and proper tuning,
 PostgreSQL could be a good replacement.

 Out of the 9 reports which needs to be migrated in PostgreSQL, 3 are now
 running.

 Report 4 was giving an issue and I will see it tomorrow.

 Just to inform you guys that, the thing that helped most is setting
 enable_nestloops to false worked. Plans are now not miscalculated.




 Regards,
 Vivek

 -Original Message-
 From: pgsql-performance-ow...@postgresql.org
 [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Tomas Vondra
 Sent: Tuesday, March 17, 2015 9:00 PM
 To: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Performance issues

 On 17.3.2015 16:24, Thomas Kellerer wrote:
  Tomas Vondra schrieb am 17.03.2015 um 15:43:
  On 17.3.2015 15:19, Thomas Kellerer wrote:
  Tomas Vondra schrieb am 17.03.2015 um 14:55:
   (2) using window functions, e.g. like this:
 
   SELECT * FROM (
 SELECT *,
  ROW_NUMBER() OVER (PARTITION BY touchpoint_execution_id
 ORDER BY FROM max_creation_dt) AS rn
 FROM s_f_touchpoint_execution_status_history
   ) foo WHERE rn = 1
 
   But estimating this is also rather difficult ...
 
 
  From my experience rewriting something like the above using DISTINCT
  ON is usually faster.
 
  How do you get the last record (with respect to a timestamp column)
  using a DISTINCT ON?
 
  You need to use order by ... desc. See here:
  http://sqlfiddle.com/#!15/d4846/2

 Nice, thanks!

 
  Btw: your row_number() usage wouldn't return the latest row either.
  It would return the oldest row.

 Oh, right. I forgot the DESC in the window.


 --
 Tomas Vondrahttp://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance





But this is not a production-suitable setting. So what do you think how to
get a work around this?

What about creating a read-only replica and apply this setting there?


Re: [PERFORM] Replication Lag Causes

2014-11-03 Thread Felipe Santos
2014-11-02 19:16 GMT-02:00 Mike Wilson mfwil...@gmail.com:

 Thanks for the information Greg.

 Unfortunately modifying the application stack this close to the holiday
 season won’t be an option so I’m left with:
1) Trying to optimize the settings I have for the query mix I have.
2) Optimize any long running DML queries (if any) to prevent lag due to
 locks.
3) Getting a better understanding of “what” causes lag.

 #3 will probably be central to at least minimizing lag during heavy DML
 load.  If anyone has a good resource to describe when a slave would start
 to lag potentially that would help me hunt for the cause.  I know long
 running DML on the master may cause lag but I’m uncertain as to the
 specifics of why.  During periods of lag we do have more DML than usual
 running against the master but the queries themselves are very quick
 although there might be 20-30 DML operations per second against some of our
 central tables that store user account information.  Even under heavy DML
 the queries still return in under a second.  Possibly a large volume of of
 short running DML cause replication lag issues for large tables (~20M)?

 Thanks again for your help.  BDR looks interesting but probably too
 cutting edge for my client.

 Mike Wilson




 On Nov 2, 2014, at 12:33 PM, Greg Spiegelberg gspiegelb...@gmail.com
 wrote:

 Hi Mike,

 Sounds very familiar.  Our master fans out to 16 slaves (cascading) and we
 had great success with segregating database queries to different slaves and
 some based on network latency.  I'd suggest, if possible, alter the
 application to use the slave for simple SELECT's and FUNCTION's performing
 SELECT-like only work while limiting those applications and queries that
 perform DML to the master (obviously).  If the load on the slave increases
 too much, spin up another slave.  I'd mention from experience that it could
 be the load on the slave that is giving the appearance of replication lag.
 This is what led us to having (1) slave per application.

 There is also the BDR multi-master available in 9.4beta if you're wanting
 to live on the edge.

 -Greg

 On Sat, Nov 1, 2014 at 4:33 PM, Mike Wilson mfwil...@gmail.com wrote:

 I have two 9.3.4 PG instances that back a large internet website that has
 very seasonal traffic and can generate large query loads.  My instances are
 in a master-slave streaming replication setup  and are stable and in
 general perform very well.  The only issues we have with the boxes is that
 when the master is busy the slave may start to lag excessively.  I can give
 specifics as to what heavily loaded means and additionally the
 postgresql.conf for both boxes but my basic questions are:
* What causes streaming replication lag to increase?
* What parameters can be tuned to reduce streaming replication lag?
* Can a loaded slave affect lag adversely?
* Can increasing max_wal_senders help reduce lag?

 The reason I ask this is that as mentioned above the servers are stable
 and are real troopers in general as they back a very popular web site that
 puts the master under heavy seasonal load at times.  At those times though
 we see an almost exponential growth in streaming replication lag compared
 to load on the master.

 For example, the master is a very beefy Solaris:
* 4 Recent Intel Zeons (16 physical cores)
* 256 GB of ECC RAM
* 12 TB of ZFS (spindle and SSD internal storage)
* DB on disk size is 2TB
* ZFS ARC cache of roughly 250G.
* ZFS ARC2/ZIL configured for SSD’s (this is awesome by the way)

 Basic PG Config:
shared_buffers = 2GB
work_mem = 128MB
max_connections = 1700 (supports roughly 100 web servers)
wal_keep_segments = 256 (roughly enough for 24 hours of operation
 under heavy load)
wal_sender_timeout = 60s
replication_timeout=(not set)
wal_receiver_status_interval=10s
max_wal_senders=6
* wal archiving is off
* 98% of the queries on the master complete in under 500ms.
* No hung or very long running queries in general.

 The master on a normal day maintains a load of about 0.5, during which
 replication lag to the slave is in hundreds milliseconds.  When the
 production db server is heavily hit though the load may go as high as 4 on
 the master and the streaming replication lag may increase to more than 2
 hours relatively quickly.  Load on the slave is generally below 1 even when
 the master is heavily loaded.  The traffic to the master is primarily read
 with about 10% DML (new users, purchase records, etc).  DML statements
 increase proportionally when under load though.  The master and slave are
 connected via dedicated 10G fiber link and even under heavy load the
 utilization of the link is nowhere near close to saturation.  BTW, the
 slave does run some reported related queries throughout the day that might
 take up to a minute to complete.

 I have the task of figuring out why this otherwise healthy DB starts to
 lag so badly under 

Re: [PERFORM] Query with large number of joins

2014-10-21 Thread Felipe Santos
2014-10-20 21:59 GMT-02:00 Tom Lane t...@sss.pgh.pa.us:

 Marco Di Cesare marco.dices...@pointclickcare.com writes:
  We are using a BI tool that generates a query with an unusually large
 number of joins. My understanding is that with this many joins Postgres
 query planner can't possibly use an exhaustive search so it drops into a
 heuristics algorithm. Unfortunately, the query runs quite slow (~35
 seconds) and seems to ignore using primary keys and indexes where available.

  Query plan here (sorry had to anonymize):
  http://explain.depesz.com/s/Uml

 It's difficult to make any detailed comments when you've shown us only an
 allegedly-bad query plan, and not either the query itself or the table
 definitions.

 However, it appears to me that the query plan is aggregating over a rather
 large number of join rows, and there are very few constraints that would
 allow eliminating rows.  So I'm not at all sure there is a significantly
 better plan available.  Are you claiming this query was instantaneous
 on SQL Server?

 The only thing that jumps out at me as possibly improvable is that with
 a further increase in work_mem, you could probably get it to change the
 last aggregation step from Sort+GroupAggregate into HashAggregate,
 which'd likely run faster ... assuming you can spare some more memory.

 regards, tom lane


 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance




Hi,

As Tom said, WORK_MEM seems a nice place to start.

Here are other considerations you might take in account:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

There's also the opportunity to tune the query itself (if it's not
automatically generated by your BI tool). You can always speed up a query
response by using filtered sub-selects instead of calling the the entire
tables themselves on the joins.

BR

Felipe


Re: [PERFORM] Query Performance Problem

2014-10-21 Thread Felipe Santos
2014-10-21 10:57 GMT-02:00 j...@jpm-cola.com:



 Hi all,

 I'm experimenting with table partitioning though inheritance. I'm testing
 a query as follows:

 explain (analyze, buffers)
 select response.id
 from claim.response
 where response.account_id = 4766
 and response.expire_timestamp is null
 and response.create_timestamp = DATE '2014-08-01'
 order by create_timestamp;

 The response table looks like this:
 account_id;integer
 file_type_id;integer
 receiver_inbound_detail_id;integer
 processing_status_id;integer
 processing;boolean
 expire_timestamp;timestamp without time zone
 last_mod_timestamp;timestamp without time zone
 create_timestamp;timestamp without time zone
 response_trace_nbr;character varying
 posted_timestamp;timestamp without time zone
 need_to_post;boolean
 response_message;text
 worked;boolean
 response_status_id;integer
 response_type_id;integer
 outbound_claim_detail_id;bigint
 id;bigint

 Here are some rowcounts:

 SELECT count(*) from claim_response.response_201408;
   count
 -
  4585746
 (1 row)

 Time: 7271.054 ms
 SELECT count(*) from claim_response.response_201409;
   count
 -
  3523370
 (1 row)

 Time: 4341.116 ms
 SELECT count(*) from claim_response.response_201410;
  count
 ---
154
 (1 row)

 Time: 0.258 ms

 The entire table has 225,665,512 rows. I read that a partitioning rule of
 thumb is that benefits of partitioning occur starting around 100 million
 rows.

 SELECT count(*) from claim.response;
count
 ---
  225665512
 (1 row)

 Time: 685064.637 ms


 The partitioning is on the create_timestamp field.

 The server is Red Hat Enterprise Linux Server release 6.2 (Santiago) on a
 VM machine - 8 GB RAM with 2 CPUs:

 Architecture:  x86_64
 CPU op-mode(s):32-bit, 64-bit
 Byte Order:Little Endian
 CPU(s):2
 On-line CPU(s) list:   0,1
 Thread(s) per core:1
 Core(s) per socket:2
 CPU socket(s): 1
 NUMA node(s):  1
 Vendor ID: GenuineIntel
 CPU family:6
 Model: 44
 Stepping:  2
 CPU MHz:   2660.000
 BogoMIPS:  5320.00
 L1d cache: 32K
 L1i cache: 32K
 L2 cache:  256K
 L3 cache:  12288K
 NUMA node0 CPU(s): 0,1



 2 users,  load average: 0.00, 0.12, 0.37


 Please see the following for the explain analysis :

 http://explain.depesz.com/s/I3SL

 I'm trying to understand why I'm getting the yellow, orange, and red on
 the inclusive, and the yellow on the exclusive. (referring to the
 explain.depesz.com/s/I3SL page.)
 I'm relatively new to PostgreSQL, but I've been an Oracle DBA for some
 time. I suspect the I/O may be dragging but I don't know how to dig that
 information out from here. Please point out anything else you can decipher
 from this.

 Thanks,

 John



Hi John,

Dont know about the colors, but the Stats tab looks fine. You've got
yourself 5 Index Scans, which are a very fast way to dig data.

 I noticed you've also cast your filter field (create_timestamp =
'2014-08-01'::date). As far as I know, Postgresql doesn't need this kind
of explicit conversion. You would be fine with just (create_timestamp =
'2014-08-01').

Regards,

Felipe


Re: [PERFORM] Partitioned tables and SELECT ... ORDER BY ... LIMIT

2014-10-16 Thread Felipe Santos
2014-10-16 14:04 GMT-03:00 Jeff Janes jeff.ja...@gmail.com:

 On Thu, Oct 16, 2014 at 5:35 AM, Дмитрий Шалашов skau...@gmail.com
 wrote:

 Hi,

 lets imagine that we have some table, partitioned by timestamp field, and
 we query it with SELECT with ordering by that field (DESC for example),
 with some modest limit.
 Lets further say that required amount of rows is found in the first table
 that query encounters (say, latest one).
 I am just wondering, why nevertheless PostgreSQL does read couple of
 buffers from each of the older tables?


 The planner only does partition pruning statically, not dynamically.The
 LIMIT has to be implemented dynamically--it cannot prove absolutely that
 the first partition will have enough rows, so it cannot eliminate the
 others.

 The Merge Append does a priority queue merge, and so needs to read the
 first row (according to the ORDER BY) from each partition in order to seed
 the priority queue.  I guess what it could be made to do in the case where
 there are suitable check constraints on a partition, is seed the priority
 queue with a dummy value constructed from the constraint.  If the merge
 never gets far enough to draw upon that dummy value, then that whole plan
 node never needs to get started up.

 In your case that would save very little, as reading a few blocks for each
 partition is not much of a burden.  Especially as it the same few blocks
 every time, so they should be well cached.  There may be other case where
 this would be more helpful.  But it isn't clear to me how the planner could
 build such a feature into its cost estimates, and the whole thing would be
 a rather complex and esoteric optimization to make for uncertain gain.

 Cheers,

 Jeff


Like Jeff said, it shouldn't be much of a burden.

If you think it is, than you can query only the last partition (since
partitions are tables themselves).

It seems to me that your application is querying some sample data from the
last date to show something in your application and this approach would do
for that purpose.


Re: [PERFORM] idle issue?

2014-10-06 Thread Felipe Santos
2014-10-06 11:54 GMT-03:00 Emi Lu em...@encs.concordia.ca:

  Hello List,

 May I know will idle cause any potential performance issues for psql8.3
 please?
 version (PostgreSQL 8.3.18 on x86_64-unknown-linux-gnu, compiled by GCC
 4.1.2)

 E.g., got 10 idle connections for 10 days.
  select current_query from pg_stat_activity  where usename ='test';
   current_query
 --
  IDLE
  IDLE
  IDLE
  IDLE
  IDLE
  IDLE
  IDLE
  IDLE
  IDLE
  IDLE

 Thanks a lot!
 Emi


Hi Emi,

As far as I know, it wont affect your performance.

It will affect the overall quantity of users that can connect to the
database though (since there is a limit that you can set up on
postgres.conf).

BR,

Felipe


Re: [PERFORM] autocommit (true/false) for more than 1 million records

2014-08-27 Thread Felipe Santos
This might also help:
http://www.postgresql.org/docs/9.1/static/populate.html

Bulk load tables from text files in almost all RDMS are log free
(Postgres' COPY is one of them).

The reason is that the database doesn't need to waste resources by writing
the log because there's no risk of data loss. If the COPY operation fails,
your data will still live in the text files you're trying to bulk load from.



2014-08-27 5:42 GMT-03:00 Albe Laurenz laurenz.a...@wien.gv.at:

 Alex Goncharov wrote:
  Thank you, Kevin -- this is helpful.
 
  But it still leaves questions for me.

  Alex Goncharov alex.goncharov@gmail.com wrote:
 
  The whole thing is aborted then, and the good 99 records are not
  making it into the target table.
 
  Right.  This is one reason people often batch such copies or check
  the data very closely before copying in.
 
  How do I decide, before starting a COPY data load, whether such a load
  protection (complexity) makes sense (is necessary)?
 
  Clearly not needed for 1 MB of data in a realistic environment.
 
  Clearly is needed for loading 1 TB in a realistic environment.
 
  To put it differently: If I COPY 1 TB of data, what criteria should I
  use for choosing the size of the chunks to split the data into?
 
  For INSERT-loading, for the database client interfaces offering the
  array mode, the performance difference between loading 100 or 1000
  rows at a time is usually negligible if any.  Therefore 100- and
  1000-row's array sizes are both reasonable choices.
 
  But what is a reasonable size for a COPY chunk?  It can't even be
  measured in rows.
 
  Note, that if you have a 1 TB record-formatted file to load, you can't
  just split it in 1 MB chunks and feed them to COPY -- the file has to
  be split on the record boundaries.
 
  So, splitting the data for COPY is not a trivial operation, and if
  such splitting can be avoided, a reasonable operator will avoid it.
 
  But then again: when can it be avoided?

 You don't need to split the data at all if you make sure that they are
 correct.

 If you cannot be certain, and you want to avoid having to restart a huge
 load with corrected data, the batch size is pretty much a matter of taste:
 How much overhead does it generate to split the data in N parts?
 How much time are you ready to wait for (re)loading a single part?

 You'll probably have to experiment to find a solution that fits you.

  My question is: Where are these 99 records have been living, on
  the database server, while the 100-th one hasn't come yet, and
  the need to throw the previous data accumulation away has not
  come yet?
 
  They will have been written into the table.  They do not become
  visible to any other transaction until and unless the inserting
  transaction successfully commits.  These slides may help:
 
  http://momjian.us/main/writings/pgsql/mvcc.pdf
 
  Yeah, I know about the MVCC model...  The question is about the huge
  data storage to be reserved without a commitment while the load is not
  completed, about the size constrains in effect here.

 I don't understand that question.

 You need the space anyway to complete the load.
 If the load fails, you simply reclaim the space (VACUUM) and reuse it.
 There is no extra storage needed.

  There have to be some limits to the space and/or counts taken by
  the new, uncommitted, data, while the COPY operation is still in
  progress.  What are they?
 
  Primarily disk space for the table.
 
  How can that be found?  Is df /mount/point the deciding factor? Or
  some 2^32 or 2^64 number?

 Disk space can be measure with df.

  If you are not taking advantage of the unlogged load optimization,
  you will have written Write Ahead Log (WAL) records, too -- which
  (depending on your configuration) you may be archiving.  In that
  case, you may need to be concerned about the archive space required.
 
  ... may need to be concerned ... if what?  Loading 1 MB? 1 GB? 1 TB?
 
  If I am always concerned, and check something before a COPY, what
  should I be checking?  What are the OK-to-proceed criteria?

 That means you should consider, not you should be worried.
 Unless you are loading into a table created in the same transaction,
 redo information will be generated and stored in WAL files, which
 end up in your WAL archive.

 This needs extra storage, proportional to the storage necessary
 for the data itself.

  If you have foreign keys defined for the table, you may get into
  trouble on the RAM used to track pending checks for those
  constraints.  I would recommend adding any FKs after you are done
  with the big bulk load.
 
  I am curious about the simplest case where only the data storage is to
  be worried about. (As an aside: the CHECK and NOT NULL constrains are
  not a storage factor, right?)

 Right.

  PostgreSQL does *not* have a rollback log which will impose a
  limit.
 
  Something will though, right?  What would that be? The available disk
  space on a file system? (I would be 

Re: [PERFORM] autocommit (true/false) for more than 1 million records

2014-08-25 Thread Felipe Santos
Hi Emi,

Databases that comply to the ACID standard (
http://en.wikipedia.org/wiki/ACID) ensure that that are no data loss by
first writing the data changes to the database log in opposition to
updating the actual data on the filesystem first (on the datafiles).

Each database has its own way of doing it, but it basically consists of
writing the data to the logfile at each COMMIT and writing the data to the
datafile only when it's necessary.

So the COMMIT command is a way of telling the database to write the data
changes to the logfile.

Both logfiles and datafiles resides on the filesystem, but why writing to
the logfile is faster?

It is because the logfile is written sequentially, while the datafile is
totally dispersed and may even be fragmented.

Resuming: autocommit false is faster because you avoid going to the hard
disk to write the changes into the logfile, you keep them in RAM memory
until you decide to write them to the logfile (at each 10K rows for
instance).

Be aware that, eventually, you will need to write data to the logfile, so
you can't avoid that. But usually the performance is better if you write X
rows at a time to the logfile, rather than writing every and each row one
by one (because of the hard disk writing overhead).

The number of rows you need to write to get a better performance will
depend on your environment and is pretty much done by blind-testing the
process. For millions of rows, I usually commit at each 10K or 50K rows.

Regards,

Felipe




2014-08-25 10:40 GMT-03:00 Emi Lu em...@encs.concordia.ca:

 Good morning,

 Trying to insert into one table with 1 million records through java
 JDBC into psql8.3. May I know (1) or (2) is better please?

 (1) set autocommit(true)
 (2) set autocommit(false)
   commit every n records (e.g., 100, 500, 1000, etc)

 It depends on what you need.

 Data will be available to concurrent processes earlier with (1), while
 (2) will go faster.

 No need to worry about the lock/loosing records because after data
 loading will do a check. For now, I'd like the fastest way. Would
 you suggest commit every 1000 or 3000 records?

 The improvement drops off pretty quickly in my experience, but it
 depends on the size of the records and other things.

 The table is huge with almost 170 columns.

  Try it and see..?  It's almost certainly going to depend on your
 specific environment.

 Can you let me know what are the specific environment please? Such as:
 ..

 By the way, could someone let me know why set autocommit(false) is for
 sure faster than true please? Or, some online docs talk about this.

 Thanks a lot!
 Emi



 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] PGSQL 9.3 - billion rows

2014-07-07 Thread Felipe Santos
Hi Nicolas,

I do believe Postgresql can handle that.

I've worked with tables that have 2 millions rows per day, which give us an
average of 700 mi/year.

It's hard to say how much hardware power you will need, but I would say
test it with a server in the cloud, since servers in the cloud are usually
easily to resize to your needs (both up and down).

Beside that, take a look at this link to fine tune your settings:
https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

My final words are about the table itselft. I've used to create partitions
for such large tables. The partitions were by day (I had a created_date
column), because that was the most used filtering field used by the people
that queried the table. Using partitions make Postgresql look at only the
subset of data that is being queried, thus increasing querying performance.

If you can do that, do it. But be sure you are partitioning the right
column. Creating partitions that are different from the most part of the
querying filters may impact the query performance negatively.

Good luck!


2014-07-07 10:59 GMT-03:00 Nicolas Paris nipari...@gmail.com:

 ​​
 ​​
 ​​
 ​​
 ​Hello,

 I have a fact table ( table and indexes are bellow ) that will probably
 get arround 2 billion rows.

 - Can postgresql support such table (this table is the fact table of a
 datamart - many join query with dimensions tables) ?
 - If yes, I would like to test (say insert 2 billion test rows), what
 serveur configuration do I need ? How much RAM ?
 - If not, would it be better to think about a cluster or other ?
 - (Have you any idea to optimize this table ?)

 Thanks a lot !


 CREATE TABLE observation_fact
 (
   encounter_num integer NOT NULL,
   patient_num integer NOT NULL,
   concept_cd character varying(50) NOT NULL,
   provider_id character varying(50) NOT NULL,
   start_date timestamp without time zone NOT NULL,
   modifier_cd character varying(100) NOT NULL DEFAULT '@'::character
 varying,
   instance_num integer NOT NULL DEFAULT 1,
   valtype_cd character varying(50),
   tval_char character varying(255),
   nval_num numeric(18,5),
   valueflag_cd character varying(50),
   quantity_num numeric(18,5),
   units_cd character varying(50),
   end_date timestamp without time zone,
   location_cd character varying(50),
   observation_blob text,
   confidence_num numeric(18,5),
   update_date timestamp without time zone,
   download_date timestamp without time zone,
   import_date timestamp without time zone,
   sourcesystem_cd character varying(50),
   upload_id integer,
   text_search_index serial NOT NULL,
   CONSTRAINT observation_fact_pk PRIMARY KEY (patient_num, concept_cd,
 modifier_cd, start_date, encounter_num, instance_num, provider_id)
 )
 WITH (
   OIDS=FALSE
 );


 CREATE INDEX of_idx_allobservation_fact
   ON i2b2databeta.observation_fact
   USING btree
   (patient_num, encounter_num, concept_cd COLLATE pg_catalog.default,
 start_date, provider_id COLLATE pg_catalog.default, modifier_cd COLLATE
 pg_catalog.default, instance_num, valtype_cd COLLATE
 pg_catalog.default, tval_char COLLATE pg_catalog.default, nval_num,
 valueflag_cd COLLATE pg_catalog.default, quantity_num, units_cd COLLATE
 pg_catalog.default, end_date, location_cd COLLATE pg_catalog.default,
 confidence_num);


 CREATE INDEX of_idx_clusteredconcept
   ON i2b2databeta.observation_fact
   USING btree
   (concept_cd COLLATE pg_catalog.default);


 CREATE INDEX of_idx_encounter_patient
   ON i2b2databeta.observation_fact
   USING btree
   (encounter_num, patient_num, instance_num);


 CREATE INDEX of_idx_modifier
   ON i2b2databeta.observation_fact
   USING btree
   (modifier_cd COLLATE pg_catalog.default);

 CREATE INDEX of_idx_sourcesystem_cd
   ON i2b2databeta.observation_fact
   USING btree
   (sourcesystem_cd COLLATE pg_catalog.default);


 CREATE INDEX of_idx_start_date
   ON i2b2databeta.observation_fact
   USING btree
   (start_date, patient_num);


 CREATE INDEX of_idx_uploadid
   ON i2b2databeta.observation_fact
   USING btree
   (upload_id);


 CREATE UNIQUE INDEX of_text_search_unique
   ON i2b2databeta.observation_fact
   USING btree
   (text_search_index);
 ​