At the moment, all guids are distinct, however before I zapped the duplicates, there were 280 duplicates.
Currently, there are over 2 million distinct guids. -Robert On Mon, Aug 19, 2013 at 11:12 AM, Pavel Stehule <pavel.steh...@gmail.com>wrote: > > > > 2013/8/19 Robert Sosinski <rsosin...@ticketevolution.com> > >> Hi Pavel, >> >> What kind of example do you need? I cant give you the actual data I have >> in the table, but I can give you an example query and the schema attached >> below. From there, I would just put in 2 million rows worth 1.2 Gigs of >> data. Average size of the the extended columns (using the pg_column_size >> function) in bytes are: >> >> guid: 33 >> name: 2.41 >> currency: 4 >> fields: 120.32 >> >> example query: >> >> -- find duplicate records using a guid >> select guid, array_agg(id) from orders group by guid; >> > > how much distinct guid is there, and how much duplicates > > ?? > > regards > > Pavel > > > >> >> example schema: >> Table "public.things" >> >> Column | Type | >> Modifiers | Storage | Stats target | Description >> >> ------------+-----------------------------+-----------------------------------------------------+----------+--------------+------------- >> id | integer | not null default >> nextval('things_id_seq'::regclass) | plain | | >> version | integer | not null >> | plain | | >> created_at | timestamp without time zone | not null >> | plain | | >> updated_at | timestamp without time zone | not null >> | plain | | >> foo_id | integer | not null >> | plain | | >> bar_id | integer | not null >> | plain | | >> baz_id | integer | not null >> | plain | | >> guid | character varying | not null >> | extended | | >> name | character varying | not null >> | extended | | >> price | numeric(12,2) | not null >> | main | | >> currency | character varying | not null >> | extended | | >> amount | integer | not null >> | plain | | >> the_date | date | not null >> | plain | | >> fields | hstore | >> | extended | | >> Indexes: >> "things_pkey" PRIMARY KEY, btree (id) >> "things_foo_id_idx" btree (foo_id) >> "things_bar_id_idx" btree (bar_id) >> "things_baz_id_idx" btree (baz_id) >> "things_guid_uidx" UNIQUE, btree (guid) >> "things_lpad_lower_name_eidx" btree (lpad(lower(name::text), 10, >> '0'::text)) >> "things_price_idx" btree (price) >> >> Foreign-key constraints: >> "things_foo_id_fkey" FOREIGN KEY (foo_id) REFERENCES foos(id) >> "things_bar_id_fkey" FOREIGN KEY (bar_id) REFERENCES bars(id) >> "things_baz_id_fkey" FOREIGN KEY (baz_id) REFERENCES bazs(id) >> Triggers: >> timestamps_trig BEFORE INSERT OR UPDATE ON things FOR EACH ROW >> EXECUTE PROCEDURE timestamps_tfun() >> >> Let me know if you need anything else. >> >> Thanks, >> >> >> On Mon, Aug 19, 2013 at 3:29 AM, Pavel Stehule >> <pavel.steh...@gmail.com>wrote: >> >>> Hello >>> >>> please, can you send some example or test? >>> >>> Regards >>> >>> Pavel Stehule >>> >>> >>> 2013/8/19 Robert Sosinski <rsosin...@ticketevolution.com> >>> >>>> When using array_agg on a large table, memory usage seems to spike up >>>> until Postgres crashes with the following error: >>>> >>>> 2013-08-17 18:41:02 UTC [2716]: [2] WARNING: terminating connection >>>> because of crash of another server process >>>> 2013-08-17 18:41:02 UTC [2716]: [3] DETAIL: The postmaster has >>>> commanded this server process to roll back the current transaction and >>>> exit, because another server process exited abnormally and possibly >>>> corrupted shared memory. >>>> 2013-08-17 18:41:02 UTC [2716]: [4] HINT: In a moment you should be >>>> able to reconnect to the database and repeat your command. >>>> >>>> I've definitely isolated it down to using array_agg, as when I changed >>>> the query to use string_agg, it worked fine. I also tried using array_agg >>>> on a few different queries, all yielding the same issue. Swapping in >>>> string_agg fixed the issue once more. >>>> >>>> This particular table has over 2 million rows and is 1.2 Gigs, and when >>>> I ran the query while viewing htop, the virtual size of the Postgres >>>> process ballooned to 13.9G until crashing. >>>> >>>> The version of Postgres I am using is: PostgreSQL 9.2.4 on >>>> x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu/Linaro 4.7.2-2ubuntu1) >>>> 4.7.2, 64-bit >>>> >>>> Any help would be much appreciated, thanks! >>>> >>>> -Robert >>>> >>> >>> >> >