Re: [SQL] Multiple DB join
On 8/15/06, Andrew Sullivan <[EMAIL PROTECTED]> wrote: On Tue, Aug 15, 2006 at 10:11:41AM -0400, Sumeet Ambre wrote:> >> The Design of the database is because our organization wants to split up> different datasets into different entities, and there might be a > possibility that they'll run different instances of postgres for each> dataset.It's this "different instances" thing that I'm having a tough timewith. Is this because they want to be able to query local things when disconnected or something? I can think of applications forthis, for sure, I'm just suggesting that you make sure you're notbuilding an optimisation that is (1) premature and (2) possibly apessimal operation. The reason for splitting up the dbs into differnet instances is that in case one of the postgres instances on the server goes down for some reason, it doesnt effect the other instances which are running on the same server. Even I dont know the reason for this kind of approach. But i've to deal with it. > records, the query runs damn slow...below is the sample schema for> my base table and the query i try to run on itit takes me more than > 2-3 minutes to run a queryIs there any way i could speed this up..The problem is not your design, nor even the size of the dataexactly, but the query:> e.g. SELECT * FROM base WHERE ((title ILIKE '%something%') AND (authors > ~* '.*something.*') AND (db_name='something'));You have two initially-unbound search terms there: ILIKE '%' and ~*'.*' are automatically seqscans, because you have nowhere in theindex to start. If you really want to do this kind of unbound-string query, you need to look into full text search. The above approach isnever going to be fast. I previously thought of doing the full text search indexing thing...but i had a intution that the full text search thing is for fields which have very large strings...but in my case the strings are not above 200 chars in length.so would the full text search thing give me the performance which we need...also i tried doing combined indexes on title, authors in my base table..would indexes of any kind help me in this case.. Thanks for ur prompt replies,Sumeet.-- Thanks,Sumeet.
Re: [SQL] NULL becomes default
Hi Julian u need to reset your sequence, i had run through the same problem. On 8/17/06, Julian Scarfe <[EMAIL PROTECTED]
> wrote:A surrogate key has been introduced on a table with modifiers:
Column|Type |Modifiers-+-+ new_id| integer | not null default
nextval(('some_id_seq'::text)::regclass)However, some legacy software inserts into the table by first getting columnnames and then composing an INSERT for all the columns, filling in anyundefined values with NULLs.
Thus when it attempts to insert a row I get a:ERROR: null value in column "new_id" violates not-null constraint
To fix this, you need to do something like:
SELECT setval('public.fdata _fid_seq', max(fid)+1) FROM fdata;
This will make sure that the next value your sequence generates is
greater than any key that already exists in the table.
>> taken from tom lane.-- Thanks,Sumeet.
[SQL] About DBlink
Hi All,Im trying to find out documentation for postgres module names "dblink", can anyone point me out to it. What i want to do is join multiple databases instead of multiple tables.Thanks,Sumeet.
Re: [SQL] About DBlink
Thanks Micheal I dont have dblink installed, I need to find a good documentation which will help me do this and also the place where i can download this module.Thanks,Sumeet. On 8/17/06, Michael Fuhr <[EMAIL PROTECTED]> wrote: On Thu, Aug 17, 2006 at 04:37:03PM -0400, Sumeet wrote:> Im trying to find out documentation for postgres module names "dblink", can> anyone point me out to it. What i want to do is join multiple databases > instead of multiple tables.If you've installed dblink then somewhere on your system should bea file named README.dblink. How to find that file depends on theplatform: on many Unix-like systems you can use "locate" or "find", or if you installed dblink via a package then you could use thepackage management tool to display a list of files that it installed.I'd mention that doing joins between databases isn't a good ideabut it looks like you've already had that discussion in the recent "Multiple DB join" thread.http://archives.postgresql.org/pgsql-sql/2006-08/msg00097.php--Michael Fuhr---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings-- Thanks,Sumeet AmbreMasters of Information Science Candidate,Indiana University.
Re: [SQL] Multiple DB join
On 8/15/06, Andrew Sullivan <[EMAIL PROTECTED]> wrote: > I previously thought of doing the full text search indexing thing...but i> had a intution that the full text search thing is for fields which have very> large strings...but in my case the strings are not above 200 chars in No, it's for data where you're going to be searching in random piecesof the text. I think you should look at tsearch2, probably.Will the full text search indexing help me achive a good speed in searching keywords??? If you had any way to enforce bounded searches, it'd be a differentmatter: strings with initial matches but an unbound end are fast. (You can do it the other way, too, by some tricks with reversing thestrings.)can someone plz ellaborate a little about ways we can enforce bounded searches?. I'm basically trying a simple search i.e. trying to find name of authors user enters into a interface against the 20 miliions records in my db. Can anyone suggest a good way to perform this kind of search ?. Thanks,Sumeet.
Re: [SQL] All columns from table in a joined query
You need to create a custom aggregate for this
CREATE AGGREGATE array_accum (
sfunc = array_append,
basetype = anyelement,
stype = anyarray,
initcond = '{}'
);
then use the field names in your query like this
select array_to_string(array_accum(field1 || '@' || field2),'#') as field_alias from table_name
records will be seperated by '#'
hope this helps.
On 8/23/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I've been trying to figure out how to do the following:Select
schedule.* from schedule join meetingday on schedule.id =
meetingday.scheduleid where sessionnumber = 165 group by schedule.*
order by min(meetingday.date);Is this possible in any way, or do I need to list each field of the schedule table to get all of them?---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings-- Thanks,Sumeet AmbreMasters of Information Science Candidate,Indiana University.
On 8/23/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I've been trying to figure out how to do the following:Select schedule.* from schedule join meetingday on
schedule.id = meetingday.scheduleid where sessionnumber = 165 group by schedule.* order by min(meetingday.date);Is this possible in any way, or do I need to list each field of the schedule table to get all of them?
---(end of broadcast)---TIP 5: don't forget to increase your free space map settings-- Thanks,Sumeet Ambre
Masters of Information Science Candidate,Indiana University.
[SQL] Importing data from csv
Hi Folks,sorry if this is a duplicate post, i've been tryin to find a solution of importing data into postgres from a csv file. The problem is, I have a database which consists of columns which contain newline characters (mac and unix). now when i export these files to a csv format, there are some line breaks (mixed unix and mac) in the data which breaks the copy procedure.
I also tried using the script posted in one of the previous posts..#! /usr/bin/perl$inquotes = 0;while (<>){ # Chop the crlf chop ($_); chop ($_); # this first bit goes through and replaces
# all the commas that re not in quotes with tildes for ($i=0 ; $i < length($_) ; $i++){ $char=substr($_,$i,1); if ($char eq '"' ){ $inquotes = not($inquotes);
}else{ if ( (!$inquotes) && ($char eq ",") ){ substr($_,$i,1)="~"; } } } # this replaces any quotes
s/"//g; print "$_\n";}cat data_file | perl scriptname.pl > outputfile.datand when i run the copy command i get messages like data missing for xyz column.any possible hints...
--Thanks,Sumeet
[SQL] Temporary Views or tables
Hi all,I'm trying to use temporary views in my application, before i proceed and go ahead for creating the temorary view i would like more abt it.1) when is the temporary view automatically deleted. Is it after we close the connection or when the session endsfor.e.g my website doesnt use a persistant connection to my postgres server and it queries and then disconnects from the db server. In one of such connections...if i create a temporary view will this temp view be visible to me after multiple disconnects and connects 2) Is the view approach faster then creating temporary tables??? (and the same question as abovew for temp. tables.)3) If there are multiple users who connect to my website at the same time and create temporary views or tables with the same name would there be any overlapping or such??? -- Thanks,Sumeet.
[SQL]
Hi All,I'm having very strange problems with indexing, I have a large number of partial indexes, when i try to run some queries sometimes the queries run very fast and sometimes they are 100times slower than what i ran before.I tried running vacuum analyze couple of times and it dint help me at all. Also i tried the explain analyze and what i found was the query execution plan keeps on changes. Is there any better way to gain control over these, I tried turning seqscan off, and messing with some other parameters but dint really help. Also can somebody point me out to links which explains more about query execution plans..i.e. each of parameters like bitmap heap scan, index scanetc... -- Thanks,Sumeet
[SQL] selecting random row values in postgres
Hi all, I'm trying to write a query to select random values from a set of 'GROUP BY' see the scenario below to understand the problem here (the actual problem cannot be discussed here so i'm taking an example scenario) Assume there is a table id | name | year_of_birth query: I want to select for each year_of_birth a random name. --> so i do a group by year_of_birth, now i have a set of names, is there any function to select just one name from these set of names. The current approach i'm using to solve this problem is 1) getting these names in a single string using a custom function 'group_concat' 2) Convert the single string into an array 3) use postgresql random function to generate a random number 4) us the random number to select a element from the array previously created. The solution is there but it's kinda hack, is there any other better way of solving this problem. Thanks, Sumeet
Re: [SQL] selecting random row values in postgres
Thanks Buddy, really appreciate ur help on this problem solved... Is there any way this query can be optimized...i'm running it on a huge table with joins - Sumeet On 2/23/07, Rajesh Kumar Mallah <[EMAIL PROTECTED]> wrote: On 2/24/07, Sumeet <[EMAIL PROTECTED]> wrote: > Hi all, > > I'm trying to write a query to select random values from a set of 'GROUP > BY' > see the scenario below to understand the problem here (the actual > problem cannot be discussed here so i'm taking an example scenario) > > Assume there is a table > > id | name | year_of_birth > > query: I want to select for each year_of_birth a random name. Dear Sumeet postgresql DISTINCT ON may be of help , but its not standard sql. regds mallah. tradein_clients=> SELECT * from temp.test; ++--+-+ | id | name | yob | ++--+-+ | 1 | A| 2 | | 2 | B| 2 | | 3 | C| 2 | | 4 | D| 1 | | 5 | E| 1 | | 6 | F| 1 | ++--+-+ (6 rows) tradein_clients=> SELECT distinct on (yob) id,name,yob from temp.testorder by yob,random(); ++--+-+ | id | name | yob | ++--+-+ | 5 | E| 1 | | 1 | A| 2 | ++--+-+ (2 rows) tradein_clients=> SELECT distinct on (yob) id,name,yob from temp.testorder by yob,random(); ++--+-+ | id | name | yob | ++--+-+ | 4 | D| 1 | | 1 | A| 2 | ++--+-+ (2 rows) > > --> so i do a group by year_of_birth, now i have a set of names, is there > any function to select just one name from these set of names. > The current approach i'm using to solve this problem is > > 1) getting these names in a single string using a custom function > 'group_concat' > 2) Convert the single string into an array > 3) use postgresql random function to generate a random number > 4) us the random number to select a element from the array previously > created. > > The solution is there but it's kinda hack, is there any other better way of > solving this problem. > > > Thanks, > Sumeet -- Thanks, Sumeet Ambre Master of Information Science Candidate, Indiana University.
[SQL] tesearch2 question
Hi All, I'm trying to udpate a table containing 13149741 records. And its taking forever to complete this process. The update query i'm trying to run is for full text indexing similiar to UPDATE tblMessages SET idxFTI=to_tsvector(strMessage); Below are some of the stats which might be helpful for analyzing this $top PID USERNAME LWP PRI NICE SIZE RES STATETIMECPU COMMAND 3091 postgres 1 430 46M 38M cpu/1 200:06 3.20% postgres 5052 postgres 1 600 149M 134M sleep0:17 3.12% postgres <<
Re: [SQL] tesearch2 question
Thanks Oleg,
My String message are Abstracts of papers, I did a
$ select avg(len) from (select length(abstract) as len from master_table
limit 500) E;
avg
---
1355.5907859078590786
(1 row)
so length is approx 1400.
I couldn't find any appropriate way to analyze the time for update queries,
but what i did was a explain analyze
$ explain analyze select to_tsvector(article_title) from master_table limit
1000;
The total runtime was approx 500ms.
The server is Sun OS 5.10, with around 8gigs of RAM and 6cpus.
Thanks,
Sumeet.
On 3/7/07, Oleg Bartunov wrote:
On Wed, 7 Mar 2007, Sumeet wrote:
> Hi All,
>
> I'm trying to udpate a table containing 13149741 records. And its
taking
> forever to complete this process.
>
> The update query i'm trying to run is for full text indexing similiar to
>
> UPDATE tblMessages SET idxFTI=to_tsvector(strMessage);
>
How big are your strMessage ? and what's your tsearch2 configuration ?
Can you estimate how long takes updating, for example, 1000 rows ?
It looks like your system is IO bound. What's your hardware ?
>
> Below are some of the stats which might be helpful for analyzing this
>
> $top
>
> PID USERNAME LWP PRI NICE SIZE RES STATETIMECPU COMMAND
> 3091 postgres 1 430 46M 38M cpu/1 200:06 3.20% postgres
> 5052 postgres 1 600 149M 134M sleep0:17 3.12% postgres
>
> << running almost for a day and a half and it is still running,
>
> This table which i'm trying to update has 10 indexes
>
> =
> "a_article_pk" PRIMARY KEY, btree (id)
> "a_article_uk_pmid" UNIQUE, btree (pmid)
> "a_article_idx_abstract" btree ("substring"(abstract::text, 0, 255))
> "a_article_idx_date_cr_year" btree (date_cr_year)
> "a_article_idx_ml_journal_info_medline_ta" btree
(ml_journal_info_a_ta)
> "a_article_idx_owner" btree ("owner")
> "a_article_idx_pmid" btree (pmid)
> "a_article_idx_status" btree (status)
> "a_article_idx_title" btree (article_title)
> "a_master_t_idx_year_published" btree (published_year)
>
> But no indexes on the field i'm trying to update. The field i'm trying
to
> add is a new field.
> Can anyone help me out to figure out why is it taking so much time to
update
> the table.
>
> Also as u see in the above indexes, I have some indexes on some varchar
> column which i feel are totally useless unless u so a exact string
match.
> But does that help in any sense for improving the speed of retreiving
the
> string just normally without any search on it?
>
> Thanks,
> Sumeet.
>
Regards,
Oleg
_____
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83
--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.
[SQL] postgres configuration
Hi All, Sorry if this is the wrong list to ask this question. My Vacuum's are running very very slow and expecially when vacuuming indexes in large tables of size in 5-10 gigabytes, can some one help me determine the parameters for postgres.conf which will help me vacuum fast. I'm running postgres on a server with 32 gigs of ram and 8 processors. Thanks, Sumeet.
Re: [SQL] postgres configuration
The only info i have is Apple xRaid drive array with 14 400GB drives for a total of 5 TB storage I have around 10-15 indexes for each tables. does the number of indexes slow down the vacuum process? indexes are compund indexes on multiple fields. -Sumeet On 3/13/07, Andrej Ricnik-Bay <[EMAIL PROTECTED]> wrote: On 3/14/07, Sumeet <[EMAIL PROTECTED]> wrote: > Hi All, Hi, > Sorry if this is the wrong list to ask this question. General woould have been better :) > My Vacuum's are running very very slow and expecially when vacuuming indexes > in large tables of size in 5-10 gigabytes, > can some one help me determine the parameters for postgres.conf which will > help me vacuum fast. I'm running postgres on a server with 32 gigs of ram > and 8 processors. And the disk subsystem is ? > Thanks, > Sumeet. Cheers, Andrej -- Thanks, Sumeet Ambre Master of Information Science Candidate, Indiana University.
[SQL] Track query status
Hi all,
I have ran a update query on a dataset which has about 48 million records
and the query is already running for the third dayim so tempted to the
kill this query nowis there a way to know if the query is running??
here is the query i've ran
UPDATE ma SET av=to_tsvector('default',coalesce(name_first,'') ||' '||
coalesce(name_last,''));
Thanks,
Sumeet
[SQL] slow query
Hi all I ran a simple query like the one displayed below and it takes a lot of time to execute on this table. This table has 48 million recordsand i worry about this table a lot coz i need to perform join on this table with some other table having around 13 million recordsI've tried vacuuming this table many time but dint help me much...is there any other trick to speed up this tableonce i tried deleting and restoring a table and it worked perfectly fine for a small table of size upto 5-6 million records sm=> explain analyze select * from ma limit 10; QUERY PLAN - Limit (cost=0.00..0.45 rows=10 width=76) (actual time=21985.292..22204.340rows=10 loops=1) -> Seq Scan on ma (cost=0.00..2181956.92 rows=48235392 width=76) (actual time=21985.285..22204.308 rows=10 loops=1) Total runtime: 22204.476 ms (3 rows) -- Thanks, Sumeet
[SQL] auto vacuuming
Hi all is there a way i can find if the pg_autovacuum module is installed on my server?? The postgres version im using is 8.1.4 I tried searching the contrib module and dint see the pg_autovacuum directory there I checked m source Thanks, Sumeet
Re: [SQL] auto vacuuming
Then do i need to still provide the auto vacumming options in the postgres.conf file or these options are automatically taken care of. Thanks, Sumeet. On 4/5/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote: Sumeet escribió: > Hi all is there a way i can find if the pg_autovacuum module is installed on > my server?? > The postgres version im using is 8.1.4 > I tried searching the contrib module and dint see the pg_autovacuum > directory there pg_autovacuum no longer exists. Autovacuum is now integrated into the core. So the way to find if it's installed is "yes". -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Thanks, Sumeet Ambre Master of Information Science Candidate, Indiana University.
[SQL] Multiple DB join
Hi All, I have a database which consists of 20 million records and I've split up the db into 6-7 dbs. I have a base database which consists of the ids with link all the databases. I'm performing search on this single base table. After searching i get some ids which are ids in the other databases which i split up. Now i need to retrieve those records. Is there a way i can join tables from multiple databases as we can join multiple tables in a single database. Thanks, SA. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [SQL] Multiple DB join
Andrew Sullivan wrote: On Mon, Aug 14, 2006 at 05:26:10PM -0400, Sumeet Ambre wrote: Hi All, I have a database which consists of 20 million records and I've split up the db into 6-7 dbs. You can do this (as someone suggested with dblink), but I'm wondering why the split? 20 million records isn't very big. A The Design of the database is because our organization wants to split up different datasets into different entities, and there might be a possibility that they'll run different instances of postgres for each dataset. The problem is that one of the databases consists of 14 million records and when i query the base database which consists of 20million records, the query runs damn slow...below is the sample schema for my base table and the query i try to run on itit takes me more than 2-3 minutes to run a queryIs there any way i could speed this up.. sample schema for my base table - doc_id | title | authors | pub_year | abs | db_name | In the above scheme the field db_name is the name of the other databases which contain the whole record. I'm trying to run query which searches on title, authors name, pub_year or abstract simultaneously... e.g. SELECT * FROM base WHERE ((title ILIKE '%something%') AND (authors ~* '.*something.*') AND (db_name='something')); Any suggestions to speed up this query. Thanks, Sumeet. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
