Re: [SQL] Multiple DB join

2006-08-15 Thread Sumeet
On 8/15/06, Andrew Sullivan <[EMAIL PROTECTED]> wrote:
On Tue, Aug 15, 2006 at 10:11:41AM -0400, Sumeet Ambre wrote:> >> The Design of the database is because our organization wants to split up> different datasets into different entities, and there might be a
> possibility that they'll run different instances of postgres for each> dataset.It's this "different instances" thing that I'm having a tough timewith.  Is this because they want to be able to query local things
when disconnected or something?  I can think of applications forthis, for sure, I'm just suggesting that you make sure you're notbuilding an optimisation that is (1) premature and (2) possibly apessimal operation.

The reason for splitting up the dbs into differnet instances is that in case one of the postgres instances on the server 
goes down for some reason, it doesnt effect the other instances which are running on the same server. Even I dont know 
the reason for this kind of approach. But i've to deal with it.
 > records, the query runs damn slow...below is the sample schema for> my base table and the query i try to run on itit takes me more than
> 2-3 minutes to run a queryIs there any way i could speed this up..The problem is not your design, nor even the size of the dataexactly, but the query:> e.g.  SELECT * FROM base WHERE ((title ILIKE '%something%') AND (authors
> ~*   '.*something.*')  AND (db_name='something'));You have two initially-unbound search terms there: ILIKE '%' and ~*'.*' are automatically seqscans, because you have nowhere in theindex to start.  If you really want to do this kind of unbound-string
query, you need to look into full text search.  The above approach isnever going to be fast.
I previously thought of doing the full text search indexing thing...but i had a intution that the full text search thing is for fields which have very large strings...but in my case the  strings are not above 200 chars in length.so would the full text search thing give me the performance which we need...also i tried doing combined indexes on title, authors in my base table..would indexes of any kind help me in this case..
Thanks for ur prompt replies,Sumeet.-- Thanks,Sumeet.


Re: [SQL] NULL becomes default

2006-08-17 Thread Sumeet
Hi Julian u need to reset your sequence, i had run through the same problem. On 8/17/06, Julian Scarfe <[EMAIL PROTECTED]
> wrote:A surrogate key has been introduced on a table with modifiers:
   Column|Type |Modifiers-+-+ new_id| integer | not null default
nextval(('some_id_seq'::text)::regclass)However, some legacy software inserts into the table by first getting columnnames and then composing an INSERT for all the columns, filling in anyundefined values with NULLs.
Thus when it attempts to insert a row I get a:ERROR:  null value in column "new_id" violates not-null constraint

To fix this, you need to do something like:  
SELECT setval('public.fdata _fid_seq', max(fid)+1) FROM fdata;  
This will make sure that the next value your sequence generates is 
 greater than any key that already exists in the table. 

>> taken from tom lane.-- Thanks,Sumeet.


[SQL] About DBlink

2006-08-17 Thread Sumeet
Hi All,Im trying to find out documentation for postgres module names "dblink", can anyone point me out to it. What i want to do is join multiple databases instead of multiple tables.Thanks,Sumeet.



Re: [SQL] About DBlink

2006-08-17 Thread Sumeet
Thanks Micheal I dont have dblink installed, I need to find a good documentation which will help me do this and also the place where i can download this module.Thanks,Sumeet.
On 8/17/06, Michael Fuhr <[EMAIL PROTECTED]> wrote:
On Thu, Aug 17, 2006 at 04:37:03PM -0400, Sumeet wrote:> Im trying to find out documentation for postgres module names "dblink", can> anyone point me out to it. What i want to do is join multiple databases
> instead of multiple tables.If you've installed dblink then somewhere on your system should bea file named README.dblink.  How to find that file depends on theplatform: on many Unix-like systems you can use "locate" or "find",
or if you installed dblink via a package then you could use thepackage management tool to display a list of files that it installed.I'd mention that doing joins between databases isn't a good ideabut it looks like you've already had that discussion in the recent
"Multiple DB join" thread.http://archives.postgresql.org/pgsql-sql/2006-08/msg00097.php--Michael Fuhr---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings-- Thanks,Sumeet AmbreMasters of Information Science Candidate,Indiana University.


Re: [SQL] Multiple DB join

2006-08-18 Thread Sumeet
On 8/15/06, Andrew Sullivan <[EMAIL PROTECTED]> wrote:
> I previously thought of doing the full text search indexing thing...but i> had a intution that the full text search thing is for fields which have very> large strings...but in my case the  strings are not above 200 chars in
No, it's for data where you're going to be searching in random piecesof the text.  I think you should look at tsearch2, probably.Will the full text search indexing help me achive a good speed in searching keywords??? 
If you had any way to enforce bounded searches, it'd be a differentmatter: strings with initial matches but an unbound end are fast.
(You can do it the other way, too, by some tricks with reversing thestrings.)can someone plz ellaborate a little about ways we can enforce bounded searches?. I'm basically trying a simple search
i.e. trying to find name of authors user enters into a interface against the 20 miliions records in my db. Can anyone suggest a good way to perform this kind of search ?. Thanks,Sumeet.



Re: [SQL] All columns from table in a joined query

2006-08-23 Thread Sumeet
You need to create a custom aggregate for this

CREATE AGGREGATE array_accum (

sfunc = array_append,

basetype = anyelement,

stype = anyarray,

initcond = '{}'

);


then use the field names in your query like this


select array_to_string(array_accum(field1 || '@' || field2),'#') as field_alias from table_name

records will be seperated by '#'
hope this helps.





On 8/23/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I've been trying to figure out how to do the following:Select
schedule.* from schedule join meetingday on schedule.id =
meetingday.scheduleid where sessionnumber = 165 group by schedule.*
order by min(meetingday.date);Is this possible in any way, or do I need to list each field of the schedule table to get all of them?---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings-- Thanks,Sumeet AmbreMasters of Information Science Candidate,Indiana University.
On 8/23/06, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
I've been trying to figure out how to do the following:Select schedule.* from schedule join meetingday on 
schedule.id = meetingday.scheduleid where sessionnumber = 165 group by schedule.* order by min(meetingday.date);Is this possible in any way, or do I need to list each field of the schedule table to get all of them?
---(end of broadcast)---TIP 5: don't forget to increase your free space map settings-- Thanks,Sumeet Ambre
Masters of Information Science Candidate,Indiana University.


[SQL] Importing data from csv

2006-08-24 Thread Sumeet
Hi Folks,sorry if this is a duplicate post, i've been tryin to find a solution of importing data into postgres from a csv file. The problem is, I have a database which consists of columns which contain newline characters (mac and unix). now when i export these files to a csv format, there are some line breaks (mixed unix and mac) in the data which breaks the copy procedure.
I also tried using the script posted in one of the previous posts..#! /usr/bin/perl$inquotes = 0;while (<>){ # Chop the crlf chop ($_); chop ($_); # this first bit goes through and replaces
 # all the commas that re not in  quotes with tildes for ($i=0 ; $i < length($_) ; $i++){ $char=substr($_,$i,1); if ($char eq '"' ){ $inquotes = not($inquotes);
 }else{ if ( (!$inquotes) && ($char eq ",") ){ substr($_,$i,1)="~"; } } } # this replaces any quotes
 s/"//g; print "$_\n";}cat data_file | perl scriptname.pl > outputfile.datand when i run the copy command i get messages like data missing for xyz column.any possible hints...
--Thanks,Sumeet


[SQL] Temporary Views or tables

2006-08-28 Thread Sumeet
Hi all,I'm trying to use temporary views in my application, before i proceed and go ahead for creating the temorary view i would like more abt it.1) when is the temporary view automatically deleted. Is it after we close the connection or when the session endsfor.e.g  my website doesnt
  use a persistant connection to my postgres server and it queries and then disconnects from the db server. In one of such connections...if i create a temporary view  will this temp view be visible to me after multiple disconnects and connects 
2) Is the view approach faster then creating temporary tables??? (and the same question as abovew for temp. tables.)3) If there are multiple users who connect to my website at the same time and create temporary views or tables with the same name would there be any overlapping or such???
-- Thanks,Sumeet.


[SQL]

2006-08-29 Thread Sumeet
Hi All,I'm having very strange problems with indexing, I have a large number of partial indexes, when i try to run some queries sometimes the queries run very fast and sometimes they are 100times slower than what i ran before.I tried running vacuum analyze couple of times and it dint help me at all. Also i tried the explain analyze and what i found was the query execution plan keeps on changes. Is there any better way to gain control over these, I tried turning seqscan off, and messing with some other parameters but dint really help. Also can somebody point me out to links which explains more about query execution plans..i.e. each of parameters like bitmap heap scan, index scanetc...
-- Thanks,Sumeet


[SQL] selecting random row values in postgres

2007-02-23 Thread Sumeet

Hi all,

I'm trying to write  a query to select random values from a set of 'GROUP
BY'
see the scenario below to understand the problem here (the actual
problem cannot be discussed here so i'm taking an example scenario)

Assume there is a table

id | name | year_of_birth

query: I want  to select for each year_of_birth a random name.

--> so i do a group by year_of_birth, now i have a set of names, is there
any function to select just one name from these set of names.
The current approach i'm using to solve this problem is

1) getting these names in a single string using a custom function
'group_concat'
2) Convert the single string into an array
3) use postgresql random function to generate  a random number
4) us the random number to select a element from the array previously
created.

The solution is there but it's kinda hack, is there any other better way of
solving this problem.


Thanks,
Sumeet


Re: [SQL] selecting random row values in postgres

2007-02-23 Thread Sumeet

Thanks Buddy, really appreciate ur help on this

problem solved...

Is there any way this query can be optimized...i'm running it on a huge
table with joins

- Sumeet


On 2/23/07, Rajesh Kumar Mallah <[EMAIL PROTECTED]> wrote:




On 2/24/07, Sumeet <[EMAIL PROTECTED]> wrote:
> Hi all,
>
> I'm trying to write  a query to select random values from a set of
'GROUP
> BY'
> see the scenario below to understand the problem here (the actual
> problem cannot be discussed here so i'm taking an example scenario)
>
> Assume there is a table
>
> id | name | year_of_birth
>
> query: I want  to select for each year_of_birth a random name.

Dear Sumeet

postgresql DISTINCT ON may be of help , but its not standard sql.

regds
mallah.

tradein_clients=> SELECT * from temp.test;
++--+-+
| id | name | yob |
++--+-+
|  1 | A|   2 |
|  2 | B|   2 |
|  3 | C|   2 |
|  4 | D|   1 |
|  5 | E|   1 |
|  6 | F|   1 |
++--+-+
(6 rows)

tradein_clients=> SELECT distinct on (yob) id,name,yob  from temp.testorder by 
yob,random();
++--+-+
| id | name | yob |
++--+-+
|  5 | E|   1 |
|  1 | A|   2 |
++--+-+
(2 rows)

tradein_clients=> SELECT distinct on (yob) id,name,yob  from temp.testorder by 
yob,random();
++--+-+
| id | name | yob |
++--+-+
|  4 | D|   1 |
|  1 | A|   2 |
++--+-+
(2 rows)




>
> --> so i do a group by year_of_birth, now i have a set of names, is
there
> any function to select just one name from these set of names.
> The current approach i'm using to solve this problem is
>
> 1) getting these names in a single string using a custom function
> 'group_concat'
> 2) Convert the single string into an array
> 3) use postgresql random function to generate  a random number
> 4) us the random number to select a element from the array previously
> created.
>
> The solution is there but it's kinda hack, is there any other better way
of
> solving this problem.
>
>
> Thanks,
> Sumeet





--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.


[SQL] tesearch2 question

2007-03-07 Thread Sumeet

Hi All,

I'm trying to udpate a table containing  13149741 records. And its taking
forever to complete this process.

The update query i'm trying to run is for full text indexing similiar to

UPDATE tblMessages SET idxFTI=to_tsvector(strMessage);


Below are some of the stats which might be helpful for analyzing this

$top

  PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
 3091 postgres   1  430   46M   38M cpu/1  200:06  3.20% postgres
 5052 postgres   1  600  149M  134M sleep0:17  3.12% postgres

<<

Re: [SQL] tesearch2 question

2007-03-07 Thread Sumeet

Thanks Oleg,

My String message are Abstracts of papers, I did a

$ select avg(len) from (select length(abstract) as len from master_table
limit 500) E;

 avg
---
1355.5907859078590786
(1 row)

so length is approx 1400.


I couldn't find any appropriate way to analyze the time for update queries,
but what i did was a explain analyze

$ explain analyze select to_tsvector(article_title) from master_table limit
1000;

The total runtime was approx 500ms.

The server is Sun OS 5.10, with around 8gigs of RAM and 6cpus.

Thanks,
Sumeet.



On 3/7/07, Oleg Bartunov  wrote:


On Wed, 7 Mar 2007, Sumeet wrote:

> Hi All,
>
> I'm trying to udpate a table containing  13149741 records. And its
taking
> forever to complete this process.
>
> The update query i'm trying to run is for full text indexing similiar to
>
> UPDATE tblMessages SET idxFTI=to_tsvector(strMessage);
>

How big are your strMessage ? and what's your tsearch2 configuration ?
Can you estimate how long takes updating, for example, 1000 rows ?
It looks like your system is IO bound. What's your hardware ?

>
> Below are some of the stats which might be helpful for analyzing this
>
> $top
>
>  PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
> 3091 postgres   1  430   46M   38M cpu/1  200:06  3.20% postgres
> 5052 postgres   1  600  149M  134M sleep0:17  3.12% postgres
>
> << running almost for a day and a half and it is still running,
>
> This table which i'm trying to  update has 10 indexes
>
> =
> "a_article_pk" PRIMARY KEY, btree (id)
>   "a_article_uk_pmid" UNIQUE, btree (pmid)
>   "a_article_idx_abstract" btree ("substring"(abstract::text, 0, 255))
>   "a_article_idx_date_cr_year" btree (date_cr_year)
>   "a_article_idx_ml_journal_info_medline_ta" btree
(ml_journal_info_a_ta)
>   "a_article_idx_owner" btree ("owner")
>   "a_article_idx_pmid" btree (pmid)
>   "a_article_idx_status" btree (status)
>   "a_article_idx_title" btree (article_title)
>   "a_master_t_idx_year_published" btree (published_year)
> 
> But no indexes on the field i'm trying to update. The field i'm trying
to
> add is a new field.
> Can anyone help me out to figure out why is it taking so much time to
update
> the table.
>
> Also as u see in the above indexes, I have some indexes on some varchar
> column which i feel are totally useless unless u so a exact string
match.
> But does that help in any sense for improving the speed of retreiving
the
> string just normally without any search on it?
>
> Thanks,
> Sumeet.
>

Regards,
Oleg
_____
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [email protected], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83





--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.


[SQL] postgres configuration

2007-03-13 Thread Sumeet

Hi All,

Sorry if this is the wrong list to ask this question.

My Vacuum's are running very very slow and expecially when vacuuming indexes
in large tables of size in 5-10 gigabytes,
can some one help me determine the parameters for postgres.conf which will
help me vacuum fast. I'm running postgres on a server with 32 gigs of ram
and 8 processors.


Thanks,
Sumeet.


Re: [SQL] postgres configuration

2007-03-13 Thread Sumeet

The only info i have is

Apple xRaid drive array with 14 400GB drives for a total of 5 TB storage
I have around 10-15 indexes for each tables. does the number of indexes slow
down the vacuum process?
indexes are compund indexes on multiple fields.

-Sumeet


On 3/13/07, Andrej Ricnik-Bay <[EMAIL PROTECTED]> wrote:


On 3/14/07, Sumeet <[EMAIL PROTECTED]> wrote:
> Hi All,
Hi,

> Sorry if this is the wrong list to ask this question.
General woould have been better :)

> My Vacuum's are running very very slow and expecially when vacuuming
indexes
> in large tables of size in 5-10 gigabytes,
> can some one help me determine the parameters for postgres.conf which
will
> help me vacuum fast. I'm running postgres on a server with 32 gigs of
ram
> and 8 processors.
And the disk subsystem is  ?



> Thanks,
> Sumeet.
Cheers,
Andrej





--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.


[SQL] Track query status

2007-03-28 Thread Sumeet

Hi all,

I have ran a update query on a dataset which has about 48 million records
and the query is already running for the third dayim so tempted to the
kill this query nowis there a way to know if the query is running??
here is the query i've ran

UPDATE ma SET av=to_tsvector('default',coalesce(name_first,'') ||' '||
coalesce(name_last,''));


Thanks,
Sumeet


[SQL] slow query

2007-04-04 Thread Sumeet

Hi all

I ran a simple query like the one displayed below and it takes a lot of time
to execute on this table.
This table has 48 million recordsand i worry about this table a lot coz
i need to perform join on this
table with some other table having around 13 million recordsI've tried
vacuuming this table many time
but dint help me much...is there any other trick to speed up this
tableonce i tried deleting and restoring a table
and it worked perfectly fine for a small table of size upto 5-6 million
records

sm=> explain analyze select * from ma limit 10;
QUERY
PLAN
-
Limit  (cost=0.00..0.45 rows=10 width=76) (actual
time=21985.292..22204.340rows=10 loops=1)
  ->  Seq Scan on ma  (cost=0.00..2181956.92 rows=48235392 width=76)
(actual time=21985.285..22204.308 rows=10 loops=1)
Total runtime: 22204.476 ms
(3 rows)


--
Thanks,
Sumeet


[SQL] auto vacuuming

2007-04-05 Thread Sumeet

Hi all is there a way i can find if the pg_autovacuum module is installed on
my server??
The postgres version im using is 8.1.4
I tried searching the contrib module and dint see the pg_autovacuum
directory there
I checked m source

Thanks,
Sumeet


Re: [SQL] auto vacuuming

2007-04-05 Thread Sumeet

Then do i need to still provide the auto vacumming options in the
postgres.conf file or these options are automatically taken care of.

Thanks,
Sumeet.

On 4/5/07, Alvaro Herrera <[EMAIL PROTECTED]> wrote:


Sumeet escribió:
> Hi all is there a way i can find if the pg_autovacuum module is
installed on
> my server??
> The postgres version im using is 8.1.4
> I tried searching the contrib module and dint see the pg_autovacuum
> directory there

pg_autovacuum no longer exists.  Autovacuum is now integrated into the
core.  So the way to find if it's installed is "yes".

--
Alvaro Herrera
http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.





--
Thanks,
Sumeet Ambre
Master of Information Science Candidate,
Indiana University.


[SQL] Multiple DB join

2006-08-14 Thread Sumeet Ambre

Hi All,

I have a database which consists of 20 million records and I've split up 
the db into 6-7 dbs. I have a base database which consists of
the ids with link all the databases. I'm performing search on this 
single base table. After searching i get some ids which are ids in the other
databases which i split up. Now i need to retrieve those records. Is 
there a way i can join tables from multiple databases as we can join 
multiple tables

in a single database.

Thanks,
SA.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Multiple DB join

2006-08-15 Thread Sumeet Ambre

Andrew Sullivan wrote:

On Mon, Aug 14, 2006 at 05:26:10PM -0400, Sumeet Ambre wrote:
  

Hi All,

I have a database which consists of 20 million records and I've split up 
the db into 6-7 dbs. 



You can do this (as someone suggested with dblink), but I'm wondering
why the split?  20 million records isn't very big.

A
  
The Design of the database is because our organization wants to split up 
different datasets into different entities, and there might be a 
possibility that they'll run different instances of postgres for each 
dataset. The problem is that one of the databases consists of 14 million 
records and when i query the base database which consists of 20million 
records, the query runs damn slow...below is the sample schema for 
my base table and the query i try to run on itit takes me more than 
2-3 minutes to run a queryIs there any way i could speed this up..


sample schema for my base table
-

doc_id  |  title |  authors  |  pub_year  |   abs   |   db_name |

In the above scheme the field db_name is the name of the other databases 
which contain the whole record.
I'm trying to run query which searches on title, authors name, pub_year 
or abstract simultaneously...


e.g.  SELECT * FROM base WHERE ((title ILIKE '%something%') AND (authors 
~*   '.*something.*')  AND (db_name='something'));


Any suggestions to speed up this query.

Thanks,
Sumeet.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match