Re: [PERFORM] tsearch2 seem very slow

2005-09-24 Thread Oleg Bartunov

Ahmad,

what's about the number of unique words ? I mean stat() function.
Sometimes, it helps to identify garbage words.
How big is your articles (average length) ?

please, cut'n paste queries and output from psql ! How fast are 
next queries ?


Oleg
On Fri, 23 Sep 2005, Ahmad Fajar wrote:


Hi Oleg,

For single index I try this query:
explain analyze
select articleid, title, datee from articles
where fti @@ to_tsquery('bankindonesia');

analyze result:

Index Scan using fti_idx on articles  (cost=0.00..862.97 rows=420 width=51)
(actual time=0.067..183761.324 rows=46186 loops=1)
  Index Cond: (fti @@ '\'bank\'  \'indonesia\''::tsquery)
Total runtime: 183837.826 ms

And for multicolumn index I try this query:
explain analyze
select articleid, title, datee from articles
where fti @@ to_tsquery('bankmega');

analyze result:

Index Scan using articles_x1 on articles  (cost=0.00..848.01 rows=410
width=51) (actual time=52.204..37914.135 rows=1841 loops=1)
  Index Cond: ((datee = '2002-01-01'::date) AND (datee =
('now'::text)::date) AND (fti @@ '\'bank\'  \'mega\''::tsquery))
Total runtime: 37933.757 ms

The table structure is as mention on the first talk. If you wanna know how
much table in my database, it's about 100 tables or maybe more. Now I
develop the version 2 of my web application, you can take a look at:
http://www.mediatrac.net, so it will hold many datas. But the biggest table
is article's table. On develop this version 2 I just use half data of the
article's table (about 419804 rows). May be if I import all of the article's
table data it will have 1 million rows. The article's table grows rapidly,
about 10 rows per-week. My developing database size is 28 GB (not real
database, coz I still develop the version 2 and I use half of the data for
play around). I just wanna to perform quick search (fulltext search) on my
article's table not other table. On version 1, the current running version I
use same hardware spesification as mention below, but there is no fulltext
search. So I develop the new version with new features, new interface and
include the fulltext search.

I do know, if the application finish, I must use powerfull hardware. But how
can I guarantee the application will run smooth, if I do fulltext search on
419804 rows in a table it took a long time to get the result.

Could you or friends in this maling-list help meplz..plzz

Tsearch2 configuration:
-
I use default configuration, english stop word file as tsearch2 provide,
stem dictionary as default (coz I don't know how to configure and add new
data to stem dictionary) and I add some words to the english stop word file.

Postgresql configuration
-
max_connections = 32
shared_buffers = 32768
sort_mem = 8192
vacuum_mem = 65536
work_mem = 16384
maintenance_work_mem = 65536
max_fsm_pages = 3
max_fsm_relations = 1000
max_files_per_process = 10
checkpoint_segments = 15
effective_cache_size = 192000
random_page_cost = 2
geqo = true
geqo_threshold = 50
geqo_effort = 5
geqo_pool_size = 0
geqo_generations = 0
geqo_selection_bias = 2.0
from_collapse_limit = 10
join_collapse_limit = 15

OS configuration:
--
I use Redhat 4 AS, kernel 2.6.9-11
kernel.shmmax=1073741824
kernel.sem=250 32000 100 128
fs.aio-max-nr=5242880
the server I configure just only for postgresql, no other service is running
like: www, samba, ftp, email, firewall

hardware configuration:

Motherboard ASUS P5GD1
Processor P4 3,2 GHz
Memory 2 GB DDR 400,
2x200 GB Serial ATA 7200 RPM UltraATA/133, configure as RAID0 for postgresql
data and the partition is EXT3
1x80 GB EIDE 7200 RPM configure for system and home directory and the
partiton is EXT3

Did I miss something?

Regards,
ahmad fajar


-Original Message-
From: Oleg Bartunov [mailto:[EMAIL PROTECTED]
Sent: Jumat, 23 September 2005 18:26
To: Ahmad Fajar
Cc: pgsql-performance@postgresql.org
Subject: RE: [PERFORM] tsearch2 seem very slow

On Fri, 23 Sep 2005, Ahmad Fajar wrote:


Hi Oleg,

I didn't deny on the third repeat or more, it can reach  600 msec. It is
only because the result still in postgres cache, but how about in the

first

run? I didn't dare, the values is un-acceptable. Because my table will

grows

rapidly, it's about 10 rows per-week. And the visitor will search
anything that I don't know, whether it's the repeated search or new

search,

or whether it's in postgres cache or not.


if you have enoush shared memory postgresql will keep index pages there.




I just compare with http://www.postgresql.org, the search is quite fast,

and

I don't know whether the site uses tsearch2 or something else. But as fas

as

I know, if the rows reach 100 milion (I have try for 200 milion rows and

it

seem very slow), even if don't use tsearch2, only use simple query like:
select f1, f2 from table1 where f2='blabla',
and f2 is indexes, my postgres still slow on the first 

[PERFORM] Advice on RAID card

2005-09-24 Thread PFC


Hello fellow Postgresql'ers.

	I've been stumbled on this RAID card which looks nice. It is a PCI-X SATA  
Raid card with 6 channels, and does RAID  0,1,5,10,50.

It is a HP card with an Adaptec chip on it, and 64 MB cache.

HP Part # : 372953-B21
Adaptec Part # : AAR-2610SA/64MB/HP

There' even a picture :
http://megbytes.free.fr/Sata/DSC05970.JPG

	I know it isn't as good as a full SCSI system. I just want to know if  
some of you have had experiences with these, and if this cards belong to  
the slower than no RAID camp, like some DELL card we often see mentioned  
here, or to the decent performance for the price camp. It is to run on a  
Linux.


Thanks in advance for your time and information.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Releasing memory during External sorting?

2005-09-24 Thread Ron Peacetree
From: Dann Corbit [EMAIL PROTECTED]
Sent: Sep 23, 2005 5:38 PM
Subject: RE: [HACKERS] [PERFORM] Releasing memory during External sorting?

_C Unleashed_ also explains how to use a callback function to perform
arbitrary radix sorts (you simply need a method that returns the
[bucketsize] most significant bits for a given data type, for the length
of the key).

So you can sort fairly arbitrary data in linear time (of course if the
key is long then O(n*log(n)) will be better anyway.)

But in any case, if we are talking about external sorting, then disk
time will be so totally dominant that the choice of algorithm is
practically irrelevant.

Horsefeathers.  Jim Gray's sorting contest site:
http://research.microsoft.com/barc/SortBenchmark/

proves that the choice of algorithm can have a profound affect on
performance.  After all, the amount of IO done is the most
important of the things that you should be optimizing for in
choosing an external sorting algorithm.

Clearly, if we know or can assume the range of the data in question
the theoretical minimum amount of IO is one pass through all of the
data (otherwise, we are back in O(lg(n!)) land ).  Equally clearly, for
HD's that one pass should involve as few seeks as possible.

In fact, such a principle can be applied to _all_ forms of IO:  HD,
RAM, and CPU cache.  The absolute best that any sort can
possibly do is to make one pass through the data and deduce the
proper ordering of the data during that one pass.

It's usually also important that our algorithm be Stable, preferably
Wholly Stable.

Let's call such a sort Optimal External Sort (OES).  Just how much
faster would it be than current practice?

The short answer is the difference between how long it currently
takes to sort a file vs how long it would take to cat the contents
of the same file to a RAM buffer (_without_ displaying it). IOW, 
there's SIGNIFICANT room for improvement over current
standard practice in terms of sorting performance, particularly
external sorting performance.

Since sorting is a fundamental operation in many parts of a DBMS,
this is a Big Deal.
   
This discussion has gotten my creative juices flowing.  I'll post
some Straw Man algorithm sketches after I've done some more
thought.

Ron

 -Original Message-
 From: Dann Corbit [EMAIL PROTECTED]
 Sent: Friday, September 23, 2005 2:21 PM
 Subject: Re: [HACKERS] [PERFORM] Releasing memory during ...
 
For the subfiles, load the top element of each subfile into a priority
queue.  Extract the min element and write it to disk.  If the next
value is the same, then the queue does not need to be adjusted.
If the next value in the subfile changes, then adjust it.
 
Then, when the lowest element in the priority queue changes, adjust
the queue.
 
Keep doing that until the queue is empty.
 
You can create all the subfiles in one pass over the data.
 
You can read all the subfiles, merge them, and write them out in a
second pass (no matter how many of them there are).
 
The Gotcha with Priority Queues is that their performance depends
entirely on implementation.  In naive implementations either Enqueue()
or Dequeue() takes O(n) time, which reduces sorting time to O(n^2).

The best implementations I know of need O(lglgn) time for those
operations, allowing sorting to be done in O(nlglgn) time.
Unfortunately, there's a lot of data manipulation going on in the 
process and two IO passes are required to sort any given file.
Priority Queues do not appear to be very IO friendly.

I know of no sorting performance benchmark contest winner based on
Priority Queues.


Replacement selection is not a good idea any more, since obvious
better ideas should take over.  Longer runs are of no value if you do not
have to do multiple merge passes.
 
Judging from the literature and the contest winners, Replacement
Selection is still a viable and important technique.  Besides Priority
Queues, what obvious better ideas have you heard of?


I have explained this general technique in the book C Unleashed,
chapter 13.
 
Sample code is available on the book's home page.

URL please?  

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Advice on RAID card

2005-09-24 Thread Ron Peacetree
It looks like a rebranded low end Adaptec 64MB PCI-X - SATA RAID card.
Looks like the 64MB buffer is not upgradable.
Looks like it's SATA, not SATA II

There are much better ways to spend your money.

These are the products with the current best price/performance ratio:
http://www.areca.us/products/html/pcix-sata.htm

Assuming you are not building 1U boxes, get one of the full height
cards and order it with the maximum size buffer you can afford.
The cards take 1 SODIMM, so that will be a max of 1GB or 2GB
depending on whether 2GB SODIMMs are available to you yet.

Ron

-Original Message-
From: PFC [EMAIL PROTECTED]
Sent: Sep 24, 2005 4:34 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Advice on RAID card


Hello fellow Postgresql'ers.

I've been stumbled on this RAID card which looks nice. It is a PCI-X 
SATA  
Raid card with 6 channels, and does RAID  0,1,5,10,50.
It is a HP card with an Adaptec chip on it, and 64 MB cache.

HP Part # : 372953-B21
Adaptec Part # : AAR-2610SA/64MB/HP

There' even a picture :
http://megbytes.free.fr/Sata/DSC05970.JPG

I know it isn't as good as a full SCSI system. I just want to know if  
some of you have had experiences with these, and if this cards belong to  
the slower than no RAID camp, like some DELL card we often see mentioned  
here, or to the decent performance for the price camp. It is to run on a  
Linux.

Thanks in advance for your time and information.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Advice on RAID card

2005-09-24 Thread PFC



It looks like a rebranded low end Adaptec 64MB PCI-X - SATA RAID card.
Looks like the 64MB buffer is not upgradable.
Looks like it's SATA, not SATA II


	Yeah, that's exactly what it is. I can get one for 150 Euro, the Areca is  
at least 600. This is for a budget server so while it would be nice to  
have all the high-tech stuff, it's not the point. My question was raher,  
is it one of the crap RAID5 cards which are actually SLOWER than plain IDE  
disks, or is it decent, even though low-end (and cheap), and worth it  
compared to software RAID5 ?



Assuming you are not building 1U boxes, get one of the full height
cards and order it with the maximum size buffer you can afford.
The cards take 1 SODIMM, so that will be a max of 1GB or 2GB
depending on whether 2GB SODIMMs are available to you yet.


	It's for a budget dev server which should have RAID5 for reliability, but  
not necessarily stellar performance (and price). I asked about this card  
because I can get one at a good price.


Thanks for taking the time to answer.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Multiple insert performance trick or performance misunderstanding?

2005-09-24 Thread Ron Mayer

When I need to insert a few hundred or thousand things in
a table from a 3-tier application, it seems I'm much better
off creating a big string of semicolon separated insert
statements rather than sending them one at a time - even
when I use the obvious things like wrapping the statements
in a transaction and using the library's prepared statements.



I tried both Ruby/DBI and C#/Npgsql; and in both cases
sets of inserts that took 3 seconds when run individually
took about 0.7 seconds when concatenated together.

Is it expected that I'd be better off sending big
concatenated strings like
  insert into tbl (c1,c2) values (v1,v2);insert into tbl (c1,c2) values 
(v3,v4);...
instead of sending them one at a time?





db.ExecuteSQL(BEGIN);
sql = new System.Text.StringBulder(1);
for ([a lot of data elements]) {
  sql.Append(
 insert into user_point_features (col1,col2)+
  values (  +obj.val1 +,+obj.val2+);
  );
}
db.ExecuteSQL(sql.ToString());
db.ExecuteSQL(COMMIT);

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Multiple insert performance trick or performance misunderstanding?

2005-09-24 Thread Tom Lane
Ron Mayer [EMAIL PROTECTED] writes:
 Is it expected that I'd be better off sending big
 concatenated strings like
insert into tbl (c1,c2) values (v1,v2);insert into tbl (c1,c2) values 
 (v3,v4);...
 instead of sending them one at a time?

It's certainly possible, if the network round trip from client to server
is slow.  I do not think offhand that there is any material advantage
for the processing within the server (assuming you've wrapped the whole
thing into one transaction in both cases); if anything, the
concatenated-statement case is probably a bit worse inside the server
because it will transiently eat more memory.  But network latency or
client-side per-command overhead could well cause the results you see.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org