[PERFORM] Hi!

2004-06-20 Thread Janio Rosa da Silva



Hi,

I am trying install the postgresql-7.4.3 simple 
installation. I did ./configure command at the postgresql directory source. 
While the configuring proccess I receiving the follow message:

checking for tar... /bin/tarchecking for 
strip... stripchecking whether it is possible to strip libraries... 
yeschecking for bison... bison -y*** The installed version of Bison is 
too old. PostgreSQL needs*** Bison version 1.875 or later.checking 
for perl... /usr/bin/perlchecking for main in -lbsd... nochecking for 
setproctitle in -lutil... nochecking for main in -lm... yes

But, after this message the install proccess 
continue like this message. The problem is that the installation never finish. I 
am thinking that the configure proccess is in loop. Have it anything relation 
with my hardware configuration? The computer where I did this is: AMD 
K6-II 200 MHZ; 64 MB memory;I would like why the configure proccess never 
finish.

Regards,

Janio


[PERFORM] Indexing question

2004-06-20 Thread Stan Bielski
Hello all,

I have a rather large table (~20 GB) of network logs taken over the period
of a month and stored under postgres 7.3. I'm trying to create an indexing
scheme so that I can do a query with a where time  foo and time  bar
and get the results without too much waiting. With this in mind, I created
a new database ordering each log entry by the time values (seconds, and
microseconds), and then created an index on the time value for seconds. I
didn't use the clusterdb command, but I did do a select * into foo_tbl
order by time, which I understand to be the functional equivalent. The
result looks something like this: 

 Table public.allflow_tv_mydoom
   Column   |  Type   | Modifiers
+-+---
 tv_s   | bigint  |
 tv_us  | bigint  |
 src| inet|
 dst| inet|
 sport  | integer |
 dport  | integer |
 bytes_in   | bigint  |
 bytes_out  | bigint  |
 isn_in | bigint  |
 isn_out| bigint  |
 num_starts | integer |
 result | integer |
 flags  | integer |
 age_s  | bigint  |
 age_us | bigint  |
 ttl_left   | bigint  |
 end_s  | bigint  |
 end_us | bigint  |
Indexes: allflow_tv_mydoom_x btree (tv_s) 

with tv_s, of course, being my value in seconds. I followed this up with
an ANALYZE so postrgres could absorb this structure into its logic. 
However, whenever I want to do my query, it *insists* on doing a
sequential scan, even when I explicitly turn sequential scan off in
postgres.conf...  here's an example: 

standb=# explain select * from allflow_tv_mydoom where tv_s  1074200099
and tv_s  107506499;
QUERY PLAN
---
 Seq Scan on allflow_tv_mydoom  (cost=1.00..102303307.94 rows=1
width=132)
   Filter: ((tv_s  1074200099) AND (tv_s  107506499))
(2 rows)  

In this query I'm basically asking for a day's worth of data. It should be
straightforward: it's all laid out in order and indexed, but it still
insists on (what I believe to be) tearing through the entire DB and
filtering out the other time values as it sees them. Even if I want just
the data from a particular second, it does the same thing:

standb=# explain select * from allflow_tv_mydoom where tv_s = 1074200099;
 QUERY PLAN
-
 Seq Scan on allflow_tv_mydoom  (cost=1.00..102140682.45 rows=145
width=132)
   Filter: (tv_s = 1074200099)
(2 rows) 

Naturally, this is incredibly frustrating because it takes forever,
regardless of the query.

The funny thing though is that I have the same data in another table
where it is ordered and indexed by IP address, and the queries use the
index and work the way I want them to. Example:

standb=# explain select * from flow_ip_mydoom where src = '10.0.5.5';
 QUERY PLAN
-
 Index Scan using flow_ip_mydoom_x on flow_ip_mydoom  (cost=0.00..333.41
rows=376 width=132)
   Index Cond: (src = '10.0.5.5'::inet)
(2 rows)

Can someone please explain to me what's going on and how to fix it? If
there's an easier way to 'jump' to different time regions in the table
without indexing all the different seconds values, I'd like to know this
as well. 

Thanks a bunch,
-S





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] 50 000 000 Table entries and I have to do a keyword search HELP NEEDED

2004-06-20 Thread Aaron
Title: Message



The words for the keyword can be made up of a 
sentace, ie 10 or more keywords to one entry.
Also incase I didnt answer before, we are using 
TSearch2 and all tables have been fully analyzed and indexed.
Any other suggestions?
How long do searches take when 10 000 rows are 
returned?
We can not use a limit of 100 because we need to 
analyze the entire data set returned.
Thanks,

  - Original Message - 
  From: 
  Jeremy 
  Dunn 
  To: 'borajetta' 
  Cc: Postgresql Performance 
  
  Sent: Tuesday, June 15, 2004 5:43 
AM
  Subject: RE: [PERFORM] 50 000 000 Table 
  entries and I have to do a keyword search HELP NEEDED
  
  One 
  option that does not take advantage of any fancy indexing methods is to create 
  a trigger on the table, on insert/update/delete, which extracts each 
  individual word from the field you care about, and creates an entry in another 
  'keyword' table, id = 'word', value = pk of your original table. then 
  index the keyword table on the 'keyword' field, and do your searches from 
  there. this should improve performance substantially, even on very large 
  return sets, because the keyword table rows are very small and thus a lot of 
  them fit in a disk block.
  
  - 
  Jeremy
  

-Original Message-From: 
[EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of 
borajettaSent: Monday, June 07, 2004 5:47 PMTo: 
[EMAIL PROTECTED]Subject: [PERFORM] 50 000 000 
Table entries and I have to do a keyword search HELP 
NEEDED

So I have a table with about 50 million entries in it, I have to do a 
keyword search.

The keyword search is done on the title of the entry. For example a entry could be "This 
is a title string which could be searched"

I have tried a few ways to search but I get horrible search 
times. Some keywords will come 
up with matches as big as .25 million but most are around 
1000-5000.

I use an index which narrows the table down to about 1.5-2million 
entries.

I used 2 tables which had a 1:1 correspondence.
One held a gist index which was on a int field which searched the for 
the keyword. Then I would join 
the table to another to retrieve the rest of the information about the items 
it matched.

This was slow even for returning 100 entries. About 10 seconds, sometimes 5. But when I start getting 1xxx 
entries its about 30-50 seconds. 
The rest is just horrible.

How should I set up my indexes and or tables.
We were thinking of putting the index inside one table then the join 
would not have to be done but this still returns rather slow 
results.

I have not fully tested this method but it looks like when run for 
just the keyword search on the title and no joining it can return in about 
10 seconds or less.
This is a great improvement but I am currently going 
to make the table all in one and see how long it will take. I believe it will not be much more 
as there will be no join needed only the returning of some attribute 
fields. 

This is still not the kind of time I would like to see, I wanted 
something around 2 seconds or less. 
I know there is a lot of information especially if .25 million rows 
are to be returned but if there is only 1xxx-9xxx rows to be returned I 
believe 2 seconds seems about right.

How do search engines do it?
Any suggestions are welcome,

Thanks


Re: [PERFORM] Indexing question

2004-06-20 Thread Tom Lane
Stan Bielski [EMAIL PROTECTED] writes:
  Table public.allflow_tv_mydoom
Column   |  Type   | Modifiers
 +-+---
  tv_s   | bigint  |
^^
 Indexes: allflow_tv_mydoom_x btree (tv_s) 

 standb=# explain select * from allflow_tv_mydoom where tv_s  1074200099
 and tv_s  107506499;
 [ gives seqscan ]

This is a FAQ :-(.  Unadorned integer constants are taken to be int4
not int8 (unless they are too large for int4), and cross-data-type
comparisons are not indexable in existing releases.  So you have to
explicitly cast the comparison values to int8:

explain select * from allflow_tv_mydoom where tv_s  1074200099::bigint
and tv_s  107506499::bigint;

(or use the standard CAST syntax if you prefer).

7.5 will have a fix for this ancient annoyance.

BTW, is there a reason to be using tv_s+tv_us and not just a single
timestamptz column?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend