Re: [PERFORM] Fastest char datatype

2009-07-20 Thread Peter Eisentraut
On Monday 20 July 2009 04:46:53 Robert James wrote:
 I'm storing a lot of words in a database.  What's the fastest format for
 finding them? I'm going to be doing a lot of WHERE w LIKE 'marsh%' and
 WHERE w IN ('m', 'ma').  All characters are lowercase a-z, no punctuation,
 no other alphabets.  By default I'm using varchar in utf-8 encoding, but
 was wondering if I could specificy something else (perhaps 7bit ascii,
 perhaps lowercase only) that would speed things up even further.

If your data is only lowercase a-z, as you say, then the binary representation 
will be the same in all server-side encodings, because they are all supersets 
of ASCII.

These concerns will likely be dominated by the question of proper indexing and 
caching anyway.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Віталій Тимчишин
2009/7/20 Robert James srobertja...@gmail.com


 Hi. I notice that when I do a WHERE x, Postgres uses an index, and when I
 do WHERE y, it does so as well, but when I do WHERE x OR y, it doesn't. Why
 is this so?


It's not clever enough.

And how can I shut this off?


Use UNION/UNION ALL if possible in your case.


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Chris

Віталій Тимчишин wrote:



2009/7/20 Robert James srobertja...@gmail.com 
mailto:srobertja...@gmail.com



Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y,
it doesn't. Why is this so? 



It's not clever enough.


Of course it is.

I'm running 8.3.7.

create table t1(id int primary key);
insert into t1(id) select a from generate_series(1, 50) as s(a);
analyze t1;

explain analyze select * from t1 where id=5000 or id=25937;
  QUERY PLAN 


--
 Bitmap Heap Scan on t1  (cost=8.60..16.44 rows=2 width=4) (actual 
time=0.077..0.083 rows=2 loops=1)

   Recheck Cond: ((id = 5000) OR (id = 25937))
   -  BitmapOr  (cost=8.60..8.60 rows=2 width=0) (actual 
time=0.063..0.063 rows=0 loops=1)
 -  Bitmap Index Scan on t1_pkey  (cost=0.00..4.30 rows=1 
width=0) (actual time=0.034..0.034 rows=1 loops=1)

   Index Cond: (id = 5000)
 -  Bitmap Index Scan on t1_pkey  (cost=0.00..4.30 rows=1 
width=0) (actual time=0.021..0.021 rows=1 loops=1)

   Index Cond: (id = 25937)
 Total runtime: 0.153 ms
(8 rows)

What Robert didn't post was his query, see

http://archives.postgresql.org/pgsql-general/2009-07/msg00767.php

which makes it a lot harder to 'optimize' since they aren't straight 
forward conditions.


--
Postgresql  php tutorials
http://www.designmagick.com/


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Trigger on column

2009-07-20 Thread ramasubramanian

Dear all,
   Can we create a trigger on particular column of a table?

Regards,
Ram


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Trigger on column

2009-07-20 Thread A. Kretschmer
In response to ramasubramanian :
 Dear all,
Can we create a trigger on particular column of a table?

No, but you can compare OLD.column and NEW.column and return from the
function if NEW.column = OLD.column.


Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: - Header)
GnuPG-ID:   0x3FFF606C, privat 0x7F4584DA   http://wwwkeys.de.pgp.net

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Performance of quer or procedure going down when we are taking the backup

2009-07-20 Thread ramasubramanian

   Dear all,
   Performance of query or procedure going down when we are taking the 
backup of that schema(it is obvious), But how to  increase the performance.


Regards,
Ram. 




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calling conventions

2009-07-20 Thread Matthew Wakeling

On Fri, 17 Jul 2009, Kevin Grittner wrote:

I've seen the code in Java outperform the same code in optimized C,
because the just in time compiler can generate native code optimized
for the actual code paths being taken rather than a compile-time guess
at that, but a factor of 100?  Something else has to be going on here
beyond an interface layer.  Is this all in RAM with the Java code,
versus having disk access in PostgreSQL?


Yeah, it does seem a little excessive. The Java code runs all in RAM, 
versus Postgres running all from OS cache or Postgres shared buffer (bit 
hard to tell which of those two it is - there is no hard drive activity 
anyway). The Java code does no page locking, whereas Postgres does loads. 
The Java code is emulating just the index, whereas Postgres is fetching 
the whole row as well. However, I still struggle to accept the 100 times 
performance difference.


Matthew

--
What goes up must come down. Ask any system administrator.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full text search with ORDER BY performance issue

2009-07-20 Thread Oleg Bartunov

Krade,

On Sat, 18 Jul 2009, Krade wrote:


Here's a couple of queries:

archive= explain analyze select * from a where  comment_tsv @@ 
plainto_tsquery('love') order by timestamp desc limit 24 offset 0;


QUERY PLAN
--
Limit  (cost=453248.73..453248.79 rows=24 width=281) (actual 
time=188441.047..188441.148 rows=24 loops=1)
  -  Sort  (cost=453248.73..453882.82 rows=253635 width=281) (actual 
time=188441.043..188441.079 rows=24 loops=1)

Sort Key: timestamp
Sort Method:  top-N heapsort  Memory: 42kB
-  Bitmap Heap Scan on a  (cost=17782.16..446166.02 rows=253635 
width=281) (actual time=2198.930..187948.050 rows=256378 loops=1)

  Recheck Cond: (comment_tsv @@ plainto_tsquery('love'::text))
  -  Bitmap Index Scan on timestamp_comment_gin 
(cost=0.00..17718.75 rows=253635 width=0) (actual time=2113.664..2113.664 
rows=259828 loops=1)
Index Cond: (comment_tsv @@ 
plainto_tsquery('love'::text))

Total runtime: 188442.617 ms
(9 rows)

archive= explain analyze select * from a where  comment_tsv @@ 
plainto_tsquery('love') limit 24 offset 0;


QUERY PLAN
--
Limit  (cost=0.00..66.34 rows=24 width=281) (actual time=14.632..53.647 
rows=24 loops=1)
  -  Seq Scan on a  (cost=0.00..701071.49 rows=253635 width=281) (actual 
time=14.629..53.588 rows=24 loops=1)

Filter: (comment_tsv @@ plainto_tsquery('love'::text))
Total runtime: 53.731 ms
(4 rows)

First one runs painfully slow.


Hmm, everything is already written in explain :) In the first query 
253635 rows should be readed from disk and sorted, while in the

second query only 24 (random) rows readed from disk, so there is 4 magnitudes
difference and in the worst case you should expected time for the 1st query
about 53*10^4 ms.



Is there really no way to have efficient full text search results ordered by 
a separate field? I'm really open to all possibilities, at this point.


Thanks.




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: o...@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full text search with ORDER BY performance issue

2009-07-20 Thread Marcin Stępnicki
On Sun, Jul 19, 2009 at 12:07 AM, Kradekr...@krade.com wrote:
 archive= explain analyze select * from a where  comment_tsv @@
 plainto_tsquery('love') order by timestamp desc limit 24 offset 0;

What happens if you make it:


select * from (
select * from a where  comment_tsv @@plainto_tsquery('love')
) xx

order by xx.timestamp desc
limit 24 offset 0;

?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Robert James
Query is:
select * from dict
where
 word in (select substr('moon', 0, generate_series(3,length('moon' --
this is my X above
 OR word like 'moon%' -- this is my Y above

dict is indexed on word
2009/7/20 Chris dmag...@gmail.com

 2009/7/20 Robert James srobertja...@gmail.com mailto:
 srobertja...@gmail.com


Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y,
it doesn't. Why is this so?

 What Robert didn't post was his query, see

 http://archives.postgresql.org/pgsql-general/2009-07/msg00767.php

 which makes it a lot harder to 'optimize' since they aren't straight
 forward conditions.



Re: [PERFORM] Can Postgres use an INDEX over an OR?

2009-07-20 Thread Віталій Тимчишин
20 липня 2009 р. 11:02 Chris dmag...@gmail.com написав:

 Віталій Тимчишин wrote:



 2009/7/20 Robert James srobertja...@gmail.com mailto:
 srobertja...@gmail.com


Hi. I notice that when I do a WHERE x, Postgres uses an index, and
when I do WHERE y, it does so as well, but when I do WHERE x OR y,
it doesn't. Why is this so?

 It's not clever enough.


 Of course it is.


For simple cases



 I'm running 8.3.7.

 create table t1(id int primary key);
 insert into t1(id) select a from generate_series(1, 50) as s(a);
 analyze t1;


explain analyze select * from t1 where
id  1

Index Scan using t1_pkey on t1  (cost=0.00..322.51 rows=9612 width=4)
(actual time=0.030..3.700 rows= loops=1)
  Index Cond: (id  1)
Total runtime: 4.835 ms

explain analyze select * from t1 where
id in (select (random() * 50)::int4 from generate_series(0,10))

Nested Loop  (cost=32.50..1341.49 rows=200 width=4) (actual
time=15.353..67.014 rows=11 loops=1)
  -  HashAggregate  (cost=32.50..34.50 rows=200 width=4) (actual
time=0.028..0.043 rows=11 loops=1)
-  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
width=0) (actual time=0.014..0.020 rows=11 loops=1)
  -  Index Scan using t1_pkey on t1  (cost=0.00..6.52 rows=1 width=4)
(actual time=6.083..6.084 rows=1 loops=11)
Index Cond: (t1.id = (((random() * 50::double
precision))::integer))
Total runtime: 67.070 ms

explain analyze select * from t1 where
id in (select (random() * 50)::int4 from generate_series(0,10))
or
id  1

Seq Scan on t1  (cost=22.50..9735.50 rows=254806 width=4) (actual
time=0.049..148.947 rows=10010 loops=1)
  Filter: ((hashed subplan) OR (id  1))
  SubPlan
-  Function Scan on generate_series  (cost=0.00..20.00 rows=1000
width=0) (actual time=0.014..0.019 rows=11 loops=1)
Total runtime: 150.123 ms

explain analyze
select * from t1 where
id in (select (random() * 50)::int4 from generate_series(0,10))
union
select * from t1 where
id  1

Unique  (cost=2412.68..2461.74 rows=9812 width=4) (actual
time=89.190..95.014 rows=10010 loops=1)
  -  Sort  (cost=2412.68..2437.21 rows=9812 width=4) (actual
time=89.189..91.167 rows=10010 loops=1)
Sort Key: public.t1.id
Sort Method:  quicksort  Memory: 854kB
-  Append  (cost=32.50..1762.13 rows=9812 width=4) (actual
time=16.641..76.338 rows=10010 loops=1)
  -  Nested Loop  (cost=32.50..1341.49 rows=200 width=4)
(actual time=16.641..70.051 rows=11 loops=1)
-  HashAggregate  (cost=32.50..34.50 rows=200 width=4)
(actual time=0.033..0.049 rows=11 loops=1)
  -  Function Scan on generate_series
(cost=0.00..20.00 rows=1000 width=0) (actual time=0.020..0.026 rows=11
loops=1)
-  Index Scan using t1_pkey on t1  (cost=0.00..6.52
rows=1 width=4) (actual time=6.359..6.361 rows=1 loops=11)
  Index Cond: (public.t1.id = (((random() *
50::double precision))::integer))
  -  Index Scan using t1_pkey on t1  (cost=0.00..322.51
rows=9612 width=4) (actual time=0.023..4.075 rows= loops=1)
Index Cond: (id  1)
Total runtime: 112.694 ms

So, if it founds out anything complex, it sadly falls back to Sequence scan.


[PERFORM] Used computers?

2009-07-20 Thread Craig James

Apologies for a slightly off-topic question ... a friend is overseeing the demise 
of a company and has several computers that they need to get rid of.  She's an 
attorney and knows little about them except that they're IBM and cost $50K 
originally.  Where does one go to sell equipment like this, and/or get a rough 
idea of its worth?

Thanks,
Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Used computers?

2009-07-20 Thread Scott Marlowe
On Mon, Jul 20, 2009 at 8:29 AM, Craig Jamescraig_ja...@emolecules.com wrote:
 Apologies for a slightly off-topic question ... a friend is overseeing the
 demise of a company and has several computers that they need to get rid of.
  She's an attorney and knows little about them except that they're IBM and
 cost $50K originally.  Where does one go to sell equipment like this,
 and/or get a rough idea of its worth?

I generally use ebay to get an idea, especially by searching the sales
that are over.  You can sell them there, or craig's list.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Used computers?

2009-07-20 Thread Glyn Astill

Here in the UK, we have Waste electrical and electronic equipment (WEEE) 
companies that'll safely destroy or sell them on for a cut of the profits.

--- On Mon, 20/7/09, Craig James craig_ja...@emolecules.com wrote:

 From: Craig James craig_ja...@emolecules.com
 Subject: [PERFORM] Used computers?
 To: pgsql-performance@postgresql.org
 Date: Monday, 20 July, 2009, 3:29 PM
 Apologies for a slightly off-topic
 question ... a friend is overseeing the demise of a company
 and has several computers that they need to get rid
 of.  She's an attorney and knows little about them
 except that they're IBM and cost $50K originally. 
 Where does one go to sell equipment like this, and/or get a
 rough idea of its worth?
 
 Thanks,
 Craig
 
 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] XMLPARSE() evaluated multiple times?

2009-07-20 Thread Tim Landscheidt
Hi,

I have been playing around with PostgreSQL's XML support
lately (cf.
URI:news:m3ljmocolf@passepartout.tim-landscheidt.de)
and stumbled upon some performance issues related to
XMLPARSE(). In my application, the XML document is supp-
lied as a string constant via a DBI ? parameter, for testing
purposes I have put it into a separate table:

| tim=# \timing
| Zeitmessung ist an.
| tim=# SELECT LENGTH(XMLasText) FROM tmpTestData;
|  length
| 
|  364446
| (1 Zeile)

| Zeit: 6,295 ms
| tim=# SELECT SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) FROM 
tmpTestData;
|  substring
| ---
|  
| (1 Zeile)

| Zeit: 40,072 ms
| tim=#

(The SUBSTRING()s above and following are for reasons of
brevity only; the results are comparable when the raw XML is
queried.)

| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) 
FROM generate_series(1, 10) AS G(A), tmpTestData;
|  a  | substring
| +---
|   1 | 
| [...]
|  10 | 
| (10 Zeilen)

| Zeit: 416,069 ms
| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) 
FROM generate_series(1, 100) AS G(A), tmpTestData;
|   a  | substring
| -+---
|1 | 
| [...]
|  100 | 
| (100 Zeilen)

| Zeit: 3029,196 ms
| tim=# SELECT G.A, SUBSTRING(XMLPARSE(DOCUMENT XMLasText)::TEXT FROM 1 FOR 1) 
FROM generate_series(1, 1000) AS G(A), tmpTestData;
|   a   | substring
| --+---
| 1 | 
|  1000 | 
| (1000 Zeilen)

| Zeit: 30740,626 ms
| tim=#

It seems that XMLPARSE() is called for every row without
PostgreSQL realizing that it is IMMUTABLE. This even seems
to be the case if the XMLPARSE() is part of a WHERE clause:

| tim=# SELECT G.A FROM generate_series(1, 10) AS G(A) WHERE G.A::TEXT = 
XMLPARSE(DOCUMENT (SELECT XMLasText FROM tmpTestData))::TEXT;
|  a
| ---
| (0 Zeilen)

| Zeit: 240,626 ms
| tim=# SELECT G.A FROM generate_series(1, 100) AS G(A) WHERE G.A::TEXT = 
XMLPARSE(DOCUMENT (SELECT XMLasText FROM tmpTestData))::TEXT;
|  a
| ---
| (0 Zeilen)

| Zeit: 2441,135 ms
| tim=# SELECT G.A FROM generate_series(1, 1000) AS G(A) WHERE G.A::TEXT = 
XMLPARSE(DOCUMENT (SELECT XMLasText FROM tmpTestData))::TEXT;
|  a
| ---
| (0 Zeilen)

| Zeit: 25228,180 ms
| tim=#

Obviously, the problem can be circumvented by caching
the results of the XMLPARSE() in a temporary table (or even
a IMMUTABLE function?), but I would assume that this should
be PostgreSQL's task.

  Any thoughts why this is not the case already? :-)

Tim


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] XMLPARSE() evaluated multiple times?

2009-07-20 Thread Tom Lane
Tim Landscheidt t...@tim-landscheidt.de writes:
 It seems that XMLPARSE() is called for every row without
 PostgreSQL realizing that it is IMMUTABLE.

Indeed, the system doesn't consider it immutable.  None of the examples
you show would benefit if it did, though.

I believe there are GUC-parameter dependencies that prevent us from
treating it as truly immutable, but if you want to ignore that
consideration and force constant-folding anyway, you could wrap it
in a SQL function that's marked as IMMUTABLE.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full text search with ORDER BY performance issue

2009-07-20 Thread Krade

Hello, thanks for your replies.


On 7/20/2009 13:12, Oleg Bartunov wrote:

Hmm, everything is already written in explain :) In the first query 
253635 rows should be readed from disk and sorted, while in the
second query only 24 (random) rows readed from disk, so there is 4 
magnitudes
difference and in the worst case you should expected time for the 1st 
query

about 53*10^4 ms.
Yes, I do realize the first query is retrieving all the rows that match 
the full text search and sorting them, that's what I wanted to avoid. :) 
Since I only want 24 results at a time, I wanted to avoid having to get 
all the rows and sort them. I was wondering if there was any way to use, 
say, some index combination I'm not aware of, cluster the table 
according to an index or using a different query to get the same results.


Well, to take advantage of the gin index on (timestamp, comment_tsv), I 
suppose could do something like this:
archive= explain analyze select * from a where comment_tsv @@ 
plainto_tsquery('love') and timestamp  cast(floor(extract(epoch from 
CURRENT_TIMESTAMP) - 864000) as integer) order by timestamp limit 24 
offset 0;


QUERY PLAN
--
Limit (cost=17326.69..17326.75 rows=24 width=281) (actual 
time=3249.192..3249.287 rows=24 loops=1)
- Sort (cost=17326.69..17337.93 rows=4499 width=281) (actual 
time=3249.188..3249.221 rows=24 loops=1)

Sort Key: timestamp
Sort Method: top-N heapsort Memory: 39kB
- Bitmap Heap Scan on a (cost=408.80..17201.05 rows=4499 width=281) 
(actual time=3223.890..3240.484 rows=5525 loops=1)
Recheck Cond: ((timestamp  (floor((date_part('epoch'::text, now()) - 
864000::double precision)))::integer) AND (comment_tsv @@ 
plainto_tsquery('love'::text)))
- Bitmap Index Scan on timestamp_comment_gin (cost=0.00..407.67 
rows=4499 width=0) (actual time=3222.769..3222.769 rows=11242 loops=1)
Index Cond: ((timestamp  (floor((date_part('epoch'::text, now()) - 
864000::double precision)))::integer) AND (comment_tsv @@ 
plainto_tsquery('love'::text)))

Total runtime: 3249.957 ms
(9 rows)

Which only looks at the last 10 days and is considerably faster. Not 
perfect, but a lot better. But this requires a lot of application logic, 
for example, if I didn't get 24 results in the first query, I'd have to 
reissue the query with a larger time interval and it gets worse pretty 
fast. It strikes me as a really dumb thing to do.


I'm really hitting a brick wall here, I can't seem to be able to provide 
reasonably efficient full text search that is ordered by date rather 
than random results from the database.


On 7/20/2009 13:22, Marcin Stępnicki wrote:

What happens if you make it:


select * from (
select * from a where comment_tsv @@plainto_tsquery('love')
) xx

order by xx.timestamp desc
limit 24 offset 0;

?

Same query plan, I'm afraid.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calling conventions

2009-07-20 Thread Kevin Grittner
Matthew Wakeling matt...@flymine.org wrote:
 
 On Fri, 17 Jul 2009, Kevin Grittner wrote:
 
 but a factor of 100?
 
 The Java code runs all in RAM, versus Postgres running all from OS
 cache or Postgres shared buffer (bit  hard to tell which of those
 two it is - there is no hard drive activity anyway). The Java code
 does no page locking, whereas Postgres does loads.  The Java code is
 emulating just the index, whereas Postgres is fetching the whole row
 as well.
 
Oh, well, if you load all the data into Java's heap and are accessing
it through HashMap or similar, I guess a factor of 100 is about right.
I see the big difference as the fact that the Java implementation is
dealing with everything already set up in RAM, versus needing to deal
with a disk image format, even if it is cached.  Try serializing
those Java objects to disk and storing the file name in the HashMap,
retrieving and de-serializing the object for each reference.  Even if
it's all cached, I expect you'd be running about 100 times slower.
 
The Java heap isn't a very safe place to persist long-term data,
however.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fastest char datatype

2009-07-20 Thread Merlin Moncure
On Sun, Jul 19, 2009 at 9:46 PM, Robert Jamessrobertja...@gmail.com wrote:
 I'm storing a lot of words in a database.  What's the fastest format for
 finding them? I'm going to be doing a lot of WHERE w LIKE 'marsh%' and WHERE
 w IN ('m', 'ma').  All characters are lowercase a-z, no punctuation, no
 other alphabets.  By default I'm using varchar in utf-8 encoding, but was
 wondering if I could specificy something else (perhaps 7bit ascii, perhaps
 lowercase only) that would speed things up even further.

All the charater types are basically the same except for char(n) which
pads out the string on disk.  Reading downthread, [a-z] needs more
than 4 bits (4 bits could only represent 16 characters).   5 bits is a
very awkward number in computing, which may explain why this type of
encoding is rarely done.  Coming from the 'cobol' world, where there
were all kinds of zany bit compressed encodings, I can tell you that
the trend is definitely in the other direction...standard data layouts
coupled with well known algorithms.

Any type of simple bitwise encoding that would get you any space
benefit would mean converting your text fields to bytea.  This would
mean that any place you needed to deal with your text field as text
would require running your data through a decoder function...you would
encode going into the field and decode going out...ugh.

Better would be to use a functional index:
create index foo_idx on foo(compactify(myfield));

If you don't need index ordering, then you could swap a hash function
for compactify and have it return type 'int'.  This should give the
best possible performance (probably better than the built in hash
index).  You would probably only see a useful benefit if your average
string length was well over 10 characters though.  In the end though,
I bet you're best off using a vanilla text field/index unless you
expect your table to get really huge.  PostgreSQL's btree
implementation is really quite amazing.

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full text search with ORDER BY performance issue

2009-07-20 Thread Kevin Grittner
Krade kr...@krade.com wrote:
 SELECT * FROM a WHERE comment_tsv @@ plainto_tsquery('love')
 ORDER BY timestamp DESC LIMIT 24 OFFSET 0;
 
Have you considered keeping rows narrow until you've identified your
24 rows?  Something like:
 
SELECT * FROM a 
  WHERE id in
(
  SELECT id FROM a
WHERE comment_tsv @@ plainto_tsquery('love')
ORDER BY timestamp DESC
LIMIT 24 OFFSET 0
)
  ORDER BY timestamp DESC
;
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Calling conventions

2009-07-20 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Oh, well, if you load all the data into Java's heap and are accessing
 it through HashMap or similar, I guess a factor of 100 is about right.
 I see the big difference as the fact that the Java implementation is
 dealing with everything already set up in RAM, versus needing to deal
 with a disk image format, even if it is cached.

Eliminating interprocess communication overhead might have something
to do with it, too ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Used computers?

2009-07-20 Thread Andy Colson

Craig James wrote:
Apologies for a slightly off-topic question ... a friend is overseeing 
the demise of a company and has several computers that they need to get 
rid of.  She's an attorney and knows little about them except that 
they're IBM and cost $50K originally.  Where does one go to sell 
equipment like this, and/or get a rough idea of its worth?


Thanks,
Craig



When I was looking for Sun boxes I found several places that buy/sell used 
hardware, some took IBM too.

I googled buy sell used ibm and came up with several, like:

http://www.usedserversystems.com/used-ibm-servers.htm

You could check with them and see what they are selling for.  (And maybe what 
they'd buy for)

Also, there is always ebay.

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full text search with ORDER BY performance issue

2009-07-20 Thread Krade

On 7/21/2009 2:13, Devin Ben-Hur wrote:
Have you tried make the full-text condition in a subselect with 
offset 0 to stop the plan reordering?


eg:

select *
from (
  select * from a where comment_tsv @@ plainto_tsquery('love')
  offset 0
) xx
order by timestamp DESC
limit 24
offset 0;


See http://blog.endpoint.com/2009/04/offset-0-ftw.html
Yes, that does force the planner to always pick the full text index 
first rather than the timestamp index. I managed to force that by doing 
something a lot more drastic, I just dropped my timestamp index 
altogether, since I never used it for anything else. (I mentioned this 
in my original post)


Though, that comment did make me try to readd it. I was pretty 
surprised, the planner was only doing backward searches on the timestamp 
index for very common words (therefore turning multi-minute queries into 
very fast ones), as opposed to trying to use the timestamp index for all 
queries. I wonder if this is related to tweaks to the planner in 8.4 or 
if it was just my statistics that got balanced out.


I'm not entirely happy, because I still easily get minute long queries 
on common words, but where the planner choses to not use the timestamp 
index. The planner can't guess right all the time.


But I think I might just do:
select * from a where comment_tsv @@ plainto_tsquery('query') and 
timestamp  cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) 
as integer) order by timestamp desc limit 24 offset 0;


And if I get less than 24 rows, issue the regular query:

select * from a where comment_tsv @@ plainto_tsquery('query') order by 
timestamp desc limit 24 offset 0;


I pay the price of doing two queries when I could have done just one, 
and it does make almost all queries about 200 ms slower, but it does so 
at the expense of turning the few very slow queries into quick ones.


Thanks for all the help.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Full text search with ORDER BY performance issue

2009-07-20 Thread Scott Marlowe
On Mon, Jul 20, 2009 at 9:35 PM, Kradekr...@krade.com wrote:

 But I think I might just do:
 select * from a where comment_tsv @@ plainto_tsquery('query') and timestamp
 cast(floor(extract(epoch from CURRENT_TIMESTAMP) - 864000) as integer)
 order by timestamp desc limit 24 offset 0;

 And if I get less than 24 rows, issue the regular query:

 select * from a where comment_tsv @@ plainto_tsquery('query') order by
 timestamp desc limit 24 offset 0;

Couldn't you do tge second query as a with query then run another
query to limit that result to everything greater than now()-xdays ?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance