[GENERAL] delaying autovacuum freeze via storage params?

2015-05-26 Thread Steve Kehlet
Hello, I'd like to postpone an autovacuum: VACUUM public.mytable (to
prevent wraparound) and handle it manually at another time. I thought I
could set these storage parameters on the large table in question
(mytable) like this:

ALTER TABLE mytable SET (
  autovacuum_freeze_min_age=1000,
  autovacuum_freeze_table_age=8,
  autovacuum_freeze_max_age=10
);

then kill the running autovacuum process, but unfortunately it just keeps
coming back:

 mydb   | 22734 | 05/26/2015 23:29:10 | autovacuum: VACUUM public.mytable
(to prevent wraparound)

I also tried setting the toast.autovacuum* storage parameters, as well as a
postgres reload, but neither made any difference.

Here is the table's age:

mydb= select age(relfrozenxid) from pg_class where oid =
'mytable'::regclass;
age
---
 203065951
(1 row)

So 203 million, which is just past the 200 million default (which is why I
started looking at this). But it's less than the 1 billion I set the
storage param to.

Here is \d+:

mydb= \d+ mytable
  Table public.mytable
 Column  |Type |   Modifiers
| Storage  | Stats target | Description
-+-+---+--+--+-
 (removed al lthis)
Options: autovacuum_freeze_min_age=1000,
autovacuum_freeze_table_age=8,
autovacuum_freeze_max_age=10, toast.autovacuum_enabled=false,
toast.autovacuum_freeze_min_age=1000,
toast.autovacuum_freeze_table_age=8,
toast.autovacuum_freeze_max_age=10


I'm guessing I'm missing something here, can anyone offer suggestions or
ideas on what I should try next?

Here is my version:

mydb= SELECT version();
version

---
 PostgreSQL 9.3.4 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2
20080704 (Red Hat 4.1.2-44), 64-bit
(1 row)

and I put my [custom settings in a gist](
https://gist.github.com/skehlet/bcdbc1ce1f212d3b7bb2).

Thanks!


Re: [GENERAL] delaying autovacuum freeze via storage params?

2015-05-26 Thread Alvaro Herrera
Steve Kehlet wrote:
 Hello, I'd like to postpone an autovacuum: VACUUM public.mytable (to
 prevent wraparound) and handle it manually at another time. I thought I
 could set these storage parameters on the large table in question
 (mytable) like this:
 
 ALTER TABLE mytable SET (
   autovacuum_freeze_min_age=1000,
   autovacuum_freeze_table_age=8,
   autovacuum_freeze_max_age=10
 );

See the docs about the freeze max age storage parameter -- the per-table
setting can decrease the global setting but not increase it.  You can
increase the global setting (postgresql.conf) to 2 billion AFAIR which
should give you plenty of room.  Needs a server restart though.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Reg: BULK COLLECT

2015-05-26 Thread Medhavi Mahansaria
Dear Andy,

We are using bulk collect to enhance the performance as the data is huge.

But as you said it is ideally insert into b select * from a;

So now I am using the looping through query result option as Adrian 
suggested.

http://www.postgresql.org/docs/9.4/interactive/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING

Thank You Adrian.


Thanks  Regards
Medhavi Mahansaria
Tata Consultancy Services Limited
Unit-VI, No.78, 79 83,
L-Centre, EPIP Industrial Estate,
Whitefield
Bangalore - 560066,Karnataka
India
Ph:- +91 80 67253769
Cell:- +91 9620053040
Mailto: medhavi.mahansa...@tcs.com
Website: http://www.tcs.com

Experience certainty.   IT Services
Business Solutions
Consulting




From:   Andy Colson a...@squeakycode.net
To: Medhavi Mahansaria medhavi.mahansa...@tcs.com, 
pgsql-general@postgresql.org
Date:   05/25/2015 06:15 PM
Subject:Re: [GENERAL] Reg: BULK COLLECT



On 05/25/2015 07:24 AM, Medhavi Mahansaria wrote:
 Hello,

 I am porting my application from Oracle to PostgreSQL.

 We are using BULK COLLECT functionality of oracle.
 How can i change the 'BULK COLLECT' fetch of the data from the cursor to 
make if compatible for pg/plsql?

 A small example is as below (This is just an example and the query is 
much more complex which returns huge amount of data)


 */CREATE OR REPLACE FUNCTION abc() RETURNS VOID AS $body$/*

 */DECLARE/*
 */l_data b%ROWTYPE;/*

 */POPULATE_STATS CURSOR IS/*
 */(/*
 */SELECT * from a/*
 */)/*
 */;// query returning a huge amount of data/*

 */BEGIN/*
 */  OPEN POPULATE_STATS;/*
 */  LOOP/*
 */  FETCH POPULATE_STATS BULK COLLECT INTO l_data LIMIT 1000;/*
 */IF POPULATE_STATS%ROWCOUNT  0/*
 */THEN/*
 */FORALL i IN 1..l_data.COUNT/*
 */  INSERT INTO b VALUES l_data(i);/*
 */END IF;/*
 */  IF NOT FOUND THEN EXIT; END IF; /*
 */  END LOOP;/*
 */  CLOSE POPULATE_STATS;/*
 */EXCEPTION/*
 */  WHEN OTHERS THEN/*
 */  CODE := SQLSTATE;/*
 */  MSG := SQLERRM;/*
 */ INSERT INTO tracker VALUES (CODE,MSG,LOCALTIMESTAMP);/*
 */  RAISE NOTICE 'SQLERRM';/*
 */  RAISE NOTICE '%', SQLSTATE;/*
 */  RAISE NOTICE '%', MSG;/*
 */END;
 /*
 */$body$/*
 */LANGUAGE PLPGSQL;/*

 How can i change the 'BULK COLLECT' fetch of the data from the cursor to 
make if compatible for pg/plsql?


 Thanks  Regards
 Medhavi Mahansaria
 Mailto: medhavi.mahansa...@tcs.com


That seems pretty over complicated version of:

insert into b select * from a;

Which is all you'll need in PG.  It it does something else, then I failed 
to understand the stored proc.

-Andy


=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: [GENERAL] FW: Constraint exclusion in partitions

2015-05-26 Thread Francisco Olarte
Hi Daniel:

On Mon, May 25, 2015 at 10:03 PM, Daniel Begin jfd...@hotmail.com wrote:
...
 Even after doing all this, I did not find any improvement in execution times 
 between my original fat table and the partitioned version (sometime even 
 worst). If partitioning the table has improved significantly queries running 
 times, I could have partitioned the tables differently to accommodate other 
 query types I will have to run later in my research (I have the same problem 
 for half a dozen tables).
 Since it does not seem that partitioning will do the job, I will get back to 
 the original table to run my queries...

Well, at least you've learnt some things about it and you can expect
to repeat the measurements faster shoudl you need it.

 However, just in case someone knows a magical trick that can improve 
 significantly the speed of my queries (but haven't told me yet!-) here are 
 the details about the concerned table/indexes
.
 Table size: 369GB
 Indexes size: 425GB
 I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.

Supposing you can dedicate about 12 Gb to shared buffers / caches,
your caches are going to get trashed often with real work, that is why
we recommended repeating the queries.

Anyway, one last remark. Partition is not magic, it helps a lot
depending on the access records. For workloads like mine they help a
lot  ( call records, where I insert frequently ( so position
correlates strongly with indexes ), nearly never update ( and I work
with high fill factors ), and query frequently for unindexed
conditions plus partition-related ranges they work great ( a big table
forces index scans, which due to correlation are fast, but indexed
anyways, plus filters on extra conditions, partitions normally go to
sequential partition scans plus filters, and sequential scans are way
faster, plus the normal queries go normally to the last things
inserted, so partitions help to keep them cached ).

For queries like the ones you've timed/shown ( partition on an integer
PK of unknown origin, queries for single values ( IN queries are
normally just several single repeated ) or small ranges, big table is
normally gonna beat partition hands down ( except if you have
locality, like you are inserting serials and querying frequently in
the vicinity of the inserted ones, in this case partitions keeps used
tables small and cacheable and may give you a small edge ).

 I am using PostgreSQL 9.3 and the database cluster is spread over 2X3TB 
 external drives with write caching.

Well, from your last measurements it seems your disk systems is awful
for database work. I do not know what you mean by external drives (
eSata? firewire? Usb2? usb3? also, any kind of volume management ) but
in your fist query:


 db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id 
 IN(10005000,105000,205000,305000);
  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 
 rows=6726 width=66) (actual time=52.226..288.700 rows=6 loops=1)
Index Cond: (id = ANY 
 ('{10005000,105000,205000,305000}'::bigint[]))
  Total runtime: 288.732 ms
 --Second attempt;
 db=# EXPLAIN ANALYSE SELECT * FROM old_nodes WHERE id 
 IN(10005000,105000,205000,305000);
  Index Scan using nodes_idversion_pk on old_nodes  (cost=0.71..17739.18 
 rows=6726 width=66) (actual time=0.014..0.035 rows=6 loops=1)
Index Cond: (id = ANY 
 ('{10005000,105000,205000,305000}'::bigint[]))
  Total runtime: 0.056 ms

288 ms for a query which should do 8-10 disk reads seems too slow to
me. And you get nearly the same on the second cache. I would normally
expect 100ms for any reasonable drive, and 50 for any db tuned
disks. I do not remember the exact parameters, but if your samples
reprensent your future workload you need to tune well for disk access
time. It's specially noticeable in the last example ( query with ID
from a subquery ), where you got the times:

 --Select 
 ids-
 --Explain analyse on original table for a query that will look into one 
 partition on the new table but list of ids provided through a select statement
 --First attempt;
  Total runtime: 2290.122 ms
 --Second attempt;
  Total runtime: 26.005 ms

Warm caches help you a lot here.

 --Explain analyse on partitioned table for a query that will look into one 
 partition but list of ids provided through a select statement
 --First attempt;
  Total runtime: 19142.983 ms
 --Second attempt;
  Total runtime: 1383.929 ms

And here too, and also, as parition means always hitting more blocks (
see it in reverse, if you collapse partitions maybe some data ends in
the same block and you save some reads, or not, but you will never be
worse ), slow disks hurt you more.

One last remark. I have not seen your data, but from what you've told
and the shown queries I would go for the 

Re: [GENERAL] Replacing uuid-ossp with uuid-freebsd

2015-05-26 Thread Piotr Gasidło
2015-05-25 16:28 GMT+02:00 Adrian Klaver adrian.kla...@aklaver.com:
 On 05/25/2015 07:17 AM, Piotr Gasidło wrote:
 (...)
 I've moved from Linux to FreeBSD. I've used uuid-ossp. Now I need to
 aply patch to make it work under FreeBSD. This is rather dirty hack.
 So I need to replace it once and for all with uuid-freebsd module. (...)

 What version of Postgres?

 So did you see the section at the bottom of this page?:

 http://www.postgresql.org/docs/9.4/interactive/uuid-ossp.html

Thanks - this helps.

Currently I'm running 9.3.4. My plan was to upgrade to latest 9.3.x
and then to 9.4. With your hint it's now easier.

-- 
Piotr Gasidło


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Oleg Bartunov
You can ask http://snowball.tartarus.org/ for stemmer. Meanwhile,
you can have small personal dictionary (before stemmer) with such
exceptions, for example, use synonym template

system system

Oleg


On Tue, May 26, 2015 at 11:18 AM, Sven R. Kunze srku...@tbz-pariv.de
wrote:

 Hi everybody,

 the following stemming results made me curious:

 select to_tsvector('german', 'systeme');  'system':1
 select to_tsvector('german', 'systemes');  'system':1
 select to_tsvector('german', 'systems');  'system':1
 select to_tsvector('german', 'systemen');  'system':1
 select to_tsvector('german', 'system');   'syst':1


 First of all, this seems to be a bug in the German stemmer. Where can I
 fix it?

 Second, and more importantly, as I understand it, the stemmed version of a
 word should be considered normalized. That is, all other versions of that
 stem should be mapped to it as well. The interesting problem here is that
 PostgreSQL maps the stem itself ('system') to a completely different stem
 ('syst').

 Should a stem not remain stable even when to_tsvector is called on it
 multiple times?

 --
 Sven R. Kunze
 TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
 Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
 e-mail: srku...@tbz-pariv.de
 web: www.tbz-pariv.de

 Geschäftsführer: Dr. Reiner Wohlgemuth
 Sitz der Gesellschaft: Chemnitz
 Registergericht: Chemnitz HRB 8543



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



[GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Sven R. Kunze

Hi everybody,

the following stemming results made me curious:

select to_tsvector('german', 'systeme');  'system':1
select to_tsvector('german', 'systemes');  'system':1
select to_tsvector('german', 'systems');  'system':1
select to_tsvector('german', 'systemen');  'system':1
select to_tsvector('german', 'system');   'syst':1


First of all, this seems to be a bug in the German stemmer. Where can I 
fix it?


Second, and more importantly, as I understand it, the stemmed version of 
a word should be considered normalized. That is, all other versions of 
that stem should be mapped to it as well. The interesting problem here 
is that PostgreSQL maps the stem itself ('system') to a completely 
different stem ('syst').


Should a stem not remain stable even when to_tsvector is called on it 
multiple times?


--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Albe Laurenz
Sven R. Kunze wrote:
 the following stemming results made me curious:
 
 select to_tsvector('german', 'systeme');  'system':1
 select to_tsvector('german', 'systemes');  'system':1
 select to_tsvector('german', 'systems');  'system':1
 select to_tsvector('german', 'systemen');  'system':1
 select to_tsvector('german', 'system');   'syst':1
 
 
 First of all, this seems to be a bug in the German stemmer. Where can I
 fix it?

As far as I understand, the stemmer is not perfect, it is just a best
effort at German stemming.  It does not have a dictionary of valid German
words, but uses an algorithm based on only the occurring letters.

This web page describes the algorithm:
http://snowball.tartarus.org/algorithms/german/stemmer.html
I guess that the Snowball folks (and PostgreSQL) would be interested
if you could come up with a better algorithm.

In this specific case, the stemmer goes wrong because System is a
foreign word whose ending is atypical for German.  The algorithm cannot
distinguish between System and, say, lautem or bestem.

 Second, and more importantly, as I understand it, the stemmed version of
 a word should be considered normalized. That is, all other versions of
 that stem should be mapped to it as well. The interesting problem here
 is that PostgreSQL maps the stem itself ('system') to a completely
 different stem ('syst').
 
 Should a stem not remain stable even when to_tsvector is called on it
 multiple times?

That's a possible position, but consider that a stem is not necessarily
a valid German word.  If you treat it as a German word (by stemming it),
the results might not be what you desire.

For example:

test= select to_tsvector('german', 'linsen');
 to_tsvector
-
 'lins':1
(1 row)

test= select to_tsvector('german', 'lins');
 to_tsvector
-
 'lin':1
(1 row)

I guess that your real problem here is that a search for system
will not find systeme, which is indeed unfortunate.
But until somebody can come up with a better stemming algorithm, cases
like that can always occur.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Sven R. Kunze
Thanks. It seems as if I have use snowball. So, I go ahead and post my 
issue at github.



Maybe, I have difficulties to understand the relationship/dependencies 
between all these 'maybe' available dictionary/parser/stemmer packages.


What happens if I install all packages for a single language? (hunspell, 
myspell, ispell, snowball)


Are they complementary? Do they replace each other?


 \dFd
 List of text search dictionaries
   Schema   |  Name   | Description
+-+---
 pg_catalog | danish_stem | snowball stemmer for danish language
 pg_catalog | dutch_stem  | snowball stemmer for dutch language
 pg_catalog | english_stem| snowball stemmer for english language
 pg_catalog | finnish_stem| snowball stemmer for finnish language
 pg_catalog | french_stem | snowball stemmer for french language
 pg_catalog | german_stem | snowball stemmer for german language
 pg_catalog | hungarian_stem  | snowball stemmer for hungarian language
 pg_catalog | italian_stem| snowball stemmer for italian language
 pg_catalog | norwegian_stem  | snowball stemmer for norwegian language
 pg_catalog | portuguese_stem | snowball stemmer for portuguese language
 pg_catalog | romanian_stem   | snowball stemmer for romanian language
 pg_catalog | russian_stem| snowball stemmer for russian language
 pg_catalog | simple  | simple dictionary: just lower case and 
check for stopword

 pg_catalog | spanish_stem| snowball stemmer for spanish language
 pg_catalog | swedish_stem| snowball stemmer for swedish language
 pg_catalog | turkish_stem| snowball stemmer for turkish language
(16 rows)


On 26.05.2015 12:09, Albe Laurenz wrote:

Sven R. Kunze wrote:

However, are you sure, I am using snowball? Maybe, I am reading the
documenation wrong:

test= SELECT * FROM ts_debug('german', 'system');
alias   |   description   | token  | dictionaries  | dictionary  | lexemes
---+-++---+-+-
  asciiword | Word, all ASCII | system | {german_stem} | german_stem | {syst}
(1 row)

test= \dFd german_stem
 List of text search dictionaries
Schema   |Name | Description
+-+--
  pg_catalog | german_stem | snowball stemmer for german language
(1 row)


http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html
but it seems as it depends on which packages (ispell, hunspell, myspell,
snowball + corresponding languages) my system has installed.

Is there an easy way to determine which of these packages PostgreSQL
uses AND what for?

If you use a standard PostgreSQL distribution, you will have no ispell
dictionary (as the documentation you quote says).
You can always list all dictionaries with \dFd in psql.


Sure. That might be the problem. It occurs to me that stems (if detected
as such) should be left alone.
In case a stem is real German word, it should be stemmed to itself anyway
If not, it might help not to stem in order to avoid errors.

Yes, but that would mean that you have a way to determine from a string
whether it is a word or a stem or both, and the software does not do that.

Yours,
Laurenz Albe



Regards,

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Sven R. Kunze

For future reference: https://github.com/snowballstem/snowball/issues/19


On 26.05.2015 12:29, Sven R. Kunze wrote:
Thanks. It seems as if I have use snowball. So, I go ahead and post my 
issue at github.



Maybe, I have difficulties to understand the relationship/dependencies 
between all these 'maybe' available dictionary/parser/stemmer packages.


What happens if I install all packages for a single language? 
(hunspell, myspell, ispell, snowball)


Are they complementary? Do they replace each other?


 \dFd
 List of text search dictionaries
   Schema   |  Name   | Description
+-+--- 


 pg_catalog | danish_stem | snowball stemmer for danish language
 pg_catalog | dutch_stem  | snowball stemmer for dutch language
 pg_catalog | english_stem| snowball stemmer for english language
 pg_catalog | finnish_stem| snowball stemmer for finnish language
 pg_catalog | french_stem | snowball stemmer for french language
 pg_catalog | german_stem | snowball stemmer for german language
 pg_catalog | hungarian_stem  | snowball stemmer for hungarian language
 pg_catalog | italian_stem| snowball stemmer for italian language
 pg_catalog | norwegian_stem  | snowball stemmer for norwegian language
 pg_catalog | portuguese_stem | snowball stemmer for portuguese language
 pg_catalog | romanian_stem   | snowball stemmer for romanian language
 pg_catalog | russian_stem| snowball stemmer for russian language
 pg_catalog | simple  | simple dictionary: just lower case and 
check for stopword

 pg_catalog | spanish_stem| snowball stemmer for spanish language
 pg_catalog | swedish_stem| snowball stemmer for swedish language
 pg_catalog | turkish_stem| snowball stemmer for turkish language
(16 rows)


On 26.05.2015 12:09, Albe Laurenz wrote:

Sven R. Kunze wrote:

However, are you sure, I am using snowball? Maybe, I am reading the
documenation wrong:

test= SELECT * FROM ts_debug('german', 'system');
alias   |   description   | token  | dictionaries  | dictionary  
| lexemes
---+-++---+-+- 

  asciiword | Word, all ASCII | system | {german_stem} | german_stem 
| {syst}

(1 row)

test= \dFd german_stem
 List of text search dictionaries
Schema   |Name | Description
+-+--
  pg_catalog | german_stem | snowball stemmer for german language
(1 row)


http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html
but it seems as it depends on which packages (ispell, hunspell, 
myspell,

snowball + corresponding languages) my system has installed.

Is there an easy way to determine which of these packages PostgreSQL
uses AND what for?

If you use a standard PostgreSQL distribution, you will have no ispell
dictionary (as the documentation you quote says).
You can always list all dictionaries with \dFd in psql.

Sure. That might be the problem. It occurs to me that stems (if 
detected

as such) should be left alone.
In case a stem is real German word, it should be stemmed to itself 
anyway

If not, it might help not to stem in order to avoid errors.

Yes, but that would mean that you have a way to determine from a string
whether it is a word or a stem or both, and the software does not do 
that.


Yours,
Laurenz Albe



Regards,




--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Sven R. Kunze
Thanks, Oleg. Unfortunately, that does not work quite well as German is 
comprised of many compound nouns.


In fact, I discovered that anomaly by searching through a 
domain-specific word table. For example: Waferhandlingsystem. There are 
many '*system' but the PostgreSQL does not allow me to have a suffix; 
only a prefix and only for to_tsquery 
(http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html#TEXTSEARCH-SYNONYM-DICTIONARY).


Is there another possibility?


On 26.05.2015 11:05, Oleg Bartunov wrote:

You can ask http://snowball.tartarus.org/ for stemmer. Meanwhile,
you can have small personal dictionary (before stemmer) with such 
exceptions, for example, use synonym template


system system

Oleg


On Tue, May 26, 2015 at 11:18 AM, Sven R. Kunze srku...@tbz-pariv.de 
mailto:srku...@tbz-pariv.de wrote:


Hi everybody,

the following stemming results made me curious:

select to_tsvector('german', 'systeme');  'system':1
select to_tsvector('german', 'systemes');  'system':1
select to_tsvector('german', 'systems');  'system':1
select to_tsvector('german', 'systemen');  'system':1
select to_tsvector('german', 'system');   'syst':1


First of all, this seems to be a bug in the German stemmer. Where
can I fix it?

Second, and more importantly, as I understand it, the stemmed
version of a word should be considered normalized. That is, all
other versions of that stem should be mapped to it as well. The
interesting problem here is that PostgreSQL maps the stem itself
('system') to a completely different stem ('syst').

Should a stem not remain stable even when to_tsvector is called on
it multiple times?

-- 
Sven R. Kunze

TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de mailto:srku...@tbz-pariv.de
web: www.tbz-pariv.de http://www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org

mailto:pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general





--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Sven R. Kunze

Thanks Albe for that detailed answer.

On 26.05.2015 11:01, Albe Laurenz wrote:

Sven R. Kunze wrote:

the following stemming results made me curious:

select to_tsvector('german', 'systeme');  'system':1
select to_tsvector('german', 'systemes');  'system':1
select to_tsvector('german', 'systems');  'system':1
select to_tsvector('german', 'systemen');  'system':1
select to_tsvector('german', 'system');   'syst':1


First of all, this seems to be a bug in the German stemmer. Where can I
fix it?

As far as I understand, the stemmer is not perfect, it is just a best
effort at German stemming.  It does not have a dictionary of valid German
words, but uses an algorithm based on only the occurring letters.

This web page describes the algorithm:
http://snowball.tartarus.org/algorithms/german/stemmer.html
I guess that the Snowball folks (and PostgreSQL) would be interested
if you could come up with a better algorithm.


Thanks for that hint. I will go to 
https://github.com/snowballstem/snowball/issues and try to explain my 
problem there.


However, are you sure, I am using snowball? Maybe, I am reading the 
documenation wrong: 
http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html 
but it seems as it depends on which packages (ispell, hunspell, myspell, 
snowball + corresponding languages) my system has installed.


Is there an easy way to determine which of these packages PostgreSQL 
uses AND what for?



In this specific case, the stemmer goes wrong because System is a
foreign word whose ending is atypical for German.  The algorithm cannot
distinguish between System and, say, lautem or bestem.


Second, and more importantly, as I understand it, the stemmed version of
a word should be considered normalized. That is, all other versions of
that stem should be mapped to it as well. The interesting problem here
is that PostgreSQL maps the stem itself ('system') to a completely
different stem ('syst').

Should a stem not remain stable even when to_tsvector is called on it
multiple times?

That's a possible position, but consider that a stem is not necessarily
a valid German word.  If you treat it as a German word (by stemming it),
the results might not be what you desire.

For example:

test= select to_tsvector('german', 'linsen');
  to_tsvector
-
  'lins':1
(1 row)

test= select to_tsvector('german', 'lins');
  to_tsvector
-
  'lin':1
(1 row)


Sure. That might be the problem. It occurs to me that stems (if detected 
as such) should be left alone.

In case a stem is real German word, it should be stemmed to itself anyway
If not, it might help not to stem in order to avoid errors.


I guess that your real problem here is that a search for system
will not find systeme, which is indeed unfortunate.
But until somebody can come up with a better stemming algorithm, cases
like that can always occur.

Yours,
Laurenz Albe
This might pose a problem in the future of course. Thanks for pointing 
this out as well.


Regards,

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Albe Laurenz
Sven R. Kunze wrote:
 However, are you sure, I am using snowball? Maybe, I am reading the
 documenation wrong:

test= SELECT * FROM ts_debug('german', 'system');
   alias   |   description   | token  | dictionaries  | dictionary  | lexemes
---+-++---+-+-
 asciiword | Word, all ASCII | system | {german_stem} | german_stem | {syst}
(1 row)

test= \dFd german_stem
List of text search dictionaries
   Schema   |Name | Description
+-+--
 pg_catalog | german_stem | snowball stemmer for german language
(1 row)

 http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html
 but it seems as it depends on which packages (ispell, hunspell, myspell,
 snowball + corresponding languages) my system has installed.
 
 Is there an easy way to determine which of these packages PostgreSQL
 uses AND what for?

If you use a standard PostgreSQL distribution, you will have no ispell
dictionary (as the documentation you quote says).
You can always list all dictionaries with \dFd in psql.

 Sure. That might be the problem. It occurs to me that stems (if detected
 as such) should be left alone.
 In case a stem is real German word, it should be stemmed to itself anyway
 If not, it might help not to stem in order to avoid errors.

Yes, but that would mean that you have a way to determine from a string
whether it is a word or a stem or both, and the software does not do that.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Albe Laurenz
Sven R. Kunze wrote:
 Maybe, I have difficulties to understand the relationship/dependencies
 between all these 'maybe' available dictionary/parser/stemmer packages.
 
 What happens if I install all packages for a single language? (hunspell,
 myspell, ispell, snowball)
 
 Are they complementary? Do they replace each other?

They are all dictionaries.

The text search configuration determines which dictionaries get
applied to which kinds of words in which order.

So if you introduce a new dictionary, you either have to modify
an existing configuration or efine a new one to use it.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] really nice article about postgres popularity in startups on techrepublic

2015-05-26 Thread Merlin Moncure
have a read:

http://www.techrepublic.com/article/postgres-pushes-past-mysql-in-developer-hearts/

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Sven R. Kunze

I think I understand now.

Thus, the issue at hand could (maybe) be solved by passing words first 
to one of those more elaborate dictionaries (myspell, hunspell or 
ispell) and if still necessary then to snowball.


Did I get this right?


On 26.05.2015 13:38, Albe Laurenz wrote:

Sven R. Kunze wrote:

Maybe, I have difficulties to understand the relationship/dependencies
between all these 'maybe' available dictionary/parser/stemmer packages.

What happens if I install all packages for a single language? (hunspell,
myspell, ispell, snowball)

Are they complementary? Do they replace each other?

They are all dictionaries.

The text search configuration determines which dictionaries get
applied to which kinds of words in which order.

So if you introduce a new dictionary, you either have to modify
an existing configuration or efine a new one to use it.

Yours,
Laurenz Albe


Regards,

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srku...@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] [tsvector] to_tsvector called multiple times

2015-05-26 Thread Albe Laurenz
Sven R. Kunze wrote:
 I think I understand now.
 
 Thus, the issue at hand could (maybe) be solved by passing words first
 to one of those more elaborate dictionaries (myspell, hunspell or
 ispell) and if still necessary then to snowball.
 
 Did I get this right?

I have never experimented with ispell dictionaries, so I don't know
if they replace a snowball dictionary or are used in addition to it.

Yours,
Laurenz Albe

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Fwd: Raster performance

2015-05-26 Thread David Haynes II
Hello,

I have a question about the query optimizer and its performance on spatial
datasets, specifically rasters. My use case is rather unique, the
application that I am developing allows users to request summarizations of
various geographic boundaries around the world. Therefore our raster
datasets are global. We are in the process of conducting some benchmarks
for our system and we noticed something unexpected.

The query is the same except the first is run on a raster (46gigs) in out
of database (outdb) and the second is the same raster (46gigs) stored in
database (indb). The raster is multibanded (13), with each band
representing one entire MODIS global scene. A single year of MODIS is
approximately 3.6 gigs.

The outdb is being out performed by indb, because the query optimizer gets
smarter. But what is also interesting is all the extra pieces that are
brought in with outdb.

with poly as
( SELECT gid, label as name, ST_Transform(geom, 6842) as geom  FROM
us_counties )
, rast_select as
( SELECT r.rid as id, s.name , ST_CLIP(r.rast, s.geom) as rast from
rasters.glc2000 as r inner join poly as s on ST_Intersects(r.rast, s.geom) )
select r.id, r.name, ST_Count(r.rast, 1, True)


   QUERY PLAN With Outdb
--
Sort   (cost=93911.29..93926.80 rows=6204 width=254)
  Sort Key: r.id, r.name
  CTE poly
-  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
  CTE rast_select
-  Nested Loop  (cost=0.28..76131.41 rows=62033 width=1086)
  -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
  -  Index Scan using modis_rast_gist on modis r_1
 (cost=0.28..24.40 rows=2 width=836)
Index Cond: ((rast)::geometry  s.geom)
Filter: _st_intersects(s.geom, rast, NULL::integer)
  -  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
-  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
width=254)

QUERY PLAN With Indb

-
Sort   (cost=69547.29..69562.80 rows=6204 width=254)
  Sort Key: r.id, r.name
  CTE poly
-  Seq Scan on us_counties  (cost=0.00..112.86 rows=3109 width=62247)
  CTE rast_select
-  Nested Loop  (cost=0.28..51767.41 rows=62033 width=272)
  -  CTE Scan on poly s  (cost=0.00..62.18 rows=3109 width=250)
  -  Index Scan using modis_noout_rast_gist on modis_noout r_1
 (cost=0.28..16.56 rows=2 width=22)
Index Cond: ((rast)::geometry  s.geom)
Filter: _st_intersects(s.geom, rast, NULL::integer)
  -  HashAggregate  (cost=17214.16..17276.20 rows=6204 width=254)
-  CTE Scan on rast_select r  (cost=0.00..1240.66 rows=62033
width=254)

-- 
David Haynes, Ph.D.
Research Associate Terra Populus
Minnesota Population Center
www.terrapop.org



-- 
David Haynes, Ph.D.
Research Associate Terra Populus
Minnesota Population Center
www.terrapop.org


Re: [GENERAL] Can we simulate Oracle Flashback with pg_export_snapshot()?

2015-05-26 Thread William Dunn
Hello Laurenz,

Thanks for your feedback. Actually when I said that the same overhead
occurs in Oracle I was referring to bloat in the UNDO logs, which similar
to PostgreSQL dead tuples has some impact on size/performance and is the
majority of the trade-off considered when deciding to implement Flashback.

Thank you for point out HOT Updates! I had not read about that before as it
doesn't seem to appear in the official documentation. I found info on it in
a readme in the source code (
http://git.postgresql.org/gitweb/?p=postgresql.git;a=blob;f=src/backend/access/heap/README.HOT;hb=HEAD)
and hope to read through it soon.

In terms of benefit over a lagging replica Flashback has the benefit of
being transparent to the user (the user can query over the same database
connection, etc), it does not incur the full cost of having a replica...

*Will J. Dunn*
*willjdunn.com http://willjdunn.com*

On Fri, May 22, 2015 at 4:50 AM, Albe Laurenz laurenz.a...@wien.gv.at
wrote:

 William Dunn wrote:
  Just had an idea and could use some feedback. If we start a transaction,
 leave it idle, and use
  pg_export_snapshot() to get its snapshot_id MVCC will hold all the
 tuples as of that transaction's
  start and any other transaction can see the state of the database as of
 that time using SET
  TRANSACTION SNAPSHOT snapshot_id?
 
 
 http://www.postgresql.org/docs/devel/static/functions-admin.html#FUNCTIONS-SNAPSHOT-SYNCHRONIZATION
 
 
  I'm thinking of setting up automation to ~every half hour open a
 transaction as such, close any that
  have been open over an hour, and store the snapshot_id. However, I don't
 have a busy system that I can
  test it on.
 
  Of course this would cause some extra bloat because those tuples cannot
 get autovacuumed until the
  transaction closes but that is also the case in Oracle. Is there
 anything else I am missing or a
  reason that this would not be possible?

 Oracle does not have that issue because modifications cannot bloat the
 table (the bloat is in
 what is called the UNDO tablespace).

 What you suggest would allow you to look at the data as they were at
 specific times (of the snapshots).
 But the price on a busy system where data are modified would be
 considerable; your tables might
 become quite bloated, and you could not use HOT updates.

 If you want to look into the past, wouldn't it be much more useful to have
 a standby server
 that is lagging behind?  There is an option for that
 (recovery_min_apply_delay) from
 PostgreSQL 9.4 on.

 Yours,
 Laurenz Albe



Re: [GENERAL] Queries for unused/useless indexes

2015-05-26 Thread William Dunn
The query I previously sent was table level. Here is an index level one:
SELECT pg_stat_user_indexes.schemaname,
   pg_stat_user_indexes.relname,
   pg_stat_user_indexes.indexrelid,
   pg_stat_user_indexes.indexrelname,
   pg_stat_user_indexes.idx_scan,
   pg_stat_user_tables.seq_scan,
   (100 * pg_stat_user_indexes.idx_scan / (pg_stat_user_tables.seq_scan
+ pg_stat_user_indexes.idx_scan)) AS perc_idx_used
FROM pg_stat_user_indexes INNER JOIN pg_stat_user_tables ON
pg_stat_user_indexes.relid = pg_stat_user_tables.relid
WHERE pg_relation_size(pg_stat_user_indexes.relid)(5*8192)
  AND NOT ((pg_stat_user_indexes.idx_scan=0
OR pg_stat_user_indexes.idx_scan=NULL)
   AND pg_stat_user_tables.seq_scan=0)
ORDER BY perc_idx_used;

*Will J. Dunn*
*willjdunn.com http://willjdunn.com*

On Tue, May 26, 2015 at 10:31 AM, William Dunn dunn...@gmail.com wrote:

 Melvin - thanks for sharing.

 Here is the query I use which lists the percent of queries against the
 table which use the index ordered by least used first.

 The 'pg_relation_size(relid)(5*8192)' is used to remove any tables that
 would be so small the optimizer would just choose a table scan.

 SELECT schemaname,
relname,
idx_scan,
seq_scan,
(100 * idx_scan / (seq_scan + idx_scan)) AS perc_idx_used
 FROM pg_stat_user_tables
 WHERE pg_relation_size(relid)(5*8192)
   AND NOT ((idx_scan=0
 OR idx_scan=NULL)
AND seq_scan=0)
 ORDER BY perc_idx_used;

 *Will J. Dunn*
 *willjdunn.com http://willjdunn.com*

 On Mon, May 25, 2015 at 12:39 PM, Peter J. Holzer h...@hjp.at wrote:

 On 2015-05-25 12:25:01 -0400, Melvin Davidson wrote:
  I'm not sure why you are using pg_stat_user_indexes.

 Because you did. I didn't change that.

  My original query below
  uses pg_stat_all_indexes and the schema names are joined and it does
 work.

 I'm not sure what you mean by original, but this:

  SELECT n.nspname as schema,
 i.relname as table,
 i.indexrelname as index,
 i.idx_scan,
 i.idx_tup_read,
 i.idx_tup_fetch,
 pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
  quote_ident(i.relname))) AS table_size,
 pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
  quote_ident(i.indexrelname))) AS index_size,
 pg_get_indexdef(idx.indexrelid) as idx_definition
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
   WHERE i.idx_scan  200
 AND NOT idx.indisprimary
 AND NOT idx.indisunique
   ORDER BY 1, 2, 3;

 is not the query you posted in your original message.

 Here is what you posted:

  On Mon, May 25, 2015 at 10:41 AM, Peter J. Holzer hjp-pg...@hjp.at
 wrote:
 
  On 2015-05-22 09:41:57 -0400, Melvin Davidson wrote:
   I'd like to share those queries with the community, as I know
 there must
  be
   others out there with the same problem.
  
   /* useless_indexes.sql */
   SELECT
  idstat.schemaname AS schema,
  idstat.relname AS table_name,
  indexrelname AS index_name,
  idstat.idx_scan AS times_used,
  
 pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
  '.' ||
   quote_ident(idstat.relname))) AS table_size,
  
 pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
  '.' ||
   quote_ident(indexrelname))) AS index_size,
  n_tup_upd + n_tup_ins + n_tup_del as num_writes,
  indexdef AS definition
   FROM pg_stat_user_indexes AS idstat
   JOIN pg_indexes ON indexrelname = indexname
   JOIN pg_stat_user_tables AS tabstat ON idstat.relname =
 tabstat.relname
   WHERE idstat.idx_scan  200
   AND indexdef !~* 'unique'
   ORDER BY idstat.schemaname,
idstat.relname,
indexrelname;

 --
_  | Peter J. Holzer| I want to forget all about both belts and
 |_|_) || suspenders; instead, I want to buy pants
 | |   | h...@hjp.at | that actually fit.
 __/   | http://www.hjp.at/ |   -- http://noncombatant.org/





Re: [GENERAL] dba scripts queries

2015-05-26 Thread John McKown
With Melvin's kind permission, I have created a repository on github which
has all of these files in it. Hopefully that will allow them to stick
around a bit longer. You can peruse them at:
https://github.com/JohnArchieMckown/PostgreSQL-DBA-tools
Also, this allows you to preview individual files and, if you want, to only
get those which you want right now. I will also add, as Melvin has in his
README, that these are as is and there is __NO__ support for them. If
they work, you're golden. If the don't, ... .

On Tue, May 26, 2015 at 9:08 AM, Melvin Davidson melvin6...@gmail.com
wrote:


 Based on several positive feedback's from my previous submission for
 unused index queries, I've decided to contribute most all of the scripts
 and queries I've written or collected over my career. Therefore please find
 attached:

 dba_scripts.tar.gz - scripts to automate specific queries. Note that
 there are two potentially dangerous scripts.

 flip_database_connect.sh - Changes the state of datallowconn for a
 specific database
 logins_block.sh - prevents all non-superusers from future logins. MUST be
 run as user postgres

 queries.tgz - sql only queries (but more than dba_scripts)
 queries.zip - Same as above, but for Windows O/S, even though I do not
 advise running production on Windows.

 Please remember to read the README.txt provided with each for a general
 explanation.

 If for some reason the attachments did not work, you can email me directly
 and I will send to your email.

 CAVEAT EMPTOR!
 Please DO NOT write to me asking me why I did things one way when you
 think it should be done another way, or way I chose specific values for a
 query. There are several available editors you can use to make your own
 changes.
 IOW: Do not peer into the oral cavity of an equestrian present!


 --
 *Melvin Davidson*



 --
 Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general




-- 
My sister opened a computer store in Hawaii. She sells C shells down by the
seashore.

If someone tell you that nothing is impossible:
Ask him to dribble a football.

He's about as useful as a wax frying pan.

10 to the 12th power microphones = 1 Megaphone

Maranatha! 
John McKown


Re: [GENERAL] Queries for unused/useless indexes

2015-05-26 Thread William Dunn
Melvin - thanks for sharing.

Here is the query I use which lists the percent of queries against the
table which use the index ordered by least used first.

The 'pg_relation_size(relid)(5*8192)' is used to remove any tables that
would be so small the optimizer would just choose a table scan.

SELECT schemaname,
   relname,
   idx_scan,
   seq_scan,
   (100 * idx_scan / (seq_scan + idx_scan)) AS perc_idx_used
FROM pg_stat_user_tables
WHERE pg_relation_size(relid)(5*8192)
  AND NOT ((idx_scan=0
OR idx_scan=NULL)
   AND seq_scan=0)
ORDER BY perc_idx_used;

*Will J. Dunn*
*willjdunn.com http://willjdunn.com*

On Mon, May 25, 2015 at 12:39 PM, Peter J. Holzer h...@hjp.at wrote:

 On 2015-05-25 12:25:01 -0400, Melvin Davidson wrote:
  I'm not sure why you are using pg_stat_user_indexes.

 Because you did. I didn't change that.

  My original query below
  uses pg_stat_all_indexes and the schema names are joined and it does
 work.

 I'm not sure what you mean by original, but this:

  SELECT n.nspname as schema,
 i.relname as table,
 i.indexrelname as index,
 i.idx_scan,
 i.idx_tup_read,
 i.idx_tup_fetch,
 pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
  quote_ident(i.relname))) AS table_size,
 pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
  quote_ident(i.indexrelname))) AS index_size,
 pg_get_indexdef(idx.indexrelid) as idx_definition
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
   WHERE i.idx_scan  200
 AND NOT idx.indisprimary
 AND NOT idx.indisunique
   ORDER BY 1, 2, 3;

 is not the query you posted in your original message.

 Here is what you posted:

  On Mon, May 25, 2015 at 10:41 AM, Peter J. Holzer hjp-pg...@hjp.at
 wrote:
 
  On 2015-05-22 09:41:57 -0400, Melvin Davidson wrote:
   I'd like to share those queries with the community, as I know
 there must
  be
   others out there with the same problem.
  
   /* useless_indexes.sql */
   SELECT
  idstat.schemaname AS schema,
  idstat.relname AS table_name,
  indexrelname AS index_name,
  idstat.idx_scan AS times_used,
  
 pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
  '.' ||
   quote_ident(idstat.relname))) AS table_size,
  
 pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
  '.' ||
   quote_ident(indexrelname))) AS index_size,
  n_tup_upd + n_tup_ins + n_tup_del as num_writes,
  indexdef AS definition
   FROM pg_stat_user_indexes AS idstat
   JOIN pg_indexes ON indexrelname = indexname
   JOIN pg_stat_user_tables AS tabstat ON idstat.relname =
 tabstat.relname
   WHERE idstat.idx_scan  200
   AND indexdef !~* 'unique'
   ORDER BY idstat.schemaname,
idstat.relname,
indexrelname;

 --
_  | Peter J. Holzer| I want to forget all about both belts and
 |_|_) || suspenders; instead, I want to buy pants
 | |   | h...@hjp.at | that actually fit.
 __/   | http://www.hjp.at/ |   -- http://noncombatant.org/



Re: [GENERAL] MD5 password storage - should be the same everywhere?

2015-05-26 Thread Francisco Reyes

On 05/25/2015 07:58 PM, Adrian Klaver wrote:

On 05/25/2015 01:41 PM, Francisco Reyes wrote:
I understood that is just a md5 hash of the password and the username
with the string md5 pre-appended, so it should be the same.


Mistery solved..
Because I usually do script of most of my work for audits I was able 
to track down what happened. Original request was to get data from 
'machine 2', but it was later changed. Found something in the output of 
my script that lead me to believe maybe the roles did not come from 
'machine 2'.


Was able to find the source machine by using
select rolname, rolpassword,rolcanlogin from pg_catalog.pg_authid where 
rolname = 'SomeUser';


Against machines I thought could have been the source and found a 
matching one.




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Queries for unused/useless indexes

2015-05-26 Thread Melvin Davidson
Will,

Much thanks. Let's keep up the sharing with the community.

On Tue, May 26, 2015 at 11:32 AM, William Dunn dunn...@gmail.com wrote:

 The query I previously sent was table level. Here is an index level one:
 SELECT pg_stat_user_indexes.schemaname,
pg_stat_user_indexes.relname,
pg_stat_user_indexes.indexrelid,
pg_stat_user_indexes.indexrelname,
pg_stat_user_indexes.idx_scan,
pg_stat_user_tables.seq_scan,
(100 * pg_stat_user_indexes.idx_scan /
 (pg_stat_user_tables.seq_scan + pg_stat_user_indexes.idx_scan)) AS
 perc_idx_used
 FROM pg_stat_user_indexes INNER JOIN pg_stat_user_tables ON
 pg_stat_user_indexes.relid = pg_stat_user_tables.relid
 WHERE pg_relation_size(pg_stat_user_indexes.relid)(5*8192)
   AND NOT ((pg_stat_user_indexes.idx_scan=0
 OR pg_stat_user_indexes.idx_scan=NULL)
AND pg_stat_user_tables.seq_scan=0)
 ORDER BY perc_idx_used;

 *Will J. Dunn*
 *willjdunn.com http://willjdunn.com*

 On Tue, May 26, 2015 at 10:31 AM, William Dunn dunn...@gmail.com wrote:

 Melvin - thanks for sharing.

 Here is the query I use which lists the percent of queries against the
 table which use the index ordered by least used first.

 The 'pg_relation_size(relid)(5*8192)' is used to remove any tables that
 would be so small the optimizer would just choose a table scan.

 SELECT schemaname,
relname,
idx_scan,
seq_scan,
(100 * idx_scan / (seq_scan + idx_scan)) AS perc_idx_used
 FROM pg_stat_user_tables
 WHERE pg_relation_size(relid)(5*8192)
   AND NOT ((idx_scan=0
 OR idx_scan=NULL)
AND seq_scan=0)
 ORDER BY perc_idx_used;

 *Will J. Dunn*
 *willjdunn.com http://willjdunn.com*

 On Mon, May 25, 2015 at 12:39 PM, Peter J. Holzer h...@hjp.at wrote:

 On 2015-05-25 12:25:01 -0400, Melvin Davidson wrote:
  I'm not sure why you are using pg_stat_user_indexes.

 Because you did. I didn't change that.

  My original query below
  uses pg_stat_all_indexes and the schema names are joined and it does
 work.

 I'm not sure what you mean by original, but this:

  SELECT n.nspname as schema,
 i.relname as table,
 i.indexrelname as index,
 i.idx_scan,
 i.idx_tup_read,
 i.idx_tup_fetch,
 pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
  quote_ident(i.relname))) AS table_size,
 pg_size_pretty(pg_relation_size(quote_ident(n.nspname) || '.' ||
  quote_ident(i.indexrelname))) AS index_size,
 pg_get_indexdef(idx.indexrelid) as idx_definition
FROM pg_stat_all_indexes i
JOIN pg_class c ON (c.oid = i.relid)
JOIN pg_namespace n ON (n.oid = c.relnamespace)
JOIN pg_index idx ON (idx.indexrelid =  i.indexrelid )
   WHERE i.idx_scan  200
 AND NOT idx.indisprimary
 AND NOT idx.indisunique
   ORDER BY 1, 2, 3;

 is not the query you posted in your original message.

 Here is what you posted:

  On Mon, May 25, 2015 at 10:41 AM, Peter J. Holzer hjp-pg...@hjp.at
 wrote:
 
  On 2015-05-22 09:41:57 -0400, Melvin Davidson wrote:
   I'd like to share those queries with the community, as I know
 there must
  be
   others out there with the same problem.
  
   /* useless_indexes.sql */
   SELECT
  idstat.schemaname AS schema,
  idstat.relname AS table_name,
  indexrelname AS index_name,
  idstat.idx_scan AS times_used,
  
 pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
  '.' ||
   quote_ident(idstat.relname))) AS table_size,
  
 pg_size_pretty(pg_relation_size(quote_ident(idstat.schemaname) ||
  '.' ||
   quote_ident(indexrelname))) AS index_size,
  n_tup_upd + n_tup_ins + n_tup_del as num_writes,
  indexdef AS definition
   FROM pg_stat_user_indexes AS idstat
   JOIN pg_indexes ON indexrelname = indexname
   JOIN pg_stat_user_tables AS tabstat ON idstat.relname =
 tabstat.relname
   WHERE idstat.idx_scan  200
   AND indexdef !~* 'unique'
   ORDER BY idstat.schemaname,
idstat.relname,
indexrelname;

 --
_  | Peter J. Holzer| I want to forget all about both belts and
 |_|_) || suspenders; instead, I want to buy pants
 | |   | h...@hjp.at | that actually fit.
 __/   | http://www.hjp.at/ |   -- http://noncombatant.org/






-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.