Re: [GENERAL] Stats update difference between VACUUM ANALYZE and ANALYZE in 9.2?

2016-09-26 Thread pbj
On Monday, September 26, 2016 9:44 AM, Tom Lane  wrote:

 >> Paul Jones  writes:
 >> For a freshly pg_restore'd 9.2 database, would VACUUM ANALYZE update
 >> statistics any better than just an ANALYZE?

 >
 > VACUUM would have caused the page-all-visible flags to get set for all
 > pages of unchanging tables.  I don't recall whether ANALYZE has any side
 > effects on those flags at all, but it certainly would not have set them
 > for pages it didn't even visit, which would be most.
 >
 > Net result is that the pg_class.relallvisible fractions didn't get high
 > enough to persuade the planner that index-only scans would be effective.
 > I guess you could call that a statistic, but it's really about the
 > contents of the tables' free space maps.
  
 > 
 > regards, tom lane

This is good to know.  I think we will be running VACUUM ANALYZE from 
now on after restore instead of just ANALYZE.
 
I do note that sect. 49.11 claims that ANALYZE updates 
pg_class.relallvisible.  I don't know if this is a documentation problem 
in light of what you explained.

PJ



Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?

2016-07-19 Thread pbj
 

On Tuesday, July 19, 2016 6:19 AM, Teodor Sigaev  wrote:
 

 > CREATE INDEX json_tables_idx ON json_tables USING GIN (data jsonb_path_ops);
> Bitmap Heap Scan on json_tables  (cost=113.50..37914.64 rows=1 width=1261)
> (actual time=2157.118..1259550.327 rows=909091 loops=1)
>    Recheck Cond: (data @> '{"name": "AC3 Case Red"}'::jsonb)
>    Rows Removed by Index Recheck: 4360296
>    Heap Blocks: exact=37031 lossy=872059
Hmm, looks like too small work_mem because lossy heap block count  is too big.

Ok, thanks.

-- 
Teodor Sigaev                                  E-mail: teo...@sigaev.ru
                                                    WWW: http://www.sigaev.ru/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


  

Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?

2016-07-19 Thread pbj
 

On Monday, July 18, 2016 10:14 PM, Kisung Kim  wrote:
 

 Hi,I recently test YCSB benchmark too.But contrary to my expectation, PG (9.5) 
is slower than MongoDB 3.2.Paul said that making table with no logging option 
improved the performance,and it might be equal to MongoDB's behavior.But in 
MongoDB documentation, it writes journal log too.So I think turning off no 
logging option is not fair.Am I wrong about MongoDB's behavior?
My understanding is that, even with Mongo journaling, it is not as reliable as 
Postgres.  So, I felt that using unloggedtables leveled the playing field for 
Postgres.
PJ

                                                                                
                                                                       

(C)Bitnine, Kisung Kim, Ph.D
https://sites.google.com/site/kisungresearch/
E-mail : ks...@bitnine.net
Office phone : 070-4800-5890, 408-606-8602
US Mobile phone : 408-805-2192



2016-03-19 5:05 GMT+09:00 :

 

On Tuesday, March 15, 2016 7:39 PM, "p...@cmicdo.com"  
wrote:
 
 > Your results are close enough to mine, I think, to prove the point.
 > And, I agree that the EDB benchmark is not necessary reflective of a
 > real-world scenario.
 > 
 > However, the cache I'm referring to is PG's shared_buffer cache.
 > You can see the first run of the select causing a lot of disk reads.
 > The second identical run, reads purely from shared_buffers.
 > 
 > What I don't understand is, why does a slightly different select from
 > the *same* table during the same session cause shared_buffers to be
 > blown out and re-read??
 > 
 > I will see if I can try YCSB next week (I'm in workshops all week...)
 > 
 > Thanks!

I was able to try YCSB today on both PG 9.5.1 and Mongo 3.2.  At first, PG
was running 4 times slower than Mongo.  Then I remembered about unlogged
tables (which I think is the way Mongo is all the time.), and remade
the PG table as UNLOGGED.  In a 50/50 read/update test over 1M records,
PG ran in 0.62 of the time of Mongo.

PG Load:

[OVERALL], RunTime(ms), 104507.0
[OVERALL], Throughput(ops/sec), 9568.737022400413
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 293.0
[CLEANUP], MinLatency(us), 293.0
[CLEANUP], MaxLatency(us), 293.0
[CLEANUP], 95thPercentileLatency(us), 293.0
[CLEANUP], 99thPercentileLatency(us), 293.0
[INSERT], Operations, 100.0
[INSERT], AverageLatency(us), 101.329235
[INSERT], MinLatency(us), 88.0
[INSERT], MaxLatency(us), 252543.0
[INSERT], 95thPercentileLatency(us), 121.0
[INSERT], 99thPercentileLatency(us), 141.0
[INSERT], Return=OK, 100
 
PG Run:
---
[OVERALL], RunTime(ms), 92763.0
[OVERALL], Throughput(ops/sec), 10780.16019318047
[READ], Operations, 499922.0
[READ], AverageLatency(us), 79.1722428698877
[READ], MinLatency(us), 69.0
[READ], MaxLatency(us), 19935.0
[READ], 95thPercentileLatency(us), 94.0
[READ], 99thPercentileLatency(us), 112.0
[READ], Return=OK, 499922
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 222.0
[CLEANUP], MinLatency(us), 222.0
[CLEANUP], MaxLatency(us), 222.0
[CLEANUP], 95thPercentileLatency(us), 222.0
[CLEANUP], 99thPercentileLatency(us), 222.0
[UPDATE], Operations, 500078.0
[UPDATE], AverageLatency(us), 98.96430156895525
[UPDATE], MinLatency(us), 83.0
[UPDATE], MaxLatency(us), 26655.0
[UPDATE], 95thPercentileLatency(us), 127.0
[UPDATE], 99thPercentileLatency(us), 158.0
[UPDATE], Return=OK, 500078
 
Mongo Load:
---
[OVERALL], RunTime(ms), 133308.0
[OVERALL], Throughput(ops/sec), 7501.425270801452
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 1822.0
[CLEANUP], MinLatency(us), 1822.0
[CLEANUP], MaxLatency(us), 1822.0
[CLEANUP], 95thPercentileLatency(us), 1822.0
[CLEANUP], 99thPercentileLatency(us), 1822.0
[INSERT], Operations, 100.0
[INSERT], AverageLatency(us), 130.830678
[INSERT], MinLatency(us), 90.0
[INSERT], MaxLatency(us), 7147519.0
[INSERT], 95thPercentileLatency(us), 159.0
[INSERT], 99thPercentileLatency(us), 226.0
[INSERT], Return=OK, 100
 
Mongo Run:
-
[OVERALL], RunTime(ms), 149150.0
[OVERALL], Throughput(ops/sec), 6704.65973851827
[READ], Operations, 500837.0
[READ], AverageLatency(us), 98.13153980237084
[READ], MinLatency(us), 69.0
[READ], MaxLatency(us), 28271.0
[READ], 95thPercentileLatency(us), 166.0
[READ], 99thPercentileLatency(us), 186.0
[READ], Return=OK, 500837
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 2387.0
[CLEANUP], MinLatency(us), 2386.0
[CLEANUP], MaxLatency(us), 2387.0
[CLEANUP], 95thPercentileLatency(us), 2387.0
[CLEANUP], 99thPercentileLatency(us), 2387.0
[UPDATE], Operations, 499163.0
[UPDATE], AverageLatency(us), 195.21505600375028
[UPDATE], MinLatency(us), 118.0
[UPDATE], MaxLatency(us), 4513791.0
[UPDATE], 95thPercentileLatency(us), 211.0
[UPDATE], 99thPercentileLatency(us), 252.0
[UPDATE], Return=OK, 499163


 > 
 > 
 > On Monday, March 14, 2016 3:34 AM, Dmitry Dolgov <9erthali...@gmail.com> 
wrote:
 > 
 > 
 > Hi, Paul
 > 
 > I agree with Oleg, EDB benchmarks are strange som

Re: [GENERAL] Question about shared_buffer cache behavior

2016-03-19 Thread pbj
On Friday, March 18, 2016 4:54 PM, Andreas Kretschmer 
wrote:
 >
 >
 >> Paul Jones  hat am 18. Marz 2016 um 21:24 geschrieben:
 >>
 >>  
 >> In Postgres 9.5.1 with a shared_buffer cache of 7Gb, a SELECT from 

 >
 > the first query reads only the tuple from heap that are matched the
 > where-condition.
 > The 2nd query with an other where-condition reads other rows than the first
 > query.
 >    
 > Keep in mind: a index search reads the index and pulls the rows that matched
the   
 > condition from the heap, no more.
  
Ok, thanks!  I understand now!
   
 > 
 > Regards
 > --
 > Andreas Kretschmer
 > http://www.2ndQuadrant.com/
 > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?

2016-03-19 Thread pbj
 

On Tuesday, March 15, 2016 7:39 PM, "p...@cmicdo.com"  
wrote:
 
 > Your results are close enough to mine, I think, to prove the point.
 > And, I agree that the EDB benchmark is not necessary reflective of a
 > real-world scenario.
 > 
 > However, the cache I'm referring to is PG's shared_buffer cache.
 > You can see the first run of the select causing a lot of disk reads.
 > The second identical run, reads purely from shared_buffers.
 > 
 > What I don't understand is, why does a slightly different select from
 > the *same* table during the same session cause shared_buffers to be
 > blown out and re-read??
 > 
 > I will see if I can try YCSB next week (I'm in workshops all week...)
 > 
 > Thanks!

I was able to try YCSB today on both PG 9.5.1 and Mongo 3.2.  At first, PG
was running 4 times slower than Mongo.  Then I remembered about unlogged
tables (which I think is the way Mongo is all the time.), and remade
the PG table as UNLOGGED.  In a 50/50 read/update test over 1M records,
PG ran in 0.62 of the time of Mongo.

PG Load:

[OVERALL], RunTime(ms), 104507.0
[OVERALL], Throughput(ops/sec), 9568.737022400413
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 293.0
[CLEANUP], MinLatency(us), 293.0
[CLEANUP], MaxLatency(us), 293.0
[CLEANUP], 95thPercentileLatency(us), 293.0
[CLEANUP], 99thPercentileLatency(us), 293.0
[INSERT], Operations, 100.0
[INSERT], AverageLatency(us), 101.329235
[INSERT], MinLatency(us), 88.0
[INSERT], MaxLatency(us), 252543.0
[INSERT], 95thPercentileLatency(us), 121.0
[INSERT], 99thPercentileLatency(us), 141.0
[INSERT], Return=OK, 100
 
PG Run:
---
[OVERALL], RunTime(ms), 92763.0
[OVERALL], Throughput(ops/sec), 10780.16019318047
[READ], Operations, 499922.0
[READ], AverageLatency(us), 79.1722428698877
[READ], MinLatency(us), 69.0
[READ], MaxLatency(us), 19935.0
[READ], 95thPercentileLatency(us), 94.0
[READ], 99thPercentileLatency(us), 112.0
[READ], Return=OK, 499922
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 222.0
[CLEANUP], MinLatency(us), 222.0
[CLEANUP], MaxLatency(us), 222.0
[CLEANUP], 95thPercentileLatency(us), 222.0
[CLEANUP], 99thPercentileLatency(us), 222.0
[UPDATE], Operations, 500078.0
[UPDATE], AverageLatency(us), 98.96430156895525
[UPDATE], MinLatency(us), 83.0
[UPDATE], MaxLatency(us), 26655.0
[UPDATE], 95thPercentileLatency(us), 127.0
[UPDATE], 99thPercentileLatency(us), 158.0
[UPDATE], Return=OK, 500078
 
Mongo Load:
---
[OVERALL], RunTime(ms), 133308.0
[OVERALL], Throughput(ops/sec), 7501.425270801452
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 1822.0
[CLEANUP], MinLatency(us), 1822.0
[CLEANUP], MaxLatency(us), 1822.0
[CLEANUP], 95thPercentileLatency(us), 1822.0
[CLEANUP], 99thPercentileLatency(us), 1822.0
[INSERT], Operations, 100.0
[INSERT], AverageLatency(us), 130.830678
[INSERT], MinLatency(us), 90.0
[INSERT], MaxLatency(us), 7147519.0
[INSERT], 95thPercentileLatency(us), 159.0
[INSERT], 99thPercentileLatency(us), 226.0
[INSERT], Return=OK, 100
 
Mongo Run:
-
[OVERALL], RunTime(ms), 149150.0
[OVERALL], Throughput(ops/sec), 6704.65973851827
[READ], Operations, 500837.0
[READ], AverageLatency(us), 98.13153980237084
[READ], MinLatency(us), 69.0
[READ], MaxLatency(us), 28271.0
[READ], 95thPercentileLatency(us), 166.0
[READ], 99thPercentileLatency(us), 186.0
[READ], Return=OK, 500837
[CLEANUP], Operations, 1.0
[CLEANUP], AverageLatency(us), 2387.0
[CLEANUP], MinLatency(us), 2386.0
[CLEANUP], MaxLatency(us), 2387.0
[CLEANUP], 95thPercentileLatency(us), 2387.0
[CLEANUP], 99thPercentileLatency(us), 2387.0
[UPDATE], Operations, 499163.0
[UPDATE], AverageLatency(us), 195.21505600375028
[UPDATE], MinLatency(us), 118.0
[UPDATE], MaxLatency(us), 4513791.0
[UPDATE], 95thPercentileLatency(us), 211.0
[UPDATE], 99thPercentileLatency(us), 252.0
[UPDATE], Return=OK, 499163


 > 
 > 
 > On Monday, March 14, 2016 3:34 AM, Dmitry Dolgov <9erthali...@gmail.com> 
wrote:
 > 
 > 
 > Hi, Paul
 > 
 > I agree with Oleg, EDB benchmarks are strange sometimes. I did the same 
benchmarks several months ago. I never noticed the cache influence back then, 
so I tried to reproduce your situation now (on a 5*10^6 records although). I 
started to play with db cache (using `echo 3 > /proc/sys/vm/drop_cache`), and I 
see difference in time execution for two subsequent queries, but `explain` info 
are almost identical, e.g. `shared hit & read`:
 > 
 > 


  

Re: [GENERAL] MongoDB 3.2 beating Postgres 9.5.1?

2016-03-15 Thread pbj
Your results are close enough to mine, I think, to prove the point.   And, I 
agree that the EDB benchmark is not necessary reflective of a real-world 
scenario.
However, the cache I'm referring to is PG's shared_buffer cache.   You can see 
the first run of the select causing a lot of disk reads.  The second identical 
run, reads purely from shared_buffers.
What I don't understand is, why does a slightly different select from the 
*same* table during the same session cause shared_buffers to be blown out and 
re-read??
I will see if I can try YCSB next week (I'm in workshops all week...)
Thanks!
 

On Monday, March 14, 2016 3:34 AM, Dmitry Dolgov <9erthali...@gmail.com> 
wrote:
 

 Hi, Paul
I agree with Oleg, EDB benchmarks are strange sometimes. I did the same 
benchmarks several months ago. I never noticed the cache influence back then, 
so I tried to reproduce your situation now (on a 5*10^6 records although). I 
started to play with db cache (using `echo 3 > /proc/sys/vm/drop_cache`), and I 
see difference in time execution for two subsequent queries, but `explain` info 
are almost identical, e.g. `shared hit & read`:
```benchmark=# explain (buffers, analyze, verbose) select data from json_tables 
where data @> '{"name": "AC3 Case Red"}';                                       
                          QUERY PLAN                                            
                     

 Bitmap Heap Scan on public.json_tables  (cost=102.74..19001.47 rows=4999 
width=1257) (actual time=740.556..215956.655 rows=454546 loops=1)   Output: 
data   Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)   
Rows Removed by Index Recheck: 2114606   Heap Blocks: exact=31624 lossy=422922  
 Buffers: shared hit=1371 read=41   ->  Bitmap Index Scan on 
json_tables_idx  (cost=0.00..101.49 rows=4999 width=0) (actual 
time=731.010..731.010 rows=454547 loops=1)         Index Cond: 
(json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)         Buffers: shared 
hit=1371 read=1005 Planning time: 6.352 ms Execution time: 216075.830 ms(11 
rows)
benchmark=# explain (buffers, analyze, verbose) select data from json_tables 
where data @> '{"name": "AC3 Case Red"}';                                       
                         QUERY PLAN                                             
                    
---
 Bitmap Heap Scan on public.json_tables  (cost=102.74..19001.47 rows=4999 
width=1257) (actual time=222.476..10692.703 rows=454546 loops=1)   Output: data 
  Recheck Cond: (json_tables.data @> '{"name": "AC3 Case Red"}'::jsonb)   Rows 
Removed by Index Recheck: 2114606   Heap Blocks: exact=31624 lossy=422922   
Buffers: shared hit=1371 read=41   ->  Bitmap Index Scan on json_tables_idx 
 (cost=0.00..101.49 rows=4999 width=0) (actual time=214.736..214.736 
rows=454547 loops=1)         Index Cond: (json_tables.data @> '{"name": "AC3 
Case Red"}'::jsonb)         Buffers: shared hit=1371 read=1005 Planning time: 
0.089 ms Execution time: 10767.739 ms(11 rows)```
But I see almost the same execution time from mongodb `explain` (216075ms for 
pg and 177784ms for mongo, which isn't so much I think):
```DBQuery.shellBatchSize = 100; db.json_tables.find({"name": "AC3 Case 
Red"}).explain(true){        "queryPlanner" : {                "plannerVersion" 
: 1,                "namespace" : "benchmark.json_tables",                
"indexFilterSet" : false,                "parsedQuery" : {                      
  "name" : {                                "$eq" : "AC3 Case Red"              
          }                },                "winningPlan" : {                  
      "stage" : "FETCH",                        "inputStage" : {                
                "stage" : "IXSCAN",                                "keyPattern" 
: {                                        "name" : 1                           
     },                                "indexName" : "name_1",                  
              "isMultiKey" : false,                                "isUnique" : 
false,                                "isSparse" : false,                       
         "isPartial" : false,                                "indexVersion" : 
1,                                "direction" : "forward",                      
          "indexBounds" : {                                        "name" : [   
                                             "[\"AC3 Case Red\", \"AC3 Case 
Red\"]"                                        ]                                
}                        }                },                "rejectedPlans" : [ 
]        },       "executionStats" : {                "executionSuccess" : 
true,                "nReturned" : 454546,      

Re: [GENERAL] Anyone compare PG 9.5 and MongoDB 3.2?

2016-03-01 Thread pbj
Very helpful!!  Thanks!!
 

On Tuesday, March 1, 2016 9:32 AM, Peter Devoy  wrote:
 

 > MongoDB has released 3.2 with their WiredTiger storage.  Has anyone
> benchmarked 9.5 against it, and for JSONB elements several MB in size?
>
> PJ

Hi Paul

I do not have an answer for you but there is a great talk here in
which someone explains why they moved from a NoSQL stack to Postgres:
https://www.youtube.com/watch?v=UgcC_bY4rPg

If I recall correctly JSON functionality was touched upon but, if you
have not seen it, the whole talk is worth a watch.

Hope this helps in some way.

Kind regards


Peter Devoy


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


  

Re: [GENERAL] ERROR: could not read block 3 in file "base/12511/12270"

2015-12-23 Thread pbj


 > On 12/23/2015 04:17 PM, Paul Jones wrote:
 > >
 > >I have been having disk errors that have corrupted something in
 > >>my postgres database.  Other databases work ok:
 > 
 > This isn't the best characterization...the "postgres" data is not a "system" 
 > database but rather a convenient default user database.  Maybe I'm being 
 > overly picky here but seeing "system" in this context does have a 
 > connotation that we really don't want to impart onto the "postgres" database.
 > 
 > It is named such because the default user is likewise "postgres" and most 
 > utilities when not provided with a database name will use the O/S user's 
 > name which, for administrative tasks, is likely to be "postgres" (you really 
 > shouldn't use root for DB-admin stuff) and thus those commands will be able 
 > to connect without much, if any, additional options supplied.
 > 
 > Its presence, absence, or modification in now way alters the fundamental 
 > operation of PostgreSQL; though its lack may frustrate users acclimated to 
 > using said defaults.
 > 

This was one of the big lessons I learned from this.  All this time I was
under the mistaken impression that it was special.

 > David J.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] ERROR: could not read block 3 in file "base/12511/12270"

2015-12-23 Thread pbj
On Wednesday, December 23, 2015 6:45 PM, Tom Lane  wrote:

 > Paul Jones  writes:
 > > I have been having disk errors that have corrupted something in
 > > my postgres database.  Other databases work ok:
 > 
 > > postgres=# SELECT pg_catalog.pg_is_in_recovery();
 > > ERROR:  could not read block 3 in file "base/12511/12270": read only 4096 
 > > of 8192 bytes
 > 
 > Hm.  Evidently you've got a partially truncated file for some system
 > catalog or index.  It's fairly hard to estimate the consequences of
 > that without knowing which one it is.  Please see if this works:
 > 
 > $ export PGOPTIONS="-c ignore_system_indexes=true"
 > $ psql -U postgres
 > 
 > # show ignore_system_indexes;
 > (should say "on")
 > 
 > # select relname, relkind from pg_class where pg_relation_filenode(oid) = 
 > 12270;

paul@kitanglad:~$ export PGOPTIONS="-c ignore_system_indexes=true"
paul@kitanglad:~$ psql -U postgres
psql (9.4.5)
Type "help" for help.

postgres=# show ignore_system_indexes;
 ignore_system_indexes 
---
 on
(1 row)

postgres=# select relname, relkind from pg_class where 
pg_relation_filenode(oid) = 12270;
  relname  | relkind 
---+-
 pg_proc_oid_index | i

(1 row)

postgres=# reindex index pg_proc_oid_index;
REINDEX
postgres=# \q
paul@kitanglad:~$ unset PGOPTIONS
paul@kitanglad:~$ psql -U postgres
psql (9.4.5)
Type "help" for help.

postgres=# SELECT pg_catalog.pg_is_in_recovery();
 pg_is_in_recovery 
---
 f
(1 row)


So, it was an index and was quickly fixed.

Thanks!

>
 > 
 > If that works, and it tells you filenode 12270 is an index, you're in
 > luck: just REINDEX that index and you're done (at least with this problem,
 > there might be more lurking behind it).  Don't forget to unset PGOPTIONS
 > afterwards.
 > 
 > 
 > > Since this is the "postgres" database, dropping and re-creating it
 > > doesn't seem possible.
 > 
 > Sure it is, as long as you issue the commands from a non-broken database:
 > 
 > # drop database postgres;
 > DROP DATABASE
 > # create database postgres with template template0;
 > CREATE DATABASE
 > 
 > If you don't have any custom objects in the postgres database, this would
 > be by far the easiest way out.

Good to know!  I thought there was something special about "postgres".
I have not modified it from what initdb put there.

 > 
 > regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Problem building both Python 2 and 3 into Postgres 9.4.4?

2015-07-20 Thread pbj
That worked, thank you.

The Tip in 43.1 did not explain in that much detail.  I searched but could
not find that explained anywhere in the docs.  Your paragraph would be
a nice enhancement to the tip.

PJ


On Sun, 7/19/15, Tom Lane  wrote:

 Subject: Re: [GENERAL] Problem building both Python 2 and 3 into Postgres 
9.4.4?
 To: p...@cmicdo.com
 Cc: pgsql-general@postgresql.org
 Date: Sunday, July 19, 2015, 4:54 PM
 
 p...@cmicdo.com
 writes:
 > Has anyone successfully built
 Python 2 and 3 into the same installation
 > of Postgres 9.4.4?
 
 I do not think you can do it like this:
 
 > make distclean
 > ./configure --with-openssl --with-libxml
 --with-libxslt --with-python PYTHON=/usr/bin/python
 > make
 > ./configure
 --with-openssl --with-libxml --with-libxslt --with-python
 PYTHON=/usr/bin/python3
 > make
 > make install
 
 You need to configure and make one way, then
 install, then make distclean,
 reconfigure
 and remake the other way, then make install in only the
 src/pl/plpython directory.  The recipe as you
 have it will end up with
 only one installed
 copy of plpython, not two, and I'm not real sure that
 that copy will be correctly built for either
 case.
 
            
 regards, tom lane
 
 
 -- 
 Sent via pgsql-general
 mailing list (pgsql-general@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-general



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Problem building both Python 2 and 3 into Postgres 9.4.4?

2015-07-19 Thread pbj
Has anyone successfully built Python 2 and 3 into the same installation
of Postgres 9.4.4?  I tried it today on Ubuntu 10.04, Python 2.6.5,
Python 3.1.2 and got and error about undefined symbol: PyString_AsString.
The Python docs say that PyString_* have been renamed to PyBytes_*
and I find references to both under ./src/pl/plpython.  Is this a bug,
something I'm doing wrong, or perhaps my system is too old?

make distclean

./configure --with-openssl --with-libxml --with-libxslt --with-python PYTHON=/us
r/bin/python

make

./configure --with-openssl --with-libxml --with-libxslt --with-python PYTHON=/us
r/bin/python3

make

make install
service postgresql restart

psql -U postgres
postgres=# create database py2;
CREATE DATABASE
postgres=# create database py3;
CREATE DATABASE
postgres=# \c py2
You are now connected to database "py2" as user "postgres".
py2=# create extension plpython2u;
CREATE EXTENSION
py2=# \c py3
You are now connected to database "py3" as user "postgres".
py3=# create extension plpython3u;
ERROR:  could not load library "/usr/local/pgsql/lib/plpython3.so": /usr/local/p
gsql/lib/plpython3.so: undefined symbol: PyString_AsString



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

2014-11-07 Thread pbj

On Thu, Nov 06, 2014 at 02:55:20PM +, Shaun Thomas wrote:
> 
> These updates aren't equivalent. It's very important you know this, because 
> you're also inflating your table with a lot of extra updated rows.
> 
> Take the first UPDATE:
> 
> > UPDATE second SET time1 = orig.time1
> > FROM orig
> > WHERE second.key1 = orig.key1;
> 
> If you wrote this as a SELECT, it would look like this:
> 
> SELECT second.time1, orig.time1
>   FROM second
>   JOIN ORIG ON (second.key1 = orig.key1)
> 
> Since second is a many to one subset of orig, you now have several 
> simultaneous updates. Your second UPDATE:
> 
> > UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
> > WHERE orig.key1 = second.key1 LIMIT 1);
> 
> Is equivalent to this SELECT:
> 
> SELECT second.time1,
>(SELECT orig.time1 FROM orig,second
>  WHERE orig.key1 = second.key1 LIMIT 1)
>   FROM second;
> 
> Meaning you'd only get as many updates as there are rows in second. The 
> difference is your LIMIT 1. However, since you're not using an ORDER BY 
> clause, the actual value you get for time1 will be indeterminate. Something 
> like this would remove the row inflation and fix the random time1 behavior, 
> but I'm not sure it was your intent:
> 
> UPDATE second
>SET time1 = orig.time1
>   FROM (SELECT DISTINCT ON (key1) key1, time1
>   FROM orig
>  ORDER BY key1, time1 DESC) sub
>  WHERE second.key1 = sub.key1;

I see now that I made more than one mistake.

1) I forgot to INCLUDE INDEXES when creating second.  I would have
   seen dup keys when filling it.

CREATE TABLE second (LIKE orig INCLUDING INDEXES);

2) I should have used something like this to fill second:

INSERT INTO second (key1)
SELECT key1 FROM orig
ORDER BY random()
LIMIT 40;

3) I then incorrectly remembered the query I had written at work.  It
   should have been:

EXPLAIN ANALYZE
UPDATE second se SET time1 = (SELECT time1 FROM orig
WHERE orig.key1 = se.key1);


Once the second table is filled with unique keys, then both UPDATES
should have produced the same results, but the UPDATE FROM is faster
than the UPDATE = SELECT, which is documented.

My original intent was to find out what the performance differences
between the two are.

Thanks for pointing these things out!
PJ


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

2014-11-03 Thread pbj

  >
 > On Mon, 11/3/14, Igor Neyman  wrote:
 >
 >  -Original Message-
 >  From: pgsql-general-ow...@postgresql.org
 >  [mailto:pgsql-general-ow...@postgresql.org]  
 >  On Behalf Of p...@cmicdo.com
 >  Sent: Monday, November 03, 2014 11:34 AM
 >  To: pgsql-general@postgresql.org
 >  Subject: [SPAM] - [GENERAL] Performance of UPDATE SET = FROM
 >  vs UPDATE SET = (SELECT ...)
 >  
 >  Why does the UPDATE SET = FROM choose a more poorly
 >  performing plan than the UPDATE SET = (SELECT ...)?  It
 >  seems to me that it is the same join. 
 >
 >  I'm using 9.3.5.
 >
 >  CREATE TABLE orig
 >  (   
 >  key1VARCHAR(11)
 >  PRIMARY KEY,   
 >  time1   TIME
 >  );
 >
 >  INSERT INTO orig (key1, time1)
 >  SELECT
 >  a::TEXT,
 >  (((random()*100)::INT %
 >  24)::TEXT || ':' ||
 >  ((random()*100)::INT %
 >  60)::TEXT)::TIME FROM generate_series(800,
 >  8000200) a;
 >
 >  CREATE INDEX odx ON orig(key1);
 >
 >  CREATE TABLE second (LIKE orig); 
 >
 >  INSERT INTO second (key1)  
 >  SELECT
 >  (800+(((random()*100)::INT) % 100))::TEXT
 >  FROM generate_series(1,40);
 >  
 >  EXPLAIN ANALYZE
 >  UPDATE second SET time1 = orig.time1
 >  FROM orig
 >  WHERE second.key1 = orig.key1;  
 >  
 [.]
 >  
 >  UPDATE second SET time1 = NULL;
 >  
 >  EXPLAIN ANALYZE
 >  UPDATE second SET time1 = (SELECT orig.time1 FROM
 >  orig,second
 >
 >  WHERE orig.key1 = second.key1
 >  LIMIT 1);   
 >
 [.]
 >  
 >  These 2 queries are not the same.
  
 >  
 >  The first query updates rows in the "second" table with the
 >  orig.time1 values based on key1 column match.
 >  The second query finds first possible match (based on key1
 >  column) and assigns orig.time1 value from the matched row to
 >  every record in "second" table.
 >  
 >  Regards,
 >  Igor Neyman
 
I see that now.  I was trying to reproduce something from work from
memory and got tripped up on a sublety of UPDATE ... SELECT.  The query
I ran at work was like this:
 
EXPLAIN ANALYZE
UPDATE second se SET time1 = (SELECT time1 FROM orig
WHERE orig.key1 = se.key1);

  QUERY PLAN
--
 Update on second se  (cost=0.00..3390627.00 rows=40 width=18) (actual 
time=18698.795..18698.795 rows=0 loops=1)
   ->  Seq Scan on second se  (cost=0.00..3390627.00 rows=40 width=18) 
(actual time=7.558..16694.600 rows=40 loops=1)
 SubPlan 1
   ->  Index Scan using odx on orig  (cost=0.43..8.45 rows=1 width=8) 
(actual time=0.033..0.035 rows=1 loops=40)
 Index Cond: ((key1)::text = (se.key1)::text)
 Total runtime: 18698.865 ms
(6 rows)

This does correctly match and update all of the second table entries.
The plan actually runs longer than the UPDATE ... FROM, which squares
with a comment the fine manual.

Thanks!
PJ




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] Performance of UPDATE SET = FROM vs UPDATE SET = (SELECT ...)

2014-11-03 Thread pbj
Why does the UPDATE SET = FROM choose a more poorly performing plan than
the UPDATE SET = (SELECT ...)?  It seems to me that it is the same join.
 
I'm using 9.3.5.  

CREATE TABLE orig
(
key1VARCHAR(11) PRIMARY KEY,
time1   TIME
);

INSERT INTO orig (key1, time1)
SELECT 
a::TEXT,
(((random()*100)::INT % 24)::TEXT || ':' ||
((random()*100)::INT % 60)::TEXT)::TIME
FROM generate_series(800, 8000200) a;

CREATE INDEX odx ON orig(key1);

CREATE TABLE second (LIKE orig);
 
INSERT INTO second (key1) 
SELECT (800+(((random()*100)::INT) % 100))::TEXT
FROM generate_series(1,40);

EXPLAIN ANALYZE  
UPDATE second SET time1 = orig.time1
FROM orig
WHERE second.key1 = orig.key1;

 QUERY PLAN 

 Update on second  (cost=69461.02..106082.02 rows=40 width=32) (actual 
time=16033.023..16033.023 rows=0 loops=1)
   ->  Hash Join  (cost=69461.02..106082.02 rows=40 width=32) (actual 
time=7698.445..12992.039 rows=40 loops=1)
 Hash Cond: ((second.key1)::text = (orig.key1)::text)
 ->  Seq Scan on second  (cost=0.00..12627.00 rows=40 width=18) 
(actual time=49.820..791.397 rows=40 loops=1)
 ->  Hash  (cost=31765.01..31765.01 rows=201 width=26) (actual 
time=7648.540..7648.540 rows=201 loops=1)
   Buckets: 4096  Batches: 128  Memory Usage: 717kB
   ->  Seq Scan on orig  (cost=0.00..31765.01 rows=201 
width=26) (actual time=0.014..3655.844 rows=201 loops=1)
 Total runtime: 16033.193 ms
(8 rows)

UPDATE second SET time1 = NULL;

EXPLAIN ANALYZE
UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
WHERE orig.key1 = second.key1 LIMIT 1);


   QUERY PLAN

 Update on second  (cost=3.60..19078.19 rows=1279959 width=18) (actual 
time=4642.453..4642.453 rows=0 loops=1)
   InitPlan 1 (returns $1)
 ->  Limit  (cost=0.43..3.60 rows=1 width=8) (actual time=2.611..2.613 
rows=1 loops=1)
   ->  Nested Loop  (cost=0.43..4056331.83 rows=1279959 width=8) 
(actual time=2.606..2.606 rows=1 loops=1)
 ->  Seq Scan on second second_1  (cost=0.00..19074.59 
rows=1279959 width=12) (actual time=2.487..2.487 rows=1 loops=1)
 ->  Index Scan using odx on orig  (cost=0.43..3.14 rows=1 
width=20) (actual time=0.098..0.098 rows=1 loops=1)
   Index Cond: ((key1)::text = (second_1.key1)::text)
   ->  Seq Scan on second  (cost=0.00..19074.59 rows=1279959 width=18) (actual 
time=6.420..817.739 rows=40 loops=1)
 Total runtime: 4642.561 ms
(9 rows)




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] BDR Global Sequences

2014-09-30 Thread pbj
Hi Andres,

 > Hi,
 >
 > On 2014-09-29 13:52:52 -0700, p...@cmicdo.com wrote:
 >> I have a question about BDR Global Sequences.
 >>
[deleted]
 >> Is there way to increase a global sequence's reservation block for each
 >> node so that I can tell the nodes, "I'm going to load 100M rows now so
 >> you should get ready for that."?
 >
 >
 > Not yet, but we're planning to add that.

Good to hear.  In the meantime, is there something I can hack to force
the nodes to make a sequence allocation of my choosing (even if just
hardwired?)  I was playing with start_elections_sql where it says:

generate_series(\n"
current_max,\n"
-- 1000 is the chunk size, -1 is to get < instead <= out of 
generate_series\n"
current_max + 1000 * (5 - open_seq_chunks) - 1,\n"
1000) chunk_start\n"

and hoping that bumping up the 1000 would help, but apparently not.

PJ
   
 > 
 > Greetings,
 > 
 > Andres Freund
 > 
 > --
 > Andres Freund   http://www.2ndQuadrant.com/
 > PostgreSQL Development, 24x7 Support, Training & Services
 >



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] BDR Global Sequences

2014-09-29 Thread pbj
I have a question about BDR Global Sequences.

I've been playing with BDR on PG 9.4beta2, built from source from the
2nd Quadrant GIT page (git://git.postgresql.org/git/2ndquadrant_bdr.git).

When trying a 100 row \copy-in, letting PG choose the global sequence
values, I get "ERROR:  could not find free sequence value for global
sequence public.myseq", as documented...no surprise there.  However, the
number of rows I can load before the error varies wildly with each trial.

Is there way to increase a global sequence's reservation block for each
node so that I can tell the nodes, "I'm going to load 100M rows now so
you should get ready for that."?

PJ



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general