Re: [GENERAL] Allow disabling folding of unquoted identifiers to lowercase

2016-05-06 Thread Guillaume Lelarge
Le 3 mai 2016 7:01 PM, "Evgeny Morozov" 
a écrit :
>
> That's an interesting idea! The client users would use is probably
pgAdmin. I don't know whether pgAdmin parses the query, though. If it does
then it should be relatively easy to add this. If not, I'd imagine it's not
going to happen.
>

The pgAdmin query tool doesn't parse the query. It sends to the server
without parsing itself.

> On 2 May 2016 at 13:59, John McKown  wrote:
>>
>> On Mon, May 2, 2016 at 3:03 AM, Evgeny Morozov <
evgeny.morozov+list+pg...@shift-technology.com> wrote:
>>>
>>> On 30 April 2016 at 01:31, Tom Lane  wrote:

 Yeah, this isn't going to happen.  Years ago we looked into what it
would
 take to be able to flip a switch and have the standard-compliant
behavior
 (fold to upper not lower).  It was impractical then and no doubt is far
 more so now.  I do not remember all the details, but there were
multiple
 pain points even in terms of server-side implementation, never mind all
 the applications we'd break.
>>>
>>>
>>> Alright, thanks to everyone for looking into this. Not knowing the
code, I naively assumed it should be easy to add an option to not do the
case folding.
>>>


 What the OP is asking for doesn't even have the argument "but it's
 standards compliant!" going for it.

>>> Indeed, my argument was it would allow people to choose their own
naming convention to (easily) use with Postgres, which would in turn make
it easier to migrate from other RDBMSes to Postgres. Although, if you want
a standard compliance argument, that can easily be added. :) Just have 3
options: fold to lowercase (current behaviour, default), fold to uppercase
(standards compliant), do not fold (most flexible, compatible with MSSQL
and MySQL).
>>>
>>> > So I doubt we'd accept such a patch even if someone managed to create
one.
>>>
>>> Well, I was going to ask if paying someone to fix this was an option,
but this preempts that!
>>
>>
>> ​I have a silly idea. IIRC, your original problem is that your users are
in the habit of entering something like:
>>
>> select ... from SomeTable ...
>>
>> And MySQL would actually use the name "SomeTable" (case preserving) and
not "sometable" (PostgreSQL) or "SOMETABLE" (SQL standard). What program
are the users actually using to do the select? If it is something like
"psql", perhaps it would actually be easier to create a modified version
which automatically inserts the " marks for them. Of course, you are now
doing double parsing of the SQL. First in the client, to modify it before
sending to the server. Then again in the server. OK, maybe it is going too
far. I guess this might be a "quote everything which is not a keyword"
option for a psql replacement. Or whatever the front end is that the users
use.
>>
>>
>> --
>> The unfacts, did we have them, are too imprecisely few to warrant our
certitude.
>>
>> Maranatha! <><
>> John McKown
>
>


Re: [GENERAL] Very slow update / hash join

2016-05-06 Thread Kurt Roeckx
On Fri, May 06, 2016 at 10:25:34AM -0700, Jeff Janes wrote:
> 
> OK, so it sounds like what is happening is that your update cannot do
> a "Heap-Only Tuple" (HOT) update, because there is not enough room in
> each data page for the new copy of rows being updated.  So it is
> forced to put the new copy on a different page, which means it has to
> update all the indexes so they know where to find the new version.

That makes total sense now.

> If this not a one-time event, then one thing you could do is lower the
> table's fillfactor, so that the table is more loosely packed and
> future updates are more likely to be able to do HOT updates.  If the
> rows being updated are randomly scattered, it wouldn' take much
> lowering to make this happen (maybe 90).  But if the rows being
> updated in a single transaction are co-located with each other, then
> you might have to lower it to below 50 before it would solve the
> problem, which might be a solution worse than the problem.  When you
> change the parameter, it won't take full effect until the table has
> been completely rewritten, either due to natural churn, or running a
> VACUUM FULL or CLUSTER.

I will probably want to run this a few times.  The data being
updated comes from an external tool and once I add new things or
fix bug in it I would like to update the old rows.  It's normally an
insert/select only table.

But there are only about 20M of the 133M current rows (about 15%)
that I'm really interested in.  So I guess something like an 85%
fillfactor might actually help.

> If this giant update does not have to occur atomically in order for
> your application to behave correctly, then I would probably break it
> up into a series of smaller transactions.  Then you could even run
> them in parallel, which would be a big help if you have a RAID (which
> can efficiently work on multiple random IO read requests in parallel)
> but not help so much if you have a single disk.

I don't care about it being atomic or not.  I actually tried to do
it in smaller batches before and I ended up calculating that it
would take 2 weeks to do the update.



Kurt



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Very slow update / hash join

2016-05-06 Thread Jeff Janes
On Fri, May 6, 2016 at 3:21 AM, Kurt Roeckx  wrote:
> On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote:
>> On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote:
>> > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx  wrote:
>> > > Hi,
>> > >
>> > > I have an update query that's been running for 48 hours now.
>> > > Since it started it used about 2.5% CPU, and is writing to the
>> > > disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
>> > > waiting for the disks.
>> >
>> > The easiest way to figure out what is going on is to identify the
>> > process, and then trace it with something like:
>> >
>> > strace -T -ttt -y -p 
>> >
>> > That should make it obvious which file it is waiting for IO on.  Then
>> > you can look up that relfilenode in pg_class to see what table/index
>> > it is.
>>
>> Thanks for the hint, that I didn't think about it.
>>
>> So it's busy reading all the index files including the primary
>> key, and only writing to the table I'm updating.

It is probably dirtying the indexes as well, but the writes are being
buffered in shared_buffers.  The dirty blocks will eventually get
written by the background writer or the checkpoint writer (which you
won't see if you aren't tracing them).


>> > What version of PostgreSQL are you using?  Have you tried dropping the
>> > foreign keys?
>>
>> I'm using 9.5.2.
>>
>> So I think the foreign keys are unrelated now.  They all obviously
>> point to the primary key that's not changing, and it's reading all
>> the index on the table itself, not those on the other tables.
>>
>> It's kind of annoying that I would need to drop the indexes that
>> aren't modified just to run an update query.
>
> I dropped all the index except for the primary key.  It was still
> as slow when it started, but then I forced the primary key into
> the filesystem cache and it seems to be much happier now, average
> reading at about 10 MB/s, writing at 30 MB/s.


OK, so it sounds like what is happening is that your update cannot do
a "Heap-Only Tuple" (HOT) update, because there is not enough room in
each data page for the new copy of rows being updated.  So it is
forced to put the new copy on a different page, which means it has to
update all the indexes so they know where to find the new version.

If this not a one-time event, then one thing you could do is lower the
table's fillfactor, so that the table is more loosely packed and
future updates are more likely to be able to do HOT updates.  If the
rows being updated are randomly scattered, it wouldn' take much
lowering to make this happen (maybe 90).  But if the rows being
updated in a single transaction are co-located with each other, then
you might have to lower it to below 50 before it would solve the
problem, which might be a solution worse than the problem.  When you
change the parameter, it won't take full effect until the table has
been completely rewritten, either due to natural churn, or running a
VACUUM FULL or CLUSTER.

If lowering the fillfactor isn't a good solution, then pre-warming
your indexes (using pg_prewarm or just doing it yourself from the
filesystem) might be your best bet.  If not all of the relevant
indexes can fit in cache at the same time, then you might still have
to drop some of them, or just be patient.

If this giant update does not have to occur atomically in order for
your application to behave correctly, then I would probably break it
up into a series of smaller transactions.  Then you could even run
them in parallel, which would be a big help if you have a RAID (which
can efficiently work on multiple random IO read requests in parallel)
but not help so much if you have a single disk.

Cheers,

Jeff


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Very slow update / hash join

2016-05-06 Thread Kurt Roeckx
On Fri, May 06, 2016 at 09:13:09AM -0500, Steven Lembark wrote:
> 
> > > It's kind of annoying that I would need to drop the indexes that
> > > aren't modified just to run an update query.  
> > 
> > I dropped all the index except for the primary key.  It was still
> > as slow when it started, but then I forced the primary key into
> > the filesystem cache and it seems to be much happier now, average
> > reading at about 10 MB/s, writing at 30 MB/s.

It finished in 2 hours.

> Look at the PG tuning doc's.
> It seems as if you have too little index cache assigned for the
> size of your database (and its indexes). Dropping indexes isn't
> the real answer, tuning the database to accomodate them is a 
> better bet.

As far as I know, the only way to improve this is to make the
shared_buffers larger.  But the database is multiple times the
total RAM. Even that table itself is, and all the combined indexes
are too.

And I do agree that dropping the indexes isn't a good idea, but it
seems to be the only way to get it done in a reasonable time.

> It would also be worth checking whether the I/O was entirely due
> to sequential reads or may have been swapping. procinfo, vmstat,
> or just top can tell you about that.

There was no swapping.

The point is that it wasn't doing sequential reads but was
randomly accessing indexes for what looks like no good reason to
me.


Kurt



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Very slow update / hash join

2016-05-06 Thread Steven Lembark

> > It's kind of annoying that I would need to drop the indexes that
> > aren't modified just to run an update query.  
> 
> I dropped all the index except for the primary key.  It was still
> as slow when it started, but then I forced the primary key into
> the filesystem cache and it seems to be much happier now, average
> reading at about 10 MB/s, writing at 30 MB/s.

Look at the PG tuning doc's.
It seems as if you have too little index cache assigned for the
size of your database (and its indexes). Dropping indexes isn't
the real answer, tuning the database to accomodate them is a 
better bet.

It would also be worth checking whether the I/O was entirely due
to sequential reads or may have been swapping. procinfo, vmstat,
or just top can tell you about that.

-- 
Steven Lembark 3646 Flora Pl
Workhorse Computing   St Louis, MO 63110
lemb...@wrkhors.com  +1 888 359 3508


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Debian and Postgres

2016-05-06 Thread Adrian Klaver

On 05/05/2016 07:29 PM, rob stone wrote:

Hello Adrian,On Thu, 2016-05-05 at 13:47 -0700, Adrian Klaver wrote:


Exactly. Showing the list the error you get when you cannot connect
help
may with solving that problem and save you a great of time. What
have
you got to lose?



I have nothing to "lose".
There is NO error, per se. The progress bars just keep churning and
absolutely nothing happens. All you can do is cancel.
Nothing in the log files. No exceptions thrown. A black hole. I've


That would be the Postgres logs or something else?


waited minutes to see if it can connect, but no. I tried running on the
9.4 cluster but the same thing.


So there is more then one cluster on the machine?

How where they installed?

Leaving aside JDBC/application can you connect to each using psql?

If you use psql with the parameters that fail for the application can it 
connect?




I appreciate all your suggestions.
Tomorrow I'll talk to a friend of mine who is a Java guru to see if it
is possible to force some kind of stack trace or something that will
provide a clue as to what is happening. If you can't find the driver
you'd expect a DriverManager exception to be thrown or maybe a
ClassNotFound. Not even this. It's the lack of any error message
anywhere that is frustating.

Cheers,
Rob




--
Adrian Klaver
adrian.kla...@aklaver.com


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Function PostgreSQL 9.2

2016-05-06 Thread Berend Tober

drum.lu...@gmail.com wrote:

It's working now...

Final code:

ALTER TABLE public.companies ADD COLUMN client_code_increment integer;
ALTER TABLE public.companies ALTER COLUMN client_code_increment SET NOT 
NULL;
ALTER TABLE public.companies ALTER COLUMN client_code_increment SET DEFAULT 
1000;
COMMIT TRANSACTION;

BEGIN;
-- Creating the function
CREATE OR REPLACE FUNCTION users_code_seq()
RETURNS "trigger" AS $$
DECLARE code character varying;
BEGIN
-- if it's an insert, then we update the client_code_increment column value 
to +1
 IF (TG_OP = 'INSERT') THEN  ...



Think about it, will (TG_OP = 'INSERT') ever be untrue.






--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Very slow update / hash join

2016-05-06 Thread Kurt Roeckx
On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote:
> On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote:
> > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx  wrote:
> > > Hi,
> > >
> > > I have an update query that's been running for 48 hours now.
> > > Since it started it used about 2.5% CPU, and is writing to the
> > > disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
> > > waiting for the disks.
> > 
> > The easiest way to figure out what is going on is to identify the
> > process, and then trace it with something like:
> > 
> > strace -T -ttt -y -p 
> > 
> > That should make it obvious which file it is waiting for IO on.  Then
> > you can look up that relfilenode in pg_class to see what table/index
> > it is.
> 
> Thanks for the hint, that I didn't think about it.
> 
> So it's busy reading all the index files including the primary
> key, and only writing to the table I'm updating.
> 
> > What version of PostgreSQL are you using?  Have you tried dropping the
> > foreign keys?
> 
> I'm using 9.5.2.
> 
> So I think the foreign keys are unrelated now.  They all obviously
> point to the primary key that's not changing, and it's reading all
> the index on the table itself, not those on the other tables.
> 
> It's kind of annoying that I would need to drop the indexes that
> aren't modified just to run an update query.

I dropped all the index except for the primary key.  It was still
as slow when it started, but then I forced the primary key into
the filesystem cache and it seems to be much happier now, average
reading at about 10 MB/s, writing at 30 MB/s.


Kurt



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] Very slow update / hash join

2016-05-06 Thread Kurt Roeckx
On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote:
> On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx  wrote:
> > Hi,
> >
> > I have an update query that's been running for 48 hours now.
> > Since it started it used about 2.5% CPU, and is writing to the
> > disk at about 3 MB/s, and reading at about 2 MB/s.  It's mostly
> > waiting for the disks.
> 
> The easiest way to figure out what is going on is to identify the
> process, and then trace it with something like:
> 
> strace -T -ttt -y -p 
> 
> That should make it obvious which file it is waiting for IO on.  Then
> you can look up that relfilenode in pg_class to see what table/index
> it is.

Thanks for the hint, that I didn't think about it.

So it's busy reading all the index files including the primary
key, and only writing to the table I'm updating.

> What version of PostgreSQL are you using?  Have you tried dropping the
> foreign keys?

I'm using 9.5.2.

So I think the foreign keys are unrelated now.  They all obviously
point to the primary key that's not changing, and it's reading all
the index on the table itself, not those on the other tables.

It's kind of annoying that I would need to drop the indexes that
aren't modified just to run an update query.


Kurt



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] xml-file as foreign table?

2016-05-06 Thread Arjen Nienhuis
On Thu, May 5, 2016 at 2:13 PM, Johann Spies  wrote:
> Dankie Arjen,
>
> On 29 April 2016 at 07:01, Arjen Nienhuis  wrote:
>
>>
>> > The options I am considering is :
>> >
>> > 1. Unpack the individual records (will be more than 50 million) using
>> > something like python with lxml and psycopg2 and insert them after dropping
>> > all indexes and triggers on the table
>> >
>> > 2. Unpack the individual records and write a (very) large tsv-file and
>> > then insert it using 'copy'
>> >
>>
>> The fastest way I found is to combine these two. Using iterparse from lxml
>> combined with load_rows and COPY from py-postgresql:
>>
>> http://python.projects.pgfoundry.org/docs/1.1/driver.html#copy-statements
>>
>> That way you can stream the data.
>
>
> I did not know about py-postgresql as I am a Python-2.7 user.
>
> I am excited with the possiilities python-postgresql is offering. After a
> bit of experimenting to use the streaming-copy option I landed up in the
> same problem that I had while using python2: Postgresql rejecting the
> xml-part of the record.
>
> How do you handle the conversion from string to bytes and back and the
> presence of quotes within the xml?
>
> I have tried this to experiment with just 10 records for a start:
>
> ut = element.xpath('.//t:UID/text()',namespaces=namespaces)[0]
> x = etree.tostring(element)
> y = x.decode(encoding='UTF-8').replace("'", "''")
> s  = '\t'.join([gzipf,filename, ut,y])
> t =  s.encode(encoding='UTF-8')
> rows.append(t)
> count += 1
> element.clear()
> gc.collect()
> if count == 10:
> import pdb;pdb.set_trace()
> xmlin.load_rows(rows)
> rows = []
> f.close()
> exit(0)
>
> Which ends up with an error:
>
> postgresql.exceptions.XMLContentError: invalid XML content
>   CODE: 2200N
>   LOCATION: File 'xml.c', line 1551, in xml_ereport from SERVER
>   CONTEXT: COPY annual, line 1, column xml: " xmlns="http://scientific.thomsonreuters.com/schema/wok5.4/public/FullRecord;
> r_id_disclaimer="R..."
>   DETAIL: line 1: Premature end of data in tag REC line 1

If I need to guess it's because of \n in the xml. I used "WITH (FORMAT
csv)" and quote the fields:

def q(v):
 return b'"' + v.replace(b'"', b'""') + b'"'

return b','.join(q(f) for f in fields) + b'\n'

In the end I also had some other problems with the XML (namespaces), so I used:

etree.tostring(element, method='c14n', exclusive=True)

>
> With python2.7 I tried to use the same technique using subprocess and a call
> to psql to pipe the data to Postgresql -- ending with the same error.
>

Maybe you can show a few lines of the output.

> Dankie nogmaals vir die verwysing na python-postgresql.
>
> Mooi dag.
>
> Johann
>

Groeten, Arjen


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general