Re: [GENERAL] Allow disabling folding of unquoted identifiers to lowercase
Le 3 mai 2016 7:01 PM, "Evgeny Morozov"a écrit : > > That's an interesting idea! The client users would use is probably pgAdmin. I don't know whether pgAdmin parses the query, though. If it does then it should be relatively easy to add this. If not, I'd imagine it's not going to happen. > The pgAdmin query tool doesn't parse the query. It sends to the server without parsing itself. > On 2 May 2016 at 13:59, John McKown wrote: >> >> On Mon, May 2, 2016 at 3:03 AM, Evgeny Morozov < evgeny.morozov+list+pg...@shift-technology.com> wrote: >>> >>> On 30 April 2016 at 01:31, Tom Lane wrote: Yeah, this isn't going to happen. Years ago we looked into what it would take to be able to flip a switch and have the standard-compliant behavior (fold to upper not lower). It was impractical then and no doubt is far more so now. I do not remember all the details, but there were multiple pain points even in terms of server-side implementation, never mind all the applications we'd break. >>> >>> >>> Alright, thanks to everyone for looking into this. Not knowing the code, I naively assumed it should be easy to add an option to not do the case folding. >>> What the OP is asking for doesn't even have the argument "but it's standards compliant!" going for it. >>> Indeed, my argument was it would allow people to choose their own naming convention to (easily) use with Postgres, which would in turn make it easier to migrate from other RDBMSes to Postgres. Although, if you want a standard compliance argument, that can easily be added. :) Just have 3 options: fold to lowercase (current behaviour, default), fold to uppercase (standards compliant), do not fold (most flexible, compatible with MSSQL and MySQL). >>> >>> > So I doubt we'd accept such a patch even if someone managed to create one. >>> >>> Well, I was going to ask if paying someone to fix this was an option, but this preempts that! >> >> >> I have a silly idea. IIRC, your original problem is that your users are in the habit of entering something like: >> >> select ... from SomeTable ... >> >> And MySQL would actually use the name "SomeTable" (case preserving) and not "sometable" (PostgreSQL) or "SOMETABLE" (SQL standard). What program are the users actually using to do the select? If it is something like "psql", perhaps it would actually be easier to create a modified version which automatically inserts the " marks for them. Of course, you are now doing double parsing of the SQL. First in the client, to modify it before sending to the server. Then again in the server. OK, maybe it is going too far. I guess this might be a "quote everything which is not a keyword" option for a psql replacement. Or whatever the front end is that the users use. >> >> >> -- >> The unfacts, did we have them, are too imprecisely few to warrant our certitude. >> >> Maranatha! <>< >> John McKown > >
Re: [GENERAL] Very slow update / hash join
On Fri, May 06, 2016 at 10:25:34AM -0700, Jeff Janes wrote: > > OK, so it sounds like what is happening is that your update cannot do > a "Heap-Only Tuple" (HOT) update, because there is not enough room in > each data page for the new copy of rows being updated. So it is > forced to put the new copy on a different page, which means it has to > update all the indexes so they know where to find the new version. That makes total sense now. > If this not a one-time event, then one thing you could do is lower the > table's fillfactor, so that the table is more loosely packed and > future updates are more likely to be able to do HOT updates. If the > rows being updated are randomly scattered, it wouldn' take much > lowering to make this happen (maybe 90). But if the rows being > updated in a single transaction are co-located with each other, then > you might have to lower it to below 50 before it would solve the > problem, which might be a solution worse than the problem. When you > change the parameter, it won't take full effect until the table has > been completely rewritten, either due to natural churn, or running a > VACUUM FULL or CLUSTER. I will probably want to run this a few times. The data being updated comes from an external tool and once I add new things or fix bug in it I would like to update the old rows. It's normally an insert/select only table. But there are only about 20M of the 133M current rows (about 15%) that I'm really interested in. So I guess something like an 85% fillfactor might actually help. > If this giant update does not have to occur atomically in order for > your application to behave correctly, then I would probably break it > up into a series of smaller transactions. Then you could even run > them in parallel, which would be a big help if you have a RAID (which > can efficiently work on multiple random IO read requests in parallel) > but not help so much if you have a single disk. I don't care about it being atomic or not. I actually tried to do it in smaller batches before and I ended up calculating that it would take 2 weeks to do the update. Kurt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very slow update / hash join
On Fri, May 6, 2016 at 3:21 AM, Kurt Roeckxwrote: > On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote: >> On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote: >> > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckx wrote: >> > > Hi, >> > > >> > > I have an update query that's been running for 48 hours now. >> > > Since it started it used about 2.5% CPU, and is writing to the >> > > disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly >> > > waiting for the disks. >> > >> > The easiest way to figure out what is going on is to identify the >> > process, and then trace it with something like: >> > >> > strace -T -ttt -y -p >> > >> > That should make it obvious which file it is waiting for IO on. Then >> > you can look up that relfilenode in pg_class to see what table/index >> > it is. >> >> Thanks for the hint, that I didn't think about it. >> >> So it's busy reading all the index files including the primary >> key, and only writing to the table I'm updating. It is probably dirtying the indexes as well, but the writes are being buffered in shared_buffers. The dirty blocks will eventually get written by the background writer or the checkpoint writer (which you won't see if you aren't tracing them). >> > What version of PostgreSQL are you using? Have you tried dropping the >> > foreign keys? >> >> I'm using 9.5.2. >> >> So I think the foreign keys are unrelated now. They all obviously >> point to the primary key that's not changing, and it's reading all >> the index on the table itself, not those on the other tables. >> >> It's kind of annoying that I would need to drop the indexes that >> aren't modified just to run an update query. > > I dropped all the index except for the primary key. It was still > as slow when it started, but then I forced the primary key into > the filesystem cache and it seems to be much happier now, average > reading at about 10 MB/s, writing at 30 MB/s. OK, so it sounds like what is happening is that your update cannot do a "Heap-Only Tuple" (HOT) update, because there is not enough room in each data page for the new copy of rows being updated. So it is forced to put the new copy on a different page, which means it has to update all the indexes so they know where to find the new version. If this not a one-time event, then one thing you could do is lower the table's fillfactor, so that the table is more loosely packed and future updates are more likely to be able to do HOT updates. If the rows being updated are randomly scattered, it wouldn' take much lowering to make this happen (maybe 90). But if the rows being updated in a single transaction are co-located with each other, then you might have to lower it to below 50 before it would solve the problem, which might be a solution worse than the problem. When you change the parameter, it won't take full effect until the table has been completely rewritten, either due to natural churn, or running a VACUUM FULL or CLUSTER. If lowering the fillfactor isn't a good solution, then pre-warming your indexes (using pg_prewarm or just doing it yourself from the filesystem) might be your best bet. If not all of the relevant indexes can fit in cache at the same time, then you might still have to drop some of them, or just be patient. If this giant update does not have to occur atomically in order for your application to behave correctly, then I would probably break it up into a series of smaller transactions. Then you could even run them in parallel, which would be a big help if you have a RAID (which can efficiently work on multiple random IO read requests in parallel) but not help so much if you have a single disk. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very slow update / hash join
On Fri, May 06, 2016 at 09:13:09AM -0500, Steven Lembark wrote: > > > > It's kind of annoying that I would need to drop the indexes that > > > aren't modified just to run an update query. > > > > I dropped all the index except for the primary key. It was still > > as slow when it started, but then I forced the primary key into > > the filesystem cache and it seems to be much happier now, average > > reading at about 10 MB/s, writing at 30 MB/s. It finished in 2 hours. > Look at the PG tuning doc's. > It seems as if you have too little index cache assigned for the > size of your database (and its indexes). Dropping indexes isn't > the real answer, tuning the database to accomodate them is a > better bet. As far as I know, the only way to improve this is to make the shared_buffers larger. But the database is multiple times the total RAM. Even that table itself is, and all the combined indexes are too. And I do agree that dropping the indexes isn't a good idea, but it seems to be the only way to get it done in a reasonable time. > It would also be worth checking whether the I/O was entirely due > to sequential reads or may have been swapping. procinfo, vmstat, > or just top can tell you about that. There was no swapping. The point is that it wasn't doing sequential reads but was randomly accessing indexes for what looks like no good reason to me. Kurt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very slow update / hash join
> > It's kind of annoying that I would need to drop the indexes that > > aren't modified just to run an update query. > > I dropped all the index except for the primary key. It was still > as slow when it started, but then I forced the primary key into > the filesystem cache and it seems to be much happier now, average > reading at about 10 MB/s, writing at 30 MB/s. Look at the PG tuning doc's. It seems as if you have too little index cache assigned for the size of your database (and its indexes). Dropping indexes isn't the real answer, tuning the database to accomodate them is a better bet. It would also be worth checking whether the I/O was entirely due to sequential reads or may have been swapping. procinfo, vmstat, or just top can tell you about that. -- Steven Lembark 3646 Flora Pl Workhorse Computing St Louis, MO 63110 lemb...@wrkhors.com +1 888 359 3508 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Debian and Postgres
On 05/05/2016 07:29 PM, rob stone wrote: Hello Adrian,On Thu, 2016-05-05 at 13:47 -0700, Adrian Klaver wrote: Exactly. Showing the list the error you get when you cannot connect help may with solving that problem and save you a great of time. What have you got to lose? I have nothing to "lose". There is NO error, per se. The progress bars just keep churning and absolutely nothing happens. All you can do is cancel. Nothing in the log files. No exceptions thrown. A black hole. I've That would be the Postgres logs or something else? waited minutes to see if it can connect, but no. I tried running on the 9.4 cluster but the same thing. So there is more then one cluster on the machine? How where they installed? Leaving aside JDBC/application can you connect to each using psql? If you use psql with the parameters that fail for the application can it connect? I appreciate all your suggestions. Tomorrow I'll talk to a friend of mine who is a Java guru to see if it is possible to force some kind of stack trace or something that will provide a clue as to what is happening. If you can't find the driver you'd expect a DriverManager exception to be thrown or maybe a ClassNotFound. Not even this. It's the lack of any error message anywhere that is frustating. Cheers, Rob -- Adrian Klaver adrian.kla...@aklaver.com -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Function PostgreSQL 9.2
drum.lu...@gmail.com wrote: It's working now... Final code: ALTER TABLE public.companies ADD COLUMN client_code_increment integer; ALTER TABLE public.companies ALTER COLUMN client_code_increment SET NOT NULL; ALTER TABLE public.companies ALTER COLUMN client_code_increment SET DEFAULT 1000; COMMIT TRANSACTION; BEGIN; -- Creating the function CREATE OR REPLACE FUNCTION users_code_seq() RETURNS "trigger" AS $$ DECLARE code character varying; BEGIN -- if it's an insert, then we update the client_code_increment column value to +1 IF (TG_OP = 'INSERT') THEN ... Think about it, will (TG_OP = 'INSERT') ever be untrue. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very slow update / hash join
On Fri, May 06, 2016 at 11:38:27AM +0200, Kurt Roeckx wrote: > On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote: > > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckxwrote: > > > Hi, > > > > > > I have an update query that's been running for 48 hours now. > > > Since it started it used about 2.5% CPU, and is writing to the > > > disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly > > > waiting for the disks. > > > > The easiest way to figure out what is going on is to identify the > > process, and then trace it with something like: > > > > strace -T -ttt -y -p > > > > That should make it obvious which file it is waiting for IO on. Then > > you can look up that relfilenode in pg_class to see what table/index > > it is. > > Thanks for the hint, that I didn't think about it. > > So it's busy reading all the index files including the primary > key, and only writing to the table I'm updating. > > > What version of PostgreSQL are you using? Have you tried dropping the > > foreign keys? > > I'm using 9.5.2. > > So I think the foreign keys are unrelated now. They all obviously > point to the primary key that's not changing, and it's reading all > the index on the table itself, not those on the other tables. > > It's kind of annoying that I would need to drop the indexes that > aren't modified just to run an update query. I dropped all the index except for the primary key. It was still as slow when it started, but then I forced the primary key into the filesystem cache and it seems to be much happier now, average reading at about 10 MB/s, writing at 30 MB/s. Kurt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] Very slow update / hash join
On Thu, May 05, 2016 at 09:32:28PM -0700, Jeff Janes wrote: > On Wed, May 4, 2016 at 3:22 PM, Kurt Roeckxwrote: > > Hi, > > > > I have an update query that's been running for 48 hours now. > > Since it started it used about 2.5% CPU, and is writing to the > > disk at about 3 MB/s, and reading at about 2 MB/s. It's mostly > > waiting for the disks. > > The easiest way to figure out what is going on is to identify the > process, and then trace it with something like: > > strace -T -ttt -y -p > > That should make it obvious which file it is waiting for IO on. Then > you can look up that relfilenode in pg_class to see what table/index > it is. Thanks for the hint, that I didn't think about it. So it's busy reading all the index files including the primary key, and only writing to the table I'm updating. > What version of PostgreSQL are you using? Have you tried dropping the > foreign keys? I'm using 9.5.2. So I think the foreign keys are unrelated now. They all obviously point to the primary key that's not changing, and it's reading all the index on the table itself, not those on the other tables. It's kind of annoying that I would need to drop the indexes that aren't modified just to run an update query. Kurt -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Re: [GENERAL] xml-file as foreign table?
On Thu, May 5, 2016 at 2:13 PM, Johann Spieswrote: > Dankie Arjen, > > On 29 April 2016 at 07:01, Arjen Nienhuis wrote: > >> >> > The options I am considering is : >> > >> > 1. Unpack the individual records (will be more than 50 million) using >> > something like python with lxml and psycopg2 and insert them after dropping >> > all indexes and triggers on the table >> > >> > 2. Unpack the individual records and write a (very) large tsv-file and >> > then insert it using 'copy' >> > >> >> The fastest way I found is to combine these two. Using iterparse from lxml >> combined with load_rows and COPY from py-postgresql: >> >> http://python.projects.pgfoundry.org/docs/1.1/driver.html#copy-statements >> >> That way you can stream the data. > > > I did not know about py-postgresql as I am a Python-2.7 user. > > I am excited with the possiilities python-postgresql is offering. After a > bit of experimenting to use the streaming-copy option I landed up in the > same problem that I had while using python2: Postgresql rejecting the > xml-part of the record. > > How do you handle the conversion from string to bytes and back and the > presence of quotes within the xml? > > I have tried this to experiment with just 10 records for a start: > > ut = element.xpath('.//t:UID/text()',namespaces=namespaces)[0] > x = etree.tostring(element) > y = x.decode(encoding='UTF-8').replace("'", "''") > s = '\t'.join([gzipf,filename, ut,y]) > t = s.encode(encoding='UTF-8') > rows.append(t) > count += 1 > element.clear() > gc.collect() > if count == 10: > import pdb;pdb.set_trace() > xmlin.load_rows(rows) > rows = [] > f.close() > exit(0) > > Which ends up with an error: > > postgresql.exceptions.XMLContentError: invalid XML content > CODE: 2200N > LOCATION: File 'xml.c', line 1551, in xml_ereport from SERVER > CONTEXT: COPY annual, line 1, column xml: " xmlns="http://scientific.thomsonreuters.com/schema/wok5.4/public/FullRecord; > r_id_disclaimer="R..." > DETAIL: line 1: Premature end of data in tag REC line 1 If I need to guess it's because of \n in the xml. I used "WITH (FORMAT csv)" and quote the fields: def q(v): return b'"' + v.replace(b'"', b'""') + b'"' return b','.join(q(f) for f in fields) + b'\n' In the end I also had some other problems with the XML (namespaces), so I used: etree.tostring(element, method='c14n', exclusive=True) > > With python2.7 I tried to use the same technique using subprocess and a call > to psql to pipe the data to Postgresql -- ending with the same error. > Maybe you can show a few lines of the output. > Dankie nogmaals vir die verwysing na python-postgresql. > > Mooi dag. > > Johann > Groeten, Arjen -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general