Re: dsa_allocate() faliure

2019-02-18 Thread Jakub Glapa
Hi I just checked the dmesg. The segfault I wrote about is the only one I see, dated Nov 24 last year. Since then no other segfaults happened although dsa_allocated failures happen daily. I'll report if anything occurs. I have the core dumping setup in place. -- regards, pozdrawiam, Jakub Glapa

Re: dsa_allocate() faliure

2019-02-17 Thread Justin Pryzby
Hi, On Mon, Nov 26, 2018 at 09:52:07AM -0600, Justin Pryzby wrote: > Hi, thanks for following through. > > On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote: > > I had a look at dmesg and indeed I see something like: > > > > postgres[30667]: segfault at 0 ip 557834264b16 sp 7ff

Re: dsa_allocate() faliure

2019-02-04 Thread Justin Pryzby
On Mon, Feb 04, 2019 at 08:31:47PM +, Arne Roland wrote: > I could take a backup and restore the relevant tables on a throwaway system. > You are just suggesting to replace line 728 > elog(FATAL, > "dsa_allocate could not find %zu free > pages", npages); > by

RE: dsa_allocate() faliure

2019-02-04 Thread Arne Roland
It's definitely a quite a relatively complex pattern. The query I set you last time was minimal with respect to predicates (so removing any single one of the predicates converted that one into a working query). > Huh. Ok well that's a lot more frequent that I thought. Is it always the > same q

Re: dsa_allocate() faliure

2019-02-04 Thread Thomas Munro
On Mon, Feb 4, 2019 at 6:52 PM Jakub Glapa wrote: > I see the error showing up every night on 2 different servers. But it's a bit > of a heisenbug because If I go there now it won't be reproducible. Huh. Ok well that's a lot more frequent that I thought. Is it always the same query? Any chanc

Re: dsa_allocate() faliure

2019-02-03 Thread Jakub Glapa
Hi Thomas, I was one of the reporter in the early Dec last year. I somehow dropped the ball and forgot about the issue. Anyhow I upgraded the clusters to pg11.1 and nothing changed. I also have a rule to coredump but a segfault does not happen while this is occurring. I see the error showing up eve

Re: dsa_allocate() faliure

2019-02-01 Thread Justin Pryzby
On Thu, Jan 31, 2019 at 06:19:54PM +, Arne Roland wrote: > this is reproducible, while it's highly sensitive to the change of plans > (i.e. the precise querys that do break change with every new analyze). > Disabling parallel query seems to solve the problem (as expected). > At some point eve

Re: dsa_allocate() faliure

2019-01-30 Thread Fabio Isabettini
Hi Thomas, it is a Production system and we don’t have permanent access to it. Also to have an auto_explain feature always on, is not an option in production. I will ask the customer to give us notice asap the error present itself to connect immediately and try to get a query plan. Regards Fabio

Re: dsa_allocate() faliure

2019-01-29 Thread Thomas Munro
On Tue, Jan 29, 2019 at 10:32 PM Fabio Isabettini wrote: > we are facing a similar issue on a Production system using a Postgresql 10.6: > > org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: > EXCEPTION on getstatistics_media ; ID: uidatareader. > run_query_media(2): [a1

Re: dsa_allocate() faliure

2019-01-29 Thread Fabio Isabettini
Hello, we are facing a similar issue on a Production system using a Postgresql 10.6: org.postgresql.util.PSQLException: ERROR: EXCEPTION on getstatistics ; ID: EXCEPTION on getstatistics_media ; ID: uidatareader. run_query_media(2): [a1] REMOTE FATAL: dsa_allocate could not find 7 free pages T

Re: dsa_allocate() faliure

2019-01-28 Thread Thomas Munro
On Tue, Jan 29, 2019 at 2:50 AM Arne Roland wrote: > does anybody have any idea what goes wrong here? Is there some additional > information that could be helpful? Hi Arne, This seems to be a bug; that error should not be reached. I wonder if it is a different manifestation of the bug reported

RE: dsa_allocate() faliure

2019-01-28 Thread Arne Roland
Hello, does anybody have any idea what goes wrong here? Is there some additional information that could be helpful? All the best Arne Roland

Re: dsa_allocate() faliure

2018-11-26 Thread Alvaro Herrera
On 2018-Nov-26, Jakub Glapa wrote: > Justin thanks for the information! > I'm running Ubuntu 16.04. > I'll try to prepare for the next crash. > Couldn't find anything this time. As I recall, the appport stuff in Ubuntu is terrible ... I've seen it take 40 minutes to write the crash dump to disk,

Re: dsa_allocate() faliure

2018-11-26 Thread Jakub Glapa
Justin thanks for the information! I'm running Ubuntu 16.04. I'll try to prepare for the next crash. Couldn't find anything this time. -- regards, Jakub Glapa On Mon, Nov 26, 2018 at 4:52 PM Justin Pryzby wrote: > Hi, thanks for following through. > > On Mon, Nov 26, 2018 at 04:38:35PM +0100,

Re: dsa_allocate() faliure

2018-11-26 Thread Justin Pryzby
Hi, thanks for following through. On Mon, Nov 26, 2018 at 04:38:35PM +0100, Jakub Glapa wrote: > I had a look at dmesg and indeed I see something like: > > postgres[30667]: segfault at 0 ip 557834264b16 sp 7ffc2ce1e030 > error 4 in postgres[557833db7000+6d5000] That's useful, I think "at

Re: dsa_allocate() faliure

2018-11-26 Thread Jakub Glapa
sorry, the message was sent out to early. So, the issue occurs only on production db an right now I cannot reproduce it. I had a look at dmesg and indeed I see something like: postgres[30667]: segfault at 0 ip 557834264b16 sp 7ffc2ce1e030 error 4 in postgres[557833db7000+6d5000] and AFAI

Re: dsa_allocate() faliure

2018-11-26 Thread Jakub Glapa
So, the issue occurs only on production db an right now I cannot reproduce it. I had a look at dmesg and indeed I see something like: -- regards, pozdrawiam, Jakub Glapa On Fri, Nov 23, 2018 at 5:10 PM Justin Pryzby wrote: > On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote: > > Hi

Re: dsa_allocate() faliure

2018-11-23 Thread Justin Pryzby
On Fri, Nov 23, 2018 at 03:31:41PM +0100, Jakub Glapa wrote: > Hi Justin, I've upgrade to 10.6 but the error still shows up: > > If I set it to max_parallel_workers=0 I also get and my connection is being > closed (but the server is alive): > > psql db@host as user => set max_parallel_workers=0;

Re: dsa_allocate() faliure

2018-11-23 Thread Jakub Glapa
Hi Justin, I've upgrade to 10.6 but the error still shows up: psql db@host as user => select version(); version ──

Re: dsa_allocate() faliure

2018-11-22 Thread Justin Pryzby
On Wed, Nov 21, 2018 at 03:26:42PM +0100, Jakub Glapa wrote: > Looks like my email didn't match the right thread: > https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com > Any chance to get some feedback on this? In the related thread

Re: dsa_allocate() faliure

2018-11-21 Thread Jakub Glapa
Looks like my email didn't match the right thread: https://www.postgresql.org/message-id/flat/CAMAYy4%2Bw3NTBM5JLWFi8twhWK4%3Dk_5L4nV5%2BbYDSPu8r4b97Zg%40mail.gmail.com Any chance to get some feedback on this? -- regards, Jakub Glapa On Tue, Nov 13, 2018 at 2:08 PM Jakub Glapa wrote: > Hi, I'm

Re: dsa_allocate() faliure

2018-11-13 Thread Jakub Glapa
Hi, I'm also experiencing the problem: dsa_allocate could not find 7 free pages CONTEXT: parallel worker I'm running: PostgreSQL 10.5 (Ubuntu 10.5-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit query plan: (select statement over pare

Re: dsa_allocate() faliure

2018-10-04 Thread Thomas Munro
On Wed, Aug 29, 2018 at 5:48 PM Sand Stone wrote: > I attached a query (and its query plan) that caused the crash: "dsa_allocate > could not find 13 free pages" on one of the worker nodes. I anonymised the > query text a bit. Interestingly, this time only one (same one) of the nodes > is crash

Re: dsa_allocate() faliure

2018-08-28 Thread Sand Stone
I attached a query (and its query plan) that caused the crash: "dsa_allocate could not find 13 free pages" on one of the worker nodes. I anonymised the query text a bit. Interestingly, this time only one (same one) of the nodes is crashing. Since this is a production environment, I cannot get the

Re: dsa_allocate() faliure

2018-08-25 Thread Sand Stone
>Can you still see the problem with Citus 7.4? Hi, Thomas. I actually went back to the cluster with Citus7.4 and PG10.4. And modified the parallel param. So far, I haven't seen any server crash. The main difference between crashes observed and no crash, is the set of Linux TCP time out parameters

Re: dsa_allocate() faliure

2018-08-15 Thread Thomas Munro
On Thu, Aug 16, 2018 at 8:32 AM, Sand Stone wrote: > Just as a follow up. I tried the parallel execution again (in a stress > test environment). Now the crash seems gone. I will keep an eye on > this for the next few weeks. Thanks for the report. That's great news, but it'd be good to understand

Re: dsa_allocate() faliure

2018-08-15 Thread Sand Stone
Just as a follow up. I tried the parallel execution again (in a stress test environment). Now the crash seems gone. I will keep an eye on this for the next few weeks. My theory is that the Citus cluster created and shut down a lot of TCP connections between coordinator and workers. If running on u

Re: dsa_allocate() faliure

2018-05-23 Thread Sand Stone
>> At which commit ID? 83fcc615020647268bb129cbf86f7661feee6412 (5/6) >>do you mean that these were separate PostgreSQL clusters, and they were all >>running the same query and they all crashed like this? A few worker nodes, a table is hash partitioned by "aTable.did" by Citus, and further partit

Re: dsa_allocate() faliure

2018-05-22 Thread Thomas Munro
On Wed, May 23, 2018 at 4:10 PM, Sand Stone wrote: >>>dsa_allocate could not find 7 free pages > I just this error message again on all of my worker nodes (I am using > Citus 7.4 rel). The PG core is my own build of release_10_stable > (10.4) out of GitHub on Ubuntu. At which commit ID? All of y

Re: dsa_allocate() faliure

2018-05-22 Thread Sand Stone
>>dsa_allocate could not find 7 free pages I just this error message again on all of my worker nodes (I am using Citus 7.4 rel). The PG core is my own build of release_10_stable (10.4) out of GitHub on Ubuntu. What's the best way to debug this? I am running pre-production tests for the next few da

Re: dsa_allocate() faliure

2018-01-29 Thread Rick Otten
If I do a "set max_parallel_workers_per_gather=0;" before I run the query in that session, it runs just fine. If I set it to 2, the query dies with the dsa_allocate error. I'll use that as a work around until 10.2 comes out. Thanks! I have something that will help. On Mon, Jan 29, 2018 at 3:52

Re: dsa_allocate() faliure

2018-01-29 Thread Thomas Munro
On Tue, Jan 30, 2018 at 5:37 AM, Tom Lane wrote: > Rick Otten writes: >> I'm wondering if there is anything I can tune in my PG 10.1 database to >> avoid these errors: > >> $ psql -f failing_query.sql >> psql:failing_query.sql:46: ERROR: dsa_allocate could not find 7 free pages >> CONTEXT: par

Re: dsa_allocate() faliure

2018-01-29 Thread Tom Lane
Rick Otten writes: > I'm wondering if there is anything I can tune in my PG 10.1 database to > avoid these errors: > $ psql -f failing_query.sql > psql:failing_query.sql:46: ERROR: dsa_allocate could not find 7 free pages > CONTEXT: parallel worker Hmm. There's only one place in the source c