Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-03-09 Thread 'George (Cloud Platform Support)' via Google App Engine
You assume that we have more access rights and somehow able to see more 
logs or other relevant information. This is not the case, and our 
investigation is based on the same logging that is accessible to you. This 
applies to the VM being shutdown on Monday 26th Feb at ~06:12 due to 
maintenance as well. 

When you deploy to the flexible environment, you define an image and 
configure it, before installing the app, so you control all deployment 
steps, in practice. 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/95a76d5b-73e6-4913-9175-54e77a98633a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-03-09 Thread Karl Tinawi
Hi George,

No. I don't agree that this is a valid conclusion without any investigation.

It's not the app that does not start properly, it's the VM that does not 
boot (!) Unless I'm missing something here?

The very first log entry in StackDriver as stated above is:

A  The instance aef-default-29--1-k42h with Debian based image was created for 
version 29-1 
A  == Unexpected error during VM startup == 


At this point no app code has been executed. The first line "The instance 
 was created for version ..." is what is logged when one clicks 'Start' 
on a stopped GAE instance. The VM dies immediately afterwards; this is *not* a 
code issue.

Not only was this behaviour intermittent, but to find that changing to the 
"new health checks" *without* changing any code resolved the issue 
instantly requires further explanation please. I don't accept that you 
continue to blame the code.

Please may I request once again that you look at the aforementioned issue 
of the VM being shutdown on Monday 26th Feb at ~06:12 due to maintenance 
without being live migrated first? We need to understand if this is 
"normal" behaviour or not.



On Friday, March 9, 2018 at 2:42:54 PM UTC, George (Cloud Platform Support) 
wrote:
>
> From the logging information provided, one is forced to conclude that your 
> app does not start properly, which attracts failures at health checks. It 
> is recommendable to thoroughly revise your code and identify the original 
> cause of start up failures. 
>
> As previously mentioned, for coding-related issues, stackoverflow or 
> similar forums offer the important advantage of getting you replies from 
> expert programmers. 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/1db27e2d-24a7-4757-bc3b-66d641910917%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-03-09 Thread 'George (Cloud Platform Support)' via Google App Engine
>From the logging information provided, one is forced to conclude that your 
app does not start properly, which attracts failures at health checks. It 
is recommendable to thoroughly revise your code and identify the original 
cause of start up failures. 

As previously mentioned, for coding-related issues, stackoverflow or 
similar forums offer the important advantage of getting you replies from 
expert programmers. 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/810de8ad-056d-4230-8065-9423b20f96c9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-03-09 Thread Karl Tinawi
I must add that after using the updated / revised healthcheck configuration 
in our yaml files we have not experienced the same issues in deployment.

This tells me that behaviour has changed outside of our application with 
undesired consequences.

Thanks,

Karl


On Friday, March 9, 2018 at 12:44:37 PM UTC, Karl Tinawi wrote:
>
> Hi George,
>
> Yes that's correct - I mentioned both issues as they were (in my view) 
> related somewhat.
>
> The relevant snipped in the log file pertaining to the timed-out 
> deployment is below (following successful build):
>
> 2018-02-27 23:12:20,489 DEBUGrootReceived operation: 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542]
> 2018-02-27 23:12:20,489 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:12:20,490 INFO ___FILE_ONLY___ Updating service 
> [default] (this may take several minutes)...
> 2018-02-27 23:12:21,669 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:12:27,827 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:12:33,253 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:12:39,161 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:12:44,540 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:12:50,597 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:12:56,205 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:02,243 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:07,426 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:12,910 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:18,292 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:24,153 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:29,641 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:35,097 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:41,127 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:47,087 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:52,316 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:13:58,013 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:14:03,792 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:14:09,905 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:14:15,471 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:14:21,112 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:14:27,082 DEBUGrootOperation 
> [apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
> not complete. Waiting to retry.
> 2018-02-27 23:14

Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-03-09 Thread Karl Tinawi
Hi George,

Yes that's correct - I mentioned both issues as they were (in my view) 
related somewhat.

The relevant snipped in the log file pertaining to the timed-out deployment 
is below (following successful build):

2018-02-27 23:12:20,489 DEBUGrootReceived operation: 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542]
2018-02-27 23:12:20,489 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:12:20,490 INFO ___FILE_ONLY___ Updating service [default] 
(this may take several minutes)...
2018-02-27 23:12:21,669 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:12:27,827 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:12:33,253 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:12:39,161 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:12:44,540 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:12:50,597 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:12:56,205 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:02,243 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:07,426 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:12,910 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:18,292 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:24,153 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:29,641 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:35,097 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:41,127 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:47,087 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:52,316 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:13:58,013 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:03,792 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:09,905 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:15,471 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:21,112 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:27,082 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:32,376 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:38,369 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:14:44,091 DEBUGrootOperation 
[apps/onkho-web-app-live/operations/550b30eb-403c-411a-999b-fa1599657542] 
not complete. Waiting to retry.
2018-02-27 23:

Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-03-03 Thread 'George (Cloud Platform Support)' via Google App Engine
Hi Karl, 

In your initial post on this thread, you wrote that you "experienced delays 
in deploying our PHP flexible app whereby "gcloud deploy" times-out waiting 
for the new instance to start". A proper start-up process is essential to 
instance management. You also mention making use of PHP extensions not on 
the list of the extensions implemented for the app engine.  What is the 
output of the gcloud app deploy command with the --verbosity debug 
parameter? This is significant when the deployment actually fails, so more 
insight may be gained in the reasons of failure. 

For coding-related issues, stackoverflow or similar forums offer the 
important advantage of getting you replies from expert programmers. 

-- 
You received this message because you are subscribed to the Google Groups 
"Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to google-appengine+unsubscr...@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/google-appengine/f409beee-0b03-41f0-8adf-2493c64e4507%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-03-02 Thread Karl Tinawi
Hi George,

On a successful boot of the VM we see the following:

A  --- 
A  The instance aef-default-30-sp8t with Debian based image was created for 
version 30 
A  Feb 28 23:07:05 aef-default-30-sp8t kernel: imklog 5.8.11, log source = 
/proc/kmsg started. 
A  Feb 28 23:07:05 aef-default-30-sp8t rsyslogd: [origin 
software="rsyslogd" swVersion="5.8.11" x-pid="2965" 
x-info="http://www.rsyslog.com";] start 
A  Feb 28 23:07:05 aef-default-30-sp8t kernel: [0.00] Initializing 
cgroup subsys cpuset 
A  Feb 28 23:07:05 aef-default-30-sp8t kernel: [0.00] Initializing 
cgroup subsys cpu 
A  Feb 28 23:07:05 aef-default-30-sp8t kernel: [0.00] Initializing 
cgroup subsys cpuacct
...




On Wednesday, February 28, 2018 at 10:55:44 PM UTC, Karl Tinawi wrote:
>
> Hi George,
>
> I was celebrating prematurely :( We're still facing the issue it seems. 
> I'm trying to deploy the same health check change to our production project.
>
> Please could you take a look at this log output? :
>
> I  App Engine CreateVersion default:30-1 k...@onkho.com 
> {"@type":"type.googleapis.com/google.cloud.audit.AuditLog","status":{},"authenticationInfo":{"principalEmail":"k...@onkho.com"},"requestMetadata":{"callerIp":"217.45.176.239","callerSuppliedUserAgent":"google-cloud-sdk
>  x_Tw5K8nnjoRAqULM9PFAC2b gcloud/190.0.1 command/gcloud.app.deploy 
> invocation-id/7… App Engine CreateVersion default:30-1 k...@onkho.com 
> A  The instance aef-default-30--1-m6mp with Debian based image was created 
> for version 30-1 
> A  == Unexpected error during VM startup == 
> A   Dump of VM runtime system logs follows  
> A  == Output of 'docker ps -a' == 
> A  CONTAINER IDIMAGE   COMMAND CREATED
>  STATUS  PORTS   NAMES 
> A  = rebooting.  
> A  VM shutdown initiated--- 
> A  Current app health: 0 
> A  Beginning service lameduck. 
> A  Triggering app shutdown handlers. 
> A  Sending SIGUSR1 to fluentd to trigger a log flush. 
> A  
> {"textPayload":"","insertId":"5kdt4ig2zxfdhb","resource":{"type":"gae_app","labels":{"project_id":"onkho-web-app-live","version_id":"30-1","module_id":"default"}},"timestamp":"2018-02-28T22:36:35.183183909Z","labels":{"compute.googleapis.com/resource_type":"instance","compute.googleapis.com/resource…
>  
> A  --- 
> A  --App was unhealthy, grabbing debug logs--- 
> A  Container CID file not found. 
> A  --- 
> A  -Tail of app logs-- 
> A  App logs not found. 
> A  --- 
> A  -Additional debug info-- 
> A  cat: /home/vmagent/onkho-web-app-live_30-1-407983590104907782.env: No such 
> file or directory 
> A  Instance has machine type of custom-4-8192 has 4 vCPU and 8192 Mb memory. 
> App can use  Mb memory. 
> A  VM memory consumption: 
> A   total   used   free sharedbuffers cached 
> A  Mem:  8005276   7728  0 14117 
> A  -/+ buffers/cache:144   7861 
> A  Swap:0  0  0 
> A  VM disk usage: 
> A  Filesystem Type 1K-blocksUsed Available Use% Mounted on 
> A  rootfs rootfs10188088 4544604   5102916  48% / 
> A  udev   devtmpfs 10240   0 10240   0% /dev 
> A  /dev/sda1  ext4  10188088 4544604   5102916  48% / 
> A  /dev/sda1  ext4  10188088 4544604   5102916  48% 
> /var/lib/docker/aufs 
> A  Processes running on the VM: 
> A  top - 22:36:35 up 0 min,  0 users,  load average: 0.48, 0.13, 0.04 
> A  Tasks:  95 total,   1 running,  94 sleeping,   0 stopped,   0 zombie 
> A  %Cpu(s):  2.1 us,  3.1 sy,  0.0 ni, 85.6 id,  9.1 wa,  0.0 hi,  0.0 si,  
> 0.0 st 
> A  KiB Mem:   8197596 total,   287356 used,  7910240 free,15352 buffers 
> A  KiB Swap:0 total,0 used,0 free,   120448 cached 
> A  
> {"textPayload":"","insertId":"5kdt4ig2zxfdi3","resource":{"type":"gae_app","labels":{"module_id":"default","project_id":"onkho-web-app-live","version_id":"30-1"}},"timestamp":"2018-02-28T22:36:35.183183937Z","labels":{"compute.googleapis.com/resource_type":"instance","compute.googleapis.com/resource…
>  
> APID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND 
> A  1 root  20   0 10664 1644 1508 S   0.0  0.0   0:00.82 init 
> A  2 root  20   0 000 S   0.0  0.0   0:00.00 kthreadd 
> A  3 root  20   0 000 S   0.0  0.0   0:00.04 ksoftirqd/0 
> A  4 root  20   0 000 S   0.0  0.0   0:00.00 kworker/0:0 
> A  5

Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-02-28 Thread Karl Tinawi
Hi George,

I was celebrating prematurely :( We're still facing the issue it seems. I'm 
trying to deploy the same health check change to our production project.

Please could you take a look at this log output? :

I  App Engine CreateVersion default:30-1 k...@onkho.com 
{"@type":"type.googleapis.com/google.cloud.audit.AuditLog","status":{},"authenticationInfo":{"principalEmail":"k...@onkho.com"},"requestMetadata":{"callerIp":"217.45.176.239","callerSuppliedUserAgent":"google-cloud-sdk
 x_Tw5K8nnjoRAqULM9PFAC2b gcloud/190.0.1 command/gcloud.app.deploy 
invocation-id/7… App Engine CreateVersion default:30-1 k...@onkho.com 
A  The instance aef-default-30--1-m6mp with Debian based image was created for 
version 30-1 
A  == Unexpected error during VM startup == 
A   Dump of VM runtime system logs follows  
A  == Output of 'docker ps -a' == 
A  CONTAINER IDIMAGE   COMMAND CREATED  
   STATUS  PORTS   NAMES 
A  = rebooting.  
A  VM shutdown initiated--- 
A  Current app health: 0 
A  Beginning service lameduck. 
A  Triggering app shutdown handlers. 
A  Sending SIGUSR1 to fluentd to trigger a log flush. 
A  
{"textPayload":"","insertId":"5kdt4ig2zxfdhb","resource":{"type":"gae_app","labels":{"project_id":"onkho-web-app-live","version_id":"30-1","module_id":"default"}},"timestamp":"2018-02-28T22:36:35.183183909Z","labels":{"compute.googleapis.com/resource_type":"instance","compute.googleapis.com/resource…
 
A  --- 
A  --App was unhealthy, grabbing debug logs--- 
A  Container CID file not found. 
A  --- 
A  -Tail of app logs-- 
A  App logs not found. 
A  --- 
A  -Additional debug info-- 
A  cat: /home/vmagent/onkho-web-app-live_30-1-407983590104907782.env: No such 
file or directory 
A  Instance has machine type of custom-4-8192 has 4 vCPU and 8192 Mb memory. 
App can use  Mb memory. 
A  VM memory consumption: 
A   total   used   free sharedbuffers cached 
A  Mem:  8005276   7728  0 14117 
A  -/+ buffers/cache:144   7861 
A  Swap:0  0  0 
A  VM disk usage: 
A  Filesystem Type 1K-blocksUsed Available Use% Mounted on 
A  rootfs rootfs10188088 4544604   5102916  48% / 
A  udev   devtmpfs 10240   0 10240   0% /dev 
A  /dev/sda1  ext4  10188088 4544604   5102916  48% / 
A  /dev/sda1  ext4  10188088 4544604   5102916  48% 
/var/lib/docker/aufs 
A  Processes running on the VM: 
A  top - 22:36:35 up 0 min,  0 users,  load average: 0.48, 0.13, 0.04 
A  Tasks:  95 total,   1 running,  94 sleeping,   0 stopped,   0 zombie 
A  %Cpu(s):  2.1 us,  3.1 sy,  0.0 ni, 85.6 id,  9.1 wa,  0.0 hi,  0.0 si,  0.0 
st 
A  KiB Mem:   8197596 total,   287356 used,  7910240 free,15352 buffers 
A  KiB Swap:0 total,0 used,0 free,   120448 cached 
A  
{"textPayload":"","insertId":"5kdt4ig2zxfdi3","resource":{"type":"gae_app","labels":{"module_id":"default","project_id":"onkho-web-app-live","version_id":"30-1"}},"timestamp":"2018-02-28T22:36:35.183183937Z","labels":{"compute.googleapis.com/resource_type":"instance","compute.googleapis.com/resource…
 
APID USER  PR  NI  VIRT  RES  SHR S  %CPU %MEMTIME+  COMMAND 
A  1 root  20   0 10664 1644 1508 S   0.0  0.0   0:00.82 init 
A  2 root  20   0 000 S   0.0  0.0   0:00.00 kthreadd 
A  3 root  20   0 000 S   0.0  0.0   0:00.04 ksoftirqd/0 
A  4 root  20   0 000 S   0.0  0.0   0:00.00 kworker/0:0 
A  5 root   0 -20 000 S   0.0  0.0   0:00.00 kworker/0:0H 
A  6 root  20   0 000 S   0.0  0.0   0:00.00 kworker/u8:0 
A  7 root  20   0 000 S   0.0  0.0   0:00.09 rcu_sched 
A  8 root  20   0 000 S   0.0  0.0   0:00.00 rcu_bh 
A  9 root  rt   0 000 S   0.0  0.0   0:00.00 migration/0 
A 10 root  rt   0 000 S   0.0  0.0   0:00.00 watchdog/0 
A 11 root  rt   0 000 S   0.0  0.0   0:00.00 watchdog/1 
A 12 root  rt   0 000 S   0.0  0.0   0:00.00 migration/1 
A 13 root  20   0 000 S   0.0  0.0   0:00.03 ksoftirqd/1 
A 14 root  20   0 000 S   0.0  0.0   0:00.00 kworker/1:0 
A 15 root   0 -20 000 S   0.0  0.0   0:00.00 kworker/1:0H 
A 16 root  rt   0 000 S   0.0  0.0   0:00.00 watchdog/2 
A 17 root  rt   0 000 S   0.0  0.0   0

Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-02-28 Thread Karl Tinawi
Hi George,

It's definitely not the code or our app's startup process as far as I'm 
aware because this issue is before our docker image is even pulled or any 
code of ours executed. The logs should show that clearly.

The change I applied as per your suggested relating to the health checks 
did not involve any code change whatsoever and yet the VM starts normally 
now. The only difference being that I now see the new health check requests 
in the logs during the startup process.

I'll continue testing and reply with any additional findings.

I'd be grateful if you could shed some more light on the incident in 
question as I still don't have an answer as to what happened.



On Wednesday, February 28, 2018 at 9:40:49 PM UTC, George (Cloud Platform 
Support) wrote:
>
> Hi Karl, 
>
> If your app starts up without issues, the health checks would not matter. 
> There are issues with your app's start up process, that you need to 
> address. For coding-related issues, you are at an advantage posting to 
> stackoverflow or similar forums, where you can get expert help from active 
> programmers. 
>
> On 28 February 2018 at 16:23, Karl Tinawi > 
> wrote:
>
>> Hi George,
>>
>> I was able to do a test deployment tonight by defining the new health 
>> checks as you recommended.
>>
>> Before I continue - my test for this based on performing a deployment as 
>> we see the exact same behaviour there with the VM starting up and crashing 
>> as we did with the incident on Monday.
>>
>> The good news is this has seemingly completely resolved our deployment 
>> issue - in that they are once again successful in a reasonable amount of 
>> time, rather than timing out and failing because of the aforementioned. So 
>> at this point I'm semi confident that it has also resolved the issue we 
>> experienced on Monday when the VM was restarted and couldn't start up 
>> again. Difficult to prove this one currently from my side.
>>
>> I replaced the legacy health_check block in our .yaml file with the 
>> following:-
>>
>> liveness_check:
>> path: "/_ah/health"
>> initial_delay_sec: 300
>> check_interval_sec: 5
>> timeout_sec: 5
>> failure_threshold: 3
>> success_threshold: 1
>>
>> readiness_check:
>> path: "/login"
>> app_start_timeout_sec: 300
>> check_interval_sec: 30
>> timeout_sec: 5
>> failure_threshold: 3
>> success_threshold: 1
>>
>>
>> The most obvious question I have at this point, is why? Why would this 
>> resolve it the issue? I can only guess that this could be related to the 
>> new style health/liveness checks being enabled by default but we had not 
>> executed:
>>
>> gcloud beta app update --split-health-checks --project [YOUR_PROJECT_ID]
>>
>>
>> or provided the liveness_check/readiness_check blocks in our yaml file? 
>> I've only just learnt about these new updated health checks here 
>> 
>>  as 
>> it's not something we keep up-to-date with once we have a desired 
>> configuration so am concerned that there was a backwards compatibility 
>> issue here.
>>
>> I'm performing a couple more deployments to satisfy myself that this is 
>> not a fluke.
>>
>> As a side question I see these entries in our logs now since activating 
>> the new health checks:
>>
>>
>> 
>>
>>
>> 
>>
>> These don't seem to be obeying the configuration I had defined (as per 
>> above code snippets). Most notably the path and interval?
>>
>> I'd like to learn if I'm doing anything wrong here or if there is an 
>> explanation.
>>
>> Many thanks again and looking forward to hearing from you.
>>
>> Karl
>>
>>
>>
>> On Wednesday, February 28, 2018 at 8:12:09 PM UTC, Karl Tinawi wrote:
>>>
>>> Hi George,
>>>
>>> Yes that's correct - it's happened once outside of deployments.
>>>
>>> To answer your questions sir:
>>>
>>>- We require a custom PHP installation in order to make use of 
>>>modules that are missing from Google's offering. I've not checked the 
>>>latest list of extension but it may be that we may be able to move back 
>>> to 
>>>using the standard PHP image so I'll check this for sure.
>>>- Scaling is another challenge that we're looking and we're 
>>>certainly aware that we need to move to auto scaling for contingency 
>>> etc...
>>>- I'll test configuring the readiness check and report back if we 
>>>notice any difference in behaviour.
>>>
>>> Were the logs helpful? I'd be grateful if you could shed some light on 
>>> the investigation your end. This is the first time we've noticed an issue 
>>> such as this during the maintenance process, which should be i

Re: [google-appengine] Re: Flexible environment "Unexpected error during VM startup" in Production folowing "periodic restart of VM version."

2018-02-28 Thread 'George Suceveanu' via Google App Engine
Hi Karl,

If your app starts up without issues, the health checks would not matter.
There are issues with your app's start up process, that you need to
address. For coding-related issues, you are at an advantage posting to
stackoverflow or similar forums, where you can get expert help from active
programmers.

On 28 February 2018 at 16:23, Karl Tinawi  wrote:

> Hi George,
>
> I was able to do a test deployment tonight by defining the new health
> checks as you recommended.
>
> Before I continue - my test for this based on performing a deployment as
> we see the exact same behaviour there with the VM starting up and crashing
> as we did with the incident on Monday.
>
> The good news is this has seemingly completely resolved our deployment
> issue - in that they are once again successful in a reasonable amount of
> time, rather than timing out and failing because of the aforementioned. So
> at this point I'm semi confident that it has also resolved the issue we
> experienced on Monday when the VM was restarted and couldn't start up
> again. Difficult to prove this one currently from my side.
>
> I replaced the legacy health_check block in our .yaml file with the
> following:-
>
> liveness_check:
> path: "/_ah/health"
> initial_delay_sec: 300
> check_interval_sec: 5
> timeout_sec: 5
> failure_threshold: 3
> success_threshold: 1
>
> readiness_check:
> path: "/login"
> app_start_timeout_sec: 300
> check_interval_sec: 30
> timeout_sec: 5
> failure_threshold: 3
> success_threshold: 1
>
>
> The most obvious question I have at this point, is why? Why would this
> resolve it the issue? I can only guess that this could be related to the
> new style health/liveness checks being enabled by default but we had not
> executed:
>
> gcloud beta app update --split-health-checks --project [YOUR_PROJECT_ID]
>
>
> or provided the liveness_check/readiness_check blocks in our yaml file?
> I've only just learnt about these new updated health checks here
> 
>  as
> it's not something we keep up-to-date with once we have a desired
> configuration so am concerned that there was a backwards compatibility
> issue here.
>
> I'm performing a couple more deployments to satisfy myself that this is
> not a fluke.
>
> As a side question I see these entries in our logs now since activating
> the new health checks:
>
>
> 
>
>
> 
>
> These don't seem to be obeying the configuration I had defined (as per
> above code snippets). Most notably the path and interval?
>
> I'd like to learn if I'm doing anything wrong here or if there is an
> explanation.
>
> Many thanks again and looking forward to hearing from you.
>
> Karl
>
>
>
> On Wednesday, February 28, 2018 at 8:12:09 PM UTC, Karl Tinawi wrote:
>>
>> Hi George,
>>
>> Yes that's correct - it's happened once outside of deployments.
>>
>> To answer your questions sir:
>>
>>- We require a custom PHP installation in order to make use of
>>modules that are missing from Google's offering. I've not checked the
>>latest list of extension but it may be that we may be able to move back to
>>using the standard PHP image so I'll check this for sure.
>>- Scaling is another challenge that we're looking and we're certainly
>>aware that we need to move to auto scaling for contingency etc...
>>- I'll test configuring the readiness check and report back if we
>>notice any difference in behaviour.
>>
>> Were the logs helpful? I'd be grateful if you could shed some light on
>> the investigation your end. This is the first time we've noticed an issue
>> such as this during the maintenance process, which should be innocuous and
>> invisible to us.
>>
>> At this point I'm unsure if the issues we face during deployments are
>> related to the incident that happened with our running app, which continue
>> to occur daily. It's worth noting that the behaviour of the VM is identical
>> (in the way of the abrupt restarts as it's trying to boot). I may look at
>> trying a test deployment using another image and seeing if that helps.
>>
>>
>> Many thanks again,
>>
>> Karl
>>
>>
>> On Wednesday, February 28, 2018 at 12:52:06 AM UTC, George (Cloud
>> Platform Support) wrote:
>>>
>>> Hello Karl,
>>>
>>> You seem to indicate that the outage is a one-time event, and that there
>>> is no other similar occurrence as yet. If this is so, to prevent similar
>>> unwanted events in future, you may configure your app for health checks, in
>>> detail. For reference, the "Configuring your App with app.yaml" should
>>> prove of great help. In your app.yaml, you