Re: Hawkular metrics returns Forbiden

2017-09-28 Thread Andrew Lau
Did you find any solution for this?

On Fri, 15 Sep 2017 at 01:34 Mateus Caruccio 
wrote:

> Yep, there it is:
>
> [OSEv3:children]
> masters
> etcd
> nodes
>
> [OSEv3:vars]
> deployment_type=origin
> openshift_release=v3.6
> debug_level=1
> openshift_debug_level=1
> openshift_node_debug_level=1
> openshift_master_debug_level=1
> openshift_master_access_token_max_seconds=2419200
> osm_cluster_network_cidr=172.16.0.0/16
> openshift_registry_selector="docker-registry=true"
> openshift_hosted_registry_replicas=1
>
> openshift_master_cluster_hostname=api-cluster.example.com.br
> openshift_master_cluster_public_hostname=api-cluster.example.com.br
> osm_default_subdomain=example.com.br
> openshift_master_default_subdomain=example.com.br
> osm_default_node_selector="role=app"
> os_sdn_network_plugin_name=redhat/openshift-ovs-multitenant
> openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login':
> 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider',
> 'filename': '/etc/origin/master/htpasswd'}]
> osm_use_cockpit=false
> containerized=False
>
> openshift_master_cluster_method=native
> openshift_master_console_port=443
> openshift_master_api_port=443
>
> openshift_master_overwrite_named_certificates=true
> openshift_master_named_certificates=[{"certfile":"{{lookup('env','PWD')}}/certs/wildcard.example.com.br.crt","keyfile":"{{lookup('env','PWD')}}/certs/wildcard.example.com.br.key",
> "cafile":"{{lookup('env','PWD')}}/certs/wildcard.example.com.br.int.crt"}]
>
> openshift_master_session_auth_secrets=['F71uoyI/Tkv/LiDH2PiFKK1o76bLoH10+uE2a']
>
> openshift_master_session_encryption_secrets=['bjDwQfiy4ksB/3qph87BGulYb/GUho6K']
> openshift_master_audit_config={"enabled": true, "auditFilePath":
> "/var/log/openshift-audit/openshift-audit.log", "maximumFileRetentionDays":
> 30, "maximumFileSizeMegabytes": 500, "maximumRetainedFiles": 10}
>
> openshift_ca_cert_expire_days=1825
> openshift_node_cert_expire_days=730
> openshift_master_cert_expire_days=730
> etcd_ca_default_days=1825
>
> openshift_hosted_router_create_certificate=false
> openshift_hosted_manage_router=true
> openshift_router_selector="role=infra"
> openshift_hosted_router_replicas=2
> openshift_hosted_router_certificate={"certfile":"{{lookup('env','PWD')}}/certs/wildcard.example.com.br.crt","keyfile":"{{lookup('env','PWD')}}/certs/wildcard.example.com.br.key",
> "cafile":"{{lookup('env','PWD')}}/certs/wildcard.example.com.br.int.crt"}
>
> openshift_hosted_metrics_deploy=true
> openshift_hosted_metrics_public_url=
> https://hawkular-metrics.example.com.br/hawkular/metrics
>
> openshift_hosted_logging_deploy=true
> openshift_hosted_logging_hostname=kibana.example.com.br
>
> openshift_install_examples=true
>
> openshift_node_kubelet_args={'pods-per-core': ['20'], 'max-pods': ['100'],
> 'image-gc-high-threshold': ['80'], 'image-gc-low-threshold':
> ['50'],'minimum-container-ttl-duration': ['60s'],
> 'maximum-dead-containers-per-container': ['1'], 'maximum-dead-containers':
> ['15']}
>
> logrotate_scripts=[{"name": "syslog", "path":
> "/var/log/cron\n/var/log/maillog\n/var/log/messages\n/var/log/secure\n/var/log/spooler\n",
> "options": ["daily", "rotate 7", "compress", "sharedscripts", "missingok"],
> "scripts": {"postrotate": "/bin/kill -HUP `cat /var/run/syslogd.pid 2>
> /dev/null` 2> /dev/null || true"}}]
>
> openshift_builddefaults_image_labels=[{'name':'builder','value':'true'}]
> openshift_builddefaults_nodeselectors={'builder':'true'}
> openshift_builddefaults_annotations={'builder':'true'}
> openshift_builddefaults_resources_requests_cpu=10m
> openshift_builddefaults_resources_requests_memory=128Mi
> openshift_builddefaults_resources_limits_cpu=500m
> openshift_builddefaults_resources_limits_memory=2Gi
>
> openshift_upgrade_nodes_serial=1
> openshift_upgrade_nodes_max_fail_percentage=0
> openshift_upgrade_control_plane_nodes_serial=1
> openshift_upgrade_control_plane_nodes_max_fail_percentage=0
>
> openshift_disable_check=disk_availability,memory_availability
>
> [masters]
> e001vmov40p42
> e001vmov40p51
> e001vmov40p52
>
> [etcd]
> e001vmov40p42
> e001vmov40p51
> e001vmov40p52
>
> [nodes]
> e001vmov40p42 openshift_node_labels="{'role': 'master'}"
> e001vmov40p51 openshift_node_labels="{'role': 'master'}"
> e001vmov40p52 openshift_node_labels="{'role': 'master'}"
>
> e001vmov40p45 openshift_node_labels="{'role': 'infra',
> 'docker-registry':'true', 'logging':'true'}"
> e001vmov40p46 openshift_node_labels="{'role': 'infra', 'metrics': 'true'}"
>
> e001vmov40p47 openshift_node_labels="{'role': 'app', 'builder': 'true'}"
> e001vmov40p48 openshift_node_labels="{'role': 'app', 'builder': 'true'}"
> e001vmov40p49 openshift_node_labels="{'role': 'app', 'builder': 'true'}"
>
>
>
>
>
> --
> Mateus Caruccio / Master of Puppets
> GetupCloud.com
> We make the infrastructure invisible
> Gartner Cool Vendor 2017
>
> 2017-09-14 10:13 GMT-03:00 Matthew Wringe :
>
>> We had an issue 

Re: OpenShift Origin v3.6.0 is released

2017-07-31 Thread Andrew Lau
Does that not prevent pods from booting? At least in my case when I was
building some test RPMs, pods were unable to start up with an error "cni
config unintialized".

On Tue, 1 Aug 2017 at 13:21 Clayton Coleman <ccole...@redhat.com> wrote:

> Yes, that'll probably have to be a point change.
>
> On Mon, Jul 31, 2017 at 11:02 PM, Andrew Lau <and...@andrewklau.com>
> wrote:
>
>> I think the node images are still missing the sdn-ovs package.
>>
>> On Tue, 1 Aug 2017 at 07:45 Clayton Coleman <ccole...@redhat.com> wrote:
>>
>>> This has been fixed and images were repushed.
>>>
>>> On Mon, Jul 31, 2017 at 12:13 PM, Clayton Coleman <ccole...@redhat.com>
>>> wrote:
>>>
>>>> Hrm, so they do.  Looking.
>>>>
>>>> On Mon, Jul 31, 2017 at 11:57 AM, Andrew Lau <and...@andrewklau.com>
>>>> wrote:
>>>>
>>>>> The images still seem to be using rc0 packages.
>>>>>
>>>>> rpm -qa | grep origin
>>>>> origin-clients-3.6.0-0.rc.0.359.de23676.x86_64
>>>>> origin-3.6.0-0.rc.0.359.de23676.x86_64
>>>>>
>>>>> I also had a PR which just got merged for a missing package in the
>>>>> node image https://github.com/openshift/origin/pull/15542
>>>>>
>>>>> On Tue, 1 Aug 2017 at 01:34 Clayton Coleman <ccole...@redhat.com>
>>>>> wrote:
>>>>>
>>>>>> https://github.com/openshift/origin/releases/tag/v3.6.0
>>>>>>
>>>>>> Images are pushed to the hub.  Thanks to everyone for their hard work
>>>>>> this release.  Expect official RPMs in a few days.  Remember to use the
>>>>>> Ansible release-3.6 branch for your installs.
>>>>>> ___
>>>>>> dev mailing list
>>>>>> dev@lists.openshift.redhat.com
>>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>>>>
>>>>>
>>>>
>>>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: OpenShift Origin v3.6.0 is released

2017-07-31 Thread Andrew Lau
I think the node images are still missing the sdn-ovs package.

On Tue, 1 Aug 2017 at 07:45 Clayton Coleman <ccole...@redhat.com> wrote:

> This has been fixed and images were repushed.
>
> On Mon, Jul 31, 2017 at 12:13 PM, Clayton Coleman <ccole...@redhat.com>
> wrote:
>
>> Hrm, so they do.  Looking.
>>
>> On Mon, Jul 31, 2017 at 11:57 AM, Andrew Lau <and...@andrewklau.com>
>> wrote:
>>
>>> The images still seem to be using rc0 packages.
>>>
>>> rpm -qa | grep origin
>>> origin-clients-3.6.0-0.rc.0.359.de23676.x86_64
>>> origin-3.6.0-0.rc.0.359.de23676.x86_64
>>>
>>> I also had a PR which just got merged for a missing package in the node
>>> image https://github.com/openshift/origin/pull/15542
>>>
>>> On Tue, 1 Aug 2017 at 01:34 Clayton Coleman <ccole...@redhat.com> wrote:
>>>
>>>> https://github.com/openshift/origin/releases/tag/v3.6.0
>>>>
>>>> Images are pushed to the hub.  Thanks to everyone for their hard work
>>>> this release.  Expect official RPMs in a few days.  Remember to use the
>>>> Ansible release-3.6 branch for your installs.
>>>> ___
>>>> dev mailing list
>>>> dev@lists.openshift.redhat.com
>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>>
>>>
>>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: OpenShift Origin v3.6.0 is released

2017-07-31 Thread Andrew Lau
The images still seem to be using rc0 packages.

rpm -qa | grep origin
origin-clients-3.6.0-0.rc.0.359.de23676.x86_64
origin-3.6.0-0.rc.0.359.de23676.x86_64

I also had a PR which just got merged for a missing package in the node
image https://github.com/openshift/origin/pull/15542

On Tue, 1 Aug 2017 at 01:34 Clayton Coleman  wrote:

> https://github.com/openshift/origin/releases/tag/v3.6.0
>
> Images are pushed to the hub.  Thanks to everyone for their hard work this
> release.  Expect official RPMs in a few days.  Remember to use the Ansible
> release-3.6 branch for your installs.
> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Origin v1.5.1 has been released

2017-05-17 Thread Andrew Lau
Cheers

On Thu, 18 May 2017 at 15:00 Clayton Coleman <ccole...@redhat.com> wrote:

> Check now
>
> On Thu, May 18, 2017 at 12:10 AM, Andrew Lau <and...@andrewklau.com>
> wrote:
>
>> Will metrics get their tags bumped to v1.5.1?
>>
>> On Wed, 17 May 2017 at 07:48 Clayton Coleman <ccole...@redhat.com> wrote:
>>
>>> Two bugs were addressed in this release, please see the GitHub release
>>> page for more info:
>>>
>>> https://github.com/openshift/origin/releases/tag/v1.5.1
>>> ___
>>> dev mailing list
>>> dev@lists.openshift.redhat.com
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>
>>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Origin v1.5.1 has been released

2017-05-17 Thread Andrew Lau
Will metrics get their tags bumped to v1.5.1?

On Wed, 17 May 2017 at 07:48 Clayton Coleman  wrote:

> Two bugs were addressed in this release, please see the GitHub release
> page for more info:
>
> https://github.com/openshift/origin/releases/tag/v1.5.1
> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Pods sometimes deployed without internal network connectivity

2017-05-16 Thread Andrew Lau
I have an issue open here https://github.com/openshift/origin/issues/14092
about pods sometimes getting deployed without internal network connectivity
(using multinenant plugin). They still seem to have outbound network
connectivity.

I believe these may be the relevant logs, I was wondering if anyone had any
possible ideas why this may be happening?

Thanks
ovs-vsctl[127682]: ovs|1|vsctl|INFO|Called as ovs-vsctl --if-exists 
del-port veth00735fc5
device veth00735fc5 left promiscuous mode
DHCPDISCOVER on veth58a37e85 to 255.255.255.255 port 67 interval 11 
(xid=0x268e1510)
ovs-vsctl[127712]: ovs|1|db_ctl_base|ERR|no row "veth00735fc5" in table Port
ovs-vsctl[127717]: ovs|1|vsctl|INFO|Called as ovs-vsctl --if-exists 
del-port veth00735fc5
dockerd-current[6361]: W0513 09:23:06.195544   14242 multitenant.go:156] 
refcounting error on vnid 14177148
origin-node[14164]: W0513 09:23:06.195544   14242 multitenant.go:156] 
refcounting error on vnid 14177148




dockerd-current[6361]: E0513 09:01:27.370479   14242 docker_manager.go:378] 
NetworkPlugin cni failed on the status hook for pod 'auth-139-ljc8r' - 
Unexpected command output Device "eth0" does not exist.
dockerd-current[6361]:  with error: exit status 1
origin-node[14164]: E0513 09:01:27.370479   14242 docker_manager.go:378] 
NetworkPlugin cni failed on the status hook for pod 'auth-139-ljc8r' - 
Unexpected command output Device "eth0" does not exist.
origin-node[14164]: with error: exit status 1
kernel: IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
NetworkManager[753]:   [1494630087.4540] manager: (vethf0daeab9): new 
Veth device (/org/freedesktop/NetworkManager/Devices/4664)
NetworkManager[753]:   [1494630087.4562] device (vethf0daeab9): link 
connected
NetworkManager[753]:   [1494630087.4563] device (vethf0daeab9): state 
change: unmanaged -> unavailable (reason 'managed') [10 20 2]
NetworkManager[753]:   [1494630087.4600] ifcfg-rh: add connection 
in-memory (c9cf445a-4141-3e70-a65c-87a3588f6c49,"Wired connection 11")
NetworkManager[753]:   [1494630087.4608] settings: (vethf0daeab9): 
created default wired connection 'Wired connection 11'
NetworkManager[753]:   [1494630087.4637] device (vethf0daeab9): state 
change: unavailable -> disconnected (reason 'none') [20 30 0]
NetworkManager[753]:   [1494630087.4677] policy: auto-activating 
connection 'Wired connection 11'
NetworkManager[753]:   [1494630087.4704] device (vethf0daeab9): 
Activation: starting connection 'Wired connection 11' 
(c9cf445a-4141-3e70-a65c-87a3588f6c49)
NetworkManager[753]:   [1494630087.4706] device (vethf0daeab9): state 
change: disconnected -> prepare (reason 'none') [30 40 0]
NetworkManager[753]:   [1494630087.4714] device (vethf0daeab9): state 
change: prepare -> config (reason 'none') [40 50 0]
ovs-vsctl[101727]: ovs|1|vsctl|INFO|Called as ovs-vsctl add-port br0 
vethf0daeab9
NetworkManager[753]:   [1494630087.4724] device (vethf0daeab9): state 
change: config -> ip-config (reason 'none') [50 70 0]
NetworkManager[753]:   [1494630087.4727] dhcp4 (vethf0daeab9): 
activation: beginning transaction (timeout in 45 seconds)
kernel: device vethf0daeab9 entered promiscuous mode
NetworkManager[753]:   [1494630087.5142] dhcp4 (vethf0daeab9): dhclient 
started with pid 101729
NetworkManager[753]:   [1494630087.5176] device (vethf0daeab9): enslaved 
to non-master-type device ovs-system; ignoring
dhclient[101729]: DHCPDISCOVER on vethf0daeab9 to 255.255.255.255 port 67 
interval 5 (xid=0xe7661df)
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Critical Routing Bug v1.5.0

2017-04-22 Thread Andrew Lau
I believe this is a significant bug that needs attention
https://github.com/openshift/origin/issues/13862
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Node report OK but every pod marked unready

2017-04-20 Thread Andrew Lau
Thanks! Hopefully we don't hit this too much until 1.5.0 is released

On Fri, 21 Apr 2017 at 01:26 Patrick Tescher <patr...@outtherelabs.com>
wrote:

> We upgraded to 1.5.0 and that error went away.
>
> --
> Patrick Tescher
>
> On Apr 19, 2017, at 10:59 PM, Andrew Lau <and...@andrewklau.com> wrote:
>
> thin_ls has been happening for quite some time
> https://github.com/openshift/origin/issues/10940
>
> On Thu, 20 Apr 2017 at 15:55 Tero Ahonen <taho...@redhat.com> wrote:
>
>> It seems that error is related to docker storage on that vm
>>
>> .t
>>
>> Sent from my iPhone
>>
>> On 20 Apr 2017, at 8.53, Andrew Lau <and...@andrewklau.com> wrote:
>>
>> Unfortunately I did not. I dumped the logs and just removed the node in
>> order to quickly restore the current containers on another node.
>>
>> At the exact time it failed I saw a lot of the following:
>>
>> ===
>> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher:
>> error performing thin_ls on metadata device
>> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls
>> --no-headers -m -o DEV,
>> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127
>>
>> failed (failure): rpc error: code = 2 desc = shim error: context deadline
>> exceeded#015
>>
>> Error running exec in container: rpc error: code = 2 desc = shim error:
>> context deadline exceeded
>> ===
>>
>> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212
>>
>>
>> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen <taho...@redhat.com> wrote:
>>
>>> Hi
>>>
>>> Did u try to ssh to that node and execute sudo docker run to some
>>> container?
>>>
>>> .t
>>>
>>> Sent from my iPhone
>>>
>>> > On 20 Apr 2017, at 8.18, Andrew Lau <and...@andrewklau.com> wrote:
>>> >
>>> > I'm trying to debug a weird scenario where a node has had every pod
>>> crash with the error:
>>> > "rpc error: code = 2 desc = shim error: context deadline exceeded"
>>> >
>>> > The pods stayed in the state Ready 0/1
>>> > The docker daemon was responding and the kublet and all it's services
>>> were running. The node was reporting with the OK status.
>>> >
>>> > No resource limits were hit with CPU almost idle and memory at 25%
>>> utilisation.
>>> >
>>> >
>>> >
>>> >
>>> > ___
>>> > users mailing list
>>> > us...@lists.openshift.redhat.com
>>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>>
>> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Node report OK but every pod marked unready

2017-04-20 Thread Andrew Lau
thin_ls has been happening for quite some time
https://github.com/openshift/origin/issues/10940

On Thu, 20 Apr 2017 at 15:55 Tero Ahonen <taho...@redhat.com> wrote:

> It seems that error is related to docker storage on that vm
>
> .t
>
> Sent from my iPhone
>
> On 20 Apr 2017, at 8.53, Andrew Lau <and...@andrewklau.com> wrote:
>
> Unfortunately I did not. I dumped the logs and just removed the node in
> order to quickly restore the current containers on another node.
>
> At the exact time it failed I saw a lot of the following:
>
> ===
> thin_pool_watcher.go:72] encountered error refreshing thin pool watcher:
> error performing thin_ls on metadata device
> /dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls
> --no-headers -m -o DEV,
> EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127
>
> failed (failure): rpc error: code = 2 desc = shim error: context deadline
> exceeded#015
>
> Error running exec in container: rpc error: code = 2 desc = shim error:
> context deadline exceeded
> ===
>
> Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212
>
>
> On Thu, 20 Apr 2017 at 15:41 Tero Ahonen <taho...@redhat.com> wrote:
>
>> Hi
>>
>> Did u try to ssh to that node and execute sudo docker run to some
>> container?
>>
>> .t
>>
>> Sent from my iPhone
>>
>> > On 20 Apr 2017, at 8.18, Andrew Lau <and...@andrewklau.com> wrote:
>> >
>> > I'm trying to debug a weird scenario where a node has had every pod
>> crash with the error:
>> > "rpc error: code = 2 desc = shim error: context deadline exceeded"
>> >
>> > The pods stayed in the state Ready 0/1
>> > The docker daemon was responding and the kublet and all it's services
>> were running. The node was reporting with the OK status.
>> >
>> > No resource limits were hit with CPU almost idle and memory at 25%
>> utilisation.
>> >
>> >
>> >
>> >
>> > ___
>> > users mailing list
>> > us...@lists.openshift.redhat.com
>> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Node report OK but every pod marked unready

2017-04-19 Thread Andrew Lau
Unfortunately I did not. I dumped the logs and just removed the node in
order to quickly restore the current containers on another node.

At the exact time it failed I saw a lot of the following:

===
thin_pool_watcher.go:72] encountered error refreshing thin pool watcher:
error performing thin_ls on metadata device
/dev/mapper/docker_vg-docker--pool_tmeta: Error running command `thin_ls
--no-headers -m -o DEV,
EXCLUSIVE_BYTES /dev/mapper/docker_vg-docker--pool_tmeta`: exit status 127

failed (failure): rpc error: code = 2 desc = shim error: context deadline
exceeded#015

Error running exec in container: rpc error: code = 2 desc = shim error:
context deadline exceeded
===

Seems to match https://bugzilla.redhat.com/show_bug.cgi?id=1427212


On Thu, 20 Apr 2017 at 15:41 Tero Ahonen <taho...@redhat.com> wrote:

> Hi
>
> Did u try to ssh to that node and execute sudo docker run to some
> container?
>
> .t
>
> Sent from my iPhone
>
> > On 20 Apr 2017, at 8.18, Andrew Lau <and...@andrewklau.com> wrote:
> >
> > I'm trying to debug a weird scenario where a node has had every pod
> crash with the error:
> > "rpc error: code = 2 desc = shim error: context deadline exceeded"
> >
> > The pods stayed in the state Ready 0/1
> > The docker daemon was responding and the kublet and all it's services
> were running. The node was reporting with the OK status.
> >
> > No resource limits were hit with CPU almost idle and memory at 25%
> utilisation.
> >
> >
> >
> >
> > ___
> > users mailing list
> > us...@lists.openshift.redhat.com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/users
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Node report OK but every pod marked unready

2017-04-19 Thread Andrew Lau
I'm trying to debug a weird scenario where a node has had every pod crash
with the error:
"rpc error: code = 2 desc = shim error: context deadline exceeded"

The pods stayed in the state Ready 0/1
The docker daemon was responding and the kublet and all it's services were
running. The node was reporting with the OK status.

No resource limits were hit with CPU almost idle and memory at 25%
utilisation.
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Backport requests?

2017-04-17 Thread Andrew Lau
Hi Brad,

Yeah 1.5  Somehow we ended up going offlist and we did find out it was
already backported.

On the topic of AWS related issues, I do have an issue open about EBS
volumes not unmonting when their pod is evicted
https://github.com/openshift/origin/issues/13611


On Tue, 18 Apr 2017 at 03:20 Brad Childs <bchi...@redhat.com> wrote:

> Andrew,
>
> The backport request is for Origin 1.5?
>
> There are a couple AWS related issues we're working on right now for OSE
> which may include this fix.  I will respond with more info when we have
> it.  Adding Hemant as he's running the AWS patches.
>
> -bc
>
>
>
> On Fri, Apr 14, 2017 at 11:37 AM, Clayton Coleman <ccole...@redhat.com>
> wrote:
>
>> I am surprised that hasn't already been backported.  Copying some folks
>> who may know.
>>
>> On Apr 13, 2017, at 9:48 PM, Andrew Lau <and...@andrewklau.com> wrote:
>>
>> Hi,
>>
>> What are the chances for having something like this backported to 1.5?
>>
>> https://github.com/kubernetes/kubernetes/commit/9992bd23c2aa12db432696fd324e900770251dc0
>>
>> I've been seeing this happen every few weeks where one EBS volume gets
>> stuck forever in a pending state. Usually after a node has been evacuated.
>>
>> ___
>> dev mailing list
>> dev@lists.openshift.redhat.com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>
>>
>> ___
>> dev mailing list
>> dev@lists.openshift.redhat.com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>
>>
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Erased pvc disk

2017-04-10 Thread Andrew Lau
Had a similar thing happen an EBS volume last year. Haven't been able to
replicate it since.
Happened when a node was overloaded and couldn't report back it's status,
my best guess was it tried to mount onto another node and some sort of race
condition wiped the contents.

On Fri, 7 Apr 2017 at 22:38 Mateus Caruccio 
wrote:

> Is it possible for this line to run while a PVC is still mounted?
>
>
> https://github.com/openshift/origin/blob/7558d75e1b677c019259136a73abbd625591f5ed/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go#L2123
>
> I got an entire disk erased, with no FS/ceph corruption indications, and
> tons of the following messages:
>
> I0401 19:05:03.8045641422 kubelet.go:2117] Failed to remove orphaned
> pod "2b46c157-16e5-11e7-9f74-000d3ac02da0" dir; err: remove
> /var/lib/docker/openshift.local.volumes/pods/2b46c157-16e5-11e7-9f74-000d3ac02da0/volumes/
> kubernetes.io~rbd/ceph-6704: device or resource busy
>
>
> Regards,
> --
> Mateus Caruccio / Master of Puppets
> GetupCloud.com
> We make the infrastructure invisible
> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Important: hold on installing/upgrading to Origin 1.4.0

2017-01-24 Thread Andrew Lau
Worth noting, the cluster policy for egressnetworkpolicy
became egressnetworkpolicies (had to update our policies)

On Wed, 25 Jan 2017 at 10:12 Jordan Liggitt  wrote:

> Origin 1.4.1 is now available and fixes this issue.
>
> https://github.com/openshift/origin/releases/tag/v1.4.1
>
> On Mon, Jan 23, 2017 at 11:04 AM, Jordan Liggitt 
> wrote:
>
> An issue[1] was reported that upgrading to 1.4.0 causes existing user
> logins to stop working.
>
> The location in etcd where user identities are stored was inadvertently
> changed during the 1.4 dev cycle, causing existing identities to appear
> missing, and new identities to be stored in an incorrect location.
>
> We're working to expedite a hot-fix restoring the previous behavior. Until
> that hot-fix is available, installing/upgrading to 1.4.0 is not
> recommended. A follow-up announcement will be made when the fix is
> available.
>
> Thanks to the community for quickly reporting the issue. Additional tests
> will be added to ensure storage locations remain stable in the future.
>
>
> Jordan
>
>
> [1] https://github.com/openshift/origin/issues/12598
>
>
>
> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Starting container fails with system error

2016-06-08 Thread Andrew Lau
Hi,

Has anyone hit this issue where a pod is not able to start after it's been
successfully built:

Starting container fails with "System error: read parent: connection reset
by peer"

This issue (https://github.com/docker/docker/issues/14203) seems to say
it's fixed in docker 1.10 but atomic host images are only shipping with 1.9
still

Regards,
Andrew.
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


S3 docker registry performance

2016-06-03 Thread Andrew Lau
Hi,

Does anyone have any comparisons of s3 registry performance? We've found it
to be quite slow, at least 2-3 times longer then using something like an
EBS volume. Here's the config being used:

  encrypt: true
  secure: true
  v4auth: true
  chunksize: 26214400

I stumbled across BZ 1314381 (
https://bugzilla.redhat.com/show_bug.cgi?id=1314381) perhaps that could be
one reason for slow performance.

Is it possible to migrate between different storage backends or is it
something that needs to be decided initially?

Thanks!
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Excluding replacement pods from quota?

2016-04-30 Thread Andrew Lau
Hi,

Is there a way to have the old pod moved into the terminating scope? Or is
there an alternative solution for the following use case:

User has the following quota:
1 pod in terminating scope
1 pod in non-terminating scope

For new builds, the build will complete in the terminating scope but the
replacement pod will not be able to start due to the quota.
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Project limits

2016-04-13 Thread Andrew Lau
Not sure how I missed that, thanks!

On Thu, 14 Apr 2016 at 11:37 Clayton Coleman <ccole...@redhat.com> wrote:

> The docs here:
> https://docs.openshift.org/latest/admin_guide/managing_projects.html#limit-projects-per-user
>
> Cover that.
>
> > On Apr 13, 2016, at 7:17 PM, Andrew Lau <and...@andrewklau.com> wrote:
> >
> > Hi,
> >
> > There seems to be documentation on project quotas and disabling self
> provisioning, however is there a facility that lets us set a limit for the
> the number of projects a user can have?
> >
> > Thanks.
> > ___
> > dev mailing list
> > dev@lists.openshift.redhat.com
> > http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Project limits

2016-04-13 Thread Andrew Lau
Hi,

There seems to be documentation on project quotas and disabling self
provisioning, however is there a facility that lets us set a limit for the
the number of projects a user can have?

Thanks.
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


AWS Dynamic Provision EBS automated annotation?

2016-04-08 Thread Andrew Lau
Hi,

When creating a PVC via console or template/quickstart, the volume isn't
dynamically created in AWS and the PVC is stuck in an infinite pending
state until the following annotation is added to the claim:

  annotations:
volume.alpha.kubernetes.io/storage-class: foo

Is there anyway that this can be included by default with any PVC claim?

Thanks.
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Origin updated to v1.1.5

2016-03-31 Thread Andrew Lau
On Fri, 1 Apr 2016 at 07:49 Jason DeTiberus <jdeti...@redhat.com> wrote:

> On Thu, Mar 31, 2016 at 4:47 PM, Andrew Lau <and...@andrewklau.com> wrote:
>
>> The repos there are for just for the standard CentOS rpm installs? I was
>> hoping to update my centos atomic hosts
>>
>
> The docker images are already up on dockerhub, so a containerized install
> should go ahead and use the latest available now.
>

However the docker and kubernetes packages that ship with centos atomic
right now don't meet the new kubernetes and docker version requirements.


>
>
>>
>> On Fri, 1 Apr 2016 at 05:53 Jason DeTiberus <jdeti...@redhat.com> wrote:
>>
>>> If you use the following PR:
>>> https://github.com/openshift/openshift-ansible/pull/1685 you can grab
>>> them now. We are slowly but surely migrating the builds to the Cent OS PaaS
>>> SIG repositories.
>>>
>>> On Wed, Mar 30, 2016 at 12:06 AM, Andrew Lau <and...@andrewklau.com>
>>> wrote:
>>>
>>>> Is there any timeline on when the centos atomic packages will be
>>>> updated to meet the new kubernetes+docker version requirements?
>>>>
>>>> On Wed, 30 Mar 2016 at 08:12 Clayton Coleman <ccole...@redhat.com>
>>>> wrote:
>>>>
>>>>> Release notes are here
>>>>> https://github.com/openshift/origin/releases/tag/v1.1.5
>>>>>
>>>>> Note that v1.1.5 resolves an issue with Docker 1.9.1-23 and cgroups
>>>>> and is a recommended upgrade for all users.
>>>>>
>>>>> ___
>>>>> dev mailing list
>>>>> dev@lists.openshift.redhat.com
>>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>>>
>>>>
>>>> ___
>>>> dev mailing list
>>>> dev@lists.openshift.redhat.com
>>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>>
>>>>
>>>
>>>
>>> --
>>> Jason DeTiberus
>>>
>>
>
>
> --
> Jason DeTiberus
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Origin updated to v1.1.5

2016-03-31 Thread Andrew Lau
The repos there are for just for the standard CentOS rpm installs? I was
hoping to update my centos atomic hosts

On Fri, 1 Apr 2016 at 05:53 Jason DeTiberus <jdeti...@redhat.com> wrote:

> If you use the following PR:
> https://github.com/openshift/openshift-ansible/pull/1685 you can grab
> them now. We are slowly but surely migrating the builds to the Cent OS PaaS
> SIG repositories.
>
> On Wed, Mar 30, 2016 at 12:06 AM, Andrew Lau <and...@andrewklau.com>
> wrote:
>
>> Is there any timeline on when the centos atomic packages will be updated
>> to meet the new kubernetes+docker version requirements?
>>
>> On Wed, 30 Mar 2016 at 08:12 Clayton Coleman <ccole...@redhat.com> wrote:
>>
>>> Release notes are here
>>> https://github.com/openshift/origin/releases/tag/v1.1.5
>>>
>>> Note that v1.1.5 resolves an issue with Docker 1.9.1-23 and cgroups
>>> and is a recommended upgrade for all users.
>>>
>>> ___
>>> dev mailing list
>>> dev@lists.openshift.redhat.com
>>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>>
>>
>> ___
>> dev mailing list
>> dev@lists.openshift.redhat.com
>> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>>
>>
>
>
> --
> Jason DeTiberus
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev


Re: Origin updated to v1.1.5

2016-03-29 Thread Andrew Lau
Is there any timeline on when the centos atomic packages will be updated to
meet the new kubernetes+docker version requirements?

On Wed, 30 Mar 2016 at 08:12 Clayton Coleman  wrote:

> Release notes are here
> https://github.com/openshift/origin/releases/tag/v1.1.5
>
> Note that v1.1.5 resolves an issue with Docker 1.9.1-23 and cgroups
> and is a recommended upgrade for all users.
>
> ___
> dev mailing list
> dev@lists.openshift.redhat.com
> http://lists.openshift.redhat.com/openshiftmm/listinfo/dev
>
___
dev mailing list
dev@lists.openshift.redhat.com
http://lists.openshift.redhat.com/openshiftmm/listinfo/dev