----- Original Message ----- > From: "Alejandro Nieto Boza" <[email protected]> > To: "Clayton Coleman" <[email protected]>, [email protected] > Cc: "users" <[email protected]> > Sent: Thursday, February 11, 2016 3:02:57 AM > Subject: Re: Enabling Cluster Metrics > > I'm running without persistent storage. > > When the pods are "turned on" more than 20 minutes they change to state > "Running" and working. Is it possible that it due to insufficient memory?
20 minutes seems like a long time here. The containers are running a full web server and database which will take up some memory resources. > I've watch their state for a day and the pods are working. I will try in > bigger scenarios when I can and I will post if the error appears again. > > > Now, I've got this problem (also I've got it previously): > > I've launched a pod especifying requests and limits for cpu and memory but > when I watch on the pod overview pages, the value of metrics graphs is 0 > (with any pod, not only with this). So before you launch a pod with limits, you can get graphs, and once you deploy a pod with limits you don't see any graphs for anything anymore? I have seen the issue where pods with limits having zero values, but that has been due to pods with limits being configured to run on a separate node and the certificates for that node were not configured properly. > > > Heapster logs: > > W0210 18:11:48.637940 1 reflector.go:224] /tmp/gopath/src/ > k8s.io/heapster/sources/pods.go:173: watch of *api.Pod ended with: 401: The > event in requested index is outdated and cleared (the requested history has > been cleared [2322574/2322059]) [2323573] > > > # curl -X GET https://hawkular-metrics.example.com/hawkular/metrics/status > -k > > {"MetricsService":"STARTED","Implementation-Version":"0.12.0.Final".... Ok, good, this means that the Hawkular Metrics and Cassandra containers are running properly. > > Here curl doesn't get JSON object: > > # curl -H "Authorization: Bearer XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX" > > -H "Hawkular-tenant: test" > > -X GET https://hawkular-metrics.example.com/hawkular/metrics/metrics > -k| python -m json.tool > > % Total % Received % Xferd Average Speed Time Time Time > Current > > Dload Upload Total Spent Left > Speed > > 100 68 100 68 0 0 110 0 --:--:-- --:--:-- --:--:-- > 110 > > No JSON object could be decoded Are you sure the 'test' project exists and it has pods running within it? If you are running the metrics components in openshift-infra can you please set the Hawkular-tenant value to 'openshift-infra'? If the metric components are running here then there should be something showing up at that url. > > It seems like the problem isn't due to certificates issues, the node IP > appears correctly in certificates. > > > > > El mié., 10 feb. 2016 17:56, Clayton Coleman <[email protected]> escribió: > > > I don't know what unconfigured table means (beyond maybe your tables need > > to be recreated because you have an old version) but I bet Matt does. > > > > On Feb 10, 2016, at 10:50 AM, Alejandro Nieto Boza <[email protected]> > > wrote: > > > > Thanks, the Openshift DNS wasn't running correctly. Now the error doesn't > > appear but... > > > > Now I've an error (this error have already appears to me in other > > scenarios). > > > > This is the state of my metrics pods: > > > > # oc get pods > > NAME READY STATUS RESTARTS AGE > > hawkular-cassandra-1-j09f6 1/1 Running 0 10m > > hawkular-metrics-xpa33 0/1 Error 1 10m > > heapster-42vyz 0/1 Error 2 10m > > metrics-deployer-e5e3v 0/1 Completed 0 12m > > > > > > # oc get pods > > NAME READY STATUS RESTARTS AGE > > hawkular-cassandra-1-j09f6 1/1 Running 0 12m > > hawkular-metrics-xpa33 0/1 Completed 2 12m > > heapster-42vyz 0/1 CrashLoopBackOff 4 12m > > metrics-deployer-e5e3v 0/1 Completed 0 15m > > > > > > The pod hawkular-metrics change its state between completed and error (?) > > > > > > These are some logs of hawkular-metrics pod: > > > > # oc logs hawkular-metrics-xpa33 > > 15:22:08,104 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-1) > > MSC000001: Failed to start service > > jboss.deployment.unit."hawkular-metrics-api-jaxrs.war": > > org.jboss.msc.service.StartException in service > > jboss.deployment.unit."hawkular-metrics-api-jaxrs.war": Failed to start > > service > > at > > org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1904) > > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > > at java.lang.Thread.run(Thread.java:745) > > Caused by: java.lang.IllegalStateException: Container is down > > ............... > > > > 15:22:08,211 ERROR [org.jboss.msc.service.fail] (MSC service thread 1-1) > > MSC000001: Failed to start service > > jboss.serverManagement.controller.management.http: > > org.jboss.msc.service.StartException in service > > jboss.serverManagement.controller.management.http: Failed to start service > > at > > org.jboss.msc.service.ServiceControllerImpl$StartTask.run(ServiceControllerImpl.java:1904) > > ............... > > > > > > 15:29:35,416 FATAL > > [org.hawkular.metrics.api.jaxrs.MetricsServiceLifecycle] > > (metricsservice-lifecycle-thread) HAWKMETRICS200006: An error occurred > > trying to connect to the Cassandra cluster: > > com.datastax.driver.core.exceptions.InvalidQueryException: unconfigured > > table retentions_idx > > at > > com.datastax.driver.core.exceptions.InvalidQueryException.copy(InvalidQueryException.java:35) > > ................. > > > > > > > > And obviously heapster cannot connect to hawkular-metrics: > > > > # oc logs heapster-42vyz > > Could not connect to https://hawkular-metrics:443/hawkular/metrics/status. > > Curl exit code: 7. Status Code 000 > > 'https://hawkular-metrics:443/hawkular/metrics/status' is not accessible > > [HTTP status code: 000. Curl exit code 7]. Retrying. > > > > > > hawkular-cassandra logs don't show errors. > > > > > > > > 2016-02-10 14:50 GMT+01:00 Clayton Coleman <[email protected]>: > > > >> Can you try from one of your nodes to reach the nameserver directly and > >> via the proxy? > >> > >> dig @<your master ip> kubernetes.default.svc.cluster.local > >> dig @172.30.0.1 kubernetes.default.svc.cluster.local > >> > >> > >> > >> On Feb 10, 2016, at 8:40 AM, Alejandro Nieto Boza <[email protected]> > >> wrote: > >> > >> It's like you said. > >> > >> Test logs: > >> # oc logs test > >> % Total % Received % Xferd Average Speed Time Time Time > >> Current > >> Dload Upload Total Spent Left > >> Speed > >> 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- > >> 0curl: (6) Could not resolve host: kubernetes; Unknown error > >> > >> > >> > >> > >> Test2 logs: > >> # oc logs test2 > >> nameserver "172.30.0.1" > >> nameserver "another-ip" > >> > >> > >> > >> > >> # oc get svc/kubernetes -n default > >> NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR > >> AGE > >> kubernetes "172.30.0.1" <none> 443/TCP,53/UDP,53/TCP <none> > >> 92d > >> search test.svc.cluster.local svc.cluster.local cluster.local test.es > >> options ndots:5 > >> > >> > >> > >> > >> > >> > >> 2016-02-10 14:01 GMT+01:00 Clayton Coleman <[email protected]>: > >> > >>> That seems to indicate that inside the deployment container DNS is not > >>> working. Can you do the following to check: > >>> > >>> oc run --image centos:7 test --generator=run-pod/v1 --restart=Never > >>> -- curl https://kubernetes > >>> oc logs test > >>> > >>> And then > >>> > >>> oc run --image centos:7 test2 --generator=run-pod/v1 --restart=Never > >>> -- cat /etc/resolv.conf > >>> oc logs test2 > >>> > >>> The latter should have a nameserver pointing to the master by its > >>> service IP - the command: > >>> > >>> oc get svc/kubernetes -n default > >>> > >>> Should show that same IP > >>> > >>> On Feb 10, 2016, at 7:39 AM, Alejandro Nieto Boza <[email protected]> > >>> wrote: > >>> > >>> Hi, > >>> > >>> I've been following the following steps to deploy metrics: > >>> > >>> https://docs.openshift.org/latest/install_config/cluster_metrics.html > >>> > >>> When I run the following command: > >>> > >>> > >>> oc process -f metrics.yaml -v \ > >>> HAWKULAR_METRICS_HOSTNAME=hawkular-metrics.example.com,USE_PERSISTENT_STORAGE=false > >>> \ > >>> | oc create -f - > >>> > >>> > >>> I get the following error: > >>> > >>> Creating the Cassandra Certificate Secrets configuration json file > >>> +++ base64 > >>> ++++ echo hawkular-cassandra > >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.truststore > >>> +++ base64 > >>> ++++ echo RjR--747mUzmTS- > >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.pem > >>> ++ echo > >>> ++ echo 'Creating the Cassandra Certificate Secrets configuration json > >>> file' > >>> ++ cat > >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra.cert > >>> +++ base64 -w 0 /etc/deploy/_output/hawkular-cassandra-ca.cert > >>> Creating Hawkular Metrics & Cassandra Secrets > >>> ++ echo 'Creating Hawkular Metrics & Cassandra Secrets' > >>> ++ oc create -f /etc/deploy/_output/hawkular-metrics-secrets.json > >>> unable to connect to a server to handle "secrets": Get > >>> https://kubernetes.default.svc:443/api: dial tcp: lookup > >>> kubernetes.default.svc: no such host > >>> > >>> > >>> > >>> > >>> # oc get pods > >>> NAME READY STATUS RESTARTS AGE > >>> metrics-deployer-7gcpd 0/1 Error 0 39m > >>> > >>> > >>> How can I know if my kubernetes master URL is > >>> https://kubernetes.default.svc:443 or is another URL? > >>> > >>> My Openshift installation isn't an update. > >>> > >>> _______________________________________________ > >>> users mailing list > >>> [email protected] > >>> http://lists.openshift.redhat.com/openshiftmm/listinfo/users > >>> > >>> > >> > > > _______________________________________________ users mailing list [email protected] http://lists.openshift.redhat.com/openshiftmm/listinfo/users
