Hiya all!

In recent weeks, folks on the list have given me some pointers on a couple 
issues I had, one where tasks were disappearing (hasn’t happened since I fired 
up a script to log status, of course ;d) and kafka-mesos refusing to launch 
brokers.

Quick overview, I have two clusters: a test cluster and a fresh, prod cluster.  
Both are built using the same chef code, with the only differences in 
configuration being the zookeeper / mesos master hostnames.  An unfortunate 
difference, which would be difficult to reverse at this point, is that the test 
cluster runs Ubuntu, and prod runs CentOS.  Our IT dept handed me VMWare keys 
to start the project and I did not know they’d be mandating CentOS for prod, 
but it also seems unlikely this is the cause of my troubles.

After weeks of everything working fine in test, everything started failing in 
test recently.  I’m fairly confident now this is due to an issue where my 
password change hadn’t gone through on one host, the fabric tasks I use to 
clear zookeeper when everything gets funky were failing on one host, and thusly 
zookeeper never actually got cleared.  So, test is working again, but prod is 
still and has consistently been dead.

I have sniffed the traffic between kafka-mesos scheduler and the mesos master – 
currently running on the same host – and am not really sure what to make of it

—
POST /master/mesos.scheduler.Call HTTP/1.1
User-Agent: 
libprocess/[email protected]:45472
Libprocess-From: 
[email protected]:45472
Connection: Keep-Alive
Host:
Transfer-Encoding: chunked

3b
...7
5
.juryan..kafka!......CA(.2.*:.zk01.something.com
0

POST /master/mesos.scheduler.Call HTTP/1.1
User-Agent: 
libprocess/[email protected]:45472
Libprocess-From: 
[email protected]:45472
Connection: Keep-Alive
Host:
Transfer-Encoding: chunked

3b
...7
5
.juryan..kafka!......CA(.2.*:.zk01.aur.ziprealty.com
0

POST /master/mesos.scheduler.Call HTTP/1.1
User-Agent: 
libprocess/[email protected]:45472
Libprocess-From: 
[email protected]:45472
Connection: Keep-Alive
Host:
Transfer-Encoding: chunked

3b
...7
5
.juryan..kafka!......CA(.2.*:.zk01.something.com
0

POST /master/mesos.scheduler.Call HTTP/1.1
User-Agent: 
libprocess/[email protected]:45472
Libprocess-From: 
[email protected]:45472
Connection: Keep-Alive
Host:
Transfer-Encoding: chunked

3b
...7
5
.juryan..kafka!......CA(.2.*:.zk01.something.com
0
—

I’m still a bit unclear on where to go from here.  My work_dir is no longer in 
/tmp, as suggested, but the brokers never start and there is no /kafka in zk.

Part of me is inclined to rebuild the test cluster with CentOS for consistency, 
but the ubuntu cluster is the only one I have working, and I know it’s possible 
that something inadvertently different in the defaults between the two is at 
play, which might be worth understanding, but I also see how I’ve gotten myself 
into an odd pickle. ;)

Anyway, as always, any input would be appreciated.  A successful test phase 
isn’t much good if we can’t actually launch!

Cheers,

Justin Alan Ryan
Sr. Systems Engineer
ZipRealty / Realogy
________________________________

P Please consider the environment before printing this e-mail

The information in this electronic mail message is the sender's confidential 
business and may be legally privileged. It is intended solely for the 
addressee(s). Access to this internet electronic mail message by anyone else is 
unauthorized. If you are not the intended recipient, any disclosure, copying, 
distribution or any action taken or omitted to be taken in reliance on it is 
prohibited and may be unlawful. The sender believes that this E-mail and any 
attachments were free of any virus, worm, Trojan horse, and/or malicious code 
when sent. This message and its attachments could have been infected during 
transmission. By reading the message and opening any attachments, the recipient 
accepts full responsibility for taking protective and remedial action about 
viruses and other defects. The sender's employer is not liable for any loss or 
damage arising in any way.

Reply via email to