>> The issue I have is in short that whenever I fire up hadoop data >> node on the new osol system and start writing files to the HDFS on >> other nodes, the system will work for a few minutes writing files as > > Other nodes? Is that systems, or something else?
Systems. As I said just below their configuration is slightly different as they are about a year old and used previous models of mobo, cpu and disks. >> So this >> makes me believe that there are a few options that could be bad >> here: >> >> 1) a bug in OpenSolaris kernel (driver or not) >> 2) bad motherboard (hmm.... doubt a bit) >> 3) bad Areca controller >> 4) bad disks >> 5) bad memory > If you're writing to other machines, then shouldn't you add: > >6) bad nic. > >I've seen problems with the e1000g on one of my systems also; I just >haven't had time to do any diagnosis on it. A bad nic I did consider, however the reason I discarded that is that I would assume a bad nic would cause hangs also for other applications transferring data. I have transferred well over 14TB of data in row from 3 other machines in parallel without a single hiccup as well as taking part in the SC09 bandwidth challenge and having varying transfers from US -> this machine. Also the other disk virtualization software dCache is working just fine on the machine and the concept of it is similar to hadoop. I did think that maybe using the Quad ethernet card causes the issue as this is a difference from other nodes, but haven't yet tried without it. The onboard port that I had used for internal networking somehow is stuck at 100mbit/s and there's also a fmadm fault report on PCI-E device causing lowered performance due to shared interrupt 19. However that NIC is currently unplumbed even. -- This message posted from opensolaris.org _______________________________________________ opensolaris-help mailing list [email protected]
