>> The issue I have is in short that whenever I fire up hadoop data
>> node on the new osol system and start writing files to the HDFS on
>> other nodes, the system will work for a few minutes writing files as
>
> Other nodes? Is that systems, or something else?

Systems. As I said just below their configuration is slightly different as they 
are about a year old and used previous models of mobo, cpu and disks. 

>> So this
>> makes me believe that there are a few options that could be bad
>> here:
>>
>> 1) a bug in OpenSolaris kernel (driver or not)
>> 2) bad motherboard (hmm.... doubt a bit)
>> 3) bad Areca controller
>> 4) bad disks
>> 5) bad memory

> If you're writing to other machines, then shouldn't you add:
>
>6) bad nic.
>
>I've seen problems with the e1000g on one of my systems also; I just
>haven't had time to do any diagnosis on it.

A bad nic I did consider, however the reason I discarded that is that I would 
assume a bad nic would cause hangs also for other applications transferring 
data. I have transferred well over 14TB of data in row from 3 other machines in 
parallel without a single hiccup as well as taking part in the SC09 bandwidth 
challenge and having varying transfers from US -> this machine. Also the other 
disk virtualization software dCache is working just fine on the machine and the 
concept of it is similar to hadoop. I did think that maybe using the Quad 
ethernet card causes the issue as this is a difference from other nodes, but 
haven't yet tried without it. The onboard port that I had used for internal 
networking somehow is stuck at 100mbit/s and there's also a fmadm fault report 
on PCI-E device causing lowered performance due to shared interrupt 19. However 
that NIC is currently unplumbed even.
-- 
This message posted from opensolaris.org
_______________________________________________
opensolaris-help mailing list
[email protected]

Reply via email to