Re: HDFS vs software RAID like md(adm)

2011-09-15 Thread Per Steffensen
Norman Maurer skrev: You should keep in mind that HDFS is not POSIX conform so you will have a hard time to use it as "real fs". I know there is a fuse driver Guess there is a few solutions http://wiki.apache.org/hadoop/MountableHDFS An alternative would be to write the file-accessing code di

Re: HDFS vs software RAID like md(adm)

2011-09-15 Thread Віталій Тимчишин
Main con for me is that all the metadata is kept in ram of single node, so if you have a lot of files, you need a lot of ram on main (name) node. This limits scalability. Another thing is that it does not like a lot of directories. It starts checking all the directories now and then, locking data

Re: HDFS vs software RAID like md(adm)

2011-09-15 Thread Kanghua151
hi Masters: i want to develop a log structure filesystem based on hdfs。this filesystem used to host virtualizaion machine image file 。 on hdfs i can implement snapshot and data redundancy;as log structure fs ,which support random access。 i also hope to use map reduce way to do segment

Re: Regarding design of HDFS

2011-09-15 Thread Kanghua151
i get it 。3x 发自我的 iPhone 在 2011-9-13,19:20,Ted Dunning 写道: > > > 2011/9/13 kang hua > Hi Master: > can you explain more detail --- "The only way to avoid this is to make > the data much more cacheable and to have a viable cache coherency strategy. > Cache coherency at the meta-data

Need help regarding HDFS-RAID

2011-09-15 Thread Ajit Ratnaparkhi
Hi, We want to use HDFS-RAID in our production cluster. ( http://wiki.apache.org/hadoop/HDFS-RAID) I am not able to find source/binaries/configs for this in official hadoop distribution from apache hadoop. (checked in 0.20.1 and 0.20.2). Can somebody please tell me where can I find that? and inst

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Harsh J
Hey Ajit, HDFS-RAID was never part of the 0.20 release. It made its debut in the 0.21 release [1]. I know that Facebook uses it (and also did develop it), but unsure of users beyond Facebook. While 0.21 overall is not entirely deemed as production-usable yet (and is in fact, possibly abandoned fo

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Ajit Ratnaparkhi
Hi, We were planning to use it for past data archival(instead of moving it to archival store). Archiving it in HDFS gives advantage of making it easily available for processing whenever required. Is there any archival solution in hadoop ecosystem? thanks, Ajit. On Thu, Sep 15, 2011 at 5:05 PM,

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Dhruba Borthakur
We use HDFS RAID in a big way. Data older than 12 days are RAIDED using XOR encoding (effective replication of 2.5). Data older than a few months are raided using ReedSolomon (effective observed replication factor of 1.5). This is running on our 60 PB size cluster for about an year now. thanks dhr

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Andrew Purtell
But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba?   Best regards,    - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > >From: Dhruba Borthakur >To: hdfs-user@hadoop.apache.org >Sent: Th

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Dhruba Borthakur
That's right Andy. 0.22+. We are running a HDFS-RAID code base that is pretty close to what is available in Apache hdfs trunk. -dhruba On Thu, Sep 15, 2011 at 10:08 AM, Andrew Purtell wrote: > But that is the HDFS RAID effectively in 0.22+, not 0.21, right Dhruba? > > Best regards, > > - And

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Ajit Ratnaparkhi
Thanks for the info! So can I use HDFS-RAID taken from apache hdfs trunk as it is with hadoop-0.20.1/hadoop-0.20.2 ? It seems to be under branch 0.21, will it work with 0.20.* ? thanks, -Ajit. On Thu, Sep 15, 2011 at 10:44 PM, Dhruba Borthakur wrote: > That's right Andy. 0.22+. We are running a

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Andrew Purtell
HDFS RAID from 0.21 will work if back ported to 0.20. Only a minor fixup is needed. HDFS RAID from 0.22 relies on new HDFS APIs not available in 0.20.   Best regards,     - Andy Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) >__

While starting HDFS process getting stucked.

2011-09-15 Thread kiranprasad
Hi I am new to Hadoop and PIG, For Cluster I have 3 VMs(10.0.0.61-master, 10.0.0.62,10.0.0.63 - Slaves) I ve installed PIG in 10.0.0.61 VM.=20 Hadoop version : hadoop-0.20.2 and PIG : pig-0.8.1 I ve updated the xmls , please find the below mapred site.xml -- mapred.j

Re: Need help regarding HDFS-RAID

2011-09-15 Thread Ajit Ratnaparkhi
Thanks! On Thu, Sep 15, 2011 at 11:31 PM, Andrew Purtell wrote: > HDFS RAID from 0.21 will work if back ported to 0.20. Only a minor fixup is > needed. > > HDFS RAID from 0.22 relies on new HDFS APIs not available in 0.20. > > Best regards, > > - Andy > > Problems worthy of attack prove their

Re: While starting HDFS process getting stucked.

2011-09-15 Thread Harsh J
Hey Kiran, What does the tail of your NameNode log say? Can you pastebin your whole NameNode logs here or at some site and link? On Fri, Sep 16, 2011 at 11:01 AM, kiranprasad wrote: > Hi > > I  am new to Hadoop and PIG, > > For Cluster I have 3 VMs(10.0.0.61-master, 10.0.0.62,10.0.0.63 - Slaves)