Re: Hadoop Namenode Failover

2012-04-02 Thread Joey Echeverria
My question is, what is the best solution to make the master(namenode) >>> fail over, i read a lot, but i dont now what is the best. >>> I found this howto: >>> http://www.cloudera.com/blog/2009/07/hadoop-ha-configuration/ , but if it is >>> possible, i do not want to use DRBD. >>> I hope somebody can help me. Sorry my english :) >>> >>> Thanks. >>> Tibi >> >> > -- Joey Echeverria Solutions Architect Cloudera, Inc.

Re: Questions about HDFS's placement policy

2012-03-16 Thread Joey Echeverria
The current replica placement policy is not aware of multiple levels in your topology. So, in your example, it would pick any of the other three racks: /rackA/rack1, /rackB/rack3, or /rackB/rack4 with equal probability. The only way to get the behavior you desire is to specify only one level of

Re: modifying

2012-02-14 Thread Joey Echeverria
You can append to a file in later versions of HDFS (0.21+), but there is no support for modifying a file in place. What's your use case? -Joey On Tue, Feb 14, 2012 at 4:20 AM, Alieh Saeedi wrote: > Hi > Is there a way to modify a file? > Thanks:-) > -- Joseph Echeverria Cloudera, Inc. 443.3

Re: Security in Hadoop-1.0.0

2012-02-14 Thread Joey Echeverria
Hey Stuti, Hadoop doesn't get configured for LDAP per se, you just need to configure your nodes to do LDAP authentication via pam. Here's a guide: http://www.howtoforge.com/linux_ldap_authentication Assuming you already have an LDAP server setup, you probably only care about the "Client configur

Re: Adding dfs.name.dir

2012-01-31 Thread Joey Echeverria
You need to restart the namenode for the new setting to take effect. The namenode will be unavailable for a while during the restart. -Joey On Tue, Jan 31, 2012 at 5:26 PM, Jain, Prem wrote: > Team, > > I would like to add additional directory under “dfs.name.dir” property in > order to have one

Re: Hardware/Software JBOD vs *.data.dir "JBOD"

2012-01-30 Thread Joey Echeverria
Three disks each mounted separately. What you say is true, it will handle failures better and generally perform better. You'll need to configure the dfs.datanode.failed.volumes.tolerated parameter in hdfs-site.xml to make sure that it handles a single failed volume gracefully. -Joey On Mon, Jan 3

Re: Data processing in DFSClient

2012-01-19 Thread Joey Echeverria
Personally I would just use Har :) It sounds like an interesting project. You might find this document helpful: http://kazman.shidler.hawaii.edu/ArchDoc.html It was designed to help contributors navigate the HDFS source tree. -Joey On Thu, Jan 19, 2012 at 11:52 AM, Sesha Kumar wrote: >  I'm cu

Re: Apply ACL on file level in Hadoop Cluster

2012-01-19 Thread Joey Echeverria
ve access. > > I do not want that users which have access on a directory level can see all > the inner content even if they do not have access permission on them. > > I thought of attaining it using ACL's . Is there any other way through which > I can achieve this goa

Re: Issues in fuse_dfs (Hadoop-1.0.0)

2012-01-19 Thread Joey Echeverria
What classpath is fuse_dfs using? It looks like your missing some jars. -Joey On Jan 19, 2012, at 5:40, Stuti Awasthi wrote: > No, I have not set any security in conf/sites file. My sites file have just > basic entries to start the hdfs cluster. > > -Original Message- > From: alo a

Re: Apply ACL on file level in Hadoop Cluster

2012-01-18 Thread Joey Echeverria
HDFS only supports Unix style read, write execute permissions. What style of ACLs do you want to apply? -Joey On Wed, Jan 18, 2012 at 7:55 AM, Stuti Awasthi wrote: > Thanks Alex, > Yes, I wanted to apply ACL's on every file/directory created on HDFS. Is > there absolutely no way to achieve that

Re: Hadoop-Hbase latest releases compatibility

2012-01-18 Thread Joey Echeverria
Yes. On Jan 18, 2012, at 3:39, Stuti Awasthi wrote: > Ok. Thanks Arun > So is Hadoop-1.0.0 is compatible with Hbase stable release with append > support ? > > -Original Message- > From: Arun C Murthy [mailto:a...@hortonworks.com] > Sent: Wednesday, January 18, 2012 1:30 PM > To: hd

Re: Data processing in DFSClient

2012-01-16 Thread Joey Echeverria
Sesha, What kind of processing are you attempting to do? Maybe it makes more sense to just implement a MapReduce job rather than modifying the datanodes? -Joey On Mon, Jan 16, 2012 at 9:20 AM, Sesha Kumar wrote: > Hey guys, > > Sorry for the typo in my last message.I have corrected it. > > I w

Re: How-to use DFSClient's BlockReader from Java

2012-01-11 Thread Joey Echeverria
Yup, just start reading from wherever the block starts and stop at the end of the block to do local reads. -Joey On Wed, Jan 11, 2012 at 11:31 AM, David Pavlis wrote: > Hi Todd, > > If I use the FileSystem API and I am on local node - how do I get/read just > that particular block residing local

Re: Re : Custom web app

2012-01-11 Thread Joey Echeverria
Take a look at Hue, it a web app that does exactly what you're talking about. It uses a combination of RPC calls and other public APIs as well as a custom plugin that adds additional APIs via thrift. Hue can be deployed on any node and is mostly written in Python. -Joey On Jan 11, 2012, at

Re: Combining AVRO files efficiently within HDFS

2012-01-06 Thread Joey Echeverria
> Frank Grimes > > > On 2012-01-06, at 1:05 PM, Joey Echeverria wrote: > >> I would do it by staging the machine data into a temporary directory >> and then renaming the directory when it's been verified. So, data >> would be written into directories like this:

Re: Combining AVRO files efficiently within HDFS

2012-01-06 Thread Joey Echeverria
I would do it by staging the machine data into a temporary directory and then renaming the directory when it's been verified. So, data would be written into directories like this: 2012-01/02/00/stage/machine1.log.avro 2012-01/02/00/stage/machine2.log.avro 2012-01/02/00/stage/machine3.log.avro Aft

Re: hadoop cluster for querying data on mongodb

2011-12-25 Thread Joey Echeverria
Don't start your daemons as root. They should be started as a system account. Typically hdfs for the HDFS services and mapred for the MapReduce ones. -Joey On Fri, Dec 23, 2011 at 4:04 AM, Martinus Martinus wrote: > Hi Ayon, > > I tried to setup the hadoop-cluster using hadoop-0.20.2 and it seem

Re: hdfs-nfs - through chokepoint or balanced?

2011-12-16 Thread Joey Echeverria
HDFS doesn't natively support NFS. In order to export HDFS via NFS you'd have to mount it to the local file system with fuse and then export that directory. In that case, all traffic would go through the host acting as the NFS server. -Joey On Dec 16, 2011, at 19:07, Mark Hedges wrote: > >

Re: Copying file is not working properly via fuse-dfs

2011-12-16 Thread Joey Echeverria
How long did you wait after copying? I've seen this behavior before and it's due to the semantics of close in fuse and not easily fixed in fuse-dfs. In a minute or so though the copy should have the right size. -Joey On Dec 16, 2011, at 1:55, Stuti Awasthi wrote: > Hi All, > > I installed a

Re: Hadoop and Hbase compatibility

2011-12-09 Thread Joey Echeverria
Those versions should work fine together. Did you get Hadoop configured for psuedo distribtued mode correctly, or are you having trouble with both? -Joey On Fri, Dec 9, 2011 at 4:57 AM, Mohammad Tariq wrote: > Is there any specific combination of Hadoop and Hbase in order to use > Hbase in atlea

Re: problem of large distributed system access hdfs

2011-11-30 Thread Joey Echeverria
You could check out Hoop[1], a REST interface for accessing HDFS. Since it's REST based, you can easily load balance clients across multiple servers. You'll have to write the C/C++ code for communicating with Hoop, but that shouldn't require too much more than a thin wrapper around an HTTP client l

Re: Best option for mounting HDFS

2011-11-29 Thread Joey Echeverria
Hey Stuti, Fuse is probably the most commonly used solution. It has some limitations because HDFS isn't posix compliant, but it it works for a lot of use cases. You can try out both the contrib driver and the google code version. I'm not sure which will work better for your Hadoop version. Newer H

Re: What does hdfs balancer do after adding more disks to existing datanode.

2011-11-22 Thread Joey Echeverria
The balancer only balances between datanodes. This means the new drives won't get used until you start writing new data to them. If you want to balance the drives on a node, you need to 1) copy a bunch of block files from the old drives to the new drives 2) shutdown the datanode 3) delete the old

Re: Input path does not exist ERROR 2118:

2011-09-20 Thread Joey Echeverria
What is the output of the following: hadoop fs -ls hdfs://10.0.0.61/user/kiranprasad.g/pig-0.8.1/tutorial/data/excite-small.log -Joey On Tue, Sep 20, 2011 at 1:44 AM, kiranprasad wrote: > Hi > > When I have run the same from local mode it is working fine and I got the > result, but on Hadoop f

Re: While starting HDFS process getting stucked.

2011-09-16 Thread Joey Echeverria
On the NN: rm -rf ${dfs.name.dir}/* On the DN: rm -rf ${dfs.data.dir}/* -Joey On Fri, Sep 16, 2011 at 7:21 AM, kiranprasad wrote: > What do I need to clear from the hadoop directory. > > -Original Message- From: Stephan Gammeter > Sent: Friday, September 16, 2011 3:57 PM > To: hdfs-us

Re: Hadoop Namenode problem

2011-08-03 Thread Joey Echeverria
e > /user/hdfs/files/d954x328-85x8-4dfe-b73c-34a7a2c1xb0f is closed by > DFSClient_1277823200 > > Is there any way I can find out from the log when the safe mode gets over. > > Regards, > Rahul > > On Thu, Jul 28, 2011 at 6:16 PM, Joey Echeverria wrote: >> >>

Re: Hadoop Namenode problem

2011-07-28 Thread Joey Echeverria
3_15838442 reported from xx.xx.xx.xx:50010 > current size is 1950720 reported size is 2448907 > > I think the edit file size was too huge thats why it took long time. > > Regards, > Rahul > > On Fri, Jul 22, 2011 at 9:33 PM, Joey Echeverria wrote: > The lon

Re: Merging Files in HDFS

2011-07-22 Thread Joey Echeverria
You could do it with streaming and a single reducer: bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -Dmapred.num.reduce.tasks=1 -reducer cat -input /hdfs/directory/allsource* -output mergefile -verbose -Joey On Fri, Jul 22, 2011 at 1:26 PM, Time Less wrote: > Hello, List! > > I have s

Re: Hadoop Namenode problem

2011-07-22 Thread Joey Echeverria
ption: Call to xx.xx.xx.xx:9000 failed on local exception: > java.io.IOException: Connection reset by peer > > Regards, > Rahul > > > On Fri, Jul 22, 2011 at 5:40 PM, Joey Echeverria wrote: > >> Do you have an instance of the SecondaryNamenode in your cluster? >> >>

Re: Hadoop Namenode problem

2011-07-22 Thread Joey Echeverria
Do you have an instance of the SecondaryNamenode in your cluster? -Joey On Fri, Jul 22, 2011 at 3:15 AM, Rahul Das wrote: > Hi, > > I am running a Hadoop cluster with 20 Data node. Yesterday I found that the > Namenode was not responding ( No write/read to HDFS is happening). It got > stuck for

Re: replicate data in HDFS with smarter encoding

2011-07-18 Thread Joey Echeverria
Facebook contributed some code to do something similar called HDFS RAID: http://wiki.apache.org/hadoop/HDFS-RAID -Joey On Jul 18, 2011, at 3:41, Da Zheng wrote: > Hello, > > It seems that data replication in HDFS is simply data copy among nodes. Has > anyone considered to use a better encodi

Re: Any other uses of hdfs?

2011-07-16 Thread Joey Echeverria
HBase does not require MapReduce. -Joey On Jul 16, 2011, at 11:46, Rita wrote: > So, I use hdfs to store very large files and access them thru various client > (100 clients) using FS utils. Are there any other tools or projects that > solely use hdfs as its storage for fast access? I know

Re: Datanode Shows Way Too Much Space Available

2011-06-13 Thread Joey Echeverria
By any chance, do you have 3 directories set in dfs.data.dir all of which are on /dev/hda1? -Joey On Mon, Jun 13, 2011 at 3:01 PM, Time Less wrote: > I have a datanode with a ~900GB hard drive in it: > > FilesystemSize Used Avail Use% Mounted on > /dev/hda1 878G 384G

Re: Persistent small number of Blocks with corrupt replicas / Under replicated blocks

2011-06-10 Thread Joey Echeverria
any negative consequences of running the fsck -move just to > try it? > > On Jun 10, 2011, at 3:33 PM, Joey Echeverria wrote: > >> Good question. I didn't pick up on the fact that fsck disagrees with >> dfsadmin. Have you tried a full restart? Maybe somebody's infor

Re: Persistent small number of Blocks with corrupt replicas / Under replicated blocks

2011-06-10 Thread Joey Echeverria
d tell me if there were issues. > > So will running hadoop fsck -move just move the corrupted replicas and leave > the good ones? Will this work even though fsck does not report any corruption? > > On Jun 9, 2011, at 3:20 PM, Joey Echeverria wrote: > >> hadoop fsck -move will mo

Re: NameNode heapsize

2011-06-10 Thread Joey Echeverria
27;ll cut your usage by another 1/3. This becomes very significant very quickly. -Joey On Fri, Jun 10, 2011 at 12:36 PM, Anh Nguyen wrote: > On 06/10/2011 04:57 AM, Joey Echeverria wrote: >> >> Hi On, >> >> The namenode stores the full filesystem image in memory. Lo

Re: NameNode heapsize

2011-06-10 Thread Joey Echeverria
Hi On, The namenode stores the full filesystem image in memory. Looking at your stats, you have ~30 million files/directories and ~47 million blocks. That means that on average, each of your files is only ~1.4 blocks in size. One way to lower the pressure on the namenode would be to store fewer,

Re: Persistent small number of Blocks with corrupt replicas / Under replicated blocks

2011-06-09 Thread Joey Echeverria
hadoop fsck -move will move the corrupt files to /lost+found, which will "fix" the report. Do you know what created the corrupt files? -Joey On Thu, Jun 9, 2011 at 3:04 PM, Robert J Berger wrote: > I'm still having this problem and am kind of paralyzed until I figure out how > to eliminate the

Re: Query regarding internal/working of hadoop fs -copyFromLocal and fs.write()

2011-05-31 Thread Joey Echeverria
They write directly to HDFS, there's no additional buffering on the local file system of the client. -Joey On Tue, May 31, 2011 at 7:56 PM, Mapred Learn wrote: > Hi guys, > I asked this question earlier but did not get any response. So, posting > again. Hope somebody can point to the right descr

Re: What's datanode doing when logging 'Verification succeeded for blk_.' ?

2011-05-31 Thread Joey Echeverria
How much memory do you have on your DataNode? Is it possible that you're swapping? -Joey On Mon, May 30, 2011 at 11:09 PM, ccxixicc wrote: > > Hi,all > I found NameNode often lost heartbeat from DataNodes: > org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.heartbeatCheck: lost > heartbeat f

Re: replicating existing blocks?

2011-05-19 Thread Joey Echeverria
fic conf and/or in mapred-site.xml. >> >> >> Friso >> >> >> >> On 19 mei 2011, at 03:42, Steve Cohen wrote: >> >>> Where is the default replication factor on job files set? Is it different >>> then the dfs.replication setting i

Re: hdfs competitor?

2011-05-18 Thread Joey Echeverria
KFS On May 18, 2011 7:03 PM, "Thanh Do" wrote: > hi hdfs users, > > Is anybody aware of a system > that is similar to HDFS, in the sense > that it has single master architecture, > and the master also keeps an operation log. > > Thanks, > > Thanh

Re: replicating existing blocks?

2011-05-18 Thread Joey Echeverria
Did you run a map reduce job? I think the default replication factor on job files is 10, which obviously doesn't work well on a psuedo-distributed cluster. -Joey On Wed, May 18, 2011 at 5:07 PM, Steve Cohen wrote: > Thanks for the answer. Earlier, I asked about why I get occasional not > repli

Re: Stability issue - dead DN's

2011-05-11 Thread Joey Echeverria
Which version of hadoop are you running? I'm pretty sure the problem is you're over committing your RAM. Hadoop really doesn't like swapping. I would try setting your mapred.child.java.opts to -Xmx1024m. -Joey On Wed, May 11, 2011 at 2:23 AM, Evert Lammerts wrote: > Hi list, > > I notice that w

Re: Namenode restart giving IllegalArgumentException

2011-05-04 Thread Joey Echeverria
How much data do you have? It takes some time for all of the datanodes to report that all blocks are accounted for. -Joey On Wed, May 4, 2011 at 4:05 PM, Himanshu Vashishtha wrote: > Hey, > Every thing comes up for good. > Why this delay of 6 minutes I wonder? And I see that this delay has nothi