Re: Performance question

2009-04-20 Thread Jean-Daniel Cryans
Mark, There is a setup price when using Hadoop, for each task a new JVM must be spawned. On such a small scale, you won't see any good using MR. J-D On Mon, Apr 20, 2009 at 12:26 AM, Mark Kerzner markkerz...@gmail.com wrote: Hi, I ran a Hadoop MapReduce task in the local mode, reading and

Re: Performance question

2009-04-20 Thread Jean-Daniel Cryans
that this is the guideline - each task should take minutes. Thank you, Mark On Mon, Apr 20, 2009 at 7:42 AM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Mark, There is a setup price when using Hadoop, for each task a new JVM must be spawned. On such a small scale, you won't see any good using MR

Re: Hadoop data nodes failing to start

2009-04-08 Thread Jean-Daniel Cryans
Kevin, I'm glad it worked for you. We talked a bit about 5114 yesterday, any chance of trying 0.18 branch on that same cluster without the socket timeout thing? Thx, J-D On Wed, Apr 8, 2009 at 9:24 AM, Kevin Eppinger keppin...@adknowledge.com wrote: FYI:  Problem fixed.  It was apparently a

Re: datanode but not tasktracker

2009-04-01 Thread Jean-Daniel Cryans
Sandhya, You can specify the file to use for slaves so instead of start-all you can start-dfs with the normal slave file and start-mapred with a specified file on the command line. J-D On Wed, Apr 1, 2009 at 3:58 AM, Sandhya E sandhyabhas...@gmail.com wrote: Hi When the host is listed in

Warning when using 2.6.27 (Was: Datanode goes missing, results in Too many open files in DFSClient)

2009-03-11 Thread Jean-Daniel Cryans
I found the solution here : http://pero.blogs.aprilmayjune.org/2009/01/22/hadoop-and-linux-kernel-2627-epoll-limits/ J-D On Fri, Mar 6, 2009 at 6:08 PM, Jean-Daniel Cryans jdcry...@apache.org wrote: I know this one may be weird, but I'll give it a try. Thanks to anyone reading this through

Datanode goes missing, results in Too many open files in DFSClient

2009-03-06 Thread Jean-Daniel Cryans
I know this one may be weird, but I'll give it a try. Thanks to anyone reading this through. Setup : hadoop-0.19.0 with hbase-0.19.0 on 10 nodes, quads with 8GB RAM, 2 disks. nofile limit is set at 30 000, xceivers at 1023, dfs.datanode.socket.write.timeout at 0, dfs.datanode.handler.count at 9.

Re: Running RowCounter as Standalone

2009-02-12 Thread Jean-Daniel Cryans
Philipp, For HBase-related questions, please post to hbase-u...@hadoop.apache.org Try importing commons-cli-2.0-SNAPSHOT.jar as well as any other jar in the lib folder just to be sure you won't get any other missing class def error. J-D On Thu, Feb 12, 2009 at 6:32 PM, Philipp Dobrigkeit

Re: HDFS loosing blocks or connection error

2009-01-23 Thread Jean-Daniel Cryans
not using EBS, just HDFS between the machines. As for tasks, there are 4 mappers and 0 reducers. Richard J. Zak -Original Message- From: jdcry...@gmail.com [mailto:jdcry...@gmail.com] On Behalf Of Jean-Daniel Cryans Sent: Friday, January 23, 2009 13:24 To: core-user@hadoop.apache.org

Re: 5 node Hadoop Cluster!!! Some Doubts...

2008-12-15 Thread Jean-Daniel Cryans
Sid, For such a small cluster, just put the Jobtracker and Namenode on the same machine and the Tasktrackers and Datanodes in pairs on the other machines. I can't think of anything else that would have an impact on performance for you. J-D On Thu, Dec 11, 2008 at 6:20 PM, Siddharth Malhotra

Re: How to read mapreduce output in HDFS directory from Web Application

2008-11-02 Thread Jean-Daniel Cryans
Alex, It's a HBase design goal to be able to answer to live non-relational queries. True, up to 0.18, performance was not the priority but 0.19 will be MUCH faster. Also, more and more websites use HBase in a production environment, see http://wiki.apache.org/hadoop/Hbase/PoweredBy Regards the

Re: SecondaryNameNode on separate machine

2008-10-29 Thread Jean-Daniel Cryans
to do to eliminate NN SPOF? Thanks, Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jean-Daniel Cryans [EMAIL PROTECTED] To: core-user@hadoop.apache.org Sent: Tuesday, October 28, 2008 8:14:44 PM Subject: Re: SecondaryNameNode

Re: SecondaryNameNode on separate machine

2008-10-28 Thread Jean-Daniel Cryans
Tomislav. Contrary to popular belief the secondary namenode does not provide failover, it's only used to do what is described here : http://hadoop.apache.org/core/docs/r0.18.1/hdfs_user_guide.html#Secondary+NameNode So the term secondary does not mean a second one but is more like a second part

Re: Use of 'dfs.replication'

2008-10-12 Thread Jean-Daniel Cryans
Amit, dfs.replication defines how many time each block of data will be replicated. In your setup, if you're planning on keeping only one datanode, a value of 1 will reduce the overhead since keeping 2 or more copies of each block would be useless if you lose your node. More info on how

Re: How to GET row name/column name in HBase using JAVA API

2008-10-06 Thread Jean-Daniel Cryans
Please use the HBase mailing list for HBase-related questions: http://hadoop.apache.org/hbase/mailing_lists.html#Users Regards your question, have you looked at http://wiki.apache.org/hadoop/Hbase/HbaseRest ? J-D On Mon, Oct 6, 2008 at 12:05 AM, Trinh Tuan Cuong [EMAIL PROTECTED] wrote:

Re: Master Recommended Hardware

2008-09-05 Thread Jean-Daniel Cryans
Camilo, See http://wiki.apache.org/hadoop/NameNode And see the discussion NameNode Hardware specs started here: http://www.mail-archive.com/core-user@hadoop.apache.org/msg04109.html This should give you the basics. Regards, J-D On Fri, Sep 5, 2008 at 10:31 PM, Camilo Gonzalez [EMAIL

Re: Hadoop + Elastic Block Stores

2008-09-05 Thread Jean-Daniel Cryans
Ryan, I currently have a Hadoop/HBase setup that uses EBS. It works but using EBS implied an additional overhead of configuration (too bad you can't spawn instances with volumes already attached to it tho I'm sure that'll come). Shutting down instances and bringing others up also requires more

Re: How do specify certain IP to be used by datanode/namenode

2008-09-05 Thread Jean-Daniel Cryans
Kevin, Did you try changing the dfs.datanode.dns.interface/dfs.datanode.dns.nameserver/mapred.tasktracker.dns.interface/mapred.tasktracker.dns.nameserver parameters? J-D On Fri, Sep 5, 2008 at 8:14 PM, Kevin [EMAIL PROTECTED] wrote: Hi, The machines I am using each has multiple network

Re: help! how can i control special data to specific datanode?

2008-09-05 Thread Jean-Daniel Cryans
Hi, I suggest that you read how data is stored in HDFS, see http://hadoop.apache.org/core/docs/r0.18.0/hdfs_design.html J-D On Sat, Sep 6, 2008 at 12:11 AM, ZhiHong Fu [EMAIL PROTECTED] wrote: hello. I'm a new user to hadoop. and Now I hava a problem in understanding Hdfs. In such a scene.

Re: why it doesn`t run?

2008-05-19 Thread Jean-Daniel Cryans
Hi wangxiaowei, Just chmod the file to get execution rights. You should also use hadoop-0.16.4 because this will be your next problem. Finally, problems regarding HBase should be sent to it`s mailing list : http://hadoop.apache.org/hbase/mailing_lists.html Regards, Jean-Daniel 2008/5/19

Re: Does any one tried to build Hadoop..

2008-04-10 Thread Jean-Daniel Cryans
At the root of the source and it's called build.xml Jean-Daniel 2008/4/9, Khalil Honsali [EMAIL PROTECTED]: Mr. Jean-Daniel, where is the ant script please? On 10/04/2008, Jean-Daniel Cryans [EMAIL PROTECTED] wrote: The ANT script works well also. Jean-Daniel 2008/4/9, Khalil

Re: Does any one tried to build Hadoop..

2008-04-09 Thread Jean-Daniel Cryans
The ANT script works well also. Jean-Daniel 2008/4/9, Khalil Honsali [EMAIL PROTECTED]: Hi, With eclise it's easy, you just have to add it as a new project, make sure you add all libraries in folder lib and should compile fine There is also an eclipse plugin for running hadoop jobs directly

Re: specifying Hadoop disk space

2008-02-15 Thread Jean-Daniel Cryans
Hi, Have you read : http://wiki.apache.org/hadoop/QuickStart Stage 3, second dot? Regards, jdcryans 2008/2/15, Chandran, Sathish [EMAIL PROTECTED]: Hi all, Can you help me out the following? Normally Hadoop takes the free disk spaces available from the machine. But I want to

Re: Confusing connection issue with client

2008-02-03 Thread Jean-Daniel Cryans
Hi, I guess you're using the SVN version. Are you running your test on the master node or remotely? jdcryans 2008/2/3, Cass Costello [EMAIL PROTECTED]: Hey all, I'm just starting with both Hadoop and HBase. I've created a 3-node cluster - 1 master and 2 slaves. I've had some fun in the