Re: Hadoop cluster hardware details for big data

2011-07-07 Thread Karthik Kumar
Hi,

Thanks a lot for your timely help. Your valuable answers helped us to
understand what kind of hardware to use when it comes to huge data.

With Regards,
Karthik

On 7/6/11, Steve Loughran ste...@apache.org wrote:
 On 06/07/11 13:18, Michel Segel wrote:
 Wasn't the answer 42?  ;-P


 42 = 40 + NN +2ary NN, assuming the JT runs on 2ary or on one of the
 worker nodes

 Looking at your calc...
 You forgot to factor in the number of slots per node.
 So the number is only a fraction. Assume 10 slots per node. (10 because it
 makes the math easier.)
   
 I thought something was wrong. Then I thought of the server revenue and
 decided not to look that hard.



-- 
With Regards,
Karthik


Hadoop cluster hardware details for big data

2011-07-06 Thread Karthik Kumar
Hi,

Has anyone here used hadoop to process more than 3TB of data? If so we
would like to know how many machines you used in your cluster and
about the hardware configuration. The objective is to know how to
handle huge data in Hadoop cluster.

-- 
With Regards,
Karthik


Re: Hadoop cluster hardware details for big data

2011-07-06 Thread Karthik Kumar
Hi,

I wanted to know the time required to process huge datasets and number
of machines used for them.

On 7/6/11, Harsh J ha...@cloudera.com wrote:
 Have you taken a look at http://wiki.apache.org/hadoop/PoweredBy? It
 contains information relevant to your question, if not a detailed
 answer.

 On Wed, Jul 6, 2011 at 4:13 PM, Karthik Kumar karthik84ku...@gmail.com
 wrote:
 Hi,

 Has anyone here used hadoop to process more than 3TB of data? If so we
 would like to know how many machines you used in your cluster and
 about the hardware configuration. The objective is to know how to
 handle huge data in Hadoop cluster.

 --
 With Regards,
 Karthik




 --
 Harsh J



-- 
With Regards,
Karthik


Log files expanding at an alarming rate

2011-05-23 Thread Karthik Kumar
Hi,

I am using a small cluster of 1 master node and 2 salves. The tasktracker
log in slaves are increasing in size approximately 1mb per second. This
results in disk space getting low over time. This happens even if no job is
running. Please suggest ways to slow down the size of the log file. Also I
would like to know why the log files are written even if there is no jobs
running.

-- 
With Regards,
Karthik


Re: Hadoop in Real time applications

2011-02-17 Thread Karthik Kumar
Hi,

Thanks for the clarification.

On Thu, Feb 17, 2011 at 2:09 PM, Niels Basjes ni...@basjes.nl wrote:

 2011/2/17 Karthik Kumar karthik84ku...@gmail.com:
  Can Hadoop be used for Real time Applications such as banking
 solutions...

 Hadoop consists of several components.
 Components like HDFS and HBase are quite suitable for interactive
 solutions (as in: I usually get an answer within 0.x seconds).
 If you really need realtime (as in: I want a guarantee that I have
 an answer within 0.x seconds) the answer is: No, HDFS/HBase cannot
 guarantee that.
 Other components like MapReduce (and Hive which run on top of
 MapReduce) are purely batch oriented.

 --
 Met vriendelijke groeten,

 Niels Basjes




-- 
With Regards,
Karthik


Hadoop in Real time applications

2011-02-16 Thread Karthik Kumar
Can Hadoop be used for Real time Applications such as banking solutions...

-- 
With Regards,
Karthik


Cannot copy files to HDFS

2011-01-26 Thread Karthik Kumar
Hi,

I am new to Hadoop. I am using Hadoop 0.20.2 version. I tried to copy a file
of size 300 MB from local to HDFS. It showed the error as below. Please help
me in solving this issue.

11/01/26 13:01:52 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: An existing connection was forcibly closed by the
remote host
at sun.nio.ch.SocketDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2314)

11/01/26 13:01:52 WARN hdfs.DFSClient: Error Recovery for block
blk_4184614741505116937_1012 bad datanode[0] 160.110.184.114:50010
11/01/26 13:01:52 WARN hdfs.DFSClient: Error Recovery for block
blk_4184614741505116937_1012 in pipeline 160.110.184.114:50010,
160.110.184.111:50010: bad datanode 160.110.184.114:50010
11/01/26 13:01:55 WARN hdfs.DFSClient: Error Recovery for block
blk_4184614741505116937_1012 failed  because recovery from primary datanode
160.110.184.111:50010 failed 1 times.  Pipeline was 160.110.184.114:50010,
160.110.184.111:50010. Will retry...
11/01/26 13:01:55 WARN hdfs.DFSClient: Error Recovery for block
blk_4184614741505116937_1012 bad datanode[0] 160.110.184.114:50010
11/01/26 13:01:55 WARN hdfs.DFSClient: Error Recovery for block
blk_4184614741505116937_1012 in pipeline 160.110.184.114:50010,
160.110.184.111:50010: bad datanode 160.110.184.114:50010
11/01/26 13:02:28 WARN hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: An existing connection was forcibly closed by the
remote host
at sun.nio.ch.SocketDispatcher.write0(Native Method)
at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:33)
at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:104)
at sun.nio.ch.IOUtil.write(IOUtil.java:75)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:334)
at
org.apache.hadoop.net.SocketOutputStream$Writer.performIO(SocketOutputStream.java:55)
at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:146)
at
org.apache.hadoop.net.SocketOutputStream.write(SocketOutputStream.java:107)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:105)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2314)

11/01/26 13:02:28 WARN hdfs.DFSClient: Error Recovery for block
blk_4184614741505116937_1013 bad datanode[0] 160.110.184.111:50010
copyFromLocal: All datanodes 160.110.184.111:50010 are bad. Aborting...
11/01/26 13:02:28 ERROR hdfs.DFSClient: Exception closing file
/hdfs/data/input/cdr10M.csv : java.io.IOException: All datanodes
160.110.184.111:50010 are bad. Aborting...
java.io.IOException: All datanodes 160.110.184.111:50010 are bad.
Aborting...
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:2556)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClient.java:2102)
at
org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2265)


-- 
With Regards,
Karthik


Re: Task tracker and Data node not stopping

2010-07-19 Thread Karthik Kumar
Hi Ken,

Thank you for your quick reply. I dont know how to find the process
which is overwriting those files. Anyhow i re-installed Cygwin from the
scratch and the problem is solved.

On Thu, Jul 15, 2010 at 9:49 PM, Ken Goodhope kengoodh...@gmail.com wrote:

 Inside hadoop-env.sh, you will see a property that sets the directory for
 pids to be written too.  Check which directory it is and then investigate
 the possibility that some other process is deleting, or overwriting those
 files.  If you are using NFS, with all nodes pointing at the same
 directory,
 then it might be a matter of each node overwriting the same file.

 Either way, the stop scripts look for those pid files, and used them to
 stop
 the correct daemon.  If they are not found, or if the file contains the
 wrong pid, the script will echo no process to stop.

 On Thu, Jul 15, 2010 at 4:51 AM, Karthik Kumar karthik84ku...@gmail.com
 wrote:

  Hi,
 
   I am using a cluster of two machines one master and one slave. When
 i
  try to stop the cluster using stop-all.sh it is displaying as below. the
  task tracker and datanode are also not stopped in the slave. Please help
 me
  in solving this.
 
  stopping jobtracker
  160.110.150.29: no tasktracker to stop
  stopping namenode
  160.110.150.29: no datanode to stop
  localhost: stopping secondarynamenode
 
 
  --
  With Regards,
  Karthik
 




-- 
With Regards,
Karthik


Task tracker and Data node not stopping

2010-07-15 Thread Karthik Kumar
Hi,

  I am using a cluster of two machines one master and one slave. When i
try to stop the cluster using stop-all.sh it is displaying as below. the
task tracker and datanode are also not stopped in the slave. Please help me
in solving this.

stopping jobtracker
160.110.150.29: no tasktracker to stop
stopping namenode
160.110.150.29: no datanode to stop
localhost: stopping secondarynamenode


-- 
With Regards,
Karthik