unable to restart namenode on hadoop 1.0.4

2013-09-30 Thread Ravi Shetye
Can some one please help me about how I go ahead debugging the issue.The NN log has the following error stack 2013-09-30 07:28:42,768 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started 2013-09-30 07:28:42,967 INFO

File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread Wolfgang Wyremba
Hello, the file format topic is still confusing me and I would appreciate if you could share your thoughts and experience with me. From reading different books/articles/websites I understand that - Sequence files (used frequently but not only for binary data), - AVRO, - RC (was developed to work

cmsg cancel c73cc320-20bf-48b3-baa4-61597d7e5...@xplosion.de

2013-09-30 Thread Fabian Zimmermann
sorry, just trying to cancel my mail

NullPointerException when start datanode

2013-09-30 Thread lei liu
I use CDH-4.3.1, When I start datanode, there are below error: 2013-09-26 17:57:07,803 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at 0.0.0.0:40075 2013-09-26 17:57:07,814 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false 2013-09-26

Add machine with bigger storage to cluster

2013-09-30 Thread Amit Sela
I would like to add new machines to my existing cluster but they won't be similar to the current nodes. I have to scenarios I'm thinking of: 1. What are the implications (besides initial load balancing) of adding a new node to the cluster, if this node runs on a machine similar to all other nodes

Hadoop Fault Injection example

2013-09-30 Thread Felipe Gutierrez
Is there a build.xml available to use fault injection with Hadoop as this tutorial says? http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html#Aspect_Example I cannot find the jar file for org.apache.hadoop.fi.ProbabilityModel and

Re: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread Rahul Bhattacharjee
Sequence files are language neutral as Avro. Yes , but not sure about the support of other language lib for processing seq files. Thanks, Rahul On Mon, Sep 30, 2013 at 11:10 PM, Peyman Mohajerian mohaj...@gmail.comwrote: It is not recommended to keep the data at rest in sequences format,

Cluster config: Mapper:Reducer Task Capapcity

2013-09-30 Thread Himanshu Vijay
Hi, Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map Task Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a ratio of 2.7. We have a lot of variety of jobs running and we want to increase the throughput. My manual observation was that we hit the Mapper

Re: Cluster config: Mapper:Reducer Task Capapcity

2013-09-30 Thread Sandy Ryza
Hi Himanshu, Changing the ratio is definitely a reasonable thing to do. The capacities come from the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations. You can tweak these on your nodes to get your desired ratio. -Sandy On Mon, Sep

RE: All datanodes are bad IOException when trying to implement multithreading serialization

2013-09-30 Thread java8964 java8964
Not exactly know what you are trying to do, but it seems like the memory is your bottle neck, and you think you have enough CPU resource, so you want to use multi-thread to utilize CPU resources? You can start multi-threads in your mapper, as if you think your mapper logic is very cpu

RE: File formats in Hadoop: Sequence files vs AVRO vs RC vs ORC

2013-09-30 Thread java8964 java8964
I am also thinking about this for my current project, so here I share some of my thoughts, but maybe some of them are not correct. 1) In my previous projects years ago, we store a lot of data as plain text, as at that time, people thinks the Big data can store all the data, no need to worry

When to use DFSInputStream and HdfsDataInputStream

2013-09-30 Thread Rob Blah
Hi What is the use case difference between: - DFSInputStream and HdfsDataInputStream - DFSOutputStream and HdfsDataOutputStream When one should be preferred over other? From sources I see they have similar functionality, only HdfsData*Stream follows Data*Stream instead of *Stream. Also is

Question on BytesWritable

2013-09-30 Thread Chandra Mohan, Ananda Vel Murugan
Hi, I am using Hadoop 1.0.2. I have written a map reduce job. I have a requirement to process the whole file without splitting. So I have written a new input format to process the file as a whole by overriding the isSplittable() method. I have also created a new Record reader implementation to

RE: When to use DFSInputStream and HdfsDataInputStream

2013-09-30 Thread Uma Maheswara Rao G
Hi Rob, DFSInputStream: InterfaceAudience for this class is private and you should not use this class directly. This class mainly implements actual core functionality of read. And this is DFS specific implementation only. HdfsDataInputStream : InterfaceAudience for this class is public and