Can some one please help me about how I go ahead debugging the issue.The NN
log has the following error stack
2013-09-30 07:28:42,768 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system
started
2013-09-30 07:28:42,967 INFO
Hello,
the file format topic is still confusing me and I would appreciate if you
could share your thoughts and experience with me.
From reading different books/articles/websites I understand that
- Sequence files (used frequently but not only for binary data),
- AVRO,
- RC (was developed to work
sorry, just trying to cancel my mail
I use CDH-4.3.1, When I start datanode, there are below error:
2013-09-26 17:57:07,803 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Opened info server at
0.0.0.0:40075
2013-09-26 17:57:07,814 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: dfs.webhdfs.enabled = false
2013-09-26
I would like to add new machines to my existing cluster but they won't be
similar to the current nodes. I have to scenarios I'm thinking of:
1. What are the implications (besides initial load balancing) of adding a
new node to the cluster, if this node runs on a machine similar to all
other nodes
Is there a build.xml available to use fault injection with Hadoop as this
tutorial says?
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/FaultInjectFramework.html#Aspect_Example
I cannot find the jar file for org.apache.hadoop.fi.ProbabilityModel and
Sequence files are language neutral as Avro. Yes , but not sure about the
support of other language lib for processing seq files.
Thanks,
Rahul
On Mon, Sep 30, 2013 at 11:10 PM, Peyman Mohajerian mohaj...@gmail.comwrote:
It is not recommended to keep the data at rest in sequences format,
Hi,
Our Hadoop cluster is running 0.20.203. The cluster currently has 'Map Task
Capacity' of 8900+ 'Reduce Task Capacity' of 3300+ resulting in a ratio of
2.7. We have a lot of variety of jobs running and we want to increase the
throughput.
My manual observation was that we hit the Mapper
Hi Himanshu,
Changing the ratio is definitely a reasonable thing to do. The capacities
come from the mapred.tasktracker.map.tasks.maximum
and mapred.tasktracker.reduce.tasks.maximum tasktracker configurations.
You can tweak these on your nodes to get your desired ratio.
-Sandy
On Mon, Sep
Not exactly know what you are trying to do, but it seems like the memory is
your bottle neck, and you think you have enough CPU resource, so you want to
use multi-thread to utilize CPU resources?
You can start multi-threads in your mapper, as if you think your mapper logic
is very cpu
I am also thinking about this for my current project, so here I share some of
my thoughts, but maybe some of them are not correct.
1) In my previous projects years ago, we store a lot of data as plain text, as
at that time, people thinks the Big data can store all the data, no need to
worry
Hi
What is the use case difference between:
- DFSInputStream and HdfsDataInputStream
- DFSOutputStream and HdfsDataOutputStream
When one should be preferred over other? From sources I see they have
similar functionality, only HdfsData*Stream follows Data*Stream instead
of *Stream. Also is
Hi,
I am using Hadoop 1.0.2. I have written a map reduce job. I have a requirement
to process the whole file without splitting. So I have written a new input
format to process the file as a whole by overriding the isSplittable() method.
I have also created a new Record reader implementation to
Hi Rob,
DFSInputStream: InterfaceAudience for this class is private and you should
not use this class directly. This class mainly implements actual core
functionality of read. And this is DFS specific implementation only.
HdfsDataInputStream : InterfaceAudience for this class is public and
14 matches
Mail list logo