Ben,
thats defined in ReplicationTargetChooser, first local, 2nd same rack, random.
You're right - 50/50 if case one and two does not match.
- Alex
--
Alexander Lorenz
http://mapredit.blogspot.com
On Jan 6, 2012, at 11:56 AM, Ben Clay wrote:
> Alex-
>
> Understood. We do not have a situation
I would use a MapReduce job to merge them.
-Joey
On Fri, Jan 6, 2012 at 11:55 AM, Frank Grimes wrote:
> Hi Joey,
>
> That's a very good suggestion and might suit us just fine.
>
> However, many of the files will be much smaller than the HDFS block size.
> That could affect the performance of the
I was exploring .har based hadop archive files for a similar small log file
scenario I have. I have millions of log files which are less than 64MB each
and I want to put them into HDFS and run analysis. Still exploring if HDFS
is a good options. Traditionally what I have learnt is that HDFS isn't g
Frank,
We have a very serious small file problem. I created a M/R job that combines
files as it seemed best to use all the resources of the cluster rather than
opening a stream and combining files single threaded or trying to do something
via command line.
Dave
-Original Message-
Fr
Hi Joey,
That's a very good suggestion and might suit us just fine.
However, many of the files will be much smaller than the HDFS block size.
That could affect the performance of the MapReduce jobs, correct?
Also, from my understanding it would put more burden on the name node (memory
usage) tha
Alex-
Understood. We do not have a situation that extreme, I was just looking for
conceptual verification that reads are balanced across replicas of equal
distance. From the PDF you linked:
"For reading, the name node first checks if the client's computer is located
in the cluster. If yes, block
I would do it by staging the machine data into a temporary directory
and then renaming the directory when it's been verified. So, data
would be written into directories like this:
2012-01/02/00/stage/machine1.log.avro
2012-01/02/00/stage/machine2.log.avro
2012-01/02/00/stage/machine3.log.avro
Aft
Ben,
the scenario should not happen, if one DN has 20 clients and the other zero
(same block) the cluster (or DN) has another problem. Rack Awareness is
described here:
https://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf
- Alex
--
Alexander Lorenz
http://mapr
Hi Bobby,
Actually, the problem we're trying to solve is one of completeness.
Say we have 3 machines generating log events and putting them to HDFS on an
hourly basis.
e.g.
2012-01/01/00/machine1.log.avro
2012-01/01/00/machine2.log.avro
2012-01/01/00/machine3.log.avro
Sometime after the hour,
Stuti, define in CLASSPATH="…." only the jars you really need for. An export of
all jars in a given directory is a red flag (done with *.jar).
- Alex
On Jan 6, 2012, at 7:23 AM, M. C. Srivas wrote:
>
> unique: 1, error: 0 (Success), outsize: 40
> unique: 2, opcode: GETATTR (3), nodeid: 1, i
Frank,
That depends on what you mean by combining. It sounds like you are trying to
aggregate data from several days, which may involve doing a join so I would say
a MapReduce job is your best bet. If you are not going to do any processing at
all then why are you trying to combine them? Is th
Hi All,
I was wondering if there was an easy way to combing multiple .avro files
efficiently.
e.g. combining multiple hours of logs into a daily aggregate
Note that our Avro schema might evolve to have new (nullable) fields added but
no fields will be removed.
I'd like to avoid needing to pull
> unique: 1, error: 0 (Success), outsize: 40
> unique: 2, opcode: GETATTR (3), nodeid: 1, insize: 56 Error occurred
> during initialization of VM
> java/lang/NoClassDefFoundError: java/lang/Object
>
> Exported Environment Variable:
>
>
> CLASSPATH="/root/FreshMount/hadoop-0.20.2/lib/*.jar:/root/F
Hi,
I am relatively new to Hadoop and I am trying to utilize HDFS for own
application where I want to take advantage of data partitioning HDFS
performs.
The idea is that I get list of individual blocks - BlockLocations of
particular file and then directly read those (go to individual DataNodes).
Hi Guys,
Badly stuck up with the fuse-dfs since last 3 days. Following are the errors I
am facing :
[root@slave fuse-dfs]# ./fuse_dfs dfs://slave:54310 /root/FreshMount/mnt1/ -d
port=54310,server=slave fuse-dfs didn't recognize /root/FreshMount/mnt1/,-2
fuse-dfs ignoring option -d
unique: 1, o
15 matches
Mail list logo