Hi,
Is there a tutorial that show how to configure MapReduce to use MapReduce unit
tests?
Thanks,
t; Hi Pedro,
>
> You can find it here
> http://wiki.apache.org/hadoop/HadoopStreaming
>
> Thanks
>
> On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa wrote:
>> Hi,
>>
>> Hadoop mapreduce can be used for streaming. But what is streaming from the
>> point of view
Hi,
Hadoop mapreduce can be used for streaming. But what is streaming from the
point of view of mapreduce? For me, streaming are video and audio data.
Why mapreduce supports streaming?
Can anyone give me an example on why to use streaming in mapreduce?
Thanks,
Pedro
I'm trying to open a local file with the FileSystem class.
FileSystem rfs = FileSystem.get(conf);
FSDataInputStream i = srcFs.open(p);
but I get file not found. The path is correct, but I think that my class is
accessing hdfs, instead of my local filesystem. Can I use the FileSystem to
access l
Does anyone have answered this question? Because I can't find it.
-- Forwarded message --
From: Pedro Costa
Date: 30 March 2012 18:19
Subject: Read key and values from HDFS
To: mapreduce-user
The ReduceTask can save the file using several output f
Hi,
I'm trying to debug map and reduce tasks for a quite long time, and it
seems that it's impossible. MR are launched in new process and there's no
way to debug them. Even with IsolationRunner class it's impossible. This
isn't good because I really need to debug the class, to understand some
chan
the first job, and not all jobs.
On 28 February 2012 15:40, Jie Li wrote:
> Try "hadoop job -list" :)
>
> Jie
>
>
> On Tue, Feb 28, 2012 at 8:37 AM, Pedro Costa wrote:
>
>> Hi,
>>
>> In MapReduce the command bin/hadoop -job history only li
Hi,
How can I know in which datanodes a file is replicated in HDFS? Is there a
command for that?
--
Thanks,
Hi,
In the hadoop MapReduce, I've executed the webdatascan example, and the
reduce output is in a SequeceFile. The result is shows here (
http://paste.lisp.org/display/126572). What's the trash (random
characters), like "u 265
100 330 320 252 " \n # ; 374 5 211 V ' 340 376" in the output? Is t
Hi,
I want to read a file that has 100MB of size and it is in the HDFS. How
should I do it? Is it with IOUtils.readFully?
Can anyone give me an example?
--
Thanks,
--
Thanks,
Hi,
I'm running an MR application that produces an output that is saved in
HDFS. My application has 5 slave nodes (so it has also 5 data nodes).
The hdfs file replication factor is 1. I want from my application, or
from the Hadoop MR source code tell which data node my result should
be. For exampl
, 2011 at 9:05 AM, Laurent Hatier wrote:
> Oh i don't see that it was in the HDFS. Aaron has answered i think
>
> 2011/6/9 Laurent Hatier
>>
>> Have you try to restart your hadoop node ? (or all hadoop node). When you
>> go to restart, the namenode go to format th
Hi,
After I run the command "bin/hadoop job -history /temp/history/", I've
got these 2 task summary. In one of them, it run a cleanup task and in
the other hasn't run the cleanup task. This means that a cleanup task
doesn't run always. So, when a cleanup task should run?
Task Summary
Hi,
I'm using the command "bin/hadoop job -history /temp/history/" to list
the details of the job. I've run several examples before running this
command. Now, when I run this command I only get information of the
first job and the information about other jobs isn't displayed. How
can I show inform
What I meant in this question is put the processed result of the
reduce task in something like /dev/null. How can I do that?
On Thu, Jun 2, 2011 at 11:07 AM, Pedro Costa wrote:
> Hi,
>
> I'm running an mapreduce example, and I want to run all the map reduce
> phases but I don&
Hi,
I'm running an mapreduce example, and I want to run all the map reduce
phases but I don't want to save in the disk the result of the reduce
tasks. Is there a way to tell hadoop not to save the output of the
reduce tasks?
Thanks,
I found the solution. The problem was that I've misspelled the
parameter "mapred.tasktracker.map.tasks.maximum".
On Tue, May 24, 2011 at 11:06 AM, Pedro Costa wrote:
> I think it's important to say that it exists 2 cpus per node and 12
> core(s) per cpu.
>
>
I think it's important to say that it exists 2 cpus per node and 12
core(s) per cpu.
On Tue, May 24, 2011 at 11:02 AM, Pedro Costa wrote:
> And all the nodes have the same configuration. A job has 5000 map tasks.
>
> On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote:
>
And all the nodes have the same configuration. A job has 5000 map tasks.
On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote:
> The values are:
> #map tasks: 8
> #reduce tasks: 10
> Map task capacity:10
> Reduce task capacity:10
>
>
> On Tue, May 24, 2011 at 8:01 AM, Har
>
> On Mon, May 23, 2011 at 10:28 PM, Pedro Costa wrote:
>> I think I've to rephrase the question.
>>
>> I set the "mapred.tasktracker.map.tasks.maximum" to 8, hoping that it
>> will run 8*10 map tasks in the whole cluster. But, it only run 8 tasks
>&g
I think I've to rephrase the question.
I set the "mapred.tasktracker.map.tasks.maximum" to 8, hoping that it
will run 8*10 map tasks in the whole cluster. But, it only run 8 tasks
simultaneously. Why this is happens?
On Mon, May 23, 2011 at 5:45 PM, Pedro Costa wrote:
> H
Hi,
I'm running hadoop map-reduce in a cluster with 10 machines. I would
like to set in the configuration that each tasktracker can run 8 map
tasks simultaneously and 4 reduce tasks simultaneously. Which
parameters should I configure?
Thanks,
PSC
Hi,
I would like how much time it took for a task to do the shuffle and
sort phase. How can I get that?
Thanks
Hi,
1 - I'm looking for the map and reduce functions of the several
examples of the Gridmix2 platform (Webdatasort, Webdatascan,
Monsterquery, javasort and combiner) and I can't find it? Where can I
find these functions?
2 - Does anyone know what the Webdatasort, Webdatascan, Monsterquery,
javas
Hi,
the Hadoop MapReduce counters has the parameters FILE_BYTES_READ,
FILE_BYTES_WRITTEN, HDFS_BYTES_READ and HDFS_BYTES_WRITTEN.
What the FILE and HDFS counters represents?
Thanks,
--
Pedro
Where are the map and the reduce functions of all examples of GridMix2?
--
Pedro
Hi,
the gridmix2 contains several tests like, Combiner, StreamingSort,
Webdatasort, webdatascan and monsterquery.
I would like to know what does this examples do? Which example uses
more CPU for the map and reduce calculation, and which of them
exchanges more messages between map and reduces?
Wh
Hi,
1 - I would like to run Hadoop MR in my laptop, but with the cluster
mode configuration. I've put a slave file with the following content:
[code]
127.0.0.1
localhost
192.168.10.1
mylaptop
[/code]
the 192.168.10.1 is the IP of the machine and the "mylaptop" is the
logical name of the address.
What I mean is, the HDFS protocol uses HTTP protocol or is built over RPC?
On Sat, Apr 9, 2011 at 5:57 PM, Pedro Costa wrote:
> Hi,
>
> I was wondering what's the communication protocol between MapReduce
> and the HDFS. The MapReduce fetch and saves data blocks from HDFS by
Hi,
I was wondering what's the communication protocol between MapReduce
and the HDFS. The MapReduce fetch and saves data blocks from HDFS by
HTTP or by RPC?
Thanks,
--
PSC
Hi,
Where can I get the shuffle and sort time of a reduce task?
--
Pedro
I don't know if this is what I want. I want to set the number of slots
that are available for the map and the reduce tasks to run. I don't
want to define the number of tasks.
On Fri, Mar 25, 2011 at 6:44 PM, David Rosenstrauch wrote:
> On 03/25/2011 02:26 PM, Pedro Costa wr
Hi,
is it possible to configure the total number of slots that a
TaskTracker has, to run the map and reduce tasks?
Thanks,
--
Pedro
Hi,
during the setup phase and the cleanup phase of the tasks, the Hadoop
MR uses map tasks to do it. These tasks appears in the counters shown
at the end of an example?
For example, the counter below shows that my example ran 9 map tasks
and 2 reduce tasks, but the Launched map task has the value
Hi,
in this MR example, it exists the field "Reduce input groups" and
"Reduce input records". What's the difference between these 2 fields?
$ hadoop jar cloud9.jar edu.umd.cloud9.example.simple.DemoWordCount
data/bible+shakes.nopunc wc 1
10/07/11 22:25:42 INFO simple.DemoWordCount: Tool: DemoWor
1 - So what's the reason to exist 2 group of phases?
2 - A JobTracker and a TaskTracker also has phases?
On Thu, Mar 24, 2011 at 6:42 PM, Arun C Murthy wrote:
>
> On Mar 24, 2011, at 11:37 AM, Pedro Costa wrote:
>
>> Hi,
>>
>> 1 - A Task is composed by several
Hi,
1 - A Task is composed by several phases:
STARTING, MAP/REDUCE, SHUFFLE, SORT, CLEANUP.
A JobTracker and a TaskTracker also has phases?
2 - It exists also the following phases
RUNNING,
SUCCEEDED,
FAILED,
UNASSIGNED,
KILLED,
[/code]
On Wed, Mar 23, 2011 at 12:03 PM, Pedro Costa wrote:
> Hi,
>
> when I'm running the Gridmix2 examples, during the execution the tests
> halt and the following error is displayed:
>
> [code]
> 11/03/23 12:52:06 WARN mapred.JobClient:544 Use GenericOptio
Hi,
when I'm running the Gridmix2 examples, during the execution the tests
halt and the following error is displayed:
[code]
11/03/23 12:52:06 WARN mapred.JobClient:544 Use GenericOptionsParser
for parsing the arguments. Applications should implement Tool for the
same.
11/03/23 12:52:06 DEBUG map
of size, but in reality, it contains 2 HDFS blocks. Is this
right?
On Fri, Mar 18, 2011 at 8:12 PM, Marcos Ortiz wrote:
> El 3/18/2011 3:54 PM, Pedro Costa escribió:
>>
>> Hi
>>
>> What's the purpose of the parameter "mapred.min.split.size"?
>>
Hi
What's the purpose of the parameter "mapred.min.split.size"?
Thanks,
--
Pedro
Hi,
I don't know what the examples of the Gridmix do. Where can I find an
explanation of that?
Thank
--
Pedro
ri, Mar 18, 2011 at 5:04 PM, Pedro Costa wrote:
> Hi,
>
> I would like define the number of map tasks to use in the GridMix2.
>
> For example, I would like to run the GridMixMonsterQuery at GridMix2
> with 5 maps, another with 10 and another with 20 maps.
>
> How can I d
Hi,
I would like define the number of map tasks to use in the GridMix2.
For example, I would like to run the GridMixMonsterQuery at GridMix2
with 5 maps, another with 10 and another with 20 maps.
How can I do that?
Thanks,
--
Pedro
Hi,
I'm running hadoop map-reduce in clustering, and I've a Reduce Task
that it remains in the state COMMIT_PENDING, and it doesn't finish.
This is happening because I've made some changes to the Hadoop MR. I'm
trying to solve my problem, but I don't understand what's happens
after the COMMIT_PEND
remove file
hadoop dfs -rmr
list file
hadoop dfs -ls
On Mon, Feb 28, 2011 at 10:13 AM, Eric wrote:
> Note that Hadoop will put your files in a trash bin. You can use the
> -skipTrash option to really delete the data and free up space. See the
> command "hadoop dfs" for more details.
>
> 2011
Does a map task finish after generating the map intermediate file?
Thanks,
--
Pedro
Hi,
I'm trying to run Gridmix2 tests (rungridmix_2), but all the tests
remain in the waiting state and none of them run. I don't see any
exception in the logs. Why this happens?
Thanks,
--
Pedro
Hi,
I like to know, depending on my problem, when should I use or not use
Hadoop MapReduce? Does exist any list that advices me to use or not to
use MapReduce?
Thanks,
--
Pedro
Hi,
1 - I'm trying to read parts of a compressed file to generate message
digests, but I can't fetch the right parts. I searched for an example
that read compressed files, but I can't find one.
As I've 3 partition in my example, below are the indexes of the file:
raw bytes: 54632 / offset: 0 / par
Hi,
When the compression in on, the compressed map intermediate files are
transfered to the reduce side as compressed data?
Thanks,
--
Pedro
#x27;s the merged files that
are compressed?
Thanks,
On Tue, Feb 15, 2011 at 10:35 AM, Pedro Costa wrote:
> Hi,
>
> I run two examples of a MR execution with the same input files and
> with 3 Reduce tasks defined. One example has the map-intermediate
> files compressed, a
Hi,
I run two examples of a MR execution with the same input files and
with 3 Reduce tasks defined. One example has the map-intermediate
files compressed, and the other examples has uncompressed data. Below,
I've put some debug lines that I put in the code.
1 - On the uncompressed data, the raw l
And when the data of the map-intermediate files is compressed, it's
still an IFile?
On Mon, Feb 14, 2011 at 4:44 PM, Harsh J wrote:
> Hello,
>
> On Mon, Feb 14, 2011 at 8:51 PM, Pedro Costa wrote:
>> Hi,
>>
>> 1 - The map output files are always of the type Seq
Hi,
1 - The map output files are always of the type SequenceFileFormat?
2 - The means that it contains a header with the following files?
# version - A byte array: 3 bytes of magic header 'SEQ', followed by 1
byte of actual version no. (e.g. SEQ4 or SEQ6)
# keyClassName - String
# valueClassName
Hi,
1 - How do I get the name of the map tasks the ran in the command line?
2 - How do I get the start time and the end time of a map task in the
command line?
--
Pedro
Hi,
I would like to get the duration of each Map and Reduce took to run by
command line. how is this possible?
Thanks,
--
Pedro
Hi,
I'm running GridMix2 examples and I would like to retrieve all the
results produced by the tests and save the files locally, to read the
later and offline. Does exists any command for that?
Thanks
--
Pedro
Hi,
1 - In the GridMix2, is it possible to configure the number of maps to
execute? For example, the small set of GridMix examples, always
execute 3 maps. The medium set, uses 30 Maps. Is it possible to
configure the number of maps?
2 - In GridMix2 the small sets use 3 maps, but the directory of
Hi,
1 - When a Map task is taking too long to finish its process, the JT
launches another Map task to process. This means that the task that
was replaced is killed?
2 - Does Hadoop MR allows that the same input split be processed by 2
different mappers at the same time?
Thanks,
--
Pedro
Hi,
The map tasks saves as output the map output and a index file. What's
the purpose of an index file?
Thanks,
--
Pedro
Hi,
In hadoop MR exists the property "mapred.system.dir" to set a relative
directory where shared files are stored during a job run. What are
these shared files?
--
Pedro
Hi,
When hadoop is running in cluster, the output of the Reducers are
saved in HDFS. The MapReduce have also location awareness on where is
saved the data?
For example, we've TT1 running in Machine1, and TT2 running in
Machine2. The replication of HDFS is 3. The Reduce Task RT1 is running
in TT1.
Hi,
The map output are always saved in the local file system, or they can
be saved in HDFS?
--
Pedro
Hi,
I'm running the wordcount example, but I would like compress the map output.
I set the following properties in the mapred-site.xml
[code]
mapred.compress.map.output
true
mapred.map.output.compression.codec
gzip
[/code]
but I still got the error:
java.la
Hi,
I really like to know how a map output packet is formed on the local
disk at the Reduce and the Map side. I'm example, at the Reduce side,
the map output is copied to local disk also.
The problem is that I'm doing a message digest on the map output on
the map side and on the reduce side, and I
2011 at 10:30 AM, Pedro Costa wrote:
> Hi,
>
> I'm trying to read the map output on the reduce side LocalFileSytem
> class. The map output is on the local file system. The problem with
> that, it's because it throws a ChecksumException.
>
> I know that checksums are
Hi,
I'm trying to read the map output on the reduce side LocalFileSytem
class. The map output is on the local file system. The problem with
that, it's because it throws a ChecksumException.
I know that checksums are verified when the file is read, and since,
it throws a ChecksumException, somethi
Hi,
1 - In the example that I'm executing, during the phase of copying the
map output from Map task to Reduce task, the RT saved the map output
into disk, because it's to big.
I've go the file path to the saved map output, and now I would like to
retrieve the following information:
- checksum
-
When the RT fetch a map output, the RT can save the map output in
memory or in disk.
1 - The map output packet is different when it's saved to memory, or
to disk? Does the header or the body of the saved map output packet
contains different fields?
2 - What are the fields of the packet?
Thanks,
of 14 bytes?
-- Forwarded message --
From: Pedro Costa
Date: Tue, Feb 1, 2011 at 11:43 AM
Subject: raw length vs part length
To: mapreduce-user@hadoop.apache.org
Hi,
Hadoop uses the compressed length and the raw length.
1 - In my example, the RT is fetching a map output that
Hi,
Hadoop uses the compressed length and the raw length.
1 - In my example, the RT is fetching a map output that shows that it
has the raw length of 14 bytes and the partLength of 10 bytes. The map
output doesn't use any compression.
When I'm dealing with uncompressed data, the raw length should
Hi,
When the reduce fetch from the mappers a map output of the size of 1GB
and do the merge, is it possible that part of the map output is saved
in disk and other part in memory?
Or a map output must be saved all in disk, or all in memory?
Thanks,
--
Pedro
I said file status, but what I would like to know is the size of the file.
On Mon, Jan 31, 2011 at 5:56 PM, Pedro Costa wrote:
> Hi,
>
> On the reduce side, after the RT had passed the merge phase (before
> the reduce phase starts), I've got the path of the map_0.out file. I
Hi,
On the reduce side, after the RT had passed the merge phase (before
the reduce phase starts), I've got the path of the map_0.out file. I'm
opening this file with
[code]
FSDataInputStream in = fs.open(file);
[/code]
But, I only got the path. Is it possible to obtain the file status of
this fi
(of one/more intermediate
> files, a.k.a. IFiles).
>
> On Fri, Jan 28, 2011 at 11:21 PM, Pedro Costa wrote:
>> Hi,
>>
>> I'm running the Terasort problem in cluster mode, and I've got a
>> RunTimeException in a Reduce Task.
>>
>> java.lang.Ru
Hi,
I'm running the Terasort problem in cluster mode, and I've got a
RunTimeException in a Reduce Task.
java.lang.RuntimeException: problem advancing post rec#499959
(Please, see attachment)
What this error means? Is it a problem about wrong KEYOUT and VALUEOUT
in the Reduce Task?
Thanks,
--
ugh,
> based on this section of the exception's call stack:
>
> at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
>
>
fine, but mapreduce.Mapper.map has this signature:
>
> map(K key, V value, Context)
>
> Your PiEstimator map signature doesn't match, so it's not overriding
> the proper function and is never getting called by the framework.
>
> Could you paste your complete
the code that
could give this error.
On Thu, Jan 27, 2011 at 4:29 PM, Pedro Costa wrote:
> Yes, that's the one that's being used ( o.a.h.mapreduce.Mapper ). This
> is not the right one to use?
>
>
>
> On Thu, Jan 27, 2011 at 3:40 PM, Chase Bradford
> wrote:
>
Map class in the job setup? It sounds a
> bit like the base o.a.h.mapreduce.Mapper map implementation is being used
> instead.
>
>
> On Jan 27, 2011, at 2:36 AM, Pedro Costa wrote:
>
>> The map output class are well defined:
>> keyClass: class org
BooleanWritable(true), new LongWritable(numInside));
out.collect(new BooleanWritable(false), new LongWritable(numOutside));
}
[/code]
I'm really confused, right now. How can this be happening?
On Thu, Jan 27, 2011 at 10:19 AM, Pedro Costa wrote:
> Thanks Nicholas, but it didn't work
; the map-reduce tutorial at
> http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html
>
> On Jan 26, 2011, at 10:27 AM, Pedro Costa wrote:
>
>> Hadoop 20.1
>>
>> On Wed, Jan 26, 2011 at 6:26 PM, Tsz Wo (Nicholas), Sze
>> wrote:
>>> Hi Srihari,
&g
t;
> I got a similar error before in one of my projects. I had to set the values
> for "mapred.output.key.class" and "mapred.output.value.class".
> That resolved the issue for me.
> Srihari
> On Jan 26, 2011, at 10:09 AM, Pedro Costa wrote:
>
> Yes, I can repro
you able to reproduce it
> deterministically?
> Nicholas
>
> ____
> From: Pedro Costa
> To: mapreduce-user@hadoop.apache.org
> Sent: Wed, January 26, 2011 5:47:01 AM
> Subject: PiEstimator error - Type mismatch in key from map
>
> Hi,
>
> I ru
Hi,
I run the PI example of hadoop, and I've got the following error:
[code]
java.io.IOException: Type mismatch in key from map: expected
org.apache.hadoop.io.BooleanWritable, recieved
org.apache.hadoop.io.LongWritable
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.j
, a logical MapReduce InputSplit is very
> different from a physical HDFS Block.
>
> On Fri, Jan 14, 2011 at 5:10 PM, Pedro Costa wrote:
>> I think that the answer is, each location of the split file
>> corresponds to a replica.
>>
>> On Fri, Jan 14, 2011 at
I think that the answer is, each location of the split file
corresponds to a replica.
On Fri, Jan 14, 2011 at 11:09 AM, Pedro Costa wrote:
> Hi,
>
> If a split location contains more that one location, it means that
> this split file is replicated through all locations, or it me
Hi,
If a split location contains more that one location, it means that
this split file is replicated through all locations, or it means that
a split is divided into several blocks, and each block is in one
location?
Thanks,
--
Pedro
Hi,
I would like to understand how the JT choose a TT to launch a map task
in a clustering setup. So I pose 4 different questions, to understand
how the choice of TT is made by the JT. All these questions are posed
in a scenario where the Hadoop MR is installed in clustering mode.
1 - For exampl
Hi,
I've hadoop installed in a cluster and I would like that JT could
guess in the network topology what are the input files in HDFS that
are closer to him, and further.
So, how can a JT know if an input file is located on local-level, on
rack-level, or on the other level?
Thanks,
--
Pedro
Hi,
I would like to run a sequence of jobs, where the output of a job is
the input for the next one. Does the Hadoop Pig helps to do it?
Thanks,
--
Pedro
The reduce task contains in the method fetchOutputs, 2 variable:
numInFlight
numCopied
What are the purpose of these variables?
Thanks,
--
Pedro
Hi,
I'm trying to understand how can I execute the unit tests of MapReduce
in eclipse IDE.
I'm trying to execute the unit test class TestMapReduceLocal, but I
get errors that I pasted below.
How can I run the MR unit tests in eclipse IDE?
[code]
2010-12-30 17:39:19,703 WARN conf.Configuration
(C
Hi,
1 - When Hadoop MR is running in cluster mode, is it possible to have
several TT, a TT per machine, running simultaneously and the JT is
communicating to every TT?
For example, running hadoop MR in 5 machines, each machine (slave
node) has a TT communicating with the JT in the master node?
Th
rec.startOffset = segmentStart;
> rec.rawLength = writer.getRawLength();
> rec.partLength = writer.getCompressedLength();
>
> -Ravi
>
> On 12/23/10 3:52 AM, "Pedro Costa" wrote:
>
> A index record contains 3 variables:
> startO
-Ravi
>
> On 12/23/10 3:24 AM, "Pedro Costa" wrote:
>
> So, I conclude that a partition is defined by the offset.
> But, for example, a Map Tasks produces 5 partitions. How the reduce
> knows that it must fetch the 5 partitions? Where's this information?
> This informa
le where the input data for the
> particular reducer starts and length is the size of the data starting from
> the offset.
>
> -Ravi
>
>
> On 12/23/10 2:17 AM, "Pedro Costa" wrote:
>
> Hi,
>
> 1 - I would like to understand how a partition works in the
Hi,
1 - I would like to understand how a partition works in the Map
Reduce. I know that the Map Reduce contains the IndexRecord class that
indicates the length of something. Is it the length of a partition or
of a spill?
2 - In large map output, a partition can be a set of spills, or a
spill is s
1 - A reduce task should start only when a map task ends ?
--
Pedro
1 - 100 of 130 matches
Mail list logo