Tutorial for MRUnit tests

2012-12-14 Thread Pedro Costa
Hi, Is there a tutorial that show how to configure MapReduce to use MapReduce unit tests? Thanks,

Re: Streaming in mapreduce

2012-06-16 Thread Pedro Costa
t; Hi Pedro, > > You can find it here > http://wiki.apache.org/hadoop/HadoopStreaming > > Thanks > > On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa wrote: >> Hi, >> >> Hadoop mapreduce can be used for streaming. But what is streaming from the >> point of view

Streaming in mapreduce

2012-06-15 Thread Pedro Costa
Hi, Hadoop mapreduce can be used for streaming. But what is streaming from the point of view of mapreduce? For me, streaming are video and audio data. Why mapreduce supports streaming? Can anyone give me an example on why to use streaming in mapreduce? Thanks, Pedro

Accessing local filesystem with org.apache.hadoop.fs.FileSystem

2012-04-04 Thread Pedro Costa
I'm trying to open a local file with the FileSystem class. FileSystem rfs = FileSystem.get(conf); FSDataInputStream i = srcFs.open(p); but I get file not found. The path is correct, but I think that my class is accessing hdfs, instead of my local filesystem. Can I use the FileSystem to access l

Fwd: Read key and values from HDFS

2012-04-01 Thread Pedro Costa
Does anyone have answered this question? Because I can't find it. -- Forwarded message -- From: Pedro Costa Date: 30 March 2012 18:19 Subject: Read key and values from HDFS To: mapreduce-user The ReduceTask can save the file using several output f

Debug MR tasks impossible.

2012-03-29 Thread Pedro Costa
Hi, I'm trying to debug map and reduce tasks for a quite long time, and it seems that it's impossible. MR are launched in new process and there's no way to debug them. Even with IsolationRunner class it's impossible. This isn't good because I really need to debug the class, to understand some chan

Re: How can I list all jobs history?

2012-02-28 Thread Pedro Costa
the first job, and not all jobs. On 28 February 2012 15:40, Jie Li wrote: > Try "hadoop job -list" :) > > Jie > > > On Tue, Feb 28, 2012 at 8:37 AM, Pedro Costa wrote: > >> Hi, >> >> In MapReduce the command bin/hadoop -job history only li

How can I know in which datanodes a file is replicated in HDFS?

2012-01-11 Thread Pedro Costa
Hi, How can I know in which datanodes a file is replicated in HDFS? Is there a command for that? -- Thanks,

Fwd: Reduce output is strange

2011-12-19 Thread Pedro Costa
Hi, In the hadoop MapReduce, I've executed the webdatascan example, and the reduce output is in a SequeceFile. The result is shows here ( http://paste.lisp.org/display/126572). What's the trash (random characters), like "u 265 100 330 320 252 " \n # ; 374 5 211 V ' 340 376" in the output? Is t

Fwd: how read a file in HDFS?

2011-12-16 Thread Pedro Costa
Hi, I want to read a file that has 100MB of size and it is in the HDFS. How should I do it? Is it with IOUtils.readFully? Can anyone give me an example? -- Thanks, -- Thanks,

define a specific data node from a reduce task.

2011-06-15 Thread Pedro Costa
Hi, I'm running an MR application that produces an output that is saved in HDFS. My application has 5 slave nodes (so it has also 5 data nodes). The hdfs file replication factor is 1. I want from my application, or from the Hadoop MR source code tell which data node my result should be. For exampl

Re: cleanup task doesn't run always

2011-06-09 Thread Pedro Costa
, 2011 at 9:05 AM, Laurent Hatier wrote: > Oh i don't see that it was in the HDFS. Aaron has answered i think > > 2011/6/9 Laurent Hatier >> >> Have you try to restart your hadoop node ? (or all hadoop node). When you >> go to restart, the namenode go to format th

cleanup task doesn't run always

2011-06-08 Thread Pedro Costa
Hi, After I run the command "bin/hadoop job -history /temp/history/", I've got these 2 task summary. In one of them, it run a cleanup task and in the other hasn't run the cleanup task. This means that a cleanup task doesn't run always. So, when a cleanup task should run? Task Summary

bin/hadoop job -history doesn't show all job information

2011-06-08 Thread Pedro Costa
Hi, I'm using the command "bin/hadoop job -history /temp/history/" to list the details of the job. I've run several examples before running this command. Now, when I run this command I only get information of the first job and the information about other jobs isn't displayed. How can I show inform

Re: How discard reduce results in mapreduce?

2011-06-02 Thread Pedro Costa
What I meant in this question is put the processed result of the reduce task in something like /dev/null. How can I do that? On Thu, Jun 2, 2011 at 11:07 AM, Pedro Costa wrote: > Hi, > > I'm running an mapreduce example, and I want to run all the map reduce > phases but I don&

How discard reduce results in mapreduce?

2011-06-02 Thread Pedro Costa
Hi, I'm running an mapreduce example, and I want to run all the map reduce phases but I don't want to save in the disk the result of the reduce tasks. Is there a way to tell hadoop not to save the output of the reduce tasks? Thanks,

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
I found the solution. The problem was that I've misspelled the parameter "mapred.tasktracker.map.tasks.maximum". On Tue, May 24, 2011 at 11:06 AM, Pedro Costa wrote: > I think it's important to say that it exists 2 cpus per node and 12 > core(s) per cpu. > >

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
I think it's important to say that it exists 2 cpus per node and 12 core(s) per cpu. On Tue, May 24, 2011 at 11:02 AM, Pedro Costa wrote: > And all the nodes have the same configuration. A job has 5000 map tasks. > > On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote: >

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
And all the nodes have the same configuration. A job has 5000 map tasks. On Tue, May 24, 2011 at 10:57 AM, Pedro Costa wrote: > The values are: > #map tasks: 8 > #reduce tasks: 10 > Map task capacity:10 > Reduce task capacity:10 > > > On Tue, May 24, 2011 at 8:01 AM, Har

Re: How set number of map and reduce can run simultaneously

2011-05-24 Thread Pedro Costa
> > On Mon, May 23, 2011 at 10:28 PM, Pedro Costa wrote: >> I think I've to rephrase the question. >> >> I set the "mapred.tasktracker.map.tasks.maximum" to 8, hoping that it >> will run 8*10 map tasks in the whole cluster. But, it only run 8 tasks >&g

Re: How set number of map and reduce can run simultaneously

2011-05-23 Thread Pedro Costa
I think I've to rephrase the question. I set the "mapred.tasktracker.map.tasks.maximum" to 8, hoping that it will run 8*10 map tasks in the whole cluster. But, it only run 8 tasks simultaneously. Why this is happens? On Mon, May 23, 2011 at 5:45 PM, Pedro Costa wrote: > H

How set number of map and reduce can run simultaneously

2011-05-23 Thread Pedro Costa
Hi, I'm running hadoop map-reduce in a cluster with 10 machines. I would like to set in the configuration that each tasktracker can run 8 map tasks simultaneously and 4 reduce tasks simultaneously. Which parameters should I configure? Thanks, PSC

get shuffle and sort duration of a task

2011-05-02 Thread Pedro Costa
Hi, I would like how much time it took for a task to do the shuffle and sort phase. How can I get that? Thanks

Where are map and reduce functions from the Gridmix2 examples

2011-05-02 Thread Pedro Costa
Hi, 1 - I'm looking for the map and reduce functions of the several examples of the Gridmix2 platform (Webdatasort, Webdatascan, Monsterquery, javasort and combiner) and I can't find it? Where can I find these functions? 2 - Does anyone know what the Webdatasort, Webdatascan, Monsterquery, javas

FILE and HDFS counters

2011-04-24 Thread Pedro Costa
Hi, the Hadoop MapReduce counters has the parameters FILE_BYTES_READ, FILE_BYTES_WRITTEN, HDFS_BYTES_READ and HDFS_BYTES_WRITTEN. What the FILE and HDFS counters represents? Thanks, -- Pedro

Where are map, red functions in Gridmix2

2011-04-18 Thread Pedro Costa
Where are the map and the reduce functions of all examples of GridMix2? -- Pedro

Gridmix2 tests

2011-04-18 Thread Pedro Costa
Hi, the gridmix2 contains several tests like, Combiner, StreamingSort, Webdatasort, webdatascan and monsterquery. I would like to know what does this examples do? Which example uses more CPU for the map and reduce calculation, and which of them exchanges more messages between map and reduces? Wh

hadoop mr cluster mode on my laptop?

2011-04-18 Thread Pedro Costa
Hi, 1 - I would like to run Hadoop MR in my laptop, but with the cluster mode configuration. I've put a slave file with the following content: [code] 127.0.0.1 localhost 192.168.10.1 mylaptop [/code] the 192.168.10.1 is the IP of the machine and the "mylaptop" is the logical name of the address.

Re: Communication protocol between MR and HDFS

2011-04-09 Thread Pedro Costa
What I mean is, the HDFS protocol uses HTTP protocol or is built over RPC? On Sat, Apr 9, 2011 at 5:57 PM, Pedro Costa wrote: > Hi, > > I  was wondering what's the communication protocol between MapReduce > and the HDFS. The MapReduce fetch and saves data blocks from HDFS by

Communication protocol between MR and HDFS

2011-04-09 Thread Pedro Costa
Hi, I was wondering what's the communication protocol between MapReduce and the HDFS. The MapReduce fetch and saves data blocks from HDFS by HTTP or by RPC? Thanks, -- PSC

Get Shuffle and sort time

2011-04-04 Thread Pedro Costa
Hi, Where can I get the shuffle and sort time of a reduce task? -- Pedro

Re: change number of slots in MR

2011-03-25 Thread Pedro Costa
I don't know if this is what I want. I want to set the number of slots that are available for the map and the reduce tasks to run. I don't want to define the number of tasks. On Fri, Mar 25, 2011 at 6:44 PM, David Rosenstrauch wrote: > On 03/25/2011 02:26 PM, Pedro Costa wr

change number of slots in MR

2011-03-25 Thread Pedro Costa
Hi, is it possible to configure the total number of slots that a TaskTracker has, to run the map and reduce tasks? Thanks, -- Pedro

map tasks vs launched map tasks

2011-03-25 Thread Pedro Costa
Hi, during the setup phase and the cleanup phase of the tasks, the Hadoop MR uses map tasks to do it. These tasks appears in the counters shown at the end of an example? For example, the counter below shows that my example ran 9 map tasks and 2 reduce tasks, but the Launched map task has the value

"Reduce input groups" vs "Reduce input records"

2011-03-25 Thread Pedro Costa
Hi, in this MR example, it exists the field "Reduce input groups" and "Reduce input records". What's the difference between these 2 fields? $ hadoop jar cloud9.jar edu.umd.cloud9.example.simple.DemoWordCount data/bible+shakes.nopunc wc 1 10/07/11 22:25:42 INFO simple.DemoWordCount: Tool: DemoWor

Re: phases in tasks

2011-03-24 Thread Pedro Costa
1 - So what's the reason to exist 2 group of phases? 2 - A JobTracker and a TaskTracker also has phases? On Thu, Mar 24, 2011 at 6:42 PM, Arun C Murthy wrote: > > On Mar 24, 2011, at 11:37 AM, Pedro Costa wrote: > >> Hi, >> >> 1 - A Task is composed by several

phases in tasks

2011-03-24 Thread Pedro Costa
Hi, 1 - A Task is composed by several phases: STARTING, MAP/REDUCE, SHUFFLE, SORT, CLEANUP. A JobTracker and a TaskTracker also has phases? 2 - It exists also the following phases RUNNING, SUCCEEDED, FAILED, UNASSIGNED, KILLED,

Re: java.lang.RuntimeException: null is null in Gridmix2

2011-03-23 Thread Pedro Costa
[/code] On Wed, Mar 23, 2011 at 12:03 PM, Pedro Costa wrote: > Hi, > > when I'm running the Gridmix2 examples, during the execution the tests > halt and the following error is displayed: > > [code] > 11/03/23 12:52:06 WARN mapred.JobClient:544 Use GenericOptio

java.lang.RuntimeException: null is null in Gridmix2

2011-03-23 Thread Pedro Costa
Hi, when I'm running the Gridmix2 examples, during the execution the tests halt and the following error is displayed: [code] 11/03/23 12:52:06 WARN mapred.JobClient:544 Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 11/03/23 12:52:06 DEBUG map

Re: mapred.min.split.size

2011-03-18 Thread Pedro Costa
of size, but in reality, it contains 2 HDFS blocks. Is this right? On Fri, Mar 18, 2011 at 8:12 PM, Marcos Ortiz wrote: > El 3/18/2011 3:54 PM, Pedro Costa escribió: >> >> Hi >> >> What's the purpose of the parameter "mapred.min.split.size"? >>

mapred.min.split.size

2011-03-18 Thread Pedro Costa
Hi What's the purpose of the parameter "mapred.min.split.size"? Thanks, -- Pedro

What the examples of Gridmix2 do?

2011-03-18 Thread Pedro Costa
Hi, I don't know what the examples of the Gridmix do. Where can I find an explanation of that? Thank -- Pedro

Re: set number of map tasks in GridMix2

2011-03-18 Thread Pedro Costa
ri, Mar 18, 2011 at 5:04 PM, Pedro Costa wrote: > Hi, > > I would like define the number of map tasks to use in the GridMix2. > > For example, I would like to run the GridMixMonsterQuery at GridMix2 > with 5 maps, another with 10 and another with 20 maps. > > How can I d

set number of map tasks in GridMix2

2011-03-18 Thread Pedro Costa
Hi, I would like define the number of map tasks to use in the GridMix2. For example, I would like to run the GridMixMonsterQuery at GridMix2 with 5 maps, another with 10 and another with 20 maps. How can I do that? Thanks, -- Pedro

What happens after COMMIT_PENDING?

2011-03-09 Thread Pedro Costa
Hi, I'm running hadoop map-reduce in clustering, and I've a Reduce Task that it remains in the state COMMIT_PENDING, and it doesn't finish. This is happening because I've made some changes to the Hadoop MR. I'm trying to solve my problem, but I don't understand what's happens after the COMMIT_PEND

Re: command to delete a directory in hadoop

2011-02-28 Thread Pedro Costa
remove file hadoop dfs -rmr list file hadoop dfs -ls On Mon, Feb 28, 2011 at 10:13 AM, Eric wrote: > Note that Hadoop will put your files in a trash bin. You can use the > -skipTrash option to really delete the data and free up space. See the > command "hadoop dfs" for more details. > > 2011

When a map task finishes?

2011-02-23 Thread Pedro Costa
Does a map task finish after generating the map intermediate file? Thanks, -- Pedro

Can't run Gridmix2 tests

2011-02-22 Thread Pedro Costa
Hi, I'm trying to run Gridmix2 tests (rungridmix_2), but all the tests remain in the waiting state and none of them run. I don't see any exception in the logs. Why this happens? Thanks, -- Pedro

When use hadoop mapreduce?

2011-02-17 Thread Pedro Costa
Hi, I like to know, depending on my problem, when should I use or not use Hadoop MapReduce? Does exist any list that advices me to use or not to use MapReduce? Thanks, -- Pedro

How read compressed files?

2011-02-16 Thread Pedro Costa
Hi, 1 - I'm trying to read parts of a compressed file to generate message digests, but I can't fetch the right parts. I searched for an example that read compressed files, but I can't find one. As I've 3 partition in my example, below are the indexes of the file: raw bytes: 54632 / offset: 0 / par

reduce input is compressed?

2011-02-15 Thread Pedro Costa
Hi, When the compression in on, the compressed map intermediate files are transfered to the reduce side as compressed data? Thanks, -- Pedro

Re: compressed map intermediate files

2011-02-15 Thread Pedro Costa
#x27;s the merged files that are compressed? Thanks, On Tue, Feb 15, 2011 at 10:35 AM, Pedro Costa wrote: > Hi, > > I run two examples of a MR execution with the same input files and > with 3 Reduce tasks defined. One example has the map-intermediate > files compressed, a

compressed map intermediate files

2011-02-15 Thread Pedro Costa
Hi, I run two examples of a MR execution with the same input files and with 3 Reduce tasks defined. One example has the map-intermediate files compressed, and the other examples has uncompressed data. Below, I've put some debug lines that I put in the code. 1 - On the uncompressed data, the raw l

Re: Map output files are SequenceFileFormat

2011-02-14 Thread Pedro Costa
And when the data of the map-intermediate files is compressed, it's still an IFile? On Mon, Feb 14, 2011 at 4:44 PM, Harsh J wrote: > Hello, > > On Mon, Feb 14, 2011 at 8:51 PM, Pedro Costa wrote: >> Hi, >> >> 1 - The map output files are always of the type Seq

Map output files are SequenceFileFormat

2011-02-14 Thread Pedro Costa
Hi, 1 - The map output files are always of the type SequenceFileFormat? 2 - The means that it contains a header with the following files? # version - A byte array: 3 bytes of magic header 'SEQ', followed by 1 byte of actual version no. (e.g. SEQ4 or SEQ6) # keyClassName - String # valueClassName

get Map tasks info in command line

2011-02-13 Thread Pedro Costa
Hi, 1 - How do I get the name of the map tasks the ran in the command line? 2 - How do I get the start time and the end time of a map task in the command line? -- Pedro

get duration of MR tasks by command line

2011-02-13 Thread Pedro Costa
Hi, I would like to get the duration of each Map and Reduce took to run by command line. how is this possible? Thanks, -- Pedro

Save Hadoop examples results locally?

2011-02-13 Thread Pedro Costa
Hi, I'm running GridMix2 examples and I would like to retrieve all the results produced by the tests and save the files locally, to read the later and offline. Does exists any command for that? Thanks -- Pedro

set number of maps in GridMix2

2011-02-12 Thread Pedro Costa
Hi, 1 - In the GridMix2, is it possible to configure the number of maps to execute? For example, the small set of GridMix examples, always execute 3 maps. The medium set, uses 30 Maps. Is it possible to configure the number of maps? 2 - In GridMix2 the small sets use 3 maps, but the directory of

Map tasks execution

2011-02-12 Thread Pedro Costa
Hi, 1 - When a Map task is taking too long to finish its process, the JT launches another Map task to process. This means that the task that was replaced is killed? 2 - Does Hadoop MR allows that the same input split be processed by 2 different mappers at the same time? Thanks, -- Pedro

What are index files in the hadoop MR

2011-02-08 Thread Pedro Costa
Hi, The map tasks saves as output the map output and a index file. What's the purpose of an index file? Thanks, -- Pedro

Shared files in Hadoop MR cluster

2011-02-08 Thread Pedro Costa
Hi, In hadoop MR exists the property "mapred.system.dir" to set a relative directory where shared files are stored during a job run. What are these shared files? -- Pedro

location awareness on RT tasks?

2011-02-04 Thread Pedro Costa
Hi, When hadoop is running in cluster, the output of the Reducers are saved in HDFS. The MapReduce have also location awareness on where is saved the data? For example, we've TT1 running in Machine1, and TT2 running in Machine2. The replication of HDFS is 3. The Reduce Task RT1 is running in TT1.

map output always locally

2011-02-04 Thread Pedro Costa
Hi, The map output are always saved in the local file system, or they can be saved in HDFS? -- Pedro

how set compression in the map output?

2011-02-02 Thread Pedro Costa
Hi, I'm running the wordcount example, but I would like compress the map output. I set the following properties in the mapred-site.xml [code] mapred.compress.map.output true mapred.map.output.compression.codec gzip [/code] but I still got the error: java.la

Map output packet form

2011-02-02 Thread Pedro Costa
Hi, I really like to know how a map output packet is formed on the local disk at the Reduce and the Map side. I'm example, at the Reduce side, the map output is copied to local disk also. The problem is that I'm doing a message digest on the map output on the map side and on the reduce side, and I

Re: ChecksumException at LocalFileSytem

2011-02-02 Thread Pedro Costa
2011 at 10:30 AM, Pedro Costa wrote: > Hi, > > I'm trying to read the map output on the reduce side LocalFileSytem > class. The map output is on the local file system. The problem with > that, it's because it throws a ChecksumException. > > I know that checksums are

ChecksumException at LocalFileSytem

2011-02-02 Thread Pedro Costa
Hi, I'm trying to read the map output on the reduce side LocalFileSytem class. The map output is on the local file system. The problem with that, it's because it throws a ChecksumException. I know that checksums are verified when the file is read, and since, it throws a ChecksumException, somethi

Retrieve the segments as bytes at RT

2011-02-01 Thread Pedro Costa
Hi, 1 - In the example that I'm executing, during the phase of copying the map output from Map task to Reduce task, the RT saved the map output into disk, because it's to big. I've go the file path to the saved map output, and now I would like to retrieve the following information: - checksum -

map Output packet in Reduce side

2011-02-01 Thread Pedro Costa
When the RT fetch a map output, the RT can save the map output in memory or in disk. 1 - The map output packet is different when it's saved to memory, or to disk? Does the header or the body of the saved map output packet contains different fields? 2 - What are the fields of the packet? Thanks,

Fwd: raw length vs part length

2011-02-01 Thread Pedro Costa
of 14 bytes? -- Forwarded message -- From: Pedro Costa Date: Tue, Feb 1, 2011 at 11:43 AM Subject: raw length vs part length To: mapreduce-user@hadoop.apache.org Hi, Hadoop uses the compressed length and the raw length. 1 - In my example, the RT is fetching a map output that

raw length vs part length

2011-02-01 Thread Pedro Costa
Hi, Hadoop uses the compressed length and the raw length. 1 - In my example, the RT is fetching a map output that shows that it has the raw length of 14 bytes and the partLength of 10 bytes. The map output doesn't use any compression. When I'm dealing with uncompressed data, the raw length should

Map output in disk and memory at same time?

2011-01-31 Thread Pedro Costa
Hi, When the reduce fetch from the mappers a map output of the size of 1GB and do the merge, is it possible that part of the map output is saved in disk and other part in memory? Or a map output must be saved all in disk, or all in memory? Thanks, -- Pedro

Re: Retrieve FileStatus given a file path?

2011-01-31 Thread Pedro Costa
I said file status, but what I would like to know is the size of the file. On Mon, Jan 31, 2011 at 5:56 PM, Pedro Costa wrote: > Hi, > > On the reduce side, after the RT had passed the merge phase (before > the reduce phase starts), I've got the path of the map_0.out file. I

Retrieve FileStatus given a file path?

2011-01-31 Thread Pedro Costa
Hi, On the reduce side, after the RT had passed the merge phase (before the reduce phase starts), I've got the path of the map_0.out file. I'm opening this file with [code] FSDataInputStream in = fs.open(file); [/code] But, I only got the path. Is it possible to obtain the file status of this fi

Re: java.lang.RuntimeException: problem advancing post rec#499959

2011-01-30 Thread Pedro Costa
(of one/more intermediate > files, a.k.a. IFiles). > > On Fri, Jan 28, 2011 at 11:21 PM, Pedro Costa wrote: >> Hi, >> >> I'm running the Terasort problem in cluster mode, and I've got a >> RunTimeException in a Reduce Task. >> >> java.lang.Ru

java.lang.RuntimeException: problem advancing post rec#499959

2011-01-28 Thread Pedro Costa
Hi, I'm running the Terasort problem in cluster mode, and I've got a RunTimeException in a Reduce Task. java.lang.RuntimeException: problem advancing post rec#499959 (Please, see attachment) What this error means? Is it a problem about wrong KEYOUT and VALUEOUT in the Reduce Task? Thanks, --

Re: PiEstimator error - Type mismatch in key from map

2011-01-27 Thread Pedro Costa
ugh, > based on this section of the exception's call stack: > >       at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) >       at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:637) >       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) > >

Re: PiEstimator error - Type mismatch in key from map

2011-01-27 Thread Pedro Costa
fine, but mapreduce.Mapper.map has this signature: > > map(K key, V value, Context) > > Your PiEstimator map signature doesn't match, so it's not overriding > the proper function and is never getting called by the framework. > > Could you paste your complete

Re: PiEstimator error - Type mismatch in key from map

2011-01-27 Thread Pedro Costa
the code that could give this error. On Thu, Jan 27, 2011 at 4:29 PM, Pedro Costa wrote: > Yes, that's the one that's being used ( o.a.h.mapreduce.Mapper ). This > is not the right one to use? > > > > On Thu, Jan 27, 2011 at 3:40 PM, Chase Bradford > wrote: >

Re: PiEstimator error - Type mismatch in key from map

2011-01-27 Thread Pedro Costa
Map class in the job setup?  It sounds a > bit like the base o.a.h.mapreduce.Mapper map implementation is being used > instead. > > > On Jan 27, 2011, at 2:36 AM, Pedro Costa wrote: > >> The map output class are well defined: >> keyClass: class org

Re: PiEstimator error - Type mismatch in key from map

2011-01-27 Thread Pedro Costa
BooleanWritable(true), new LongWritable(numInside)); out.collect(new BooleanWritable(false), new LongWritable(numOutside)); } [/code] I'm really confused, right now. How can this be happening? On Thu, Jan 27, 2011 at 10:19 AM, Pedro Costa wrote: > Thanks Nicholas, but it didn't work

Re: PiEstimator error - Type mismatch in key from map

2011-01-27 Thread Pedro Costa
; the map-reduce tutorial at > http://hadoop.apache.org/common/docs/r0.20.2/mapred_tutorial.html > > On Jan 26, 2011, at 10:27 AM, Pedro Costa wrote: > >> Hadoop 20.1 >> >> On Wed, Jan 26, 2011 at 6:26 PM, Tsz Wo (Nicholas), Sze >> wrote: >>> Hi Srihari, &g

Re: PiEstimator error - Type mismatch in key from map

2011-01-26 Thread Pedro Costa
t; > I got a similar error before in one of my projects. I had to set the values > for "mapred.output.key.class" and "mapred.output.value.class". > That resolved the issue for me. > Srihari > On Jan 26, 2011, at 10:09 AM, Pedro Costa wrote: > > Yes, I can repro

Re: PiEstimator error - Type mismatch in key from map

2011-01-26 Thread Pedro Costa
you able to reproduce it > deterministically? > Nicholas > > ____ > From: Pedro Costa > To: mapreduce-user@hadoop.apache.org > Sent: Wed, January 26, 2011 5:47:01 AM > Subject: PiEstimator error - Type mismatch in key from map > > Hi, > > I ru

PiEstimator error - Type mismatch in key from map

2011-01-26 Thread Pedro Costa
Hi, I run the PI example of hadoop, and I've got the following error: [code] java.io.IOException: Type mismatch in key from map: expected org.apache.hadoop.io.BooleanWritable, recieved org.apache.hadoop.io.LongWritable at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.j

Re: split locations

2011-01-14 Thread Pedro Costa
, a logical MapReduce InputSplit is very > different from a physical HDFS Block. > > On Fri, Jan 14, 2011 at 5:10 PM, Pedro Costa wrote: >> I think that the answer is, each location of the split file >> corresponds to a replica. >> >> On Fri, Jan 14, 2011 at

Re: split locations

2011-01-14 Thread Pedro Costa
I think that the answer is, each location of the split file corresponds to a replica. On Fri, Jan 14, 2011 at 11:09 AM, Pedro Costa wrote: > Hi, > > If a split location contains more that one location, it means that > this split file is replicated through all locations, or it me

split locations

2011-01-14 Thread Pedro Costa
Hi, If a split location contains more that one location, it means that this split file is replicated through all locations, or it means that a split is divided into several blocks, and each block is in one location? Thanks, -- Pedro

how the JT choose a TT to launch a map task?

2011-01-13 Thread Pedro Costa
Hi, I would like to understand how the JT choose a TT to launch a map task in a clustering setup. So I pose 4 different questions, to understand how the choice of TT is made by the JT. All these questions are posed in a scenario where the Hadoop MR is installed in clustering mode. 1 - For exampl

know a location of an input file in a network

2011-01-13 Thread Pedro Costa
Hi, I've hadoop installed in a cluster and I would like that JT could guess in the network topology what are the input files in HDFS that are closer to him, and further. So, how can a JT know if an input file is located on local-level, on rack-level, or on the other level? Thanks, -- Pedro

Run jobs in sequence

2011-01-06 Thread Pedro Costa
Hi, I would like to run a sequence of jobs, where the output of a job is the input for the next one. Does the Hadoop Pig helps to do it? Thanks, -- Pedro

What's the purpose of the variables numInFlight and numCopied?

2011-01-02 Thread Pedro Costa
The reduce task contains in the method fetchOutputs, 2 variable: numInFlight numCopied What are the purpose of these variables? Thanks, -- Pedro

Run MR unit tests in eclipse

2010-12-30 Thread Pedro Costa
Hi, I'm trying to understand how can I execute the unit tests of MapReduce in eclipse IDE. I'm trying to execute the unit test class TestMapReduceLocal, but I get errors that I pasted below. How can I run the MR unit tests in eclipse IDE? [code] 2010-12-30 17:39:19,703 WARN conf.Configuration (C

Several TT in hadoop MR in clustering?

2010-12-29 Thread Pedro Costa
Hi, 1 - When Hadoop MR is running in cluster mode, is it possible to have several TT, a TT per machine, running simultaneously and the JT is communicating to every TT? For example, running hadoop MR in 5 machines, each machine (slave node) has a TT communicating with the JT in the master node? Th

Re: Spill and Map Output

2010-12-22 Thread Pedro Costa
  rec.startOffset = segmentStart; > rec.rawLength = writer.getRawLength(); > rec.partLength = writer.getCompressedLength(); > > -Ravi > > On 12/23/10 3:52 AM, "Pedro Costa" wrote: > > A index record contains 3 variables: > startO

Re: Spill and Map Output

2010-12-22 Thread Pedro Costa
-Ravi > > On 12/23/10 3:24 AM, "Pedro Costa" wrote: > > So, I conclude that a partition is defined by the offset. > But, for example, a Map Tasks produces 5 partitions. How the reduce > knows that it must fetch the 5 partitions? Where's this information? > This informa

Re: Spill and Map Output

2010-12-22 Thread Pedro Costa
le where the input data for the > particular reducer starts and length is the size of the data starting from > the offset. > > -Ravi > > > On 12/23/10 2:17 AM, "Pedro Costa" wrote: > > Hi, > > 1 - I would like to understand how a partition works in the

Spill and Map Output

2010-12-22 Thread Pedro Costa
Hi, 1 - I would like to understand how a partition works in the Map Reduce. I know that the Map Reduce contains the IndexRecord class that indicates the length of something. Is it the length of a partition or of a spill? 2 - In large map output, a partition can be a set of spills, or a spill is s

When a Reduce Task starts?

2010-12-20 Thread Pedro Costa
1 - A reduce task should start only when a map task ends ? -- Pedro

  1   2   >