directly from the hdfs and since it
would be a sorted file you could just walk it and merge the count in single
pass in the reduce function.
- Sharad
--
Aayush Garg,
Phone: +41 764822440
Burger norbert.bur...@gmail.comwrote:
Aayush, out of curiosity, why do you want model wordcount this way?
What benefit do you see?
Norbert
On 4/6/09, Aayush Garg aayush.g...@gmail.com wrote:
Hi,
I want to make experiments with wordcount example in a different way.
Suppose we have very
Hi,
I want to make experiments with wordcount example in a different way.
Suppose we have very large data. Instead of splitting all the data one time,
we want to feed some splits in the map-reduce job at a time. I want to model
the hadoop job like this,
Suppose a batch of inputsplits arrive in
Hi,
I am having a 5 node cluster for hadoop usage. All nodes are multi-core.
I am running a shell command in Map function of my program and this shell
command takes one file as an input. Many of such files are copied in the
HDFS.
So in summary map function will run a command like ./run file1
I put my username to R61neptun as you suggested but I am still getting that
error:
localhost: starting datanode, logging to
/home/garga/Documents/hadoop-0.15.3/bin/../logs/hadoop-garga-datanode-R61neptun.out
localhost: starting secondarynamenode, logging to
Could anyone please tell?
On Sat, Apr 19, 2008 at 1:33 PM, Aayush Garg [EMAIL PROTECTED] wrote:
Hi,
I have written the following code for writing my key,value pairs in the
file, and this file is then read by another MR.
Path pth = new Path(./dir1/dir2/filename);
FileSystem fs
I just tried the same thing (mapred.task.id)as you told..But I am getting
one file named null in my directory.
On Mon, Apr 21, 2008 at 8:33 AM, Amar Kamat [EMAIL PROTECTED] wrote:
Aayush Garg wrote:
Could anyone please tell?
On Sat, Apr 19, 2008 at 1:33 PM, Aayush Garg [EMAIL PROTECTED
Could anyone please help me with this error below ? I am not able to start
HDFS due to this?
Thanks,
On Sat, Apr 19, 2008 at 7:25 PM, Aayush Garg [EMAIL PROTECTED] wrote:
I have my hadoop-site.xml correct !! but it creates error in this way
On Sat, Apr 19, 2008 at 6:35 PM, Stuart Sierra
HI,
I am getting following error on start up the hadoop as pseudo distributed::
bin/start-all.sh
localhost: starting datanode, logging to
/home/garga/Documents/hadoop-0.15.3/bin/../logs/hadoop-root-datanode-R61-neptun.out
localhost: starting secondarynamenode, logging to
I have my hadoop-site.xml correct !! but it creates error in this way
On Sat, Apr 19, 2008 at 6:35 PM, Stuart Sierra [EMAIL PROTECTED]
wrote:
On Sat, Apr 19, 2008 at 9:53 AM, Aayush Garg [EMAIL PROTECTED]
wrote:
I am getting following error on start up the hadoop as pseudo
distributed
a
global picture? I guess not.
In general, it is better to not try to communicate between map and reduce
except via the expected mechanisms.
On 4/16/08 1:33 PM, Aayush Garg [EMAIL PROTECTED] wrote:
We can not read HashMap in the configure method of the reducer because
. The file can
be very big ...so can I write in such a manner that file is distributed and
I can read it easily in the next MapReduce Phase. Other way, can I split the
file when it becomes gerater than a certain size?
Thanks,
Aayush
On Thu, Apr 17, 2008 at 1:01 PM, Aayush Garg [EMAIL PROTECTED] wrote
getting FileSystem create error?
Thanks,
On Thu, Apr 17, 2008 at 5:54 PM, Ted Dunning [EMAIL PROTECTED] wrote:
Don't assume that any variables are shared between reducers or between
maps,
or between maps and reducers.
If you want to share data, put it into HDFS.
On 4/17/08 4:01 AM, Aayush Garg
should I
choose??? Is this design and approach ok?
}
public static void main() {}
}
I hope you have got my question.
Thanks,
On Wed, Apr 16, 2008 at 8:33 AM, Amar Kamat [EMAIL PROTECTED] wrote:
Aayush Garg wrote:
Hi,
Are you sure that another MR is required for eliminating some rows
format you like.
On 4/16/08 12:41 PM, Aayush Garg [EMAIL PROTECTED] wrote:
HI,
The current structure of my program is::
Upper class{
class Reduce{
reduce function(K1,V1,K2,V2){
// I count the frequency for each key
// Add output in HashMap(Key,value) instead
.
On 4/16/08 1:33 PM, Aayush Garg [EMAIL PROTECTED] wrote:
We can not read HashMap in the configure method of the reducer because
it is
called before reduce job.
I need to eliminate rows from the HashMap when all the keys are read.
Also my concern is if dataset is large will this HashMap thing
But the problem is that I need to sort according to freq which is the part
of my value field...
Any inputs?? Could you provide smal piece of code of your thought
On Wed, Apr 9, 2008 at 9:45 AM, Owen O'Malley [EMAIL PROTECTED] wrote:
On Apr 8, 2008, at 4:54 AM, Aayush Garg wrote:
I construct
Hi,
I have implemented Key and value pairs in the following way:
Key (Text class) Value(Custom class)
word1
word2
class Custom{
int freq;
TreeMapString, ArrayListString
}
I construct this type of key, value pairs in the outputcollector of reduce
phase. Now I want to SORT this
Hi,
I have implemented Key and value pairs in the following way:
Key (Text class) Value(Custom class)
word1
word2
class Custom{
int freq;
TreeMapString, ArrayListString
}
I construct this type of key, value pairs in the outputcollector of reduce
phase. Now I want to SORT this
Dempsey (new to the list)
On Apr 4, 2008, at 5:36 PM, Ted Dunning wrote:
See Nutch. See Nutch run.
http://en.wikipedia.org/wiki/Nutch
http://lucene.apache.org/nutch/
--
Aayush Garg,
Phone: +41 76 482 240
[EMAIL PROTECTED] wrote:
On Wed, 26 Mar 2008, Aayush Garg wrote:
HI,
I am developing the simple inverted index program frm the hadoop. My map
function has the output:
word, doc
and the reducer has:
word, list(docs)
Now I want to use one more mapreduce to remove stop and scrub words
associated with every word. How should I design my program from this stage?
I mean how would I apply multiple mapreduce to this? What would be the
better way to perform this?
Thanks,
Regards,
-
Aayush Garg,
Phone: +41 76 482 240
22 matches
Mail list logo