Hi,
So, I have two different directories.. which i want to process
differently...
For which I have to mappers for the job..
Data1
Data2
and in my driver.. I add the following:
MultipleInputs.addInputPath(job, new Path( args[0]),
TextInputFormat.class,
Data1.class);
MultipleInpu
Hi,
I have a very simple use case...
Basically I have an edge list and I am trying to convert it into adjacency
list..
Basically
src target
a b
ac
bd
be
and so on..
What I am trying to build is
a [b,c]
b [d,e]
.. and so on..
But every now and then.. I hit a super node..which h
Hi,
I have been searching in mrunit documentation but hasnt been able to find
it so far..
How do i pass configuration parameters in my mrunit.
So for example, if i take the wordcount example.
Lets say, in my driver code I am setting this parameter...
conf.set("delimiter",args[2])
And in my ma
Hi,
I am trying to join two datasets.. One of which is json..
I am relying on json-simple library to parse that json..
I am trying to use libjars.. So far .. for simple data processing.. the
approach has worked.. but now i am getting the following error
Exception in thread "main" java.lang.NoClass
OOps..forgot the code:
http://pastebin.com/7XnyVnkv
On Thu, Oct 24, 2013 at 10:54 AM, jamal sasha wrote:
> Hi,
>
> I am trying to join two datasets.. One of which is json..
> I am relying on json-simple library to parse that json..
> I am trying to use libjars.. So far ..
Hi,
I am trying to separate my output from reducer to different folders..
My dirver has the following code:
FileOutputFormat.setOutputPath(job, new Path(output));
//MultipleOutputs.addNamedOutput(job, namedOutput,
outputFormatClass, keyClass, valueClass)
//MultipleOutputs
never mind.. found a bug :D
On Fri, Oct 11, 2013 at 12:54 PM, jamal sasha wrote:
> Hi..
>
> In my mapper function..
> Can i have multiple context.write()...
>
> So...
>
> public void map(LongWritable key, Text value, Context context) throws
> IOExce
Hi..
In my mapper function..
Can i have multiple context.write()...
So...
public void map(LongWritable key, Text value, Context context) throws
IOException, InterruptedException ,NullPointerException{
.. //processing/...
context.write(k1,v1);
context.write(k2,v2);
}
I thought we could do th
Hi,
I have data in this one folder like following:
data---shard1---d1_1
| |_d2_1
Lshard2---d1_1
| |_d2_2
Lshard3---d1_1
| |_d2_3
Lshard4---d1_1
|_d2_4
Now, I want to
Hi,
So in native hadoop streaming, how do i send a helper file.. ?
Like in core hadoop, you can write your code in multiple files and then jar
it out...
But if i am using hadoop streaming, all my code should be in single file??
Is that so?
Oops.. wrong email thread :D Please ignore the previous email
On Fri, Sep 20, 2013 at 1:49 PM, jamal sasha wrote:
> Hi,
> So in native hadoop streaming, is there no way to send a helper file.. ?
> Like in core hadoop, you can write your code in multiple files and then
> jar it out
wrote:
> LMGTFY:
> http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script-guide
>
>
> On Wed, Sep 18, 2013 at 6:01 PM, jamal sasha wrote:
>
>> Hi,
>> How do I implement (say ) in wordcount a combiner functionality if i am
>> using python hadoop streaming?
>> Thanks
>>
>
>
Hi,
How do I implement (say ) in wordcount a combiner functionality if i am
using python hadoop streaming?
Thanks
= FileSystem.open(p);
> String str;
> while((str = iStream.readLine())!=null)
> {
> System.out.printn(str);
>
> }
> Regards,
> Som Shekhar Sharma
> +91-8197243810
>
Hi,
Probably a very stupid question.
I have this data in binary format... and the following piece of code works
for me in normal java.
public classparser {
public static void main(String [] args) throws Exception{
String filename = "sample.txt";
File file = new File(filename);
FileInputStrea
Hi,
A very weird question.
I have data in format 1...
And there is this conversion utility to convert data to format 2 whcih
works like this
restore input > output
and then I want to copy output to hdfs at /user/output
Is tehre a way that i can merge all these commands into single line like
c
args); // calls your
run() method.
System.exit(ret);
}
}
On Wed, Aug 28, 2013 at 10:03 AM, Shahab Yunus wrote:
> Just google it.
>
> For HBaseStorage
> http://blog.whitepages.com/2011/10/27/hbase-storage-and-pig/
>
> For M/R:
> http://wiki.apache.org/hadoop/Hbase/MapRe
t;
> Or you can use Pig to store it in HBase using HBaseStorage.
>
> There are many ways (and resources available on the web) and he question
> that you have asked is very high level.
>
> Regards,
> Shahab
>
>
> On Wed, Aug 28, 2013 at 12:49 PM, jamal sasha wrote:
Hi,
I have data in form:
source, destination, connection
This data is saved in hdfs
I want to read this data and put it in hbase table something like:
Column1 (source) | Column2(Destination)| Column3(Connection Type)
Rowvertex A| vertex B | connection
Ho
Hi,
I am new to hbase and am trying to achieve the following.
I am reading data from hdfs in mapper and parsing it..
So, in reducer I want my output to write to hbase instead of hdfs
But here is the thing.
public static class MyTableReducer extends TableReducer {
public void reduce(Text key
I have bunch of jars whcih i want to pass. I am using libjars option to do
so. But to do that I have to implement tool ??
So i change my code to following but still I am getting this warning?
13/08/27 11:32:37 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications s
t; log4j
>
>
>
>
>
> slf4j-log4j12
>
> org.slf4j
>
>
>
>
>
>
>
> Regards,
> Shahab
>
>
> On Tue, Aug 27, 2013 at 11:47 AM, jamal sasha wrote:
>
>> Hi,
>> For one of my map reduce code I want to use a different version of
>> slf4
Hi,
For one of my map reduce code I want to use a different version of slf4j
jar (1.6.4)
But I guess hadoop has a different version of jar in hadoop
classpath lib/slf4j-log4j12-1.4.3.jar
And when I am trying to run my code, I am gettign this error:
Exception in thread "main" java.lang.NoSuchMetho
Hi,
I am wondering if there is any tutorial to see.
What are the challenges for reading and/or writing to/from database.
Is there a common flavor across all the database.
For example, the dbs start a server on some host : port
Establish connection to that host:port
It can be across proxy?
Which
Hi,
Lets say that I have a data which interacts with a rest api like
%curl hostname data
Now, I have the following script:
#!/usr/bin/env python
import sys,os
cmd = """curl http://localhost --data '"""
string = " "
for line in sys.stdin:
line = line.rstrip(os.linesep)
string += line
s as key value pair. This configuration object will be set in the
> Job Object. The same properties can be accessed in the mapper/reducer using
> the Context Object -> getConfiguration() -> get(propertyName).
>
> ** **
>
> Hope this helps.
>
> ** **
>
> Regards,
&g
Hi,
I am initializing an object in driver code.
For sake of argument let say I want to save data to some database..
say:
Connection con = new Connection(host, db);
Now, in reducer I want to do something like
con.write(key,value)
So, how do i pass this object from driver to mapper / reducer/?
An
Hi,
I am trying to understand, how to write my own writable.
So basically trying to understand how to process records spanning multiple
lines.
Can some one break down to me, that what are the things needed to be
considered in each method??
I am trying to understand this example:
https://github
Never mind guys.
I had a typo when I was trying to set configuration param.
Sorry.
On Tue, Aug 6, 2013 at 4:46 PM, jamal sasha wrote:
> Hi,
> I am trying to pass a parameter to multiple mappers
>
> So, I do this in my driver
>
> conf.set("delimiter", ar
Hi,
I am trying to pass a parameter to multiple mappers
So, I do this in my driver
conf.set("delimiter", args[3]);
In mapper1, I am retreiving this as:
Configuration conf = context.getConfiguration();
String[] values = value.toString().split(conf.get("delimiter"));
and same is my mapper2
B
code to get these directly instead of iterating multiple times.
>
>
> ** **
>
> Thanks****
>
> Devaraj k
>
> ** **
>
> *From:* jamal sasha [mailto:jamalsha...@gmail.com]
> *Sent:* 31 July 2013 23:40
> *To:* user@hadoop.apache.org
> *Subject:*
Hi,
I am getting this error:
13/07/31 09:29:41 INFO mapred.JobClient: Task Id :
attempt_201307102216_0270_m_02_2, Status : FAILED
java.util.NoSuchElementException
at java.util.StringTokenizer.nextToken(StringTokenizer.java:332)
at java.util.StringTokenizer.nextElement(StringTokenizer.java:39
Ok.
A very basic (stupid) question.
I am trying to compute mean using hadoop.
So my implementation is like this:
public class Mean
public static class Pair{
//simple class to create object
}
public class MeanMapper
emit(text,pair) //where pair is (local sum, count)
public class MeanRed
Hi,
I am getting a weird error?
13/07/29 10:50:58 INFO mapred.JobClient: Task Id :
attempt_201307102216_0145_r_16_0, Status : FAILED
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
/wordcount_raw/_temporary/_attempt_201307102216
Then how should I approach this issue?
On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes wrote:
> If you try to hammer in a nail (json file) with a screwdriver (
> XMLInputReader) then perhaps the reason it won't work may be that you are
> using the wrong tool?
> On Jun 21, 2013
Hi,
I am using one of the libraries which rely on InputFormat.
Right now, it is reading xml files spanning across mutiple lines.
So currently the input format is like:
public class XMLInputReader extends FileInputFormat {
public static final String START_TAG = "";
public static final Strin
Hi,
I have deployed hadoop into a small cluster..
Now the issue is while on job launch it does says
Tracking URL: http://foobar:50030/jobdetails.jsp?jobid=job_201305241622_0047
but I cannot look at this url and look at the job status (maybe its the
firewall?? proxy?? )
I can only look at my l
Ok got this thing working..
Turns out that -libjars should be mentioned before specifying hdfs input
and output.. rather than after it..
:-/
Thanks everyone.
On Thu, May 30, 2013 at 1:35 PM, jamal sasha wrote:
> Hi,
> I did that but still same exception error.
> I did:
the -libjars parameter when you
> kick off your M/R job. This way the jars will be copied to all TTs.
>
> Regards,
> Shahab
>
>
> On Thu, May 30, 2013 at 2:43 PM, jamal sasha wrote:
>
>> Hi Thanks guys.
>> I figured out the issue. Hence i have another question.
&g
>
> On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee <
> rahul.rec@gmail.com> wrote:
>
>> Whatever you have mentioned Jamal should work.you can debug this.
>>
>> Thanks,
>> Rahul
>>
>>
>> On Thu, May 30, 2013 at 5:14 AM, jamal sasha wro
aracters.
>
> Thanks and Regards,
>
> Rishi Yadav
>
> On Wed, May 29, 2013 at 2:54 PM, jamal sasha wrote:
>
>> Hi,
>>I am stuck again. :(
>> My input data is in hdfs. I am again trying to do wordcount but there is
>> slight difference.
>> The
group AS word,
> COUNT_STAR(words) AS word_count;
> STORE word_counts INTO '/tmp/word_counts.txt';
>
> It will be faster than the Java you'll likely write.
>
>
> On Wed, May 29, 2013 at 2:54 PM, jamal sasha wrote:
>
>> Hi,
>>I am stuck again. :(
Hi,
I am stuck again. :(
My input data is in hdfs. I am again trying to do wordcount but there is
slight difference.
The data is in json format.
So each line of data is:
{"author":"foo", "text": "hello"}
{"author":"foo123", "text": "hello world"}
{"author":"foo234", "text": "hello this world"}
Hi,
Is it possible to save data in database (HBase, cassandra??) directly
from hadoop.
so that there is no output in hdfs but that it directly writes data into
this db?
If I want to modify wordcount example to achive this, what/where should I
made these modifications.
Any help/ suggestions.
Tha
Hi,
I want to process some text files and then save the output in a db.
I am using python (hadoop streaming).
I am using mongo as backend server.
Is it possible to run hadoop streaming jobs without specifying any output?
What is the best way to deal with this.
Hi,
I am trying to understand the difference between combiner and aggregator.
Based on my readings:
Wordcount example (mapper)
aggregator
class Mapper
method MAP
H <-- Associative array
for all term t in document:
H{t} = H{t} + 1
for all term t ele H do
EMIT(term t, count H{t
uce the ultimate result.
>
> BR
> Yanbo
>
>
> 2013/4/2 jamal sasha
>
>> pinging again.
>> Let me rephrase the question.
>> If my data is like:
>> id, value
>>
>> And I want to find average "value" for each id, how can i do that using
>
Hi,
I have a quick question. I am trying to write MR code using python.
In the word count example:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
The reducer..
Why cant in the reducer I can declare a ditionary (hashmap) whose key is
word and value is a list o
t right.
I would really appreciate if someone can help me with this query.
THanks
On Mon, Apr 1, 2013 at 2:27 PM, jamal sasha wrote:
> data_dict is declared globably as
> data_dict = defaultdict(list)
>
>
> On Mon, Apr 1, 2013 at 2:25 PM, jamal sasha wrote:
>
>> Very
data_dict is declared globably as
data_dict = defaultdict(list)
On Mon, Apr 1, 2013 at 2:25 PM, jamal sasha wrote:
> Very dumb question..
> I have data as following
> id1, value
> 1, 20.2
> 1,20.4
>
>
> I want to find the mean and median of id1?
> I am us
Very dumb question..
I have data as following
id1, value
1, 20.2
1,20.4
I want to find the mean and median of id1?
I am using python hadoop streaming..
mapper.py
for line in sys.stdin:
try:
# remove leading and trailing whitespace
line = line.rstrip(os.linesep)
tokens = line.split(",")
print
oops never mind guys..
figured out the issue.
sorry for spamming.
On Thu, Mar 28, 2013 at 5:15 PM, jamal sasha wrote:
> Very much like this:
>
> http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file
>
>
> On Thu,
Very much like this:
http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file
On Thu, Mar 28, 2013 at 5:10 PM, jamal sasha wrote:
> Hi,
>I am facing a weird problem.
> My python scripts were working just fine.
>
Hi,
I am facing a weird problem.
My python scripts were working just fine.
I made few modifications..
tested via:
cat input.txt | python mapper.py | sort | python reducer.py
runs just fine
Ran on my local machine (pseudo-distributed mode)
THat also runs just fine
Deployed on clusters..
Now,
Though it copies.. but it gives this error?
On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha wrote:
> When I try this.. I get an error
> cat: Unable to write to output stream.
>
> Are these permissions issue
> How do i resolve this?
> THanks
>
>
> On Wed, Feb 20, 2013
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid:
> >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870,
> >> duration: 19207000
> >>
> >> I don't see how this is anymore dangerous than doing a
> >>
Awesome thanks :)
On Tue, Feb 19, 2013 at 2:14 PM, Harsh J wrote:
> You can instead use 'fs -cat' and the 'head' coreutil, as one example:
>
> hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file
>
> On Wed, Feb 20, 2013 at 3:38 AM, jam
Hi,
I was wondering in the following command:
bin/hadoop dfs -copyToLocal hdfspath localpath
can we have specify to copy not full but like xMB's of file to local drive?
Is something like this possible
Thanks
Jamal
Hi,
This might be more of a python centric question but was wondering if
anyone has tried it out...
I am trying to run few hadoop commands from python program...
For example if from command line, you do:
bin/hadoop dfs -ls /hdfs/query/path
it returns all the files in the hdfs query pat
Hi.
A very very lame question.
Does numbers of mapper depends on the number of nodes I have?
How I imagine map-reduce is this.
For example in word count example
I have bunch of slave nodes.
The documents are distributed across these slave nodes.
Now depending on how big the data is, it will sprea
orked fine. Can you show the code of your
driver program (i.e. where you have main) ?
>
> Thanks
> hemanth
>
>
>
> On Tue, Jan 22, 2013 at 5:22 AM, jamal sasha
wrote:
>>
>> Hi,
>> Lets say I have the standard helloworld program
>>
http://hadoop.apache.o
in reducer by
> adding a logger or syso.
>
> Also consider removing static in declaration of baseSum as it would add
> counts of previous keys.
> On Jan 22, 2013 7:17 AM, "jamal sasha" wrote:
>
>> The second one.
>> If the word hello appears once, its count is
HELLO appears once it's count is 201.
>
> Please clarify
> On Jan 22, 2013 5:22 AM, "jamal sasha" wrote:
>
>> Hi,
>> Lets say I have the standard helloworld program
>>
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html#Example%3A
rm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Fri, Jan 18, 2013 at 5:48 AM, jamal sasha wrote:
>
>>
>> On Thu, Jan 17, 2013 at 4:14 PM, Mohammad Tariq wrote:
>>
>>> hdfs://your_namenode:9000/user/hduser/data/input1.t
On Thu, Jan 17, 2013 at 4:14 PM, Mohammad Tariq wrote:
> hdfs://your_namenode:9000/user/hduser/data/input1.txt
>
It runs :D
But I am very curious. if i run the sample wordcount example normally .. it
automatically reads from the hdfs location..
but here.. it didnt seemed to respect that?
uration.addResource(new Path("PATH_TO_YOUR_hdfs-site.xml"));
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Fri, Jan 18, 2013 at 5:26 AM, jamal sasha wrote:
>
>> Hi,
>> I am not sure what I am doing wrong.
Hi,
I am not sure what I am doing wrong.
I copy my input files from local to hdfs at local
/user/hduser/data/input1.txt
/user/hduser/data/input2.txt
In my driver code: I have
MultipleInputs.addInputPath(conf, new Path(args[0]),
TextInputFormat.class, UserFileMapper.class);
Multip
tple Inputs here and process the new input file into
> 'word 1' and the previous output file as 'word $count' in the mapper and do
> its aggregation in the reducer.
> Regards
> Bejoy KS
>
> Sent from remote device, Please excuse typos
> -
;
> On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha wrote:
>
>> Hi,
>> In the wordcount example:
>> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
>> Lets say I run the above example and save the the output.
>> But lets say that I have now a new inp
Hi,
In the wordcount example:
http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html
Lets say I run the above example and save the the output.
But lets say that I have now a new input file. What I want to do is..
basically again do the wordcount but basically modifying the previous
counts.
Fo
I am inside a network where I need proxy settings to access the internet.
I have a weird problem.
The internet is working fine.
But it is one particular instance when i get this error:
Network Error (tcp_error)
A communication error occurred: "Operation timed out"
The Web Server may
Hi,
The relevant code snippet posted here
http://pastebin.com/DRPXUm62
On Tue, Jan 15, 2013 at 5:31 PM, jamal sasha wrote:
> My bad. Sorry I fixed it. It is BuildGraph.class
>
>
> On Tue, Jan 15, 2013 at 5:30 PM, Serge Blazhiyevskyy <
> serge.blazhiyevs...@nice.com> wrote
SERGE BLAZHIYEVSKY
> Architect
> (T) +1 (650) 226-0511
> (M) +1 (408) 772-2615
> se...@nice.com<mailto:se...@nice.com>
> www.nice.com<http://www.nice.com>
>
>
> On Jan 15, 2013, at 5:24 PM, jamal sasha jamalsha...@gmail.com>> wrote:
>
> I have a mappe
I have a mapper
public class BuildGraph{
public void config(JobConf job){ *<==this block doesnt seems to be
exexcuting at all :(*
super.configure(job);
this.currentId = job.getInt("currentId",0);
if (this.currentId!=0){
// I call a method from differnt c
Hi,
Probably a very lame question.
I have two documents and I want to find the overlap of both documents in
map reduce fashion and then compare the overlap (lets say I have some
measure to do that)
SO this is what I am thinking:
1) Run the normal wordcount job on one document (
https://site
king for but that wont be
> map reduce.
>
> ** **
>
> Thanks,
>
> Abhishek
>
> ** **
>
> *From:* jamal sasha [mailto:jamalsha...@gmail.com]
> *Sent:* Monday, January 07, 2013 3:21 PM
> *To:* user@hadoop.apache.org
> *Subject:* Binary Search in ma
mory
> and you must perform a non-trivial graph traversal for each change record,
> you have something must harder to do.
>
> ** **
>
> FYI top google results for joins in Hadoop here:
> https://www.google.com/search?q=joins+in+hadoop&aq=f&oq=joins+in+hadoop&aqs=c
better.****
>
> ** **
>
> john
>
> ** **
>
> *From:* jamal sasha [mailto:jamalsha...@gmail.com]
> *Sent:* Monday, January 07, 2013 4:21 PM
> *To:* user@hadoop.apache.org
> *Subject:* Binary Search in map reduce
>
> ** **
>
> Hi,
>
>
erly then there is no problem
> with the third party libraries which you are using. It looks like to me
> that your code doesn't have the proper info about the intermediate path.
> Please make sure you have told your code the exact location of intermediate
> output.
>
>
> B
Hi,
So I am still in process of learning hadoop.
I tried to run wordcount.java (by writing my own mapper reducer.. creating
jar and then running it in a pseudo distributed mode).
At that time I got an error, something like
ERROR security.UserGroupInformation: PriviledgedActionException as:mhdus
Hi,
I have been using python hadoop streaming framework to write the code and
now I am slowly moving towards the core java api's.
And I am getting comfortable with it but what is the quickest way to debug
the map reduce native code..
like in hadoop streaming this worked great.
% cat input.txt | p
I am new to hadoop but I think the data transfer from the completed mapped
nodes are transferred (copied,.. shuffled and sorted ) to the reducer nodes
even though some of the mappers are still running. but the code execution
strts only when al the mapper phases have finished.
thats why you see some
Hi,
Lately, I have been writing alot of algorithms in map reduce abstraction
in python (hadoop streaming).
I have got a hang of it (I think)...
I have couple of questions:
1) By not using java libraries, what power of hadoop am I missing?
2) I know that this is just the tip of the iceberg, can so
the reducer takes the input as a key and a
> collection of values only. The reduce method signature defines it.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
> ------
> *From: * jamal sasha
> *Date: *Wed, 21 Nov 2012 14:50:51 -0
Hi..
I guess i am asking alot of fundamental questions but i thank you guys for
taking out time to explain my doubts.
So i am able to write map reduce jobs but here is my mydoubt
As of now i am writing mappers which emit key and a value
This key value is then captured at reducer end and then i proc
then you need lesser volume of data
per reducer for better performance results.
>
> In general it is better to have the number of reduce tasks slightly less
than the number of available reduce slots in the cluster.
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
>
>
> From: jamal sasha
>
> Date: Wed, 21 Nov 2012 11:38:38 -0500
>
> To: user@hadoop.apache.org
By default the number of reducers is set to 1..
Is there a good way to guess optimal number of reducers
Or let's say i have tbs worth of data... mappers are of order 5000 or so...
But ultimately i am calculating , let's say, some average of whole data...
say average transaction occurring...
Now
re being reattempted by the framework (default
>>> behavior, attempts 4 times to avoid transient failure scenario).
>>>
>>> Visit your job's logs in the JobTracker web UI, to find more
>>> information on why your tasks fail.
>>>
>&g
; Bejoy KS
>
> Sent from handheld, please excuse typos.
> ____
> From: jamal sasha
> Date: Tue, 20 Nov 2012 14:38:54 -0500
> To:
> ReplyTo: user@hadoop.apache.org
> Subject: number of reducers
>
>
> Hi,
>
> I wrote a simple map reduc
Hi,
I wrote a simple map reduce job in hadoop streaming.
I am wondering if I am doing something wrong ..
While number of mappers are projected to be around 1700.. reducers.. just 1?
It’s couple of TB’s worth of data.
What can I do to address this.
Basically mapper looks like this
For l
your Python code (Map unit & reduce unit) locally on your input data and
> see whether your logic has any issues.
>
> Best,
> Mahesh Balija,
> Calsoft Labs.
>
>
> On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha wrote:
>
>>
>>
>>
>> Hi,
>>
Hi,
This is my first attempt to learn the map reduce abstraction.
My problem is as follows
I have a text file as follows:
id 1, id2, date,time,mrps,code,code2
3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0
3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0
Now w
92 matches
Mail list logo