MultipleInputs.addInputPath

2013-11-21 Thread jamal sasha
Hi, So, I have two different directories.. which i want to process differently... For which I have to mappers for the job.. Data1 Data2 and in my driver.. I add the following: MultipleInputs.addInputPath(job, new Path( args[0]), TextInputFormat.class, Data1.class); MultipleInpu

Dealing with stragglers in hadoop

2013-11-15 Thread jamal sasha
Hi, I have a very simple use case... Basically I have an edge list and I am trying to convert it into adjacency list.. Basically src target a b ac bd be and so on.. What I am trying to build is a [b,c] b [d,e] .. and so on.. But every now and then.. I hit a super node..which h

MRUNIT basic question

2013-10-26 Thread jamal sasha
Hi, I have been searching in mrunit documentation but hasnt been able to find it so far.. How do i pass configuration parameters in my mrunit. So for example, if i take the wordcount example. Lets say, in my driver code I am setting this parameter... conf.set("delimiter",args[2]) And in my ma

Unable to use third party jar

2013-10-24 Thread jamal sasha
Hi, I am trying to join two datasets.. One of which is json.. I am relying on json-simple library to parse that json.. I am trying to use libjars.. So far .. for simple data processing.. the approach has worked.. but now i am getting the following error Exception in thread "main" java.lang.NoClass

Re: Unable to use third party jar

2013-10-24 Thread jamal sasha
OOps..forgot the code: http://pastebin.com/7XnyVnkv On Thu, Oct 24, 2013 at 10:54 AM, jamal sasha wrote: > Hi, > > I am trying to join two datasets.. One of which is json.. > I am relying on json-simple library to parse that json.. > I am trying to use libjars.. So far ..

Writing to multiple directories in hadoop

2013-10-11 Thread jamal sasha
Hi, I am trying to separate my output from reducer to different folders.. My dirver has the following code: FileOutputFormat.setOutputPath(job, new Path(output)); //MultipleOutputs.addNamedOutput(job, namedOutput, outputFormatClass, keyClass, valueClass) //MultipleOutputs

Re: Multiple context.write inmapper

2013-10-11 Thread jamal sasha
never mind.. found a bug :D On Fri, Oct 11, 2013 at 12:54 PM, jamal sasha wrote: > Hi.. > > In my mapper function.. > Can i have multiple context.write()... > > So... > > public void map(LongWritable key, Text value, Context context) throws > IOExce

Multiple context.write inmapper

2013-10-11 Thread jamal sasha
Hi.. In my mapper function.. Can i have multiple context.write()... So... public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException ,NullPointerException{ .. //processing/... context.write(k1,v1); context.write(k2,v2); } I thought we could do th

Accessing only particular folder using hadoop streaming

2013-10-02 Thread jamal sasha
Hi, I have data in this one folder like following: data---shard1---d1_1 | |_d2_1 Lshard2---d1_1 | |_d2_2 Lshard3---d1_1 | |_d2_3 Lshard4---d1_1 |_d2_4 Now, I want to

[no subject]

2013-09-20 Thread jamal sasha
Hi, So in native hadoop streaming, how do i send a helper file.. ? Like in core hadoop, you can write your code in multiple files and then jar it out... But if i am using hadoop streaming, all my code should be in single file?? Is that so?

Re: Using combiners in python hadoop streaming

2013-09-20 Thread jamal sasha
Oops.. wrong email thread :D Please ignore the previous email On Fri, Sep 20, 2013 at 1:49 PM, jamal sasha wrote: > Hi, > So in native hadoop streaming, is there no way to send a helper file.. ? > Like in core hadoop, you can write your code in multiple files and then > jar it out

Re: Using combiners in python hadoop streaming

2013-09-20 Thread jamal sasha
wrote: > LMGTFY: > http://pydoop.sourceforge.net/docs/pydoop_script.html#pydoop-script-guide > > > On Wed, Sep 18, 2013 at 6:01 PM, jamal sasha wrote: > >> Hi, >> How do I implement (say ) in wordcount a combiner functionality if i am >> using python hadoop streaming? >> Thanks >> > >

Using combiners in python hadoop streaming

2013-09-18 Thread jamal sasha
Hi, How do I implement (say ) in wordcount a combiner functionality if i am using python hadoop streaming? Thanks

Re: reading input stream

2013-08-29 Thread jamal sasha
= FileSystem.open(p); > String str; > while((str = iStream.readLine())!=null) > { > System.out.printn(str); > > } > Regards, > Som Shekhar Sharma > +91-8197243810 >

reading input stream

2013-08-28 Thread jamal sasha
Hi, Probably a very stupid question. I have this data in binary format... and the following piece of code works for me in normal java. public classparser { public static void main(String [] args) throws Exception{ String filename = "sample.txt"; File file = new File(filename); FileInputStrea

QUick question

2013-08-28 Thread jamal sasha
Hi, A very weird question. I have data in format 1... And there is this conversion utility to convert data to format 2 whcih works like this restore input > output and then I want to copy output to hdfs at /user/output Is tehre a way that i can merge all these commands into single line like c

Re: Writing data to hbase from reducer

2013-08-28 Thread jamal sasha
args); // calls your run() method. System.exit(ret); } } On Wed, Aug 28, 2013 at 10:03 AM, Shahab Yunus wrote: > Just google it. > > For HBaseStorage > http://blog.whitepages.com/2011/10/27/hbase-storage-and-pig/ > > For M/R: > http://wiki.apache.org/hadoop/Hbase/MapRe

Re: Writing data to hbase from reducer

2013-08-28 Thread jamal sasha
t; > Or you can use Pig to store it in HBase using HBaseStorage. > > There are many ways (and resources available on the web) and he question > that you have asked is very high level. > > Regards, > Shahab > > > On Wed, Aug 28, 2013 at 12:49 PM, jamal sasha wrote:

Writing data to hbase from reducer

2013-08-28 Thread jamal sasha
Hi, I have data in form: source, destination, connection This data is saved in hdfs I want to read this data and put it in hbase table something like: Column1 (source) | Column2(Destination)| Column3(Connection Type) Rowvertex A| vertex B | connection Ho

Writing multiple tables from reducer

2013-08-27 Thread jamal sasha
Hi, I am new to hbase and am trying to achieve the following. I am reading data from hdfs in mapper and parsing it.. So, in reducer I want my output to write to hbase instead of hdfs But here is the thing. public static class MyTableReducer extends TableReducer { public void reduce(Text key

libjars error

2013-08-27 Thread jamal sasha
I have bunch of jars whcih i want to pass. I am using libjars option to do so. But to do that I have to implement tool ?? So i change my code to following but still I am getting this warning? 13/08/27 11:32:37 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications s

Re: Jar issue

2013-08-27 Thread jamal sasha
t; log4j > > > > > > slf4j-log4j12 > > org.slf4j > > > > > > > > Regards, > Shahab > > > On Tue, Aug 27, 2013 at 11:47 AM, jamal sasha wrote: > >> Hi, >> For one of my map reduce code I want to use a different version of >> slf4

Jar issue

2013-08-27 Thread jamal sasha
Hi, For one of my map reduce code I want to use a different version of slf4j jar (1.6.4) But I guess hadoop has a different version of jar in hadoop classpath lib/slf4j-log4j12-1.4.3.jar And when I am trying to run my code, I am gettign this error: Exception in thread "main" java.lang.NoSuchMetho

Things to keep in mind when writing to a db

2013-08-16 Thread jamal sasha
Hi, I am wondering if there is any tutorial to see. What are the challenges for reading and/or writing to/from database. Is there a common flavor across all the database. For example, the dbs start a server on some host : port Establish connection to that host:port It can be across proxy? Which

executing linux command from hadoop (python)

2013-08-15 Thread jamal sasha
Hi, Lets say that I have a data which interacts with a rest api like %curl hostname data Now, I have the following script: #!/usr/bin/env python import sys,os cmd = """curl http://localhost --data '""" string = " " for line in sys.stdin: line = line.rstrip(os.linesep) string += line

Re: 答复: Passing an object in mapper

2013-08-14 Thread jamal sasha
s as key value pair. This configuration object will be set in the > Job Object. The same properties can be accessed in the mapper/reducer using > the Context Object -> getConfiguration() -> get(propertyName). > > ** ** > > Hope this helps. > > ** ** > > Regards, &g

Passing an object in mapper

2013-08-14 Thread jamal sasha
Hi, I am initializing an object in driver code. For sake of argument let say I want to save data to some database.. say: Connection con = new Connection(host, db); Now, in reducer I want to do something like con.write(key,value) So, how do i pass this object from driver to mapper / reducer/? An

Not able to understand writing custom writable

2013-08-09 Thread jamal sasha
Hi, I am trying to understand, how to write my own writable. So basically trying to understand how to process records spanning multiple lines. Can some one break down to me, that what are the things needed to be considered in each method?? I am trying to understand this example: https://github

Re: Passing arguments in hadoop

2013-08-06 Thread jamal sasha
Never mind guys. I had a typo when I was trying to set configuration param. Sorry. On Tue, Aug 6, 2013 at 4:46 PM, jamal sasha wrote: > Hi, > I am trying to pass a parameter to multiple mappers > > So, I do this in my driver > > conf.set("delimiter", ar

Passing arguments in hadoop

2013-08-06 Thread jamal sasha
Hi, I am trying to pass a parameter to multiple mappers So, I do this in my driver conf.set("delimiter", args[3]); In mapper1, I am retreiving this as: Configuration conf = context.getConfiguration(); String[] values = value.toString().split(conf.get("delimiter")); and same is my mapper2 B

Re: java.util.NoSuchElementException

2013-07-31 Thread jamal sasha
code to get these directly instead of iterating multiple times. > > > ** ** > > Thanks**** > > Devaraj k > > ** ** > > *From:* jamal sasha [mailto:jamalsha...@gmail.com] > *Sent:* 31 July 2013 23:40 > *To:* user@hadoop.apache.org > *Subject:*

java.util.NoSuchElementException

2013-07-31 Thread jamal sasha
Hi, I am getting this error: 13/07/31 09:29:41 INFO mapred.JobClient: Task Id : attempt_201307102216_0270_m_02_2, Status : FAILED java.util.NoSuchElementException at java.util.StringTokenizer.nextToken(StringTokenizer.java:332) at java.util.StringTokenizer.nextElement(StringTokenizer.java:39

objects as key/values

2013-07-29 Thread jamal sasha
Ok. A very basic (stupid) question. I am trying to compute mean using hadoop. So my implementation is like this: public class Mean public static class Pair{ //simple class to create object } public class MeanMapper emit(text,pair) //where pair is (local sum, count) public class MeanRed

Error on running a hadoop job

2013-07-29 Thread jamal sasha
Hi, I am getting a weird error? 13/07/29 10:50:58 INFO mapred.JobClient: Task Id : attempt_201307102216_0145_r_16_0, Status : FAILED org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on /wordcount_raw/_temporary/_attempt_201307102216

Re: Inputformat

2013-06-22 Thread jamal sasha
Then how should I approach this issue? On Fri, Jun 21, 2013 at 4:25 PM, Niels Basjes wrote: > If you try to hammer in a nail (json file) with a screwdriver ( > XMLInputReader) then perhaps the reason it won't work may be that you are > using the wrong tool? > On Jun 21, 2013

Inputformat

2013-06-21 Thread jamal sasha
Hi, I am using one of the libraries which rely on InputFormat. Right now, it is reading xml files spanning across mutiple lines. So currently the input format is like: public class XMLInputReader extends FileInputFormat { public static final String START_TAG = ""; public static final Strin

Exposing hadoop web interface

2013-06-03 Thread jamal sasha
Hi, I have deployed hadoop into a small cluster.. Now the issue is while on job launch it does says Tracking URL: http://foobar:50030/jobdetails.jsp?jobid=job_201305241622_0047 but I cannot look at this url and look at the job status (maybe its the firewall?? proxy?? ) I can only look at my l

Re: Reading json format input

2013-05-30 Thread jamal sasha
Ok got this thing working.. Turns out that -libjars should be mentioned before specifying hdfs input and output.. rather than after it.. :-/ Thanks everyone. On Thu, May 30, 2013 at 1:35 PM, jamal sasha wrote: > Hi, > I did that but still same exception error. > I did:

Re: Reading json format input

2013-05-30 Thread jamal sasha
the -libjars parameter when you > kick off your M/R job. This way the jars will be copied to all TTs. > > Regards, > Shahab > > > On Thu, May 30, 2013 at 2:43 PM, jamal sasha wrote: > >> Hi Thanks guys. >> I figured out the issue. Hence i have another question. &g

Re: Reading json format input

2013-05-30 Thread jamal sasha
> > On Thu, May 30, 2013 at 8:42 AM, Rahul Bhattacharjee < > rahul.rec@gmail.com> wrote: > >> Whatever you have mentioned Jamal should work.you can debug this. >> >> Thanks, >> Rahul >> >> >> On Thu, May 30, 2013 at 5:14 AM, jamal sasha wro

Re: Reading json format input

2013-05-29 Thread jamal sasha
aracters. > > Thanks and Regards, > > Rishi Yadav > > On Wed, May 29, 2013 at 2:54 PM, jamal sasha wrote: > >> Hi, >>I am stuck again. :( >> My input data is in hdfs. I am again trying to do wordcount but there is >> slight difference. >> The

Re: Reading json format input

2013-05-29 Thread jamal sasha
group AS word, > COUNT_STAR(words) AS word_count; > STORE word_counts INTO '/tmp/word_counts.txt'; > > It will be faster than the Java you'll likely write. > > > On Wed, May 29, 2013 at 2:54 PM, jamal sasha wrote: > >> Hi, >>I am stuck again. :(

Reading json format input

2013-05-29 Thread jamal sasha
Hi, I am stuck again. :( My input data is in hdfs. I am again trying to do wordcount but there is slight difference. The data is in json format. So each line of data is: {"author":"foo", "text": "hello"} {"author":"foo123", "text": "hello world"} {"author":"foo234", "text": "hello this world"}

Writing data in db instead of hdfs

2013-05-29 Thread jamal sasha
Hi, Is it possible to save data in database (HBase, cassandra??) directly from hadoop. so that there is no output in hdfs but that it directly writes data into this db? If I want to modify wordcount example to achive this, what/where should I made these modifications. Any help/ suggestions. Tha

Not saving any output

2013-05-28 Thread jamal sasha
Hi, I want to process some text files and then save the output in a db. I am using python (hadoop streaming). I am using mongo as backend server. Is it possible to run hadoop streaming jobs without specifying any output? What is the best way to deal with this.

Difference between combiner and aggregator

2013-04-05 Thread jamal sasha
Hi, I am trying to understand the difference between combiner and aggregator. Based on my readings: Wordcount example (mapper) aggregator class Mapper method MAP H <-- Associative array for all term t in document: H{t} = H{t} + 1 for all term t ele H do EMIT(term t, count H{t

Re: Finding mean and median python streaming

2013-04-04 Thread jamal sasha
uce the ultimate result. > > BR > Yanbo > > > 2013/4/2 jamal sasha > >> pinging again. >> Let me rephrase the question. >> If my data is like: >> id, value >> >> And I want to find average "value" for each id, how can i do that using >

Basic hadoop MR question

2013-04-02 Thread jamal sasha
Hi, I have a quick question. I am trying to write MR code using python. In the word count example: http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/ The reducer.. Why cant in the reducer I can declare a ditionary (hashmap) whose key is word and value is a list o

Re: Finding mean and median python streaming

2013-04-01 Thread jamal sasha
t right. I would really appreciate if someone can help me with this query. THanks On Mon, Apr 1, 2013 at 2:27 PM, jamal sasha wrote: > data_dict is declared globably as > data_dict = defaultdict(list) > > > On Mon, Apr 1, 2013 at 2:25 PM, jamal sasha wrote: > >> Very

Re: Finding mean and median python streaming

2013-04-01 Thread jamal sasha
data_dict is declared globably as data_dict = defaultdict(list) On Mon, Apr 1, 2013 at 2:25 PM, jamal sasha wrote: > Very dumb question.. > I have data as following > id1, value > 1, 20.2 > 1,20.4 > > > I want to find the mean and median of id1? > I am us

Finding mean and median python streaming

2013-04-01 Thread jamal sasha
Very dumb question.. I have data as following id1, value 1, 20.2 1,20.4 I want to find the mean and median of id1? I am using python hadoop streaming.. mapper.py for line in sys.stdin: try: # remove leading and trailing whitespace line = line.rstrip(os.linesep) tokens = line.split(",") print

Re: Hadoop streaming weird problem

2013-03-28 Thread jamal sasha
oops never mind guys.. figured out the issue. sorry for spamming. On Thu, Mar 28, 2013 at 5:15 PM, jamal sasha wrote: > Very much like this: > > http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file > > > On Thu,

Re: Hadoop streaming weird problem

2013-03-28 Thread jamal sasha
Very much like this: http://stackoverflow.com/questions/13445126/python-code-is-valid-but-hadoop-streaming-produces-part-0-empty-file On Thu, Mar 28, 2013 at 5:10 PM, jamal sasha wrote: > Hi, >I am facing a weird problem. > My python scripts were working just fine. >

Hadoop streaming weird problem

2013-03-28 Thread jamal sasha
Hi, I am facing a weird problem. My python scripts were working just fine. I made few modifications.. tested via: cat input.txt | python mapper.py | sort | python reducer.py runs just fine Ran on my local machine (pseudo-distributed mode) THat also runs just fine Deployed on clusters.. Now,

Re: copy chunk of hadoop output

2013-03-01 Thread jamal sasha
Though it copies.. but it gives this error? On Fri, Mar 1, 2013 at 3:21 PM, jamal sasha wrote: > When I try this.. I get an error > cat: Unable to write to output stream. > > Are these permissions issue > How do i resolve this? > THanks > > > On Wed, Feb 20, 2013

Re: copy chunk of hadoop output

2013-03-01 Thread jamal sasha
> >> srvID: DS-1092147940-192.168.2.1-50010-1349279636946, blockid: > >> BP-1461691939-192.168.2.1-1349279623549:blk_2568668834545125596_73870, > >> duration: 19207000 > >> > >> I don't see how this is anymore dangerous than doing a > >>

Re: copy chunk of hadoop output

2013-02-19 Thread jamal sasha
Awesome thanks :) On Tue, Feb 19, 2013 at 2:14 PM, Harsh J wrote: > You can instead use 'fs -cat' and the 'head' coreutil, as one example: > > hadoop fs -cat 100-byte-dfs-file | head -c 5 > 5-byte-local-file > > On Wed, Feb 20, 2013 at 3:38 AM, jam

copy chunk of hadoop output

2013-02-19 Thread jamal sasha
Hi, I was wondering in the following command: bin/hadoop dfs -copyToLocal hdfspath localpath can we have specify to copy not full but like xMB's of file to local drive? Is something like this possible Thanks Jamal

executing hadoop commands from python?

2013-02-16 Thread jamal sasha
Hi, This might be more of a python centric question but was wondering if anyone has tried it out... I am trying to run few hadoop commands from python program... For example if from command line, you do: bin/hadoop dfs -ls /hdfs/query/path it returns all the files in the hdfs query pat

mappers-node relationship

2013-01-24 Thread jamal sasha
Hi. A very very lame question. Does numbers of mapper depends on the number of nodes I have? How I imagine map-reduce is this. For example in word count example I have bunch of slave nodes. The documents are distributed across these slave nodes. Now depending on how big the data is, it will sprea

Re: passing arguments to hadoop job

2013-01-21 Thread jamal sasha
orked fine. Can you show the code of your driver program (i.e. where you have main) ? > > Thanks > hemanth > > > > On Tue, Jan 22, 2013 at 5:22 AM, jamal sasha wrote: >> >> Hi, >> Lets say I have the standard helloworld program >> http://hadoop.apache.o

Re: passing arguments to hadoop job

2013-01-21 Thread jamal sasha
in reducer by > adding a logger or syso. > > Also consider removing static in declaration of baseSum as it would add > counts of previous keys. > On Jan 22, 2013 7:17 AM, "jamal sasha" wrote: > >> The second one. >> If the word hello appears once, its count is

Re: passing arguments to hadoop job

2013-01-21 Thread jamal sasha
HELLO appears once it's count is 201. > > Please clarify > On Jan 22, 2013 5:22 AM, "jamal sasha" wrote: > >> Hi, >> Lets say I have the standard helloworld program >> >> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html#Example%3A

Re: Program trying to read from local instead of hdfs

2013-01-17 Thread jamal sasha
rm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Fri, Jan 18, 2013 at 5:48 AM, jamal sasha wrote: > >> >> On Thu, Jan 17, 2013 at 4:14 PM, Mohammad Tariq wrote: >> >>> hdfs://your_namenode:9000/user/hduser/data/input1.t

Re: Program trying to read from local instead of hdfs

2013-01-17 Thread jamal sasha
On Thu, Jan 17, 2013 at 4:14 PM, Mohammad Tariq wrote: > hdfs://your_namenode:9000/user/hduser/data/input1.txt > It runs :D But I am very curious. if i run the sample wordcount example normally .. it automatically reads from the hdfs location.. but here.. it didnt seemed to respect that?

Re: Program trying to read from local instead of hdfs

2013-01-17 Thread jamal sasha
uration.addResource(new Path("PATH_TO_YOUR_hdfs-site.xml")); > > Warm Regards, > Tariq > https://mtariq.jux.com/ > cloudfront.blogspot.com > > > On Fri, Jan 18, 2013 at 5:26 AM, jamal sasha wrote: > >> Hi, >> I am not sure what I am doing wrong.

Program trying to read from local instead of hdfs

2013-01-17 Thread jamal sasha
Hi, I am not sure what I am doing wrong. I copy my input files from local to hdfs at local /user/hduser/data/input1.txt /user/hduser/data/input2.txt In my driver code: I have MultipleInputs.addInputPath(conf, new Path(args[0]), TextInputFormat.class, UserFileMapper.class); Multip

Re: modifying existing wordcount example

2013-01-16 Thread jamal sasha
tple Inputs here and process the new input file into > 'word 1' and the previous output file as 'word $count' in the mapper and do > its aggregation in the reducer. > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > -

Re: modifying existing wordcount example

2013-01-16 Thread jamal sasha
; > On Wed, Jan 16, 2013 at 9:07 PM, jamal sasha wrote: > >> Hi, >> In the wordcount example: >> http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html >> Lets say I run the above example and save the the output. >> But lets say that I have now a new inp

modifying existing wordcount example

2013-01-16 Thread jamal sasha
Hi, In the wordcount example: http://hadoop.apache.org/docs/r0.17.0/mapred_tutorial.html Lets say I run the above example and save the the output. But lets say that I have now a new input file. What I want to do is.. basically again do the wordcount but basically modifying the previous counts. Fo

tcp error

2013-01-16 Thread jamal sasha
I am inside a network where I need proxy settings to access the internet. I have a weird problem. The internet is working fine. But it is one particular instance when i get this error: Network Error (tcp_error) A communication error occurred: "Operation timed out" The Web Server may

Re: newbie question

2013-01-15 Thread jamal sasha
Hi, The relevant code snippet posted here http://pastebin.com/DRPXUm62 On Tue, Jan 15, 2013 at 5:31 PM, jamal sasha wrote: > My bad. Sorry I fixed it. It is BuildGraph.class > > > On Tue, Jan 15, 2013 at 5:30 PM, Serge Blazhiyevskyy < > serge.blazhiyevs...@nice.com> wrote

Re: newbie question

2013-01-15 Thread jamal sasha
SERGE BLAZHIYEVSKY > Architect > (T) +1 (650) 226-0511 > (M) +1 (408) 772-2615 > se...@nice.com<mailto:se...@nice.com> > www.nice.com<http://www.nice.com> > > > On Jan 15, 2013, at 5:24 PM, jamal sasha jamalsha...@gmail.com>> wrote: > > I have a mappe

newbie question

2013-01-15 Thread jamal sasha
I have a mapper public class BuildGraph{ public void config(JobConf job){ *<==this block doesnt seems to be exexcuting at all :(* super.configure(job); this.currentId = job.getInt("currentId",0); if (this.currentId!=0){ // I call a method from differnt c

probably very stupid question

2013-01-14 Thread jamal sasha
Hi, Probably a very lame question. I have two documents and I want to find the overlap of both documents in map reduce fashion and then compare the overlap (lets say I have some measure to do that) SO this is what I am thinking: 1) Run the normal wordcount job on one document ( https://site

Re: Binary Search in map reduce

2013-01-08 Thread jamal sasha
king for but that wont be > map reduce. > > ** ** > > Thanks, > > Abhishek > > ** ** > > *From:* jamal sasha [mailto:jamalsha...@gmail.com] > *Sent:* Monday, January 07, 2013 3:21 PM > *To:* user@hadoop.apache.org > *Subject:* Binary Search in ma

Re: Binary Search in map reduce

2013-01-07 Thread jamal sasha
mory > and you must perform a non-trivial graph traversal for each change record, > you have something must harder to do. > > ** ** > > FYI top google results for joins in Hadoop here: > https://www.google.com/search?q=joins+in+hadoop&aq=f&oq=joins+in+hadoop&aqs=c

Re: Binary Search in map reduce

2013-01-07 Thread jamal sasha
better.**** > > ** ** > > john > > ** ** > > *From:* jamal sasha [mailto:jamalsha...@gmail.com] > *Sent:* Monday, January 07, 2013 4:21 PM > *To:* user@hadoop.apache.org > *Subject:* Binary Search in map reduce > > ** ** > > Hi, > >

Re: setting hadoop for pseudo distributed mode.

2012-12-27 Thread jamal sasha
erly then there is no problem > with the third party libraries which you are using. It looks like to me > that your code doesn't have the proper info about the intermediate path. > Please make sure you have told your code the exact location of intermediate > output. > > > B

setting hadoop for pseudo distributed mode.

2012-12-27 Thread jamal sasha
Hi, So I am still in process of learning hadoop. I tried to run wordcount.java (by writing my own mapper reducer.. creating jar and then running it in a pseudo distributed mode). At that time I got an error, something like ERROR security.UserGroupInformation: PriviledgedActionException as:mhdus

good way to debug map reduce code

2012-12-25 Thread jamal sasha
Hi, I have been using python hadoop streaming framework to write the code and now I am slowly moving towards the core java api's. And I am getting comfortable with it but what is the quickest way to debug the map reduce native code.. like in hadoop streaming this worked great. % cat input.txt | p

Re: can local disk of reduce task cause the job to fail?

2012-12-09 Thread jamal sasha
I am new to hadoop but I think the data transfer from the completed mapped nodes are transferred (copied,.. shuffled and sorted ) to the reducer nodes even though some of the mappers are still running. but the code execution strts only when al the mapper phases have finished. thats why you see some

advice

2012-11-27 Thread jamal sasha
Hi, Lately, I have been writing alot of algorithms in map reduce abstraction in python (hadoop streaming). I have got a hang of it (I think)... I have couple of questions: 1) By not using java libraries, what power of hadoop am I missing? 2) I know that this is just the tip of the iceberg, can so

Re: fundamental doubt

2012-11-21 Thread jamal sasha
the reducer takes the input as a key and a > collection of values only. The reduce method signature defines it. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------ > *From: * jamal sasha > *Date: *Wed, 21 Nov 2012 14:50:51 -0

fundamental doubt

2012-11-21 Thread jamal sasha
Hi.. I guess i am asking alot of fundamental questions but i thank you guys for taking out time to explain my doubts. So i am able to write map reduce jobs but here is my mydoubt As of now i am writing mappers which emit key and a value This key value is then captured at reducer end and then i proc

Re: guessing number of reducers.

2012-11-21 Thread jamal sasha
then you need lesser volume of data per reducer for better performance results. > > In general it is better to have the number of reduce tasks slightly less than the number of available reduce slots in the cluster. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > > > > From: jamal sasha > > Date: Wed, 21 Nov 2012 11:38:38 -0500 > > To: user@hadoop.apache.org

guessing number of reducers.

2012-11-21 Thread jamal sasha
By default the number of reducers is set to 1.. Is there a good way to guess optimal number of reducers Or let's say i have tbs worth of data... mappers are of order 5000 or so... But ultimately i am calculating , let's say, some average of whole data... say average transaction occurring... Now

Re: reducer not starting

2012-11-21 Thread jamal sasha
re being reattempted by the framework (default >>> behavior, attempts 4 times to avoid transient failure scenario). >>> >>> Visit your job's logs in the JobTracker web UI, to find more >>> information on why your tasks fail. >>> >&g

Re: number of reducers

2012-11-20 Thread jamal sasha
; Bejoy KS > > Sent from handheld, please excuse typos. > ____ > From: jamal sasha > Date: Tue, 20 Nov 2012 14:38:54 -0500 > To: > ReplyTo: user@hadoop.apache.org > Subject: number of reducers > > > Hi, > > I wrote a simple map reduc

number of reducers

2012-11-20 Thread jamal sasha
Hi, I wrote a simple map reduce job in hadoop streaming. I am wondering if I am doing something wrong .. While number of mappers are projected to be around 1700.. reducers.. just 1? It’s couple of TB’s worth of data. What can I do to address this. Basically mapper looks like this For l

Re: debugging hadoop streaming programs (first code)

2012-11-20 Thread jamal sasha
your Python code (Map unit & reduce unit) locally on your input data and > see whether your logic has any issues. > > Best, > Mahesh Balija, > Calsoft Labs. > > > On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha wrote: > >> >> >> >> Hi, >>

Fwd: debugging hadoop streaming programs (first code)

2012-11-19 Thread jamal sasha
Hi, This is my first attempt to learn the map reduce abstraction. My problem is as follows I have a text file as follows: id 1, id2, date,time,mrps,code,code2 3710100022400,1350219887, 2011-09-10, 12:39:38.000, 99.00, 1, 0 3710100022400, 5045462785, 2011-09-06, 13:23:00.000, 70.63, 1, 0 Now w