Thanks!! Is there a place where I can see if task was re-scheduled? On Sat, Mar 24, 2012 at 6:28 PM, Prashant Kommireddi <[email protected]>wrote:
> Read about it here http://developer.yahoo.com/hadoop/tutorial/module4.html > > A task could get rescheduled and run in parallel, this happens when Hadoop > "thinks" the task is slower relative to other tasks in the job. This is to > make sure the free slots in the cluster can be used to run tasks that > (hadoop thinks) have slowed down due to issues with a particular node > having issues (slow disk, bad memory ...). > > In your case, my guess is 1 of the parts is larger relative to others and > the corresponding task is being rescheduled. It's a guess and I might be > wrong, but worth trying. > > Based on the phase that is writing to DB, you can set > "*mapred.map.tasks.speculative.execution" > or "**mapred.reduce.tasks.speculative.execution"* to false. > > Thanks, > Prashant > > > > On Sat, Mar 24, 2012 at 6:00 PM, Mohit Anchlia <[email protected] > >wrote: > > > No I don't have it turned off. Can you please explain what might be > > happening because of that? And how to debug if that indeed is the > problem. > > > > > > On Sat, Mar 24, 2012 at 5:30 PM, Prashant Kommireddi < > [email protected] > > >wrote: > > > > > Do you have speculative execution turned off? > > > > > > On Sat, Mar 24, 2012 at 5:25 PM, Mohit Anchlia <[email protected] > > > >wrote: > > > > > > > I don't have my script handy but all I am doing is something like: > > > > > > > > A = LOAD $in using PigStorage("\t") as (col:chararray, > col2:chararray); > > > > STORE A INTO '{Table}' USING using > > > > com.vertica.pig.VerticaStorer(‘localhost’,'verticadb502′,’5935′, > > 'user'); > > > > > > > > > > > > When I run as pig -f script6.pig -p in="/examples/2/part-m-0000[0-4]" > > it > > > > creates 2 rows > > > > > > > > but if I run them individually 4 times giving the actual file names > > then > > > it > > > > doesn't have any duplicates > > > > On Sat, Mar 24, 2012 at 1:36 PM, Bill Graham <[email protected]> > > > wrote: > > > > > > > > > Can you provide the script you're running? That will help people > > better > > > > > understand what you're doing. > > > > > > > > > > On Saturday, March 24, 2012, Mohit Anchlia <[email protected] > > > > > > wrote: > > > > > > Could someone please help me understand or give some pointers to > > me, > > > > > > > > > > > > On Fri, Mar 23, 2012 at 4:57 PM, Mohit Anchlia < > > > [email protected] > > > > > >wrote: > > > > > > > > > > > >> I am running a script to load data in the database. When I use > > > [0-4] I > > > > > see > > > > > >> 2 rows being created for every record that I process. But when I > > run > > > > > them > > > > > >> individually then it works. Could someone please help me > > understand > > > or > > > > > >> troubleshoot this behaviour? > > > > > >> > > > > > >> > > > > > >> pig -f script6.pig -p in="/examples/2/part-m-0000[0-4]" > --creates > > 2 > > > > rows > > > > > >> > > > > > >> pig -f script6.pig -p in="/examples/2/part-m-00000 --works > > > > > >> > > > > > >> pig -f script6.pig -p in="/examples/2/part-m-00001 --works > > > > > >> > > > > > >> pig -f script6.pig -p in="/examples/2/part-m-00002 --works > > > > > >> > > > > > >> pig -f script6.pig -p in="/examples/2/part-m-00003 --works > > > > > >> > > > > > >> pig -f script6.pig -p in="/examples/2/part-m-00004 --works > > > > > >> > > > > > > > > > > > > > > > > -- > > > > > *Note that I'm no longer using my Yahoo! email address. Please > email > > me > > > > at > > > > > [email protected] going forward.* > > > > > > > > > > > > > > >
