What is your record source? Files or Hive or? On Nov 17, 2015 6:29 PM, "Sam Joe" <games2013....@gmail.com> wrote:
> Hi Arvind, > > You are right. It works fine in local mode. No records eliminated. > > I need to now find out why while using mapreduce mode some records are > getting eliminated. > > Any suggestions on troubleshooting steps for finding out the root-cause in > mapreduce mode? Which logs to be checked, etc. > > Appreciate any help! > > Thanks, > Joel > > On Mon, Nov 16, 2015 at 11:32 PM, Arvind S <arvind18...@gmail.com> wrote: > > > tested on pig .15 using your data and in local mode .. could not > reproduce > > issue .. > > ================================================== > > final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > > > (Ian_hoch,{(en,Ian_hoch)}) > > (gwenshap,{(en,gwenshap)}) > > (p2people,{(en,p2people)}) > > (DoThisBest,{(en,DoThisBest)}) > > (wesleyyuhn1,{(en,wesleyyuhn1)}) > > (GuitartJosep,{(en,GuitartJosep)}) > > (Komalmittal91,{(en,Komalmittal91)}) > > (LornaGreenNWC,{(en,LornaGreenNWC)}) > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ)}) > > (innovatesocialm,{(en,innovatesocialm)}) > > ================================================== > > final_by_lsn_g = GROUP final_by_lsn BY language; > > > > > > > (en,{(en,DoThisBest),(en,wesleyyuhn1),(en,W4_Jobs_in_ARZ),(en,p2people),(en,Ian_hoch),(en,Komalmittal91),(en,innovatesocialm),(en,gwenshap),(en,GuitartJosep),(en,LornaGreenNWC)}) > > ================================================== > > > > suggestions .. > > > try in local mode to reporduce issue .. (if you have not already done > so) > > > close all old sessions and open a new one... (i know its dumb..but > helped > > me some times) > > > > > > *Cheers !!* > > Arvind > > > > On Tue, Nov 17, 2015 at 8:09 AM, Sam Joe <games2013....@gmail.com> > wrote: > > > > > Hi, > > > > > > I reproduced the issue with less columns as well. > > > > > > dump final_by_lsn; > > > > > > (en,LornaGreenNWC) > > > (en,GuitartJosep) > > > (en,gwenshap) > > > (en,innovatesocialm) > > > (en,Komalmittal91) > > > (en,Ian_hoch) > > > (en,p2people) > > > (en,W4_Jobs_in_ARZ) > > > (en,wesleyyuhn1) > > > (en,DoThisBest) > > > > > > grunt> final_by_lsn_g = GROUP final_by_lsn BY screen_name; > > > > > > > > > grunt> dump final_by_lsn_g; > > > > > > (gwenshap,{(en,gwenshap)}) > > > (p2people,{(en,p2people),(en,p2people),(en,p2people)}) > > > (GuitartJosep,{(en,GuitartJosep),(en,GuitartJosep),(en,GuitartJosep)}) > > > > > > > > > (W4_Jobs_in_ARZ,{(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ),(en,W4_Jobs_in_ARZ)}) > > > > > > > > > Steps I tried to find the root-cause: > > > - Removing special characters from the data > > > - Setting the loglevel to 'Debug' > > > However, I couldn't find a clue about the problem. > > > > > > > > > > > > Can someone please help me troubleshoot the issue? > > > > > > Thanks, > > > Joel > > > > > > On Fri, Nov 13, 2015 at 12:18 PM, Steve Terrell <sterr...@oculus360.us > > > > > wrote: > > > > > > > Please try reproducing the problem with the smallest amount of data > > > > possible. Use as few rows and the smallest strings possible that > still > > > > demonstrate the discrepancy. And then repost your problem. In doing > > so, > > > > it will make your request easier to digest by the readers of group, > and > > > you > > > > might even discover a problem in your original data if you can not > > > > reproduce it on a smaller scale. > > > > > > > > Thanks, > > > > Steve > > > > > > > > On Fri, Nov 13, 2015 at 10:28 AM, Sam Joe <games2013....@gmail.com> > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > I am trying to group a table (final) containing 10 records, by a > > > > > column screen_name using the following command. > > > > > > > > > > final_by_sn = GROUP final BY screen_name; > > > > > > > > > > When I dump final_by_sn table, only 4 records are returned as shown > > > > below: > > > > > > > > > > grunt> dump final_by_sn; > > > > > > > > > > (gwenshap,{(.@bigdata used this photo in his blog post and made me > > > > realize > > > > > how much I miss Japan: > > > > https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,2943 > > > > > ) > > > > > }) > > > > > (p2people,{(6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > > http://t.co/UBAni5DPrw > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437 > > > > > ),(6 > > > > > new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > http://t.co/UBAni5DPrw > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437),(6 new > > > @p2pLanguages > > > > > jobs w/ #BigData #Hadoop skills http://t.co/UBAni5DPrw > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,2437)}) > > > > > (GuitartJosep,{(#BigData: What it can and can't do! > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140),(#BigData: > What > > it > > > > can > > > > > and can't do! http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140 > > > > > ),(#BigData: > > > > > What it can and can't do! > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,140)}) > > > > > (W4_Jobs_in_ARZ,{(Big #Data #Lead Phoenix AZ (#job) wanted in > > #Arizona. > > > > > #TechFetch http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big > > > #Data > > > > > #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433),(Big #Data > #Lead > > > > > Phoenix > > > > > AZ (#job) wanted in #Arizona. #TechFetch > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,433)}) > > > > > > > > > > dump final; > > > > > > > > > > (RT @lordlancaster: Absolutely blown away by @SciTecDaresbury! > > 'Proper' > > > > Big > > > > > Data, Smart Cities, Internet of Things & more! #TechNorth > > > > > http:/…,en,LornaGreenNWC,8,166,188,Mon May 12 10:19:39 +0000 > > > > > 2014,654395184428515332) > > > > > (#BigData: What it can and can't do! > > > > > http://t.co/LrO4NBZE4J,en,GuitartJosep,,61,218,Thu Jun 18 10:20:02 > > > +0000 > > > > > 2015,654395189595869184) > > > > > (.@bigdata used this photo in his blog post and made me realize how > > > much > > > > I > > > > > miss Japan: https://t.co/XdglxbLBhN,en,gwenshap,,4992,1887,Mon Oct > > 15 > > > > > 20:49:39 +0000 2007,654395195581009920) > > > > > ("Global Release [Big Data Book] Profit From Science" on @LinkedIn > > > > > http://t.co/WnJ2HwthYF Congrats to George > > > > > Danner!,en,innovatesocialm,,1517,1712,Wed Sep 12 13:46:43 +0000 > > > > > 2012,654395207065034752) > > > > > (Hi, BesPardon Don't Forget to follow -->> > > > http://t.co/Dahu964w5U > > > > > Thanks.. http://t.co/9kKXJ0GQcT,en,Komalmittal91,,51,0,Thu Feb 12 > > > > 16:44:50 > > > > > +0000 2015,654395216208752641) > > > > > (On Google Books, language, and the possible limits of big data > > > > > https://t.co/OEebZSK952,en,Ian_hoch,,63,107,Fri Aug 31 16:25:09 > > +0000 > > > > > 2012,654395216057659392) > > > > > (6 new @p2pLanguages jobs w/ #BigData #Hadoop skills > > > > > http://t.co/UBAni5DPrw > > > > > http://t.co/IhKNWMc5fy,en,p2people,,1899,1916,Wed Mar 04 06:17:09 > > > +0000 > > > > > 2009,654395220373729280) > > > > > (Big #Data #Lead Phoenix AZ (#job) wanted in #Arizona. #TechFetch > > > > > http://t.co/v82R4WmWMC,en,W4_Jobs_in_ARZ,,7,9,Fri Aug 29 09:32:31 > > > +0000 > > > > > 2014,654395236718911488) > > > > > (#Appboy expands suite of #mobile #analytics @venturebeat > > @wesleyyuhn1 > > > > > http://t.co/85P6vEJg08 #MarTech #automation > > > > > http://t.co/rWqzNNt1vW,en,wesleyyuhn1,,1531,1927,Mon Jul 21 > 12:35:12 > > > > +0000 > > > > > 2014,654395243975065600) > > > > > (Best Cloud Hosting and CDN services for Web Developers > > > > > http://t.co/9uf6IaUIlM #cdn #cloudcomputing #cloudhosting > > #webmasters > > > > > #websites,en,DoThisBest,,816,1092,Mon Nov 26 18:34:20 +0000 > > > > > 2012,654395246025904128) > > > > > grunt> > > > > > > > > > > > > > > > Could you please help me understand why 6 records are eliminated > > while > > > > > doing a group by? > > > > > > > > > > Thanks, > > > > > Joel > > > > > > > > > > > > > > >