I guess that if you use newlines as row separator than Pig will load them using ALL the newlines. I don't think it can distinguish them. So you end up having too many rows. I think this type of input should be considered to be corrupted. If you need the newlines in the rows themselves I suggest you can use another separator for the rows, not the newlines. Thanks
On Wed, Jun 26, 2013 at 8:27 AM, Mohit Anchlia <[email protected]>wrote: > We use newline as row seprater, however we are getting some newlines in a > column. So data looks like this > > Hello I \n am \n here > Hello\n I am here > > Those are 2 lines however it gets broken down as 5 lines because of \n in > between and the real line ends. I tried to use foreach generate > REPLACE('\n',''); . Is that the right thing to do? Does it replace all \n > or only the first one? > > On Tue, Jun 25, 2013 at 3:13 AM, Ruslan Al-Fakikh <[email protected] > >wrote: > > > Hi Mohit, > > > > I don't clearly understand your use case. It depends on how you read the > > input, how you use the newlines... As the row separator, or just inside a > > row as a normal character. > > Can you put a simple example of input and output that you need? > > > > Thanks > > > > > > On Mon, Jun 24, 2013 at 10:18 PM, Mohit Anchlia <[email protected] > > >wrote: > > > > > Is there a way to remove line feeds from a bag in foreach? > > > > > > We today just do: > > > > > > > > > page = foreach B generate p; > > > > > > > > > > > > Is there a way to remove line from above foreach? I see you can do > > > DISTINCT, SUM but can I also replace newline with a space? > > > > > >
