Hmm, thanks for the reply.  Anyone have a Pig way of doing this?  I'd rather 
not write a UDF to look for comment lines, but I can do so if I have to.  This 
seems like something PigStorage or the like should handle.
______________________________________
Michael Moore :: [email protected]
The Johns Hopkins University Applied Physics Laboratory
JHUAPL/AISD/VES analytics section
240-228-6768 phone
202-370-7993 mobile

0B7B17EE1AE2A80B pgp
BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint
 

On Jun 7, 2011, at 3:17 PM, <[email protected]> 
<[email protected]> wrote:

> I do that kind of streaming on hdfs files using Hadoop streaming, outside of 
> pig. I assume you could do it from inside pig too, but haven’t tested.
> 
> 
> 
> William F Dowling
> 
> Sr Technical Specialist, Software Engineering
> 
> Thomson Reuters
> 
> 0 +1 215 823 3853
> 
> 
> 
> From: Moore, Michael A. [mailto:[email protected]] 
> Sent: Tuesday, June 07, 2011 3:14 PM
> To: [email protected]
> Subject: Re: Loading Files with Comment Lines
> 
> 
> 
> Possibly.  Can I do that if the file is already in HDFS?
> 
> ______________________________________
> 
> Michael Moore :: [email protected] <mailto:[email protected]> 
> 
> The Johns Hopkins University Applied Physics Laboratory
> 
> 0B7B17EE1AE2A80B pgp
> 
> BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint
> 
> 
> 
> 
> 
> On Jun 7, 2011, at 3:12 PM, <[email protected]> wrote:
> 
> 
> 
> 
> 
> Can you stream it through
> 
> grep -v ‘^#’
> 
> 
> 
> ?
> 
> 
> 
> William F Dowling
> 
> Sr Technical Specialist, Software Engineering
> 
> Thomson Reuters
> 
> 0 +1 215 823 3853
> 
> 
> 
> From: Moore, Michael A. [mailto:[email protected]] 
> Sent: Tuesday, June 07, 2011 3:04 PM
> To: [email protected]
> Subject: Loading Files with Comment Lines
> 
> 
> 
> Hello all-
> 
> 
> 
> I've got a quick question and Google isn't proving to be much help.
> 
> 
> 
> I've got a big file, that has a few lines in it prefaced with a pound sign 
> (#) to indicate they are to be ignored.  I would like to LOAD this file using 
> PigStorage.  Is there a way to do this, or is it handled automatically?
> 
> 
> 
> The data might look something like this:
> 
> 
> 
> # Data Source: Project A
> 
> # Contact MMoore with Questions
> 
> # SenderId      RecipientId
> 
> 1          2
> 
> 3          5
> 
> 6          7
> 
> #2        1
> 
> 3          6
> 
> 11        7
> 
> 
> 
> Thanks!
> 
> -Michael
> 
> 
> 
> ______________________________________
> 
> Michael Moore :: [email protected] <mailto:[email protected]> 
> 
> The Johns Hopkins University Applied Physics Laboratory
> 
> 0B7B17EE1AE2A80B pgp
> 
> BC31 A861 9726 8211 F79F 7E21 0B7B 17EE 1AE2 A80B pgp fingerprint
> 
> 
> 
> 
> 
> 
> 
> 

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to