Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/PigFaq

------------------------------------------------------------------------------
- '''1. I'm using PigStorage to parse my input files. Can I make it use control 
characters as delimiters?''' 
+ '''1. I'm using !PigStorage to parse my input files. Can I make it use 
control characters as delimiters?''' 
  
- Yes. The first parameter to PigStorage is the dataset name, the second is a 
regular expression to describe the delimiter. We used String.split(regex, -1) 
to extract fields from lines. See 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more 
information on the way to use special characters in regex. For example "load 
'input.dat' using PigStorage('\u0001');" will use ^A as a delimiter.
+ Yes. The first parameter to !PigStorage is the dataset name, the second is a 
regular expression to describe the delimiter. We used String.split(regex, -1) 
to extract fields from lines. See 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more 
information on the way to use special characters in regex. For example "load 
'input.dat' using !PigStorage('\u0001');" will use ^A as a delimiter.
  
  '''2. Can I do a numerical comparison while filtering?'''
  
@@ -14, +14 @@

  
  '''4. I would like to use Pig to read a list of .gz files that use '\u0001' 
as a delimiter. How do I do that?'''
  
- You can use the following load command: Load 'INPUT_FILE' USING 
<nop>PigStorage(‘\u0001’);
+ You can use the following load command: Load 'INPUT_FILE' USING 
<nop>!PigStorage(‘\u0001’);
  
  '''5. Does Pig support NULLs?'''
  
@@ -29, +29 @@

  You can filter away those records by including the following in your Pig 
program:
  
  <verbatim>
- A = load 'foo' using PigStorage('\t');
+ A = load 'foo' using !PigStorage('\t');
  B = FILTER A BY ARITY(*) < 5;
  .....
  </verbatim>
@@ -76, +76 @@

  
  Three (3) nodes is the minimum.
  
- '''14. How can I load data using "PigStorage()" that requires Unicode 
specification for separators?'''
+ '''14. How can I load data using "!PigStorage()" that requires Unicode 
specification for separators?'''
  
  
- Old version of Pig using '\t':<verbatim>a = load '/homes/yahooid/tmp/a.txt' 
using PigStorage('\t');</verbatim>
+ Old version of Pig using '\t':<verbatim>a = load '/homes/yahooid/tmp/a.txt' 
using !PigStorage('\t');</verbatim>
  
- New version of Pig using Unicode:<verbatim>a = load 
'/homes/yahooid/tmp/a.txt' using PigStorage('\u0000B');</verbatim>
+ New version of Pig using Unicode:<verbatim>a = load 
'/homes/yahooid/tmp/a.txt' using !PigStorage('\u0000B');</verbatim>
  

Reply via email to