Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/PigFaq

------------------------------------------------------------------------------
- 1. I'm using PigStorage to parse my input files. Can I make it use control 
characters as delimiters? 
+ '''1. I'm using PigStorage to parse my input files. Can I make it use control 
characters as delimiters?''' 
  
- Ans. Yes. The first parameter to PigStorage is the dataset name, the second 
is a regular expression to describe the delimiter. We used String.split(regex, 
-1) to extract fields from lines. See 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more 
information on the way to use special characters in regex. For example "load 
'input.dat' using PigStorage('\u0001');" will use ^A as a delimiter.
+ Yes. The first parameter to PigStorage is the dataset name, the second is a 
regular expression to describe the delimiter. We used String.split(regex, -1) 
to extract fields from lines. See 
http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more 
information on the way to use special characters in regex. For example "load 
'input.dat' using PigStorage('\u0001');" will use ^A as a delimiter.
  
- 2. Can I do a numerical comparison while filtering?
+ '''2. Can I do a numerical comparison while filtering?'''
  
- Ans. Yes, you can choose between numerical and string comparison. For 
numerical comparison use the operators =, <>, <  etc. and for string 
comparisons use eq, neq etc. See the format of [#CondS Conditions].
+ Yes, you can choose between numerical and string comparison. For numerical 
comparison use the operators =, <>, <  etc. and for string comparisons use eq, 
neq etc. See the format of [#CondS Conditions].
  
- 3. How do I make my jobs run on multiple machines?
+ '''3. How do I make my jobs run on multiple machines?'''
  
- Ans. Use the PARALLEL clause. For example =C = JOIN A by url, B by url 
PARALLEL 50
+ Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 50
  
- 4. I would like to use Pig to read a list of .gz files that use '\u0001' as a 
delimiter. How do I do that?
+ '''4. I would like to use Pig to read a list of .gz files that use '\u0001' 
as a delimiter. How do I do that?'''
  
- Ans. You can use the following load command: Load 'INPUT_FILE' USING 
<nop>PigStorage(‘\u0001’);
+ You can use the following load command: Load 'INPUT_FILE' USING 
<nop>PigStorage(‘\u0001’);
  
- 5. Does Pig support NULLs?
+ '''5. Does Pig support NULLs?'''
  
- Ans. Pig currently has no support for NULL values but it is on the roadmap.
+ Pig currently has no support for NULL values but it is on the roadmap.
  
- 6. Does pig support regular expressions?
+ '''6. Does Pig support regular expressions?'''
  
- Ans. Pig does support regular expression matching via `matches` keyword. It 
uses java.util.regexp matches which means your pattern has to match the entire 
string (ie if your string is "hi fred" and you want to find "fred" you have to 
give a pattern of ".*fred" not "fred").
+ Pig does support regular expression matching via `matches` keyword. It uses 
java.util.regexp matches which means your pattern has to match the entire 
string (ie if your string is "hi fred" and you want to find "fred" you have to 
give a pattern of ".*fred" not "fred").
  
- 7. How to prevent failure if some records don't have the needed number of 
columns.
+ '''7. How to prevent failure if some records don't have the needed number of 
columns.'''
  
- Ans. You can filter away those records by including the following in your Pig 
program:
+ You can filter away those records by including the following in your Pig 
program:
  
  <verbatim>
  A = load 'foo' using PigStorage('\t');
@@ -36, +36 @@

  
  This code would drop all the records that has less than 5 columns.
  
- 8. Is there any difference between == and eq for numeric comparisons?
+ '''8. Is there any difference between == and eq for numeric comparisons?'''
  
- Ans. For equality, there is no difference while you stay in integers. However 
11.0 and 11 will be equal with == but not with eq. 
+ For equality, there is no difference while you stay in integers. However 11.0 
and 11 will be equal with == but not with eq. 
  
- 9. Is it possible to use PIG with a regular Hadoop cluster (not HOD) ?
+ '''9. Is it possible to use PIG with a regular Hadoop cluster (not HOD) ?'''
  
- Ans. You can set this property using the empty string.
+ You can set this property using the empty string.
  
  hod.server=”” 
  
  
- 10. Is there an easy way for me to figure out how many rows exists in a 
dataset from its alias?
+ '''10. Is there an easy way for me to figure out how many rows exists in a 
dataset from its alias?'''
  
- Ans. You can run the following set of commands:
+ You can run the following set of commands:
  
  <verbatim>
  a = load 'bla' ... ;
@@ -60, +60 @@

  This is equivalent to select count(*) in SQL.
  
  
- 11. Does Pig allow grouping on expressions
+ '''11. Does Pig allow grouping on expressions?'''
  
  Ans. Currently, Pig only allows to group on data fields rather than 
expressions. Allowing grouping on expressions is on our road map. Stay tuned!
  
  
- 12. Is there a way to check if a map is empty
+ '''12. Is there a way to check if a map is empty?'''
  
- Ans. Currently, there is no way to do that.
+ Currently, there is no way to do that.
  
  
- 13. Can I specify the number of nodes Pig allocates?
+ '''13. How can I specify the number of nodes Pig allocates?'''
  
- Ans. Yes. Three (3) nodes is the minimum.
  > pig -Dhod.param='-m 3' my_script.pig
  
+ Three (3) nodes is the minimum.
  
- 14. Can I load data using "PigStorage()" that requires Unicode specification 
for separators?
+ '''14. How can I load data using "PigStorage()" that requires Unicode 
specification for separators?'''
  
- Ans. Yes
  
  Old version of Pig using '\t':<verbatim>a = load '/homes/yahooid/tmp/a.txt' 
using PigStorage('\t');</verbatim>
  

Reply via email to