Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by CorinneC: http://wiki.apache.org/pig/PigFaq ------------------------------------------------------------------------------ - 1. I'm using PigStorage to parse my input files. Can I make it use control characters as delimiters? + '''1. I'm using PigStorage to parse my input files. Can I make it use control characters as delimiters?''' - Ans. Yes. The first parameter to PigStorage is the dataset name, the second is a regular expression to describe the delimiter. We used String.split(regex, -1) to extract fields from lines. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more information on the way to use special characters in regex. For example "load 'input.dat' using PigStorage('\u0001');" will use ^A as a delimiter. + Yes. The first parameter to PigStorage is the dataset name, the second is a regular expression to describe the delimiter. We used String.split(regex, -1) to extract fields from lines. See http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html for more information on the way to use special characters in regex. For example "load 'input.dat' using PigStorage('\u0001');" will use ^A as a delimiter. - 2. Can I do a numerical comparison while filtering? + '''2. Can I do a numerical comparison while filtering?''' - Ans. Yes, you can choose between numerical and string comparison. For numerical comparison use the operators =, <>, < etc. and for string comparisons use eq, neq etc. See the format of [#CondS Conditions]. + Yes, you can choose between numerical and string comparison. For numerical comparison use the operators =, <>, < etc. and for string comparisons use eq, neq etc. See the format of [#CondS Conditions]. - 3. How do I make my jobs run on multiple machines? + '''3. How do I make my jobs run on multiple machines?''' - Ans. Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 50 + Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 50 - 4. I would like to use Pig to read a list of .gz files that use '\u0001' as a delimiter. How do I do that? + '''4. I would like to use Pig to read a list of .gz files that use '\u0001' as a delimiter. How do I do that?''' - Ans. You can use the following load command: Load 'INPUT_FILE' USING <nop>PigStorage(â\u0001â); + You can use the following load command: Load 'INPUT_FILE' USING <nop>PigStorage(â\u0001â); - 5. Does Pig support NULLs? + '''5. Does Pig support NULLs?''' - Ans. Pig currently has no support for NULL values but it is on the roadmap. + Pig currently has no support for NULL values but it is on the roadmap. - 6. Does pig support regular expressions? + '''6. Does Pig support regular expressions?''' - Ans. Pig does support regular expression matching via `matches` keyword. It uses java.util.regexp matches which means your pattern has to match the entire string (ie if your string is "hi fred" and you want to find "fred" you have to give a pattern of ".*fred" not "fred"). + Pig does support regular expression matching via `matches` keyword. It uses java.util.regexp matches which means your pattern has to match the entire string (ie if your string is "hi fred" and you want to find "fred" you have to give a pattern of ".*fred" not "fred"). - 7. How to prevent failure if some records don't have the needed number of columns. + '''7. How to prevent failure if some records don't have the needed number of columns.''' - Ans. You can filter away those records by including the following in your Pig program: + You can filter away those records by including the following in your Pig program: <verbatim> A = load 'foo' using PigStorage('\t'); @@ -36, +36 @@ This code would drop all the records that has less than 5 columns. - 8. Is there any difference between == and eq for numeric comparisons? + '''8. Is there any difference between == and eq for numeric comparisons?''' - Ans. For equality, there is no difference while you stay in integers. However 11.0 and 11 will be equal with == but not with eq. + For equality, there is no difference while you stay in integers. However 11.0 and 11 will be equal with == but not with eq. - 9. Is it possible to use PIG with a regular Hadoop cluster (not HOD) ? + '''9. Is it possible to use PIG with a regular Hadoop cluster (not HOD) ?''' - Ans. You can set this property using the empty string. + You can set this property using the empty string. hod.server=ââ - 10. Is there an easy way for me to figure out how many rows exists in a dataset from its alias? + '''10. Is there an easy way for me to figure out how many rows exists in a dataset from its alias?''' - Ans. You can run the following set of commands: + You can run the following set of commands: <verbatim> a = load 'bla' ... ; @@ -60, +60 @@ This is equivalent to select count(*) in SQL. - 11. Does Pig allow grouping on expressions + '''11. Does Pig allow grouping on expressions?''' Ans. Currently, Pig only allows to group on data fields rather than expressions. Allowing grouping on expressions is on our road map. Stay tuned! - 12. Is there a way to check if a map is empty + '''12. Is there a way to check if a map is empty?''' - Ans. Currently, there is no way to do that. + Currently, there is no way to do that. - 13. Can I specify the number of nodes Pig allocates? + '''13. How can I specify the number of nodes Pig allocates?''' - Ans. Yes. Three (3) nodes is the minimum. > pig -Dhod.param='-m 3' my_script.pig + Three (3) nodes is the minimum. - 14. Can I load data using "PigStorage()" that requires Unicode specification for separators? + '''14. How can I load data using "PigStorage()" that requires Unicode specification for separators?''' - Ans. Yes Old version of Pig using '\t':<verbatim>a = load '/homes/yahooid/tmp/a.txt' using PigStorage('\t');</verbatim>