[Pig Wiki] Trivial Update of FrontPage by CorinneC

2009-03-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/FrontPage

--
  
   * Pig Language
  
-   * (./) [attachment:plrm.htm Pig Latin Reference Manual] - Includes Pig 
Latin, built-in functions, and shell commands (''updated 2/27/09 ...'')
+   * [attachment:plrm.htm Pig Latin Reference Manual] - Includes Pig Latin, 
built-in functions, and shell commands (''updated 2/27/09 ...'')
  
   * Pig Functions
* PiggyBank - User-defined functions (UDFs) contributed by Pig users!
-   * (./) [http://wiki.apache.org/pig/UDFManual UDF Manual] - Write your own 
UDFs
+   * [http://wiki.apache.org/pig/UDFManual UDF Manual] - Write your own UDFs
  
-  * Pig Latin Editors
+  * (./) Pig Latin Editors
* PigPen - A plugin for Eclipse that provides syntax highlighting, 
graphical script construction, example result generation, schema descriptions, 
and enables running your pig scripts locally and on a hadoop cluster.
* A !TextMate bundle for Pig Latin - 
[http://github.com/kevinweil/pig.tmbundle/tree/master]
* A Vim plugin for Pig Latin - 
[http://www.vim.org/scripts/script.php?script_id=2186]
  
   * More Pig
-   * (./) PigUserCookbook - Want Pig to fly? Tips and tricks on how to write 
efficient Pig scripts
+   * PigUserCookbook - Want Pig to fly? Tips and tricks on how to write 
efficient Pig scripts
* [http://hadoop.apache.org/pig/javadoc/docs/api/ Javadocs] - Refer to the 
Javadocs for embedded Pig and UDFs
-   * PigFaq - The answer to your question may be here 
+   * [http://wiki.apache.org/pig/FAQ FAQ] - The answer to your question may be 
here 
  
   * Old Pig
* [Grunt] Shell and PigLatin Manual
@@ -81, +81 @@

* PigStreamingFunctionalSpec
* ParameterSubstitution
* PigOptimizationWishList
-   * NestedLogicalPlan (still draft version)
+   * NestedLogicalPlan (''draft version'')
* PigErrorHandling
* PigMultiQueryPerformanceSpecification
   * Performance


[Pig Wiki] Trivial Update of FAQ by CorinneC

2009-03-04 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Pig Wiki for change 
notification.

The following page has been changed by CorinneC:
http://wiki.apache.org/pig/FAQ

--
- Pig FAQ
+ '''1. I'm using `PigStorage` to parse my input files. Can I make it use 
control characters as delimiters?''' 
  
- 1. I'm using PigStorage to parse my input files. Can I make it use control 
characters as delimiters?
+ Yes. The first parameter to `PigStorage` is the dataset name, the second is a 
regular expression to describe the delimiter. We used `String.split(regex, -1)` 
to extract fields from lines. See 
[http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html 
java.util.regex.Pattern] for more information on the way to use special 
characters in regex. For example,
  
- A. Yes. Examples: PigStorage('\u0001') for Ctrl+A or '\u007C' for this 
character: |
+ {{{
+ LOAD 'input.dat' USING PigStorage('\u0001');
+ }}}
  
- 2. Can I do a numerical comparison while filtering?
+ will use `^A` as a delimiter.
  
- A. Yes, you can choose between numerical and string comparison. For numerical 
comparison use the operators =, ,   etc. and for string comparisons use eq, 
neq etc. 
+ '''2. Can I do a numerical comparison while filtering?'''
  
- 3. How do I make my jobs run on multiple machines?
+ Yes, you can choose between numerical and string comparison. For numerical 
comparison use the operators =, ,   etc. and for string comparisons use eq, 
neq etc. See the format of [#CondS Conditions].
  
- A. Use the PARALLEL clause. For example =C = JOIN A by url, B by url PARALLEL 
50=
+ '''3. How do I make my jobs run on multiple machines?'''
  
- 4. Does Pig support NULLs?
+ Use the `PARALLEL` clause:
  
- A. Pig currently has no support for NULL values but it is on the roadmap.
+ {{{
+ C = JOIN A by url, B by url PARALLEL 50;
+ }}}
  
- 5. Does pig support regular expressions?
+ '''4. I would like to use Pig to read a list of `.gz` files that use 
`'\u0001'` as a delimiter. How do I do that?'''
  
- A. Pig does support regular expression matching via =matches= keyward. Tt 
uses java.util.regexp matches which means your pattern has to match the entire 
string (ie if your string is hi fred and you want to find fred you have to 
give a pattern of .*fred not fred).
+ You can use the following load command:
  
+ {{{
+ LOAD 'input_file' USING PigStorage('\u0001');
+ }}}
+ 
+ '''5. Does Pig support NULLs?'''
+ 
+ Pig currently has no support for NULL values but it is on the roadmap.
+ 
+ '''6. Does Pig support regular expressions?'''
+ 
+ Pig does support regular expression matching via the `matches` keyword. It 
uses 
[http://java.sun.com/javase/6/docs/api/java/util/regex/package-summary.html 
java.util.regex] matches which means your pattern has to match the entire 
string (e.g. if your string is `hi fred` and you want to find `fred` you 
have to give a pattern of `.*fred` not `fred`).
+ 
- 6. How to prevent failure if some records don't have the needed number of 
columns.
+ '''7. How do I prevent failure if some records don't have the needed number 
of columns?'''
  
  You can filter away those records by including the following in your Pig 
program:
  
- 
+ {{{
- A = load 'foo' using PigStorage('\t');
+ A = LOAD 'foo' USING PigStorage('\t');
  B = FILTER A BY ARITY(*)  5;
  .
+ }}}
  
+ This code would drop all records that have fewer than five (5) columns.
  
- This code would drop all the records that has less than 5 columns.
+ '''8. Is there any difference between `==` and `eq` for numeric 
comparisons?'''
  
- 7. Is there any difference between == and eq for numeric comparisons?
+ There is no difference when using integers. However, `11.0` and `11` will be 
equal with `==` but not with `eq`. 
  
- For equality, there is no difference while you stay in integers. However 11.0 
and 11 will be equal with == but not with eq. 
+ '''9. Is it possible to use PIG with a regular Hadoop cluster (not HOD) ?'''
  
+ You can set this property using the empty string.
+ 
+ {{{
+ hod.server=
+ }}}
+ 
- 8. Is there an easy way for me to figure out how many rows exists in a 
dataset from its alias?
+ '''10. Is there an easy way for me to figure out how many rows exist in a 
dataset from it's alias?'''
  
  You can run the following set of commands:
  
+ {{{
+ a = LOAD 'bla' ... ;
+ b = GROUP a ALL;
+ c = FOREACH b GENERATE COUNT(a.$0);
+ }}}
  
- a = load 'bla' ... ;
+ This is equivalent to `SELECT COUNT(*)` in SQL.
  
- b = group a all;
+ '''11. Does Pig allow grouping on expressions?'''
  
- c = foreach b generate COUNT(a.$0);
+ Currently, Pig only allows grouping on data fields rather than expressions. 
Allowing grouping on expressions is on our roadmap. Stay tuned!
  
+ '''12. Is there a way to check if a map is empty?'''
  
- This is equivalent to select count(*) in SQL.
+ Currently, there is no way to do that.
  
- 9. Does Pig allow 

svn commit: r750271 - in /hadoop/pig/trunk: CHANGES.txt src/org/apache/pig/Main.java

2009-03-04 Thread gates
Author: gates
Date: Thu Mar  5 01:13:52 2009
New Revision: 750271

URL: http://svn.apache.org/viewvc?rev=750271view=rev
Log:
PIG-692 When running a job from a script, use that script name as the default 
job name.


Modified:
hadoop/pig/trunk/CHANGES.txt
hadoop/pig/trunk/src/org/apache/pig/Main.java

Modified: hadoop/pig/trunk/CHANGES.txt
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/CHANGES.txt?rev=750271r1=750270r2=750271view=diff
==
--- hadoop/pig/trunk/CHANGES.txt (original)
+++ hadoop/pig/trunk/CHANGES.txt Thu Mar  5 01:13:52 2009
@@ -10,6 +10,9 @@
 
PIG-620: Added MaxTupleBy1stField UDF to piggybank (vzaliva via gates)
 
+   PIG-692: When running a job from a script, use the name of that script 
as
+   the default name for the job (vzaliva via gates)
+
   OPTIMIZATIONS
 
   BUG FIXES

Modified: hadoop/pig/trunk/src/org/apache/pig/Main.java
URL: 
http://svn.apache.org/viewvc/hadoop/pig/trunk/src/org/apache/pig/Main.java?rev=750271r1=750270r2=750271view=diff
==
--- hadoop/pig/trunk/src/org/apache/pig/Main.java (original)
+++ hadoop/pig/trunk/src/org/apache/pig/Main.java Thu Mar  5 01:13:52 2009
@@ -264,6 +264,11 @@
 
 logFileName = validateLogFile(logFileName, file);
 pigContext.getProperties().setProperty(pig.logfile, logFileName);
+
+// Set job name based on name of the script
+pigContext.getProperties().setProperty(PigContext.JOB_NAME, 
+   PigLatin: +new 
File(file).getName()
+);
 
 if (!debug)
 new File(substFile).deleteOnExit();
@@ -339,6 +344,11 @@
 if (!debug)
 new File(substFile).deleteOnExit();
 
+// Set job name based on name of the script
+pigContext.getProperties().setProperty(PigContext.JOB_NAME, 
+   PigLatin: +new 
File(remainders[0]).getName()
+);
+
 grunt = new Grunt(pin, pigContext);
 gruntCalled = true;
 grunt.exec();