[jira] Commented: (PIG-1472) Optimize serialization/deserialization between Map and Reduce and between MR jobs

2010-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886647#action_12886647
 ] 

Hadoop QA commented on PIG-1472:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12449033/PIG-1472.3.patch
  against trunk revision 960062.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 69 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 395 release audit warnings 
(more than the trunk's current 394 warnings).

+1 core tests.  The patch passed core unit tests.

-1 contrib tests.  The patch failed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/343/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/343/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/343/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/343/console

This message is automatically generated.

 Optimize serialization/deserialization between Map and Reduce and between MR 
 jobs
 -

 Key: PIG-1472
 URL: https://issues.apache.org/jira/browse/PIG-1472
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1472.2.patch, PIG-1472.3.patch, PIG-1472.patch


 In certain types of pig queries most of the execution time is spent in 
 serializing/deserializing (sedes) records between Map and Reduce and between 
 MR jobs. 
 For example, if PigMix queries are modified to specify types for all the 
 fields in the load statement schema, some of the queries (L2,L3,L9, L10 in 
 pigmix v1) that have records with bags and maps being transmitted across map 
 or reduce boundaries run a lot longer (runtime increase of few times has been 
 seen.
 There are a few optimizations that have shown to improve the performance of 
 sedes in my tests -
 1. Use smaller number of bytes to store length of the column . For example if 
 a bytearray is smaller than 255 bytes , a byte can be used to store the 
 length instead of the integer that is currently used.
 2. Instead of custom code to do sedes on Strings, use DataOutput.writeUTF and 
 DataInput.readUTF.  This reduces the cost of serialization by more than 1/2. 
 Zebra and BinStorage are known to use DefaultTuple sedes functionality. The 
 serialization format that these loaders use cannot change, so after the 
 optimization their format is going to be different from the format used 
 between M/R boundaries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886772#action_12886772
 ] 

Daniel Dai commented on PIG-1434:
-

We may also add some sanity check, instead of just doing a limit.
{code}
C = foreach C generate CheckSingular(*);
Z = join X by 1, C by 1 using 'replicated';
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}

CheckSingular will check if C only have one record.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-19) A=load causes parse error

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-19:
--

Fix Version/s: 0.9.0

 A=load causes parse error
 -

 Key: PIG-19
 URL: https://issues.apache.org/jira/browse/PIG-19
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Olga Natkovich
Priority: Minor
 Fix For: 0.9.0


 Parser expects spaces around =. This should be a minor change in 
 src/org/apache/pig/tools/grunt/GruntParser.jj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-103) Shared Job /tmp location should be configurable

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-103:
---

Fix Version/s: 0.8.0

 Shared Job /tmp location should be configurable
 ---

 Key: PIG-103
 URL: https://issues.apache.org/jira/browse/PIG-103
 Project: Pig
  Issue Type: Improvement
  Components: impl
 Environment: Partially shared file:// filesystem (eg NFS)
Reporter: Craig Macdonald
 Fix For: 0.8.0


 Hello,
 I'm investigating running pig in an environment where various parts of the 
 file:// filesystem are available on all nodes. I can tell hadoop to use a 
 file:// file system location for it's default, by seting 
 fs.default.name=file://path/to/shared/folder
 However, this creates issues for Pig, as Pig writes it's job information in a 
 folder that it assumes is a shared FS (eg DFS). However, in this scenario 
 /tmp is not shared on each machine.
 So /tmp should either be configurable, or Hadoop should tell you the actual 
 full location set in fs.default.name?
 Straightforward solution is to make /tmp/ a property in 
 src/org/apache/pig/impl/io/FileLocalizer.java init(PigContext)
 Any suggestions of property names?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1488) Make HDFS temp dir configurable

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1488.
-

Resolution: Duplicate

This is duplicate of PIG-103 which is not linked to 0.8.0 release

 Make HDFS temp dir configurable
 ---

 Key: PIG-1488
 URL: https://issues.apache.org/jira/browse/PIG-1488
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
 Fix For: 0.8.0


 Currently it is hardcoded to /tmp. It should be made into a property.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-126) custom storage ( LOAD with USING) doesn't work with inner classes

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-126.


Resolution: Won't Fix

We have completely redun the Load/Store UDFs. Please, create new JIRA if you 
still see this issue with the latest code

 custom storage ( LOAD with USING) doesn't work with inner classes
 -

 Key: PIG-126
 URL: https://issues.apache.org/jira/browse/PIG-126
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.1.0
Reporter: Pi Song
Priority: Minor
 Attachments: PIG-126-test.patch, pig126_UnitTest2.patch


 It might be trivial but this has held me up for quite a while.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1309) Map-side Cogroup

2010-07-09 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1309:
--

Status: Resolved  (was: Patch Available)
Resolution: Fixed

Patch checked-in to 0.7 branch as well.

 Map-side Cogroup
 

 Key: PIG-1309
 URL: https://issues.apache.org/jira/browse/PIG-1309
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Fix For: 0.8.0

 Attachments: mapsideCogrp.patch, pig-1309_1.patch, pig-1309_2.patch, 
 PIG_1309_7.patch


 In never ending quest to make Pig go faster, we want to parallelize as many 
 relational operations as possible. Its already possible to do Group-by( 
 PIG-984 ) and Joins( PIG-845 , PIG-554 ) purely in map-side in Pig. This jira 
 is to add map-side implementation of Cogroup in Pig. Details to follow.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-128) Incorporate CheckStyle into Pig build.xml (experimental)

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-128:
---

Status: Resolved  (was: Patch Available)
Resolution: Won't Fix

We are currently using findbugs for similar purpose. findbugs is used accross 
Hadoop 

 Incorporate CheckStyle into Pig build.xml  (experimental)
 -

 Key: PIG-128
 URL: https://issues.apache.org/jira/browse/PIG-128
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.1.0
Reporter: Pi Song
 Attachments: checkstyle-all-4.4.jar, PIG-128-v02.patch, 
 pig_checkstyle1.patch


 As discussed in the mailing list, now I have included CheckStyle as a part of 
 the build process. Some might agree and some might not agree. Please note 
 that initially *this is only for experimental purpose*. 
 In my opinion, this is a systematic way to control coding style as you expect 
 more and more people coming to help, you will need a good system to support.
 *Proposal*
 +Stage1+
 - Checkstyle will run as a part of build process. The output file will be 
 created at build/checkstyle/checkstyle-report.txt. This only took a few more 
 seconds in my slow development box.
 - At the moment sun's guideline is used with special exceptions Indentation=4 
 and neglecting package.html requirement.
 - Failures on Checkstyle will not cause the build to be broken at this stage 
 as this will only provide guideline for developers and for committers to make 
 decisions whether the patch is ready to be committed. Basically new patches 
 should not introduce more violations.
 - From time to time, we should spend some time cleaning up code to reduce the 
 number of violations. Before, people just did clean-up and check-in believing 
 the code would be cleaner. Now you will have a good indicator to showcase 
 your achievement.
 +Stage2+ (don't know when yet)
 - It's interesting that some checks in Checkstyle can help us eliminate 
 unforseen bugs such as DoubleCheckingLock, EqualsHashCode, MagicNumber, or 
 StringLiteralEquality. These checks should be enforced as errors and break 
 the build. The set of such hard checks needs us all to decide. (see 
 http://checkstyle.sourceforge.net/config_coding.html)
 From my test, currently we have around 1 violations. 
 Awaiting for suggestions!!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-138) Support for Multiple tmp directories and clean up

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-138.


Resolution: Won't Fix

Not clear that we need it and it will add complexity to the code

 Support for Multiple tmp directories and clean up
 -

 Key: PIG-138
 URL: https://issues.apache.org/jira/browse/PIG-138
 Project: Pig
  Issue Type: Improvement
 Environment: Pig Local
Reporter: Amir Youssefi
Assignee: Pi Song

 This is to separate additional improvements and original requirement on 
 PIG-129 issue.
 Pasting comments of Pi Song. 
 I think the concept of multi-dir temp file creator (LocalDirAllocator in 
 Hadoop) should be adopted to Pig. What it does is:-
 * You can set up a set of tmp file dirs in configuration (They can be on 
 different physical drives so you can utilize more disk space)
 * When a temp file is being created, the system will probe the given temp 
 dirs in round-robin fashion
 * For a selected temp dir, if it exists and you have permission to write, 
 temp file will be created
 * For a selected temp dir, it it doesn't exist or you don't have 
 permission to write, the temp dir will be kept in the black list, thus not 
 being used later on.
 * For the next temp file, move on to the next temp dir
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-206) Right granularity for a pig script

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-206:
---

Fix Version/s: 0.9.0

 Right granularity for a pig script
 --

 Key: PIG-206
 URL: https://issues.apache.org/jira/browse/PIG-206
 Project: Pig
  Issue Type: Wish
Reporter: Mathieu Poumeyrol
 Fix For: 0.9.0


 I'd like to understand what people have in mind when they picture pig 
 scripts...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-166) Disk Full

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-166.


Resolution: Won't Fix

I think this is the issue for Hadoop to resolve. Also I don't think we have 
really ran into the problem as long as I can remember

 Disk Full
 -

 Key: PIG-166
 URL: https://issues.apache.org/jira/browse/PIG-166
 Project: Pig
  Issue Type: Bug
Reporter: Amir Youssefi
 Attachments: PIG-166_v1.patch


 Occasionally spilling fills up (all) hard drive(s) on a Data Node and crashes 
 Task Tracker (and other processes) on that node. We need to have a safety net 
 and fail the task before crashing happens (and more). 
 In Pig + Hadoop setting, Task Trackers get Black Listed. And Pig console gets 
 stock at a percentage without returning nodes to cluster. I talked to Hadoop 
 team to explore Max Percentage idea. Nodes running into this problem get into 
 permanent problems and manual cleaning by administrator is necessary. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-205) Add a -n / -namenode parameter to pig command line

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-205:
---

Status: Resolved  (was: Patch Available)
Resolution: Won't Fix

No activity on this issue for a while. Please, re-open if we still want to come 
to the resolution

 Add a -n / -namenode parameter to pig command line
 --

 Key: PIG-205
 URL: https://issues.apache.org/jira/browse/PIG-205
 Project: Pig
  Issue Type: Improvement
Reporter: Mathieu Poumeyrol
Priority: Minor
 Attachments: NameNodeArg.patch, NameNodeArg.v2.patch


 -c allows to specify the cluster job tracker location from the command line. 
 For this to be usefull in most case, I expect users to need specifying the 
 dfs location too.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-217) Syntax Errors

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-217:
---

Fix Version/s: 0.9.0

We want to address error handling as part of parser re-work

 Syntax Errors
 -

 Key: PIG-217
 URL: https://issues.apache.org/jira/browse/PIG-217
 Project: Pig
  Issue Type: Sub-task
Reporter: Amir Youssefi
 Fix For: 0.9.0


 This is a sub-task for Syntax Errors and use cases for it. 
 Having PARALLEL in wrong places is confusing for many users. I just saw 
 somebody putting it after STORE. Adding it to FILTER is very common as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-239) illustrate followed by dump gives a runtime exception

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-239:
---

Fix Version/s: 0.9.0

 illustrate followed by dump gives a runtime exception
 -

 Key: PIG-239
 URL: https://issues.apache.org/jira/browse/PIG-239
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Pradeep Kamath
Assignee: Shubham Chopra
 Fix For: 0.9.0


 Here is a session which outlines the issue:
 grunt a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, 
 age,gpa);
 grunt b = filter a by name lt 'b';
 grunt c = foreach b generate TOKENIZE(name);
 grunt illustrate c;
 -
 | a | name  | age   | gpa   |
 -
 |   | tom xylophone | 69| 0.04  |
 |   | alice ovid| 75| 3.89  |
 -
 --
 | b | name   | age   | gpa   |
 --
 |   | alice ovid | 75| 3.89  |
 --
 -
 | c | (token )  |
 -
 |   | {(alice), (ovid)} |
 -
 grunt dump c;
 2008-05-15 14:35:54,476 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
 java.lang.RuntimeException: java.io.IOException: Serialization error: 
 org.apache.pig.impl.util.
 LineageTracer
 at 
 org.apache.pig.backend.hadoop.executionengine.POMapreduce.copy(POMapreduce.java:242)
 at 
 org.apache.pig.backend.hadoop.executionengine.MapreducePlanCompiler.compile(MapreducePlanCompiler.java:115)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:232)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:209)
 at org.apache.pig.PigServer.optimizeAndRunQuery(PigServer.java:410)
 at org.apache.pig.PigServer.openIterator(PigServer.java:332)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:265)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:162)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:73)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:54)
 at org.apache.pig.Main.main(Main.java:270)
 Caused by: java.io.IOException: Serialization error: 
 org.apache.pig.impl.util.LineageTracer
 at 
 org.apache.pig.impl.util.WrappedIOException.wrap(WrappedIOException.java:16)
 at 
 org.apache.pig.impl.util.ObjectSerializer.serialize(ObjectSerializer.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.POMapreduce.copy(POMapreduce.java:233)
 ... 10 more
 Caused by: java.io.NotSerializableException: 
 org.apache.pig.impl.util.LineageTracer
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1081)
 at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1375)
 at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1347)
 at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
 at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302)
 at java.util.ArrayList.writeObject(ArrayList.java:569)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:585)
 at 
 java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:917)
 at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1339)
 at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
 at 
 java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1375)
 at 
 java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1347)
 at 
 java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1290)
 at 
 java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1079)
 at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:302)
 at java.util.ArrayList.writeObject(ArrayList.java:569)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 

[jira] Resolved: (PIG-326) Read properties file from classpath

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-326.


Fix Version/s: 0.7.0
   Resolution: Fixed

 Read properties file from classpath
 ---

 Key: PIG-326
 URL: https://issues.apache.org/jira/browse/PIG-326
 Project: Pig
  Issue Type: Improvement
Reporter: Amir Youssefi
 Fix For: 0.7.0


 Some users need to change properties file frequently. Looking for 
 pig.properties in classpath and merging that with existing one in pig.jar 
 (giving it precedence to one in classpath) will be helpful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-254) Allowing user defined counters for streaming

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-254?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-254.


Resolution: Won't Fix

Hadoop is discouraging users from using counters since they are epxensive. 
Also, we have not seen recent requests/use cases for this.

 Allowing user defined counters for streaming
 

 Key: PIG-254
 URL: https://issues.apache.org/jira/browse/PIG-254
 Project: Pig
  Issue Type: New Feature
Reporter: Olga Natkovich

 Streaming in hadoop 18 will allow users to define and harvest counters. It 
 would be nice if streaming in pig does the same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-322) Same job name used for a series of map-reduce jobs

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-322.


Resolution: Won't Fix

It is not possible for users to know (especially with multiquery) where the job 
boundaries will be. Allowing this functionality would be more confusing than 
helpful. 

 Same job name used for a series of map-reduce jobs
 --

 Key: PIG-322
 URL: https://issues.apache.org/jira/browse/PIG-322
 Project: Pig
  Issue Type: Improvement
Reporter: Laukik Chitnis
Priority: Minor

 The only job name used for a series of map-reduce jobs is the one before 
 STORE, even if SET job.name is used multiple number of times. Though it is 
 known that there exists no direct mapping between (a set of) pig statements 
 and the map-reduce jobs, and Pig tries to optimize the number of map-reduce 
 jobs, having the ability to have different names for the map-reduce jobs 
 triggered by Pig is a useful feature that allows better tracking.
 If no job.name is SET, may be Pig can tag along a count in the name, instead 
 of the default PigLatin:DefaultJobName
 The issue of associating a name explicitly SET by the user with a map-reduce 
 job can be more tricky though when the name is set multiple times.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1484) BinStorage should support comma seperated path

2010-07-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1484:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to both trunk and 0.7 branch.

 BinStorage should support comma seperated path
 --

 Key: PIG-1484
 URL: https://issues.apache.org/jira/browse/PIG-1484
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0, 0.7.0

 Attachments: PIG-1484-1.patch, PIG-1484-2.patch, PIG-1484-3.patch


 BinStorage does not take comma seperated path. The following script fail:
 a = load '1.bin,2.bin' using BinStorage();
 dump a;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-347) Pig (help) Commands

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-347:
---

Fix Version/s: 0.8.0

 Pig (help) Commands
 ---

 Key: PIG-347
 URL: https://issues.apache.org/jira/browse/PIG-347
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
Priority: Minor
 Fix For: 0.8.0


 Pig help can be specified 2 ways: $pig -help and $pig -h
 I. $pig -help (seen by external/internal users)
 (1) fix
 -c, -cluster clustername, kryptonite is default 
  remove kryptonite is default
 (2) change 
 -x, -exectype local|mapreduce, mapreduce is default 
  change mapdreduce to hadoop (maintain backward compatibility)
 II. $pig -h (seen by internal users users only)
 (1) fix typos
 -l, --latest   use latest, untested, unsupported version of pig.jar instaed 
 of relased, tested, supported version.
instead of released 
 (2) fix
 -c, -cluster clustername, kryptonite is default 
  remove kryptonite is default 
 (same as above)
 (3) change:  -x, -exectype local|mapreduce, mapreduce is default ... 
  change mapdreduce to hadoop (maintain backward compatibility)
 (same as above)
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-313) Error handling aggregate of a computation

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-313:
---

Fix Version/s: 0.9.0

 Error handling aggregate of a computation
 -

 Key: PIG-313
 URL: https://issues.apache.org/jira/browse/PIG-313
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Priority: Minor
 Fix For: 0.9.0


 Query which fails:
 {code}
 a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, 
 gpa:double);
 b = group a by name;
 c = foreach b generate group, SUM(a.age*a.gpa);
 store c into ':OUTPATH:';\,
 {code}
 Error output:
 {quote}
 2008-07-14 16:34:08,684 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: testhost.com:8020
 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - 
 testhost.com:8020 is a deprecated filesystem name. Use 
 hdfs://testhost:8020/ instead.
 2008-07-14 16:34:08,995 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: testhost.com:50020
 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - 
 testhost.com:8020 is a deprecated filesystem name. Use 
 hdfs://testhost:8020/ instead.
 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot 
 evaluate output type of Mul/Div Operator
 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem 
 resolving LOForEach schema
 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe 
 problem found during validation 
 org.apache.pig.impl.plan.PlanValidationException: An unexpected exception 
 caused the validation to stop 
 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.io.IOException: Unable to store for alias: c
 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - 
 java.io.IOException: Unable to store for alias: c
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-333) MIN on strings (undeclared) gives strange error in store

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-333:
---

Fix Version/s: 0.9.0

 MIN on strings (undeclared) gives strange error in store
 

 Key: PIG-333
 URL: https://issues.apache.org/jira/browse/PIG-333
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Santhosh Srinivasan
Priority: Minor
 Fix For: 0.9.0


 Script which causes error:
 {code}
 a = load '/user/pig/tests/data/singlefile/votertab10k' as (name, age, 
 registration, contribution);
 b = group a all;
 c = foreach b generate MIN(a.name), MAX(a.name);
 store c into '/tmp';
 {code}
 Error:
 {noformat}
 2008-07-23 11:31:15,415 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 0.0% 
 complete
 2008-07-23 11:31:19,167 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 50.0% 
 complete
 2008-07-23 11:31:43,431 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 100.0% complete
 2008-07-23 11:31:45,956 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 Unsuccessful attempt. Completed 0.0% of the job
 2008-07-23 11:31:45,969 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
 message from task (map) tip_20080723_0002_m_00
 2008-07-23 11:31:45,974 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
 message from task (reduce) tip_20080723_0002_r_00 
 java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:373)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:170)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:85)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
  java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:373)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:170)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:85)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
  java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:373)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:170)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:85)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
  java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at 

[jira] Commented: (PIG-1468) DataByteArray.compareTo() does not compare in lexicographic order

2010-07-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886819#action_12886819
 ] 

Daniel Dai commented on PIG-1468:
-

The other concern is we only change DataByteArray not byte. So comparator for 
DataType.BYTEARRAY and DataType.BYTE is different. This will cause confusion.

 DataByteArray.compareTo() does not compare in lexicographic order
 -

 Key: PIG-1468
 URL: https://issues.apache.org/jira/browse/PIG-1468
 Project: Pig
  Issue Type: Bug
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
 Attachments: PIG-1468.patch


 The compareTo() method of org.apache.pig.data.DataByteArray does not compare 
 items in lexicographic order.
 Actually, it takes into account the signum of the bytes that compose the 
 DataByteArray.
 So, for example, 0xff compares to less than 0x00

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-346) Grunt (help) commands

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-346:
---

Fix Version/s: 0.8.0

 Grunt (help) commands 
 --

 Key: PIG-346
 URL: https://issues.apache.org/jira/browse/PIG-346
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
 Fix For: 0.8.0


 I think there are 22 grunt commands  and 2 different lists of the 
 commands can be displayed.
 I. Grunt commands displayed with grunt help
 (1) put 22 grunt commands in alphabetical order
 (2) fix double entry for cd ... cd path and cd dir  keep cd path
 (3) fix notation for set key value ... set key 'value'
 (4) add explain
 (5) add illustrate
 (6) add help
 II. Grunt commands display with grunt asdf 
 The asdf is a mistake and generates msg Was expecting one of: and list of 
 grunt commands
 (1) put 22 grunt commands in alphabetical order
 (2) add define
 (3) add du
 
 22 Grunt commands in aphabetical order:
 cat src
 cd path
 copyFromLocal localsrc dst
 copyToLocal src localdst
 cp src dst
 define functionAlias functionSpec
 describe alias
 dump alias
 du path
 explain
 help
 illustrate
 kill job_id
 ls path
 mkdir path
 mv src dst
 pwd
 quit
 register udfJar
 rm src
 set key 'value'
 store alias into filename [using functionSpec]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-348) -j command line option doesn't work

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-348:
---

Fix Version/s: 0.8.0
  Description: 
According to:

$ pig --help 

...
-j, -jar jarfile load jarfile
...

yet 

$pig -j my.jar

doesn't work in place of:

register my.jar 

in Pig script. 

  was:

According to:

$ pig --help 

...
-j, -jar jarfile load jarfile
...

yet 

$pig -j my.jar

doesn't work in place of:

register my.jar 

in Pig script. 


 -j command line option doesn't work
 ---

 Key: PIG-348
 URL: https://issues.apache.org/jira/browse/PIG-348
 Project: Pig
  Issue Type: Bug
Reporter: Amir Youssefi
 Fix For: 0.8.0


 According to:
 $ pig --help 
 ...
 -j, -jar jarfile load jarfile
 ...
 yet 
 $pig -j my.jar
 doesn't work in place of:
 register my.jar 
 in Pig script. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-360) Java Datalog Library

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-360.


Resolution: Won't Fix

There has not been any activity on this issue for 2 years. Please, re-open if 
you are still planning to make it work

 Java  Datalog Library
 -

 Key: PIG-360
 URL: https://issues.apache.org/jira/browse/PIG-360
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.2.0
Reporter: Tyson Condie
Priority: Trivial
 Attachments: datalog.patch, wrapper.patch


 Java Datalog LIbrary Patch

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-356) map lookup on empty key should be disallowed at parse time

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-356:
---

Fix Version/s: 0.9.0

 map lookup on empty key should be disallowed at parse time
 --

 Key: PIG-356
 URL: https://issues.apache.org/jira/browse/PIG-356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Priority: Minor
 Fix For: 0.9.0


 Currently the following is allowed:
 {code}
 a = load 'testfile';
 b = foreach a generate $0#'apple', $0#'mango', $0#'', flatten($1#'orange');
 {code}
 Looking up an empty key ($0#'') should not be allowed at parse time

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1472) Optimize serialization/deserialization between Map and Reduce and between MR jobs

2010-07-09 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886820#action_12886820
 ] 

Thejas M Nair commented on PIG-1472:


The audit warning diff looks bogus. The contrib tests passed when i ran them on 
my machine, failures seem to be caused by hudson environment.

The changes in PIG-1295 will need to be ported to work with this new 
serialization format. For that patch, I think we should introduce a new 
functions in InterSedes that can compare two serialized tuples. Also add a 
function to BinSedesTuple that returns corresponding InterSedes class. 
Then while selecting the comparator, add a check to see if the default tuple 
type is BinSedesTuple, if yes, use the corresponding InterSedes function as the 
comparator class.  


 Optimize serialization/deserialization between Map and Reduce and between MR 
 jobs
 -

 Key: PIG-1472
 URL: https://issues.apache.org/jira/browse/PIG-1472
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1472.2.patch, PIG-1472.3.patch, PIG-1472.patch


 In certain types of pig queries most of the execution time is spent in 
 serializing/deserializing (sedes) records between Map and Reduce and between 
 MR jobs. 
 For example, if PigMix queries are modified to specify types for all the 
 fields in the load statement schema, some of the queries (L2,L3,L9, L10 in 
 pigmix v1) that have records with bags and maps being transmitted across map 
 or reduce boundaries run a lot longer (runtime increase of few times has been 
 seen.
 There are a few optimizations that have shown to improve the performance of 
 sedes in my tests -
 1. Use smaller number of bytes to store length of the column . For example if 
 a bytearray is smaller than 255 bytes , a byte can be used to store the 
 length instead of the integer that is currently used.
 2. Instead of custom code to do sedes on Strings, use DataOutput.writeUTF and 
 DataInput.readUTF.  This reduces the cost of serialization by more than 1/2. 
 Zebra and BinStorage are known to use DefaultTuple sedes functionality. The 
 serialization format that these loaders use cannot change, so after the 
 optimization their format is going to be different from the format used 
 between M/R boundaries.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886825#action_12886825
 ] 

Daniel Dai commented on PIG-1434:
-

We also need to enforce C only have one part file to do the check (use limit to 
achieve it).
{code}
C = limit C 2;
C = foreach C generate CheckSingular(*);
Z = join X by 1, C by 1 using 'replicated';
Y = foreach Z generate X::$1/(long) C.count, X::$2-(long) C.max;
{code}

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFFinale4.patch

Fixed @@@ related stuff...
Parsing of schema from decorators is postponed until the constructor.
Fixed some test related changes. 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-394) Syntax for ?: requires parens in FOREACH

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-394:
---

Fix Version/s: 0.9.0
  Description: 
This fails

clean = FOREACH log {
ev = eventType eq '/rate/video'?'none':eventType;
GENERATE ev as event, 1 as cnt;
}


but this works

clean = FOREACH log {
ev = (eventType eq '/rate/video'?'none':eventType);
GENERATE ev as event, 1 as cnt;
}

The requirement for parens is bogus.  Also, this fails with very misleading 
messages:

clean = FOREACH log {
ev = (eventType eq '/rate/video')?'none':eventType;
GENERATE ev as event, 1 as cnt;
}

I think that the parser needs to be completely revamped to avoid this sort of 
strangeness.

  was:

This fails

clean = FOREACH log {
ev = eventType eq '/rate/video'?'none':eventType;
GENERATE ev as event, 1 as cnt;
}


but this works

clean = FOREACH log {
ev = (eventType eq '/rate/video'?'none':eventType);
GENERATE ev as event, 1 as cnt;
}

The requirement for parens is bogus.  Also, this fails with very misleading 
messages:

clean = FOREACH log {
ev = (eventType eq '/rate/video')?'none':eventType;
GENERATE ev as event, 1 as cnt;
}

I think that the parser needs to be completely revamped to avoid this sort of 
strangeness.


 Syntax for ?: requires parens in FOREACH
 

 Key: PIG-394
 URL: https://issues.apache.org/jira/browse/PIG-394
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.1.0
Reporter: Ted Dunning
 Fix For: 0.9.0


 This fails
 clean = FOREACH log {
 ev = eventType eq '/rate/video'?'none':eventType;
 GENERATE ev as event, 1 as cnt;
 }
 but this works
 clean = FOREACH log {
 ev = (eventType eq '/rate/video'?'none':eventType);
 GENERATE ev as event, 1 as cnt;
 }
 The requirement for parens is bogus.  Also, this fails with very misleading 
 messages:
 clean = FOREACH log {
 ev = (eventType eq '/rate/video')?'none':eventType;
 GENERATE ev as event, 1 as cnt;
 }
 I think that the parser needs to be completely revamped to avoid this sort of 
 strangeness.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-406) The implementation of IndexedTuple might break in cases where a custom TupleFactory is being used

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-406.


Resolution: Fixed

We no longer use IndexedTuple

 The implementation of IndexedTuple might break in cases where a custom 
 TupleFactory is being used
 -

 Key: PIG-406
 URL: https://issues.apache.org/jira/browse/PIG-406
 Project: Pig
  Issue Type: Bug
  Components: data
Affects Versions: 0.2.0
Reporter: Shubham Chopra

 IndexedTuple extends the DefaultTuple. This implementation might break in 
 case we are trying to use a custom TupleFactory. Any special data in the 
 custom tuples might get lost when we convert it to an IndexedTuple. The 
 constructor does a super(t.getAll()), other non-datum attributes will be lost 
 after this conversion. The indexedTuple must instead have Tuple as an 
 attribute and implement the Tuple interface and delegate all the methods. The 
 implementation should be by composition.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-408) Display hadoop task name and link on the client side

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-408.


Resolution: Fixed

Pig displays jobid as it executes the jobs

 Display hadoop task name and link on the client side
 

 Key: PIG-408
 URL: https://issues.apache.org/jira/browse/PIG-408
 Project: Pig
  Issue Type: Improvement
Reporter: Yiping Han
Priority: Minor

 For pig jobs that contains a lot of mapred tasks, it would be easier to debug 
 if pig can display the task name (and maybe a ilnk) on the client console. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-405) describe does not display map types in the schema.

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-405.


Fix Version/s: 0.7.0
   Resolution: Fixed

 describe does not display map types in the schema.
 

 Key: PIG-405
 URL: https://issues.apache.org/jira/browse/PIG-405
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: Linux 2.6.9-55.ELsmp #1 SMP Fri Apr 20 16:36:54 EDT 2007 
 x86_64 x86_64 x86_64 GNU/Linux
Reporter: Araceli Henley
Priority: Minor
 Fix For: 0.7.0


 In a load statement, if the type in the as clause is a map, the describe 
 statement does not show a type of map in the schema.  
 A= load ':INPATH:/singlefile/studentcomplex10k' using PigStorage() as 
 (s:map[],m,l);
 describe A;
 A: {s: ,m:bytearray,l:bytearray}
 But it should be:
 A: {s: map,m:bytearray,l:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-408) Display hadoop task name and link on the client side

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-408:
---

Fix Version/s: 0.7.0

 Display hadoop task name and link on the client side
 

 Key: PIG-408
 URL: https://issues.apache.org/jira/browse/PIG-408
 Project: Pig
  Issue Type: Improvement
Reporter: Yiping Han
Priority: Minor
 Fix For: 0.7.0


 For pig jobs that contains a lot of mapred tasks, it would be easier to debug 
 if pig can display the task name (and maybe a ilnk) on the client console. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-435) wrong columns produced if incomplete definition provided during load

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-435:
---

Fix Version/s: 0.9.0

 wrong columns produced if incomplete definition provided during load
 

 Key: PIG-435
 URL: https://issues.apache.org/jira/browse/PIG-435
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
Priority: Minor
 Fix For: 0.9.0


 Scrip:
 A = load 'studenttab10k' as (name); -- note that data has more than 1 column
 B = load 'votertab10k' as (name, age, reg, contrib);
 D = COGROUP A by name, B by name;  
 E = foreach D generate flatten(A), flatten(B); 
 F = foreach E generate registration, contr;
 dump F;
 The dump produces the wrong columns. This is because even though we declared 
 only one column, we actually load all columns of A. So any place where we 
 explicitely or implicitely use A.* as the case in flatten, we would produce 
 the wrong results.
 The long term solution is actually to push projections into the load. Shorter 
 term the proposal is to notice if the script uses A.* and stick a project 
 after the load. Note that we don't need to do that if types are declared 
 because there will be already casting foreach there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-498) Pig does not error out while trying to use a input file to which the user does not have access permissions

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-498:
---

Fix Version/s: 0.8.0

 Pig does not error out while trying to use a input file to which the user 
 does not have access permissions
 --

 Key: PIG-498
 URL: https://issues.apache.org/jira/browse/PIG-498
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
 Fix For: 0.8.0


 Session illustrating the issue.
 {code}
 bash-3.00$ hadoop fs -ls /data/statistics.txt
 ls: org.apache.hadoop.fs.permission.AccessControlException: Permission 
 denied: user=username, access=READ_EXECUTE, inode=inodepermissions-
 bash-3.00$ pig -latest 
 2008-10-16 23:31:25,134 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to HOD...
 ...
 2008-10-16 23:34:45,810 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: local
 grunt a = load '/data/statistics.txt';  
 grunt dump a;
 2008-10-16 23:39:05,624 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2008-10-16 23:39:05,624 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 grunt 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-523) help in grunt should show all commands

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-523:
---

Fix Version/s: 0.8.0

 help in grunt should show all commands
 --

 Key: PIG-523
 URL: https://issues.apache.org/jira/browse/PIG-523
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Priority: Minor
 Fix For: 0.8.0


 curently, it only show commands directly supported by grunt parser and not 
 command supported by pig parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-466) PERFORMANCE: dropping the columns as soon as possible

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-466:
---

Fix Version/s: 0.8.0

 PERFORMANCE: dropping the columns as soon as possible
 -

 Key: PIG-466
 URL: https://issues.apache.org/jira/browse/PIG-466
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.8.0


 Currently, each operator carries all the data until foreach is encountered. 
 This can cause significant performance degradation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-525) UDF that requires a cast for its input arguments causes an exception

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-525:
---

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.6.0
   Resolution: Fixed

I checked that the code was committed - we just forgot to close it

 UDF that requires a cast for its input arguments causes an exception
 

 Key: PIG-525
 URL: https://issues.apache.org/jira/browse/PIG-525
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-525.patch


 Script:
 register /homes/olgan/piggybank.jar   
 A = load 'studenttab10k' as (name, age, gpa); 
 B = filter A by name is not null; 
 C = foreach A generate org.apache.pig.piggybank.evaluation.string.UPPER(name);
 D = limit C 10;   
 dump D; 
 Stack:
  (reduce) task_200809241441_20466_r_00java.io.IOException: Received a 
 bytearray from the UDF. Cannot determine how to convert the bytearray to 
 string.
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:266)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:224)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:136)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:318)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-548) ParseException involving as keyword

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-548:
---

Fix Version/s: 0.9.0

 ParseException involving as  keyword
 --

 Key: PIG-548
 URL: https://issues.apache.org/jira/browse/PIG-548
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Priority: Minor
 Fix For: 0.9.0

 Attachments: assyntax.pig


 The enclosed Pig script, throws the following error:
 =
 org.apache.pig.tools.pigscript.parser.ParseException: Encountered as at 
 line 13, column 11.
 Was expecting one of:
 EOF 
 cat ...
 cd ...
 cp ...
 copyFromLocal ...
 copyToLocal ...
 dump ...
 describe ...
 explain ...
 help ...
 kill ...
 ls ...
 mv ...
 mkdir ...
 pwd ...
 quit ...
 register ...
 rm ...
 rmf ...
 set ...
 illustrate ...
 scriptDone ...
  ...
 EOL ...
 ; ...
 
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.generateParseException(PigScriptParser.java:688)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.handle_invalid_command(PigScriptParser.java:515)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:356)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 =
 But the error seems to disappear if a few lines are moved around the 
 foreach and as keywords. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-547) minor clarification of log messages in Hadoop backend

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-547.


Resolution: Fixed

the code is not longer in use

 minor clarification of log messages in Hadoop backend
 -

 Key: PIG-547
 URL: https://issues.apache.org/jira/browse/PIG-547
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
 Environment: HOD
Reporter: Craig Macdonald
 Attachments: improvedHodErrors.patch


 Minor patch to clarify two logging messages:
 1. When using HOD, HExecutionEngine misleads by saying it is connecting to 
 the local filesystem
 2. HDataStorage error message give better error message by saying which file 
 system that it failed to connect to.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-542) pig gets confused about schema, when joining a table that has a known schema with one that doesn't

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-542:
---

Fix Version/s: 0.9.0

 pig gets confused about schema, when joining a table that has a known schema 
 with one that doesn't
 --

 Key: PIG-542
 URL: https://issues.apache.org/jira/browse/PIG-542
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
 Environment: types branch, running in local mode
Reporter: Christopher Olston
 Fix For: 0.9.0


 query:
 A = load '/data/A' using myLoadFunc('...');
 A1 = foreach (group A by ($8)) generate group, COUNT($1);
 B = load '/data/B';
 J = join A1 by $0, B by $0;
 J1 = foreach J generate $0, $1, $3;- crashes on attempt to parse 
 this line.
 problem:
 It knows the schema of A1 but not of B -- but it seems to think B has only
 one field.
 error message (on parsing J1=... line):
 Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Out of
 bound access. Trying to access non-existent column: 3. Schema {ID10::group:
 bytearray,long,bytearray} has 3 column(s).
 at
 org.apache.pig.impl.logicalLayer.parser.QueryParser.DollarVar(QueryParser.ja
 va:5764)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.ja
 va:5713)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser
 .java:4018)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.ja
 va:3915)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.jav
 a:3869)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(Query
 Parser.java:3778)
 at
 org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser
 .java:3704)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.ja
 va:3670)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(Qu
 eryParser.java:3596)
 at
 org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemLis
 t(QueryParser.java:3519)
 at
 org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryP
 arser.java:3463)
 at
 org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.
 java:2939)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParse
 r.java:2342)
 at
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.jav
 a:979)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:75
 5)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:5
 50)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder
 .java:60)
 at org.apache.pig.PigServer.parseQuery(PigServer.java:295)
 ... 16 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-555) ORDER by requires write access to /tmp

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-555.


Resolution: Duplicate

 ORDER by requires write access to /tmp
 --

 Key: PIG-555
 URL: https://issues.apache.org/jira/browse/PIG-555
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.2.0
Reporter: Ian Holsman
Priority: Minor

 when doing an order by in pig, it relies on you having write access to /tmp/ 
 in hdfs. (and i think for it to be present in the first place).
 this comes down to FileLocalizer creating the default 
 relativeRoot = pigContext.getDfs().asContainer(/tmp/temp + r.nextInt())
 i'm not sure if this is due to someone on our side deleting the /tmp dir 
 accidently, so apolagies for the spam if it is.
 2008-12-03 22:36:41,716 [main] ERROR org.apache.pig.tools.grunt.Grunt - You 
 don't have permission to perform the operation. Error from the server: Unable 
 to store for alias: 7 
 [org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x]
 java.io.IOException: Unable to store for alias: 7 
 [org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x]
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:255)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:647)
 at org.apache.pig.PigServer.execute(PigServer.java:638)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:242)
 Caused by: org.apache.pig.backend.executionengine.ExecException: 
 org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x
 ... 9 more
 Caused by: org.apache.pig.impl.plan.VisitorException: 
 org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSort(MRCompiler.java:858)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort.visit(POSort.java:304)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:266)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:251)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:193)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:134)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:63)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:245)
 ... 8 more
 Caused by: org.apache.hadoop.fs.permission.AccessControlException: 
 org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
 at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:52)
 at org.apache.hadoop.dfs.DFSClient.mkdirs(DFSClient.java:704)
 at 
 org.apache.hadoop.dfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:236)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1116)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDirectory.create(HDirectory.java:64)
 at 
 org.apache.pig.backend.hadoop.datastorage.HPath.create(HPath.java:155)
 at 
 org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:391)
 at 
 

[jira] Updated: (PIG-579) Adding newlines to format foreach statement with constants causes parse errors

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-579:
---

Fix Version/s: 0.9.0

 Adding newlines to format foreach statement with constants causes parse errors
 --

 Key: PIG-579
 URL: https://issues.apache.org/jira/browse/PIG-579
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
 Fix For: 0.9.0


 The following code example files with parse errors on step D:
 {code}
 A = LOAD 'student_data' AS (name: chararray, age: int, gpa: float);
 B = LOAD 'voter_data' AS (name: chararray, age: int, registration: chararray, 
 contributions: float);
 C = COGROUP A BY name, B BY name;
 D = FOREACH C GENERATE
 group,
 flatten((not IsEmpty(A) ? A : (bag{tuple(chararray, int, 
 float)}){(null, null, null)})),
 flatten((not IsEmpty(B) ? B : (bag{tuple(chararray, int, chararray, 
 float)}){(null,null,null, null)}));
 dump D;
 {code}
 I get the parse error:
 Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: 
 Encountered not IsEmpty ( A ) ? A : ( bag { tuple ( chararray , int , float 
 ) } ; at line 9, column 18.
 Was expecting one of:
 ( ...
 - ...
 tuple ...
 bag ...
 map ...
 int ...
 long ...
 ...
 However, if I simply remove the new lines from statement D and make it:
 {code}
 D = FOREACH C GENERATE group, flatten((not IsEmpty(A) ? A : 
 (bag{tuple(chararray, int, float)}){(null, null, null)})), flatten((not 
 IsEmpty(B) ? B : (bag{tuple(chararray, int, chararray, 
 float)}){(null,null,null, null)}));
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-583) Bag constants used in non foreach statements cause lexical errors

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-583:
---

Fix Version/s: 0.9.0

 Bag constants used in non foreach statements cause lexical errors
 -

 Key: PIG-583
 URL: https://issues.apache.org/jira/browse/PIG-583
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: 0.9.0


 Use of bag constants in non-foreach statement cause lexical errors in Pig. 
 The root cause is the inability of grunt to distinguish between nested block 
 and bag constant in non-foreach statements.
 {code}
 grunt a = load 'input'; 
 grunt b = filter a by ($0 eq {(1)});
 2008-12-29 14:12:15,306 [main] ERROR org.apache.pig.tools.grunt.GruntParser - 
 java.io.IOException: Encountered  FILTEROP eq  at line 1, column 21.
 Was expecting one of:
 * ...
 ) ...
 . ...
 + ...
 - ...
 / ...
 % ...
 # ...
 ...
 org.apache.pig.tools.pigscript.parser.TokenMgrError: Lexical error at line 2, 
 column 29.  Encountered: ) (41), after : 
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParserTokenManager.getNextToken(PigScriptParserTokenManager.java:2608)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.jj_ntk(PigScriptParser.java:658)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:84)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseContOnError(GruntParser.java:94)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:58)
 at org.apache.pig.Main.main(Main.java:282)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-596) Anonymous tuples in bags create ParseExceptions

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-596:
---

Fix Version/s: 0.9.0

 Anonymous tuples in bags create ParseExceptions
 ---

 Key: PIG-596
 URL: https://issues.apache.org/jira/browse/PIG-596
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
 Fix For: 0.9.0


 {code}
 One = load 'one.txt' using PigStorage() as ( one: int );
 LabelledTupleInBag = foreach One generate { ( 1, 2 ) } as mybag { tuplelabel: 
 tuple ( a, b ) };
 AnonymousTupleInBag = foreach One generate { ( 2, 3 ) } as mybag { tuple ( a, 
 b ) }; -- Anonymous tuple creates bug
 Tuples = union LabelledTupleInBag, AnonymousTupleInBag;
 dump Tuples;
 {code}
 java.io.IOException: Encountered { tuple at line 6, column 66.
 Was expecting one of:
 parallel ...
 ; ...
 , ...
 : ...
 ( ...
 { IDENTIFIER ...
 { } ...
 [ ...
 
 at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:263)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: 
 Encountered { tuple at line 6, column 66.
 Why can't there be an anonymous tuple at the top level of a bag?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-555) ORDER by requires write access to /tmp

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-555:
---


This will be addressed in Pig 0.8.0 by allowing to configure your temp location

 ORDER by requires write access to /tmp
 --

 Key: PIG-555
 URL: https://issues.apache.org/jira/browse/PIG-555
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.2.0
Reporter: Ian Holsman
Priority: Minor

 when doing an order by in pig, it relies on you having write access to /tmp/ 
 in hdfs. (and i think for it to be present in the first place).
 this comes down to FileLocalizer creating the default 
 relativeRoot = pigContext.getDfs().asContainer(/tmp/temp + r.nextInt())
 i'm not sure if this is due to someone on our side deleting the /tmp dir 
 accidently, so apolagies for the spam if it is.
 2008-12-03 22:36:41,716 [main] ERROR org.apache.pig.tools.grunt.Grunt - You 
 don't have permission to perform the operation. Error from the server: Unable 
 to store for alias: 7 
 [org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x]
 java.io.IOException: Unable to store for alias: 7 
 [org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x]
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:255)
 at 
 org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:647)
 at org.apache.pig.PigServer.execute(PigServer.java:638)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:278)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:242)
 Caused by: org.apache.pig.backend.executionengine.ExecException: 
 org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x
 ... 9 more
 Caused by: org.apache.pig.impl.plan.VisitorException: 
 org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitSort(MRCompiler.java:858)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POSort.visit(POSort.java:304)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:266)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:251)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:193)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:134)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:63)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:245)
 ... 8 more
 Caused by: org.apache.hadoop.fs.permission.AccessControlException: 
 org.apache.hadoop.fs.permission.AccessControlException: Permission denied: 
 user=jholsma, access=WRITE, inode=:hadoop:supergroup:rwxr-xr-x
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
 Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
 at 
 org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:52)
 at org.apache.hadoop.dfs.DFSClient.mkdirs(DFSClient.java:704)
 at 
 org.apache.hadoop.dfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:236)
 at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1116)
 at 
 org.apache.pig.backend.hadoop.datastorage.HDirectory.create(HDirectory.java:64)
 at 
 org.apache.pig.backend.hadoop.datastorage.HPath.create(HPath.java:155)
 at 
 org.apache.pig.impl.io.FileLocalizer.getTemporaryPath(FileLocalizer.java:391)
 at 
 

[jira] Resolved: (PIG-608) Compile or validate the whole script before execution

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-608.


Resolution: Fixed

 Compile or validate the whole script before execution
 -

 Key: PIG-608
 URL: https://issues.apache.org/jira/browse/PIG-608
 Project: Pig
  Issue Type: Improvement
  Components: grunt
Reporter: Yiping Han

 This is a very usual scenario: 
 We are running a big pig job that contains several hadoop jobs. It has been 
 running for long times and the first hadoop job sucess, then suddenly pig 
 report it found a syntax error in the script after the first hadoop job...we 
 have to repeat from the beginning.
 It would be nice if pig can compile to the end of the script, find all the 
 syntax error, type mismatch, etc., before it really starts execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-618) Bad error message when period rather than comma appears as separator in UDF parameter list

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-618:
---

Fix Version/s: 0.9.0

 Bad error message when period rather than comma appears as separator in UDF 
 parameter list 
 ---

 Key: PIG-618
 URL: https://issues.apache.org/jira/browse/PIG-618
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat
 Fix For: 0.9.0


 Pig script generates the following compile-time error as it contains a period 
 between 0.8 and 0.9 in the MYUDF parameter list. The Invalid alias MYUDF 
 message should be changed to something that is more meaningful for the user 
 to trace.
 {code}
 register 'MYUDF.jar';
 A = load 'mydata.txt' using PigStorage() as (
 col1:   int,
 col2:   chararray,
 col3:   long,
 col4:   int
 );
 B =  group A by (
 col1,
 col2
 );
 C = foreach B generate
 group,
 MYUDF(A.col3, 0.0, 0.8. 0.9) as stat: (min, max);
 describe C;
 {code}
 
 java.io.IOException: Invalid alias: MYUDF in {group: (col1: int,col2: 
 chararray),A: {col1: int,col2: chararray,col
 3: long,col4: int}}
 at org.apache.pig.PigServer.parseQuery(PigServer.java:301)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:266)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:439)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
 alias: MYUDF in {group: (col1: int,col2
 : chararray),A: {col1: int,col2: chararray,col3: long,col4: int}}
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:6005)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5863)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4049)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3946)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3900)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3809)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3735)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3701)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3627)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3550)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3494)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2969)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2384)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1019)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:795)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:590)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
 at org.apache.pig.PigServer.parseQuery(PigServer.java:298)
 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-565) Several builting functions no longer support bytearray

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-565:
---

Fix Version/s: 0.8.0

 Several builting functions no longer support bytearray
 --

 Key: PIG-565
 URL: https://issues.apache.org/jira/browse/PIG-565
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0


 ARITY
 DIFF
 TOKENIZE
 All we need to do is to add lookup tables.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-621) Casts swallow exceptions when there are issues with conversion of bytes to Pig types

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-621:
---

Fix Version/s: 0.8.0

 Casts swallow exceptions when there are issues with conversion of bytes to 
 Pig types
 

 Key: PIG-621
 URL: https://issues.apache.org/jira/browse/PIG-621
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
 Fix For: 0.8.0


 In the current implementation of casts, exceptions thrown while converting 
 bytes to Pig types are swallowed. Pig needs to either return NULL or rethrow 
 the exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-624) Pig help usage is missing some options

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-624.


Fix Version/s: 0.7.0
   (was: 0.8.0)
   Resolution: Fixed

The options listed on this bug are already in pig 0.7

 Pig help usage is missing some options
 --

 Key: PIG-624
 URL: https://issues.apache.org/jira/browse/PIG-624
 Project: Pig
  Issue Type: Bug
Reporter: Tom White
 Fix For: 0.7.0


 Running pig -help doesn't show -param, -param_file, or -dryrun options.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-625) Add global -explain, -illustrate, -describe mode to PIG

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-625:
---

Fix Version/s: 0.9.0

 Add global -explain, -illustrate, -describe mode to PIG
 ---

 Key: PIG-625
 URL: https://issues.apache.org/jira/browse/PIG-625
 Project: Pig
  Issue Type: New Feature
Reporter: Yiping Han
 Fix For: 0.9.0


 Currently PIG has the command EXPLAIN, ILLUSTRATE and DESCRIBE. But user need 
 to manually add/remove these lines in the script when they want to debug or 
 see details of the job. I think there should be a wait to enable these 
 globally. 
 What I suggest is, to add -explain, -illustrate, -describe options to PIG 
 command line. When either of these are presented, all the DUMP and STORE 
 commands in the script are converted into EXPLAIN, ILLUSTRATE, DESCRIBE 
 correspondingly. This makes debugging easier.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-624) Pig help usage is missing some options

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-624:
---

Fix Version/s: 0.8.0

 Pig help usage is missing some options
 --

 Key: PIG-624
 URL: https://issues.apache.org/jira/browse/PIG-624
 Project: Pig
  Issue Type: Bug
Reporter: Tom White
 Fix For: 0.8.0


 Running pig -help doesn't show -param, -param_file, or -dryrun options.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-638) error handling - enforce error codes

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-638:
---

Fix Version/s: 0.9.0

 error handling - enforce error codes
 

 Key: PIG-638
 URL: https://issues.apache.org/jira/browse/PIG-638
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Santhosh Srinivasan
 Fix For: 0.9.0


 We should not allow exceptions that don't set error code as that kind of 
 information is not helpful for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-671) typechecker does not throw an error when multiple arguments are passed to COUNT

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-671:
---

Fix Version/s: 0.9.0

 typechecker does not throw an error when multiple arguments are passed to 
 COUNT
 ---

 Key: PIG-671
 URL: https://issues.apache.org/jira/browse/PIG-671
 Project: Pig
  Issue Type: Bug
 Environment: i686 i386 GNU/Linux
Reporter: Araceli Henley
Priority: Trivial
 Fix For: 0.9.0


 In this example, the agggregate function COUNT is passed multiple arguments 
 and does not throw an error.
 TEST: Aggregate_184
  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
 B =GROUP A ALL; 
 X =FOREACH B GENERATE COUNT ( A.$0, A.$0 ); 
 STORE X INTO 
 '/user/pig/tests/results/araceli.1234381533/AggregateFunc_184.out' USING 
 PigStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-674) Improve errors in Pig parser

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-674:
---

Fix Version/s: 0.9.0

 Improve errors in Pig parser
 

 Key: PIG-674
 URL: https://issues.apache.org/jira/browse/PIG-674
 Project: Pig
  Issue Type: Bug
Reporter: Araceli Henley
Priority: Minor
 Fix For: 0.9.0


 These tests are for Aggregate Functions
 
Recomend msg -  SHould indicate that this is an invalid cast.
ERROR - MAX with int with invalid cast
TEST:  106,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, MAX( (invalid) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1000:.*Invalid alias: MAX,
 
Recomend msg -  SHould indicate that this is an invalid cast.
ERROR - MAX with int with invalid cast
TEST:  106,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, MAX( (invalid) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1000:.*Invalid alias: MAX,
 
Recomend msg -
ERROR: invalid use of foreach with multiple functions and positional 
 parameters
TEST:  107,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH A GENERATE  SUM( A.$0), AVG( A.$0), COUNT( A.$0), MAX(A.$0), 
 MIN( A.$0); STORE X INTO ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: FIX: improve msg,
 
Recomend msg - ERROR 1052: Cannot cast bag with schema.*: bag
ERROR: invalid use of MIN with int with valid cast
TEST:  108,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, MIN( (double) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1052: Cannot cast.*,
 
Recomend msg -
ERROR - AVG needs bag
TEST:  113,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) ); B = GROUP 
 A ALL; X =FOREACH B GENERATE  AVG( A.Fint); STORE X INTO ':OUTPATH:' USING 
 PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1052: Cannot cast bag with schema.*bag,
 
Recomend msg -  this should indicate there was an invalid Cast
ERROR - AVG with int with invalid cast
TEST:  115,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, AVG( (invalid) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1000:.*Invalid alias: AVG,
 
Recomend msg -  this should indicate that COUNT expects a bag for an 
 argument
ERROR - COUNT needs bag
TEST:  118,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) ); B = GROUP 
 A ALL; X 

[jira] Updated: (PIG-679) error message suppressed due to class cast exception

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-679:
---

Fix Version/s: 0.9.0

 error message suppressed due to class cast exception
 

 Key: PIG-679
 URL: https://issues.apache.org/jira/browse/PIG-679
 Project: Pig
  Issue Type: Bug
Reporter: Christopher Olston
 Fix For: 0.9.0


 weekinclude 14:30:44 ~/workspace/Pig $ cat pig_1234564011522.log 
 ERROR 2999: Unexpected internal error. 
 org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to 
 java.lang.Error
 java.lang.ClassCastException: 
 org.apache.pig.impl.logicalLayer.FrontendException cannot be cast to 
 java.lang.Error
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1096)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:802)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:595)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
 at org.apache.pig.PigServer.parseQuery(PigServer.java:303)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:269)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:441)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:249)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:72)
 at org.apache.pig.Main.main(Main.java:296)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-675) Improve backend error messages

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-675:
---

Fix Version/s: 0.9.0

 Improve backend error messages
 --

 Key: PIG-675
 URL: https://issues.apache.org/jira/browse/PIG-675
 Project: Pig
  Issue Type: Bug
 Environment:  i686 i386 GNU/Linux
 java version 1.6.0_02
 Java(TM) SE Runtime Environment (build 1.6.0_02-b05)
 Java HotSpot(TM) Client VM (build 1.6.0_02-b05, mixed mode, sharing)
Reporter: Araceli Henley
Priority: Minor
 Fix For: 0.9.0


 #
 LOAD AND STORE TEST
 #
 DESCRIPTION ERROR: empty parameter for file  in Load Statement
 TEST:  109,
 PIG SCRIPT:  q\A = load '' using PigStorage(); STORE A INTO ':OUTPATH:' 
 USING PigStorage();\,
 CURRENT ERROR MESSAGE: ERROR 2118: Unable to create input slice,
  RECOMMENDED ERROR MESSAGE: -  It should indicate that this is an invalid 
 input file for a load statement
 #
 AGGREGATE FUNC TESTS
 #
 DESCRIPTION ERROR - MAX with missing argument
 TEST:  138,
 PIG SCRIPT:  q\ A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE  MAX(); STORE X INTO ':OUTPATH:' USING 
 PigStorage();\,
CURRENT ERROR MESSAGE: ERROR 2064: Unsupported root type in LOForEach: 
 LOUserFunc,
  RECOMMENDED ERROR MESSAGE: - invalid use of MAX function or invalid 
 argument for MAX function ...
 #
DESCRIPTION ERROR - MIN with missing argument
TEST:  149,
PIG SCRIPT:  q\ A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE  MIN(); STORE X INTO ':OUTPATH:' USING 
 PigStorage();\,
CURRENT ERROR MESSAGE:  ERROR 2064: Unsupported root type in LOForEach: 
 LOUserFunc,
 RECOMMENDED ERROR MESSAGE: - MIN arguments cannot be empty
 #
DESCRIPTION  ERROR - SUM with missing argument
TEST:  161,
PIG SCRIPT:  q\ A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE  SUM(); STORE X INTO ':OUTPATH:' USING 
 PigStorage();\,
CURRENT ERROR MESSAGE: ERROR 2064: Unsupported root type in LOForEach: 
 LOUserFunc,
 #
DESCRIPTION ERROR - COUNT with missing argument
TEST:  183,
PIG SCRIPT:  q\ A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE  COUNT(); STORE X INTO ':OUTPATH:' USING 
 PigStorage();\,
CURRENT ERROR MESSAGE: ERROR 2064: Unsupported root type in LOForEach: 
 LOUserFunc,
RECOMMENDED ERROR MESSAGE: -  COUNT requires parenthesis.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-680) Unknown parameter causes pig to exit without message

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-680.


Fix Version/s: 0.7.0
   Resolution: Fixed

Both cases work fine in Pig 0.7.0

 Unknown parameter causes pig to exit without message
 

 Key: PIG-680
 URL: https://issues.apache.org/jira/browse/PIG-680
 Project: Pig
  Issue Type: Bug
Reporter: Andreas Neumann
Priority: Minor
 Fix For: 0.7.0


 This pig script: 
 -- REGISTER $unknown;
 x = LOAD 'nn' AS x:chararray;
 DUMP x;
 I run like this:
  java -jar ../../../../Pig/pig.jar -x local -file nn.pig
 and pig does nothing, just exits without doiung anything.
 But if I remove the $ from the first line of the script, then:
  java -jar ../../../../Pig/pig.jar -x local -file nn.pig
 2009-02-13 16:30:01,062 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-02-13 16:30:01,063 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (abc)
 (def)
 Similarly if I define the unknown parameter on the command line, it works 
 fine:
  java -jar ../../../../Pig/pig.jar -x local -file nn.pig -param unknown=1
 2009-02-13 16:32:23,652 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-02-13 16:32:23,653 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (abc)
 (def)
 It seems that undefined parameters cause pig to exit without doing 
 anything... even if they are within a comment...
 -Andreas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1367) [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is supported in 0.7

2010-07-09 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1367:
--

   Status: Resolved  (was: Patch Available)
 Assignee: Yan Zhou
Fix Version/s: 0.7.0
   (was: 0.8.0)
   Resolution: Fixed

Committed to the 0.7 branch.

 [zebra] Map-side Cogroup Test case is needed on 0.7 if the feature is 
 supported in 0.7
 --

 Key: PIG-1367
 URL: https://issues.apache.org/jira/browse/PIG-1367
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Yan Zhou
Assignee: Yan Zhou
 Fix For: 0.7.0

 Attachments: PIG-1367.patch


 PIG-1315 has the Zebra support for this feature and the map-side group-by. It 
 also has the test case for map-side COGROUP; while the test case for map-side 
 GROUP-BY is in PIG-1357.
 However PIG-1315 is committed to the trunk as a whole; but only committed to 
 the 0.7 branch without the map-side group-by test case because PIG has yet to 
 decide if the feature will be in the 0.7 release.
 This JIRA is created for tracking purpose should the decision to support 
 map-side COGROUP in 0.7 by PIG is made. If not, this should be made invalid 
 eventually.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Patch Available  (was: Open)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886888#action_12886888
 ] 

Hadoop QA commented on PIG-928:
---

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12449105/RegisterPythonUDFFinale4.patch
  against trunk revision 962628.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/365/console

This message is automatically generated.

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1490) Make Pig storers work with remote HDFS in secure mode

2010-07-09 Thread Richard Ding (JIRA)
Make Pig storers work with remote HDFS in secure mode
-

 Key: PIG-1490
 URL: https://issues.apache.org/jira/browse/PIG-1490
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0, 0.7.0


PIG-1403 fixed the problem for Pig loaders. We need to do the same for Pig 
storers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1491) Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to POLocalRearrange

2010-07-09 Thread Scott Carey (JIRA)
Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to 
POLocalRearrange


 Key: PIG-1491
 URL: https://issues.apache.org/jira/browse/PIG-1491
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Scott Carey


I have a failure that occurs during planning while using DISTINCT in a nested 
FOREACH. 

Caused by: java.lang.ClassCastException: 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad
 cannot be cast to 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizer.visitMROp(SecondaryKeyOptimizer.java:352)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:218)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:40)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1389) Implement Pig counter to track number of rows for each input files

2010-07-09 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886904#action_12886904
 ] 

Ashutosh Chauhan commented on PIG-1389:
---

+1 

Discussed about 3) with Richard offline. Though theoretically it will be better 
to find out the features on the fully compiled and optimized MR plan, it will 
be hard and may not be worth the complexity doing it. So, in this first pass it 
is fine to mark those features while MR plan's compilation is in progress. As a 
result in few corner cases, features marked for MR Oper may not be correct. We 
will fix up those cases as and when they come up.

 Implement Pig counter to track number of rows for each input files 
 ---

 Key: PIG-1389
 URL: https://issues.apache.org/jira/browse/PIG-1389
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1389.patch, PIG-1389.patch, PIG-1389_1.patch, 
 PIG-1389_2.patch


 A MR job generated by Pig not only can have multiple outputs (in the case of 
 multiquery) but also can have multiple inputs (in the case of join or 
 cogroup). In both cases, the existing Hadoop counters (e.g. 
 MAP_INPUT_RECORDS, REDUCE_OUTPUT_RECORDS) can not be used to count the number 
 of records in the given input or output.  PIG-1299 addressed the case of 
 multiple outputs.  We need to add new counters for jobs with multiple inputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-530) Provide verbose output while query is running

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-530.


Resolution: Fixed

I think the current output provides sufficient info including progress and 
jobids. Please, re-open if more is required

 Provide verbose output while query is running
 -

 Key: PIG-530
 URL: https://issues.apache.org/jira/browse/PIG-530
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich

 Many users liked the verbose outout currently available on trunk. In 
 particular, the fact that MR job progress was available.
 While we don't want to have it enable by default because it is very verbose 
 and also Hadoop specific, we could a flag that would provide ouput similar to 
 EXPLAIN command.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1491) Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to POLocalRearrange

2010-07-09 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886906#action_12886906
 ] 

Ashutosh Chauhan commented on PIG-1491:
---

Scott,

It will be useful if you can also paste the Pig script which produced this 
exception.

 Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to 
 POLocalRearrange
 

 Key: PIG-1491
 URL: https://issues.apache.org/jira/browse/PIG-1491
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Scott Carey

 I have a failure that occurs during planning while using DISTINCT in a nested 
 FOREACH. 
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizer.visitMROp(SecondaryKeyOptimizer.java:352)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:218)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:40)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1491) Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to POLocalRearrange

2010-07-09 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886908#action_12886908
 ] 

Scott Carey commented on PIG-1491:
--

more readable:

{code}
DATA_G = COGROUP A by (a, b, c, d) OUTER, B by (a, b, c, d) OUTER;
DATA = FOREACH DATA_G { 
  a_items = DISTINCT A.x;
  b_items = DISTINCT B.x;
  GENERATE 
  FLATTEN(group) as (a,b,c,d),
  SUM(A.m) as m, SUM(A.n) as n,
  COUNT(a_items) as a_item_count,
  (long)(SUM(B.u) + (double)0.5) as u,
  (long)(SUM(B.v) + (double)0.5) as v,
  COUNT(b_items) as b_item_count;
}
{code}

 Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to 
 POLocalRearrange
 

 Key: PIG-1491
 URL: https://issues.apache.org/jira/browse/PIG-1491
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Scott Carey

 I have a failure that occurs during planning while using DISTINCT in a nested 
 FOREACH. 
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizer.visitMROp(SecondaryKeyOptimizer.java:352)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:218)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:40)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-534) Illustrate can't handle Map's or NULLs

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-534:
---

Fix Version/s: 0.9.0

 Illustrate can't handle Map's or NULLs
 --

 Key: PIG-534
 URL: https://issues.apache.org/jira/browse/PIG-534
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Ian Holsman
 Fix For: 0.9.0

 Attachments: Illustrate.patch


 when I 'illustrate' a record that contains a map, or has a NULL it crashes 
 with a NPE.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1491) Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to POLocalRearrange

2010-07-09 Thread Scott Carey (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886907#action_12886907
 ] 

Scott Carey commented on PIG-1491:
--

Full stack trace at the end of this comment.  

The pig script used is a couple hundred lines long.  But the individual chunk 
that I can change to trigger the issue is the following:

DATA_G = COGROUP A by (a, b, c, d) OUTER, B by (a, b, c, d) OUTER;
DATA = FOREACH DATA_G {
a_items = DISTINCT A.x;
b_items = DISTINCT B.x;
GENERATE
FLATTEN(group) as (a,b,c,d),
SUM(A.m) as m,
SUM(A.n) as n,
COUNT(a_items) as a_item_count,
(long)(SUM(B.u) + (double)0.5) as u,
(long)(SUM(B.v) + (double)0.5) as v,
COUNT(b_items) as b_item_count;
}

Removing both of the DISTINCT temporary aliases and not generating those counts 
works fine.  Adding either one of them causes it to fail.





ERROR 2043: Unexpected error during execution.

org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected 
error during execution.
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:318)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1016)
at org.apache.pig.PigServer.execute(PigServer.java:1009)
at org.apache.pig.PigServer.access$100(PigServer.java:114)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1261)
at org.apache.pig.PigServer.executeBatch(PigServer.java:326)
at 
org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:110)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:167)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:139)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90)
at org.apache.pig.Main.main(Main.java:335)
Caused by: java.lang.ClassCastException: 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad
 cannot be cast to 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizer.visitMROp(SecondaryKeyOptimizer.java:352)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:218)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:40)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
at 
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:446)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:108)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:294)
... 10 more


 Failure planning nested FOREACH with DISTINCT, POLoad cannot be cast to 
 POLocalRearrange
 

 Key: PIG-1491
 URL: https://issues.apache.org/jira/browse/PIG-1491
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Scott Carey

 I have a failure that occurs during planning while using DISTINCT in a nested 
 FOREACH. 
 Caused by: java.lang.ClassCastException: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLoad
  cannot be cast to 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.SecondaryKeyOptimizer.visitMROp(SecondaryKeyOptimizer.java:352)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:218)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:40)
 at 
 org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1434) Allow casting relations to scalars

2010-07-09 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12886914#action_12886914
 ] 

Aniket Mokashi commented on PIG-1434:
-

Adding this support makes pig code complicated/hacky, because we conclude any 
not parsed alias (AliasFieldOrSpec) as scalar and try to resolve it as scalar 
at runtime.

To simplify, square bracketed syntax is a better idea, for example- 
{code}
Y = foreach Z generate X::$1/(long) [C].count, X::$2-(long) [C].max;
{code}
Otherwise, such queries (if typed by mistakes) can result into non-intuitive 
errors for users.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-698) Simple join fails on records not loaded with schema

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-698:
---

Fix Version/s: 0.9.0

 Simple join fails on records not loaded with schema
 ---

 Key: PIG-698
 URL: https://issues.apache.org/jira/browse/PIG-698
 Project: Pig
  Issue Type: Bug
  Components: impl
 Environment: Yahoo! clusters.
Reporter: Peter Arthur Ciccolo
 Fix For: 0.9.0


 Joins can fail with an out-of-bounds access to fields that are not referenced 
 in the script when records without schema (including all variable-length 
 records) are involved.
 Example by Ben Reed:
 i1:
 1   c   D   E
 1   a   B
 i2:
 0
 0   Q
 1   x   z
 1   a   b   c
 i1 = load 'i1';   
   

 i2 = load 'i2';   
   

 j = join i1 by $0, i2 by $0;  
   

 dump j

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1389) Implement Pig counter to track number of rows for each input files

2010-07-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1389:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 Implement Pig counter to track number of rows for each input files 
 ---

 Key: PIG-1389
 URL: https://issues.apache.org/jira/browse/PIG-1389
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1389.patch, PIG-1389.patch, PIG-1389_1.patch, 
 PIG-1389_2.patch


 A MR job generated by Pig not only can have multiple outputs (in the case of 
 multiquery) but also can have multiple inputs (in the case of join or 
 cogroup). In both cases, the existing Hadoop counters (e.g. 
 MAP_INPUT_RECORDS, REDUCE_OUTPUT_RECORDS) can not be used to count the number 
 of records in the given input or output.  PIG-1299 addressed the case of 
 multiple outputs.  We need to add new counters for jobs with multiple inputs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-809) number of input lines it processed, number of output lines it produced for PIG job

2010-07-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-809.
--

Resolution: Fixed

 number of input lines it processed, number of output lines it produced for 
 PIG job
 --

 Key: PIG-809
 URL: https://issues.apache.org/jira/browse/PIG-809
 Project: Pig
  Issue Type: Improvement
  Components: impl
 Environment: Linux
Reporter: Supreeth
Assignee: Richard Ding
 Fix For: 0.8.0


 Excerpt from the mail conversation.
 It will be a great addition to Pig. Hadoop currently provides all these
 counters. All Pig has to do is to add them up for all Hadoop jobs in the
 script, and emit them at the end of the script. File a jira ?
 - Milind
 On 5/13/09 8:16 AM, Supreeth Hosur Nagesh Rao supre...@yahoo-inc.com
 wrote:
   Hi Olga
   
   With every PIG job is there any way for us to trap into the operational
   stats of that job, like number of input lines it processed, number of
   output lines it produced?
   
   I dont want to have a separate PIG script to do the same as it may be
   additional parsing, so is there such a stat. If not can that be
   provided, and exposed as a config parameter?
   
   -Supreeth
 This will be a great feature to have for our processing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-696) Fatal error produced when malformed scalar types within complex type is converted to given type

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-696?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-696:
---

Fix Version/s: 0.9.0

 Fatal error produced when malformed scalar types within complex type is 
 converted to given type
 ---

 Key: PIG-696
 URL: https://issues.apache.org/jira/browse/PIG-696
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
 Fix For: 0.9.0


 Instead of fatal error, the failed conversions should result in null values.
 Example -
 grunt  cat cbag3.dat
 {(asdf)}
 {(2344)}
 {(2344}
 {(323423423423434)}
 {(323423423423434L)}
 {(asdff)}
 grunt A = load 'cbag3.dat' as (f1:bag{t:tuple(i:int)});  B = foreach A 
 generate flatten(f1);  C = foreach B generate $0 + 1; dump C;
 2009-03-03 14:25:19,604 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-03-03 14:25:44,628 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Map reduce job failed
 2009-03-03 14:25:44,642 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2043: Unexpected error during execution.
 Details at logfile: /d1/tejas/pig_1236118410343.log
 tail  /d1/tejas/pig_1236118410343.log
   Caused by: java.lang.ClassCastException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Add.getNext(Add.java:110)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:260)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:198)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:217)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:208)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
 The 'conversion' of scalar types in complex types is happening in the 
 physicaloperators, and not in the loaders. The expressions (such as Add in 
 example) attempts to cast input to given type, and ClassCastException is 
 thrown when conversion fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-709) Handling of NULL in Pig builtin functions needs to be reviewed

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-709:
---

Fix Version/s: 0.9.0

 Handling of NULL in Pig builtin functions needs to be reviewed
 --

 Key: PIG-709
 URL: https://issues.apache.org/jira/browse/PIG-709
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Santhosh Srinivasan
 Fix For: 0.9.0


 Pig builtin functions do not handle NULL consistently. Some examples are the 
 combiner versus non-combiner for AVG. All the builtins need a review of cases 
 where NULL is handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-694) Schema merge should take into account bags with tuples and bags with schemas

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-694:
---

Fix Version/s: 0.9.0

 Schema merge should take into account bags with tuples and bags with schemas
 

 Key: PIG-694
 URL: https://issues.apache.org/jira/browse/PIG-694
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: 0.9.0


 The merge method in Schema does not treat bags with schemas and bags with 
 tuples as equivalent. This will bring closure to PIG-448 and PIG-577.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-702) Computation of number of reducers in order by has to change for a static cluster (Hadoop 20)

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-702?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-702.


Resolution: Fixed

 Computation of number of reducers in order by has to change for a static 
 cluster (Hadoop 20)
 

 Key: PIG-702
 URL: https://issues.apache.org/jira/browse/PIG-702
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan

 In Hadoop 20, a static cluster is shared amongst multiple queues. The 
 computation of the number of reducers in order by queries when parallel 
 keyword needs to change. Currently, the number of reducers is computed as a 
 function of the number of reduce slots and the number of nodes in the 
 cluster. Conservatively, this should revert to one reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-716) skinconf.xml file - removed CC initials

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-716:
---

   Status: Resolved  (was: Patch Available)
Fix Version/s: 0.8.0
   Resolution: Fixed

 skinconf.xml file - removed CC initials
 ---

 Key: PIG-716
 URL: https://issues.apache.org/jira/browse/PIG-716
 Project: Pig
  Issue Type: Task
Reporter: Corinne Chandel
 Fix For: 0.8.0

 Attachments: skinconf.patch


 Removed my initials (CC) from the skinconf.xml file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-723) Pig generates incorrect schema for generated bags after FOREACH.

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-723:
---

Fix Version/s: 0.9.0

 Pig generates incorrect schema for generated bags after FOREACH.
 

 Key: PIG-723
 URL: https://issues.apache.org/jira/browse/PIG-723
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.1.0
 Environment: Linux
 $pig --version
 Apache Pig version 0.1.0-dev (r750430)
 compiled Mar 07 2009, 09:20:13
Reporter: Dhruv M
 Fix For: 0.9.0


 grunt rf_src = LOAD 'rf_test.txt' USING PigStorage(',') AS (lhs:chararray, 
 rhs:chararray, r:float, p:float, c:float);
 grunt rf_grouped = GROUP rf_src BY rhs;  
 
 grunt lhs_grouped = FOREACH rf_grouped GENERATE group as rhs, rf_src.(lhs, 
 r) as lhs, MAX(rf_src.p) as p, MAX(rf_src.c) AS c;
 grunt describe lhs_grouped;
 lhs_grouped: {rhs: chararray,lhs: {lhs: chararray,r: float},p: float,c: float}
 I think it should be:
 lhs_grouped: {rhs: chararray,lhs: {(lhs: chararray,r: float)},p: float,c: 
 float}
 Because of this, we are not able to perform UNION on 2 sets because union on 
 incompatible schemas is causing a complete loss of schema information, making 
 further processing impossible.
 This is what we want to UNION with:
 grunt asrc = LOAD 'atest.txt' USING PigStorage(',') AS (rhs:chararray, 
 a:int);
 grunt aa = FOREACH asrc GENERATE rhs, (bag{tuple(chararray,float)}) null as 
 lhs, -10F as p, -10F as c;
 grunt describe aa;
 aa: {rhs: chararray,lhs: {(chararray,float)},p: float,c: float}
 If there is something wrong with what I am trying to do, please let me know.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDFFinale5.patch

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterPythonUDF3.patch, RegisterPythonUDF4.patch, 
 RegisterPythonUDF_Final.patch, RegisterPythonUDFFinale.patch, 
 RegisterPythonUDFFinale3.patch, RegisterPythonUDFFinale4.patch, 
 RegisterPythonUDFFinale5.patch, RegisterScriptUDFDefineParse.patch, 
 scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1490) Make Pig storers work with remote HDFS in secure mode

2010-07-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1490:
--

Status: Patch Available  (was: Open)

 Make Pig storers work with remote HDFS in secure mode
 -

 Key: PIG-1490
 URL: https://issues.apache.org/jira/browse/PIG-1490
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0, 0.7.0

 Attachments: PIG-1490.patch


 PIG-1403 fixed the problem for Pig loaders. We need to do the same for Pig 
 storers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1490) Make Pig storers work with remote HDFS in secure mode

2010-07-09 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1490:
--

Attachment: PIG-1490.patch

 Make Pig storers work with remote HDFS in secure mode
 -

 Key: PIG-1490
 URL: https://issues.apache.org/jira/browse/PIG-1490
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1490.patch


 PIG-1403 fixed the problem for Pig loaders. We need to do the same for Pig 
 storers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-729) Use of default parallelism

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-729:
---

Fix Version/s: 0.8.0

 Use of default parallelism
 --

 Key: PIG-729
 URL: https://issues.apache.org/jira/browse/PIG-729
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
 Environment: Hadoop 0.20
Reporter: Santhosh Srinivasan
 Fix For: 0.8.0


 Currently, if the user does not specify the number of reduce slots using the 
 parallel keyword, Pig lets Hadoop decide on the default number of reducers. 
 This model worked well with dynamically allocated clusters using HOD and for 
 static clusters where the default number of reduce slots was explicitly set. 
 With Hadoop 0.20, a single static cluster will be shared amongst a number of 
 queues. As a result, a common scenario is to end up with default number of 
 reducers set to one (1).
 When users migrate to Hadoop 0.20, they might see a dramatic change in the 
 performance of their queries if they had not used the parallel keyword to 
 specify the number of reducers. In order to mitigate such circumstances, Pig 
 can support one of the following:
 1. Specify a default parallelism for the entire script.
 This option will allow users to use the same parallelism for all operators 
 that do not have the explicit parallel keyword. This will ensure that the 
 scripts utilize more reducers than the default of one reducer. On the down 
 side, due to data transformations, usually operations that are performed 
 towards the end of the script will need smaller number of reducers compared 
 to the operators that appear at the beginning of the script.
 2. Display a warning message for each reduce side operator that does have the 
 use of the explicit parallel keyword. Proceed with the execution.
 3. Display an error message indicating the operator that does not have the 
 explicit use of the parallel keyword. Stop the execution.
 Other suggestions/thoughts/solutions are welcome.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-730) problem combining schema from a union of several LOAD expressions, with a nested bag inside the schema.

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-730:
---

Fix Version/s: 0.9.0

 problem combining schema from a union of several LOAD expressions, with a 
 nested bag inside the schema.
 ---

 Key: PIG-730
 URL: https://issues.apache.org/jira/browse/PIG-730
 Project: Pig
  Issue Type: Bug
 Environment: pig local mode
Reporter: Christopher Olston
 Fix For: 0.9.0


 grunt a = load 'foo' using BinStorage as 
 (url:chararray,outlinks:{t:(target:chararray,text:chararray)});
 grunt b = union (load 'foo' using BinStorage as 
 (url:chararray,outlinks:{t:(target:chararray,text:chararray)})), (load 'bar' 
 using BinStorage as 
 (url:chararray,outlinks:{t:(target:chararray,text:chararray)}));
 grunt c = foreach a generate flatten(outlinks.target);
 grunt d = foreach b generate flatten(outlinks.target);
 --- Would expect both C and D to work, but only C works. D gives the error 
 shown below.
 --- Turns out using outlinks.t.target (instead of outlinks.target) works for 
 D but not for C.
 --- I don't care which one, but the same syntax should work for both!
 2009-03-24 13:15:05,376 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: target in {t: (target: 
 chararray,text: chararray)}
 Details at logfile: /echo/olston/data/pig_1237925683748.log
 grunt quit
 $ cat pig_1237925683748.log 
 ERROR 1000: Error during parsing. Invalid alias: target in {t: (target: 
 chararray,text: chararray)}
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. Invalid alias: target in {t: (target: chararray,text: chararray)}
 at org.apache.pig.PigServer.parseQuery(PigServer.java:317)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:276)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
 at org.apache.pig.Main.main(Main.java:321)
 Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Invalid 
 alias: target in {t: (target: chararray,text: chararray)}
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.AliasFieldOrSpec(QueryParser.java:6042)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.ColOrSpec(QueryParser.java:5898)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BracketedSimpleProj(QueryParser.java:5423)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseEvalSpec(QueryParser.java:4100)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.UnaryExpr(QueryParser.java:3967)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.CastExpr(QueryParser.java:3920)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.MultiplicativeExpr(QueryParser.java:3829)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.AdditiveExpr(QueryParser.java:3755)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.InfixExpr(QueryParser.java:3721)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItem(QueryParser.java:3617)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.FlattenedGenerateItemList(QueryParser.java:3557)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.GenerateStatement(QueryParser.java:3514)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.NestedBlock(QueryParser.java:2985)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.ForEachClause(QueryParser.java:2395)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1028)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:804)
 at 
 org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:595)
 at 
 org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:60)
 at org.apache.pig.PigServer.parseQuery(PigServer.java:310)
 ... 6 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-748) Adding documentation for IsEmpty() in the Pig Latin Reference Manual

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-748.


Fix Version/s: 0.7.0
   Resolution: Fixed

 Adding documentation for IsEmpty() in the Pig Latin Reference Manual
 

 Key: PIG-748
 URL: https://issues.apache.org/jira/browse/PIG-748
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Priority: Minor
 Fix For: 0.7.0


 The built-in functions listed in:
 http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm
 do not include IsEmpty. 
 Kindly, add documentation for this useful built-in function.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-731) Passing semicolon as a parameter in UDF causes parser error

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-731:
---

Fix Version/s: 0.9.0

 Passing semicolon as a parameter in UDF causes parser error 
 

 Key: PIG-731
 URL: https://issues.apache.org/jira/browse/PIG-731
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
 Fix For: 0.9.0

 Attachments: CONCATSEP.jar, semicolonerr.pig


 Pig script, which uses a UDF  loads in 3 chararray columns, and then 
 concatenates columns 2 and 3 using a semicolon.
 {code}
 register CONCATSEP.jar;
 A = LOAD 'someinput/*' USING PigStorage(';') as 
 (col1:chararray,col2:chararray,col3:chararray);
 B = FOREACH A GENERATE col1, string.CONCATSEP(';',col2,col3) as newcol;
 STORE B INTO 'someoutput' USING PigStorage(';');
 {code}
 The following script causes an error during the parsing stage due to the 
 semicolon present in the UDF.
 =
 2009-03-24 15:50:56,454 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Lexical error at line 3, column 49.  Encountered: 
 EOF after : \';
 Details at logfile: /homes/viraj/pig-svn/trunk/pig_1237935055635.log
 =
 There is no workaround for the same, expect to hardcode this in the UDF.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-767) Schema reported from DESCRIBE and actual schema of inner bags are different.

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-767:
---

Fix Version/s: 0.9.0
  Description: 
The following script:

urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, 
pg:bytearray);
-- describe and dump are in-sync
DESCRIBE urlContents;
DUMP urlContents;

urlContentsG = GROUP urlContents BY url;
DESCRIBE urlContentsG;

urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;

DESCRIBE urlContentsF;
DUMP urlContentsF;


Prints for the DESCRIBE commands:

urlContents: {url: chararray,pg: chararray}
urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
urlContentsF: {group: chararray,pg: {pg: chararray}}

The reported schemas for urlContentsG and urlContentsF are wrong. They are also 
against the section Schemas for Complex Data Types in 
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.

As expected, actual data observed from DUMP urlContentsG and DUMP urlContentsF 
do contain the tuple inside the inner bags.

The correct schema for urlContentsG is:  {group: chararray,urlContents: 
{t1:(url: chararray,pg: chararray)}}

This may sound like a technicality, but it isn't. For instance, a UDF that 
assumes an inner bag of {chararray} will not work with {(chararray)}. 





  was:

The following script:

urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, 
pg:bytearray);
-- describe and dump are in-sync
DESCRIBE urlContents;
DUMP urlContents;

urlContentsG = GROUP urlContents BY url;
DESCRIBE urlContentsG;

urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;

DESCRIBE urlContentsF;
DUMP urlContentsF;


Prints for the DESCRIBE commands:

urlContents: {url: chararray,pg: chararray}
urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
urlContentsF: {group: chararray,pg: {pg: chararray}}

The reported schemas for urlContentsG and urlContentsF are wrong. They are also 
against the section Schemas for Complex Data Types in 
http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.

As expected, actual data observed from DUMP urlContentsG and DUMP urlContentsF 
do contain the tuple inside the inner bags.

The correct schema for urlContentsG is:  {group: chararray,urlContents: 
{t1:(url: chararray,pg: chararray)}}

This may sound like a technicality, but it isn't. For instance, a UDF that 
assumes an inner bag of {chararray} will not work with {(chararray)}. 






 Schema reported from DESCRIBE and actual schema of inner bags are different.
 

 Key: PIG-767
 URL: https://issues.apache.org/jira/browse/PIG-767
 Project: Pig
  Issue Type: Bug
Reporter: George Mavromatis
 Fix For: 0.9.0


 The following script:
 urlContents = LOAD 'inputdir' USING BinStorage() AS (url:bytearray, 
 pg:bytearray);
 -- describe and dump are in-sync
 DESCRIBE urlContents;
 DUMP urlContents;
 urlContentsG = GROUP urlContents BY url;
 DESCRIBE urlContentsG;
 urlContentsF = FOREACH urlContentsG GENERATE group,urlContents.pg;
 DESCRIBE urlContentsF;
 DUMP urlContentsF;
 Prints for the DESCRIBE commands:
 urlContents: {url: chararray,pg: chararray}
 urlContentsG: {group: chararray,urlContents: {url: chararray,pg: chararray}}
 urlContentsF: {group: chararray,pg: {pg: chararray}}
 The reported schemas for urlContentsG and urlContentsF are wrong. They are 
 also against the section Schemas for Complex Data Types in 
 http://wiki.apache.org/pig-data/attachments/FrontPage/attachments/plrm.htm#_Schemas.
 As expected, actual data observed from DUMP urlContentsG and DUMP 
 urlContentsF do contain the tuple inside the inner bags.
 The correct schema for urlContentsG is:  {group: chararray,urlContents: 
 {t1:(url: chararray,pg: chararray)}}
 This may sound like a technicality, but it isn't. For instance, a UDF that 
 assumes an inner bag of {chararray} will not work with {(chararray)}. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-742) Spaces could be optional in Pig syntax

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-742.


Resolution: Duplicate

This is duplicate of PIG-19

 Spaces could be optional in Pig syntax
 --

 Key: PIG-742
 URL: https://issues.apache.org/jira/browse/PIG-742
 Project: Pig
  Issue Type: Wish
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Priority: Minor

 The following Pig statements generate an error if there is no space between A 
  and =
 {code}
 A=load 'quf.txt' using PigStorage() as (q, u, f:long);
 B = group A by (q);
 C = foreach B {
 F = order A by f desc;
 generate F;
 };
 describe C;
 dump C;
 {code}
 2009-03-31 17:14:15,959 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Encountered
  PATH A=load  at line 1, column 1.
 Was expecting one of:
 EOF 
 cat ...
 cd ...
 cp ...
 copyFromLocal ...
 copyToLocal ...
 dump ...
 describe ...
 aliases ...
 explain ...
 help ...
 kill ...
 ls ...
 mv ...
 mkdir ...
 pwd ...
 quit ...
 register ...
 rm ...
 rmf ...
 set ...
 illustrate ...
 run ...
 exec ...
 scriptDone ...
  ...
 EOL ...
 ; ...
 It would be nice if the parser would not expect these space requirements 
 between an alias and =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-769) COUNT fails on local mode but executes correctly on grid mode for the same data

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-769.


Fix Version/s: 0.7.0
   Resolution: Fixed

Local mode now uses the same code path as MR mode. Please, re-open if this is 
till a problem.

 COUNT fails on local mode but executes correctly on grid mode for the same 
 data
 ---

 Key: PIG-769
 URL: https://issues.apache.org/jira/browse/PIG-769
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: George Mavromatis
 Fix For: 0.7.0


 The following script run on the grid executes correctly. It  prints (4L) for 
 '/user/gmavr/k_sample_preprocessed_withj_sample'
 On local mode (invoked with -x local) and the same data in the local 
 filesystem, it failes with:
 -2009-04-11 03:23:15,155 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore
  - Received error from storer function: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
 computing count in COUNT
 %declare k_sample_preprocessed_withj 
 '/user/gmavr/k_sample_preprocessed_withj_sample';
 -- %declare k_sample_preprocessed_withj 
 '/homes/gmavr/mlrSite/k_sample_preprocessed_withj_sample';
 webdataFiltered = LOAD '$k_sample_preprocessed_withj' USING BinStorage() AS 
 (url:chararray, pg:bytearray);
 X1 = GROUP webdataFiltered ALL;
 Y1 = FOREACH X1 GENERATE COUNT(*);
 DUMP Y1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-768) Schema of a relation reported by DESCRIBE and allowed operations on the relation are not compatible

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-768:
---

Fix Version/s: 0.9.0

 Schema of a relation reported by DESCRIBE and allowed operations on the 
 relation are not compatible
 ---

 Key: PIG-768
 URL: https://issues.apache.org/jira/browse/PIG-768
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: George Mavromatis
 Fix For: 0.9.0


 The DESCIBE command in the following script  prints:
 {s: bytearray, pg: bytearray, wm: bytearray}
 However, the script later treats the s field of urlMap as a map instead of a 
 bytearray, as shown in s#'Url'.
 Pig does not complain about this contradiction and at execution time, the s 
 field is treated as hash, although it was reported as byterray at parse time.
 Pig should either not report s as a byterray or exit with a parsing error.
 Note that all above operations happen before the query executes at the 
 cluster.
 register WebDataProcessing.jar; 
 register opencrawl.jar; 
 urlMap = LOAD '$input' USING opencrawl.pigudf.WebDataLoader() AS (s, pg, wm);
 DESCRIBE urlMap;
 -- in fact the loader in the WebDataProcessing.jar populates s and pg as 
 s:map[], pg:bag{t1:(contents:bytearray)}
 -- and defines that in determineSchema() but pig describe ignores it!
 urlMap2 = LIMIT urlMap 20;
 urlList2 = FOREACH urlMap2 GENERATE s#'Url', pg;
 DESCRIBE urlList2;
 STORE urlList2 INTO 'output2' USING BinStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-798) Schema errors when using PigStorage and none when using BinStorage in FOREACH??

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-798:
---

Fix Version/s: 0.9.0

 Schema errors when using PigStorage and none when using BinStorage in 
 FOREACH??
 ---

 Key: PIG-798
 URL: https://issues.apache.org/jira/browse/PIG-798
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0, 0.3.0, 0.4.0, 0.5.0, 0.6.0, 0.7.0, 0.8.0
Reporter: Viraj Bhat
 Fix For: 0.9.0

 Attachments: binstoragecreateop, schemaerr.pig, visits.txt


 In the following script I have a tab separated text file, which I load using 
 PigStorage() and store using BinStorage()
 {code}
 A = load '/user/viraj/visits.txt' using PigStorage() as (name:chararray, 
 url:chararray, time:chararray);
 B = group A by name;
 store B into '/user/viraj/binstoragecreateop' using BinStorage();
 dump B;
 {code}
 I later load file 'binstoragecreateop' in the following way.
 {code}
 A = load '/user/viraj/binstoragecreateop' using BinStorage();
 B = foreach A generate $0 as name:chararray;
 dump B;
 {code}
 Result
 ===
 (Amy)
 (Fred)
 ===
 The above code work properly and returns the right results. If I use 
 PigStorage() to achieve the same, I get the following error.
 {code}
 A = load '/user/viraj/visits.txt' using PigStorage();
 B = foreach A generate $0 as name:chararray;
 dump B;
 {code}
 ===
 {code}
 2009-05-02 03:58:50,662 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1022: Type mismatch merging schema prefix. Field Schema: bytearray. Other 
 Field Schema: name: chararray
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1241236728311.log
 {code}
 ===
 So why should the semantics of BinStorage() be different from PigStorage() 
 where is ok not to specify a schema??? Should it not be consistent across 
 both.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-813) Semantics of * and count

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-813.


Resolution: Fixed

Looks like this has been committed but not closed

 Semantics of * and count
 

 Key: PIG-813
 URL: https://issues.apache.org/jira/browse/PIG-813
 Project: Pig
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.2.0
Reporter: George Mavromatis
Assignee: Benjamin Reed

 Continuation of PIG-812. See PIG-812 for more details.
 In order for this to be resolved in the right manner the following must added 
 in the http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html
 1) The semantics of * as explained by Olga.
 2) An example of GROUP ALL
 Otherwise people will waste their time doing the same (documentation-caused) 
 mistakes again.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-838) Parser does not handle ctrl-m ('\u000d') as argument to PigStorage

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-838:
---

Fix Version/s: 0.9.0

 Parser does not handle ctrl-m ('\u000d') as argument to PigStorage
 --

 Key: PIG-838
 URL: https://issues.apache.org/jira/browse/PIG-838
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Pradeep Kamath
 Fix For: 0.9.0


 An script which has 
 a = load 'input' using PigStorage('\u000d');
  
 produces the following error:
 2009-06-05 14:47:49,241 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Lexical error at line 1, column 47.  Encountered: 
 \r (13), after : \'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-859) Optimizer throw error on self-joins

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-859:
---

Fix Version/s: 0.9.0

 Optimizer throw error on self-joins
 ---

 Key: PIG-859
 URL: https://issues.apache.org/jira/browse/PIG-859
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
 Fix For: 0.9.0


 Doing self-join results in exception thrown by Optimizer. Consider the 
 following query
 {code}
 grunt A = load 'a';
 grunt B = Join A by $0, A by $0;
 grunt explain B;
 2009-06-20 15:51:38,303 [main] ERROR org.apache.pig.tools.grunt.Grunt -
 ERROR 1094: Attempt to insert between two nodes that were not connected.
 Details at logfile: pig_1245538027026.log
 {code}
 Relevant stack-trace from log-file:
 {code}
 Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR
 2047: Internal error. Unable to introduce split operators.
 at
 org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:163)
 at
 org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:163)
 at org.apache.pig.PigServer.compileLp(PigServer.java:844)
 at org.apache.pig.PigServer.compileLp(PigServer.java:781)
 at org.apache.pig.PigServer.getStorePlan(PigServer.java:723)
 at org.apache.pig.PigServer.explain(PigServer.java:566)
 ... 8 more
 Caused by: org.apache.pig.impl.plan.PlanException: ERROR 1094: Attempt
 to insert between two nodes that were not connected.
 at
 org.apache.pig.impl.plan.OperatorPlan.doInsertBetween(OperatorPlan.java:500)
 at
 org.apache.pig.impl.plan.OperatorPlan.insertBetween(OperatorPlan.java:480)
 at
 org.apache.pig.impl.logicalLayer.optimizer.ImplicitSplitInserter.transform(ImplicitSplitInserter.java:139)
 ... 13 more
 {code}
 A possible workaround is:
 {code}
 grunt A = load 'a';
 grunt B = load 'a';
 grunt C = join A by $0, B by $0;
 grunt explain C;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-801) Pig needs to handle scalar aliases to improve programmer and code execution efficiency

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-801:
---

 Assignee: Aniket Mokashi
Fix Version/s: 0.8.0

 Pig needs to handle scalar aliases to improve programmer and code execution 
 efficiency
 --

 Key: PIG-801
 URL: https://issues.apache.org/jira/browse/PIG-801
 Project: Pig
  Issue Type: New Feature
Reporter: David Ciemiewicz
Assignee: Aniket Mokashi
 Fix For: 0.8.0


 In Pig, it is often the case that the result of an operation is a scalar 
 value that needs to be applied to the next step of processing.
 For example:
 * FILTER by MAX of group -- See: PIG-772
 * Compute proportions by dividing by total (SUM) of grouped alias
 Today Pig programmers need to go through distasteful and slow contortions of 
 using FLATTEN or CROSS to propagate the scalar computation to EVERY row of 
 data to perform these operations creating needless copies of data.  Or, the 
 user must write the global sum to a file, then read it back in to gain the 
 efficiency.
 If the language were simply extended to have the notion of scalar aliases, 
 then coding would be simplified without contortions for the programmer and, I 
 believe, execution of the code would be faster too.
 For instance, to compute global proportions, I want to do the following:
 {code}
 CountryPopulations = load 'country.dat' using PigStorage() as ( country: 
 chararray, population: long );
 AllCountryPopulations= group CountryPopulations all;
 Total = foreach AllCountryPopulations generate 
 SUM(CountryPopulations.population) as population;
 PopulationProportions = foreach CountryPopulations generate
 country, population, (double)population / (double)Total.population as 
 global_proportion;
 {code}
 One of the very distasteful workarounds for this is to do something like:
 {code}
 CountryPopulations = load 'country.dat' using PigStorage() as ( country: 
 chararray, population: long );
 AllCountryPopulations= group CountryPopulations all;
 Total = foreach AllCountryPopulations generate 
 SUM(CountryPopulations.population) as population;
 CountryPopulationsTotal = cross CountryPopulations, Total;
 PopulationProportions = foreach CountryPopulations generate
 CountryPopulations::country,
 CountryPopulations::population,
 (double)CountryPopulations::population / (double)Total::population as 
 global_proportion;
 {code}
 This just makes me cringe every time I have to do it.  Constructing new rows 
 of data simply to apply
 the same scalar value row after row after row for potentially billions of 
 rows of data just feels horribly wrong
 and inefficient both from the coding standpoint and from the execution 
 standpoint.
 In SQL, I'd just code this as:
 {code}
 select
  country,
  population,
  population / SUM(population)
 from
  CountryPopulations;
 {code}
 In writing a SQL to Pig translator, it would seem that this construct or 
 idiom would need to be supported, so why not create a higher level of Pig 
 which would support the notion of scalars efficiently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-815) misleading error message when streaming fails

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-815:
---

Fix Version/s: 0.9.0

 misleading error message when streaming fails
 -

 Key: PIG-815
 URL: https://issues.apache.org/jira/browse/PIG-815
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.9.0


 One of the users reported seeing a confusing message: Jobs not found in the 
 JobClient. Please try to use Local, Hadoop Distributed or Hadoop MiniCluster 
 modes instead of Hadoop LocalExecution ERROR 2055: Received Error while 
 processing the map plan: 'process.pl ' failed with exit status: 255 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-870) Pig is broken loading .gz files

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-870:
---

Fix Version/s: 0.7.0

This is no longer relevant with Pig 0.7.0

 Pig is broken loading .gz files
 ---

 Key: PIG-870
 URL: https://issues.apache.org/jira/browse/PIG-870
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Priority: Minor
 Fix For: 0.7.0


 Looks like the code is trying to split a gz file which is not supported. In 
 general, gz is a poor choice for compression with Pig since the 
 parallelization is limitted to the number of files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-870) Pig is broken loading .gz files

2010-07-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-870.


Resolution: Fixed

 Pig is broken loading .gz files
 ---

 Key: PIG-870
 URL: https://issues.apache.org/jira/browse/PIG-870
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Priority: Minor
 Fix For: 0.7.0


 Looks like the code is trying to split a gz file which is not supported. In 
 general, gz is a poor choice for compression with Pig since the 
 parallelization is limitted to the number of files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >