date:20100804

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1178:

Attachment: PIG-1178-5.patch

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

Attachments: expressions-2.patch, expressions.patch, lp.patch,
lp.patch, PIG-1178-4.patch, PIG-1178-5.patch, pig_1178.patch, pig_1178.patch,
PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch,
pig_1178_3.4.patch, pig_1178_3.patch

The current implementation of the logical plan and the logical optimizer in
Pig has proven to not be easily extensible. Developer feedback has indicated
that adding new rules to the optimizer is quite burdensome. In addition, the
logical plan has been an area of numerous bugs, many of which have been
difficult to fix. Developers also feel that the logical plan is difficult to
understand and maintain. The root cause for these issues is that a number of
design decisions that were made as part of the 0.2 rewrite of the front end
have now proven to be sub-optimal. The heart of this proposal is to revisit a
number of those proposals and rebuild the logical plan with a simpler design
that will make it much easier to maintain the logical plan as well as extend
the logical optimizer.
See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full
details.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1178:

Status: Open (was: Patch Available)

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1178:

Status: Patch Available (was: Open)

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with


[ 
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895165#action_12895165
 ] 

Daniel Dai commented on PIG-1178:
-

Did some restructure and bug fixing. Also move package from experimental to 
newplan.

 LogicalPlan and Optimizer are too complex and hard to work with
 ---

 Key: PIG-1178
 URL: https://issues.apache.org/jira/browse/PIG-1178
 Project: Pig
  Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: expressions-2.patch, expressions.patch, lp.patch, 
 lp.patch, PIG-1178-4.patch, PIG-1178-5.patch, pig_1178.patch, pig_1178.patch, 
 PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, 
 pig_1178_3.4.patch, pig_1178_3.patch


 The current implementation of the logical plan and the logical optimizer in 
 Pig has proven to not be easily extensible. Developer feedback has indicated 
 that adding new rules to the optimizer is quite burdensome. In addition, the 
 logical plan has been an area of numerous bugs, many of which have been 
 difficult to fix. Developers also feel that the logical plan is difficult to 
 understand and maintain. The root cause for these issues is that a number of 
 design decisions that were made as part of the 0.2 rewrite of the front end 
 have now proven to be sub-optimal. The heart of this proposal is to revisit a 
 number of those proposals and rebuild the logical plan with a simpler design 
 that will make it much easier to maintain the logical plan as well as extend 
 the logical optimizer. 
 See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full 
 details.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-08-04 Thread Ankur (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: (was: jira-1229-final.test-fix.patch)

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-v2.patch, 
 jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-08-04 Thread Ankur (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229-final.test-fix.patch

Aaron,
 Autocommit() was not the issue.  It was the usage of 
jdbc:hsqldb:file: url in the STORE function that was the problem. Replacing 
it with jdbc:hsqldb:hsql://localhost/dbname solved the issue. Attaching the 
updated patch with the test case modification.

Really appreciate your help here. Thanks a lot :-)

 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
 jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1461) support union operation that merges based on column names

[
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895212#action_12895212
]

Hadoop QA commented on PIG-1461:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12451175/PIG-1461.1.patch
against trunk revision 981984.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 6 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

-1 release audit. The applied patch generated 407 release audit warnings
(more than the trunk's current 405 warnings).

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/372/testReport/
Release audit warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/372/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/372/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/372/console

This message is automatically generated.

support union operation that merges based on column names
-

Key: PIG-1461
URL: https://issues.apache.org/jira/browse/PIG-1461
Project: Pig
Issue Type: New Feature
Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1461.1.patch, PIG-1461.patch

When the data has schema, it often makes sense to union on column names in
schema rather than the position of the columns.
The behavior of existing union operator should remain backward compatible .
This feature can be supported using either a new operator or extending union
to support 'using' clause . I am thinking of having a new operator called
either unionschema or merge . Does anybody have any other suggestions for the
syntax ?
example -
L1 = load 'x' as (a,b);
L2 = load 'y' as (b,c);
U = unionschema L1, L2;
describe U;
U: {a:bytearray, b:byetarray, c:bytearray}

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1527) No need to deserialize UDFContext on the client side

[
https://issues.apache.org/jira/browse/PIG-1527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895310#action_12895310
]

Hadoop QA commented on PIG-1527:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12451181/PIG-1527.patch
against trunk revision 981984.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

-1 release audit. The applied patch generated 406 release audit warnings
(more than the trunk's current 405 warnings).

+1 core tests. The patch passed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/373/testReport/
Release audit warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/373/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/373/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/373/console

This message is automatically generated.

No need to deserialize UDFContext on the client side

Key: PIG-1527
URL: https://issues.apache.org/jira/browse/PIG-1527
Project: Pig
Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
Fix For: 0.8.0

Attachments: PIG-1527.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1404) PigUnit - Pig script testing simplified.

2010-08-04 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895318#action_12895318
 ] 

Ashutosh Chauhan commented on PIG-1404:
---

bq. 3. (This one is for other pig developers) Is Piggybank the right place for 
this or should we put it under test? I think this will be really useful for Pig 
users in setting up automated tests of their Pig Latin scripts. Should we 
support it outright rather than put it in piggybank and risk having it go 
unmaintained?

I think it deserves to be put in under test. Having written few end-to-end test 
cases of pig in junit, I can see its really useful for Pig itself. Usefulness 
for pig users is pretty obvious.

 PigUnit - Pig script testing simplified. 
 -

 Key: PIG-1404
 URL: https://issues.apache.org/jira/browse/PIG-1404
 Project: Pig
  Issue Type: New Feature
Reporter: Romain Rigaux
Assignee: Romain Rigaux
 Fix For: 0.8.0

 Attachments: commons-lang-2.4.jar, PIG-1404-2.patch, 
 PIG-1404-3-doc.patch, PIG-1404-3.patch, PIG-1404-4-doc.patch, 
 PIG-1404-4.patch, PIG-1404.patch


 The goal is to provide a simple xUnit framework that enables our Pig scripts 
 to be easily:
   - unit tested
   - regression tested
   - quickly prototyped
 No cluster set up is required.
 For example:
 TestCase
 {code}
   @Test
   public void testTop3Queries() {
 String[] args = {
 n=3,
 };
 test = new PigTest(top_queries.pig, args);
 String[] input = {
 yahoo\t10,
 twitter\t7,
 facebook\t10,
 yahoo\t15,
 facebook\t5,
 
 };
 String[] output = {
 (yahoo,25L),
 (facebook,15L),
 (twitter,7L),
 };
 test.assertOutput(data, input, queries_limit, output);
   }
 {code}
 top_queries.pig
 {code}
 data =
 LOAD '$input'
 AS (query:CHARARRAY, count:INT);
  
 ... 
 
 queries_sum = 
 FOREACH queries_group 
 GENERATE 
 group AS query, 
 SUM(queries.count) AS count;
 
 ...
 
 queries_limit = LIMIT queries_ordered $n;
 STORE queries_limit INTO '$output';
 {code}
 They are 3 modes:
 * LOCAL (if pigunit.exectype.local properties is present)
 * MAPREDUCE (use the cluster specified in the classpath, same as 
 HADOOP_CONF_DIR)
 ** automatic mini cluster (is the default and the HADOOP_CONF_DIR to have in 
 the class path will be: ~/pigtest/conf)
 ** pointing to an existing cluster (if pigunit.exectype.cluster properties 
 is present)
 For now, it would be nice to see how this idea could be integrated in 
 Piggybank and if PigParser/PigServer could improve their interfaces in order 
 to make PigUnit simple.
 Other components based on PigUnit could be built later:
   - standalone MiniCluster
   - notion of workspaces for each test
   - standalone utility that reads test configuration and generates a test 
 report...
 It is a first prototype, open to suggestions and can definitely take 
 advantage of feedbacks.
 How to test, in pig_trunk:
 {code}
 Apply patch
 $pig_trunk ant compile-test
 $pig_trunk ant
 $pig_trunk/contrib/piggybank/java ant test -Dtest.timeout=99
 {code}
 (it takes 15 min in MAPREDUCE minicluster, tests will need to be split in the 
 future between 'unit' and 'integration')
 Many examples are in:
 {code}
 contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/test/pigunit/TestPigTest.java
 {code}
 When used as a standalone, do not forget commons-lang-2.4.jar and the 
 HADOOP_CONF_DIR to your cluster in your CLASSPATH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1295) Binary comparator for secondary sort

2010-08-04 Thread Gianmarco De Francisci Morales (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gianmarco De Francisci Morales updated PIG-1295:

Attachment: PIG-1295_0.12.patch

Ok, first working integration.
Modified PigTupleRawComparatorNew to use the raw comparators via TupleFactory.
Created a new class PigSecondaryKeyComparatorNew that should substitute the old
one. This one uses the raw comparators.
Modified JobControlCompiler to use the new comparators.

Moved the null/index semantic outside the raw comparators and inside the
wrappers.

Modified BinSedesTupleComparator to correctly handle sort order. The sort order
is applied to the first call to compare tuples. In case we are doing a
secondary sort, the sort orders are propagated 1 level more (because we have a
nested tuple with the keys, and we need to apply the sort orders to the content
of the outermost tuple).
The code is not the cleanest possible but TestPigTupleRawComparator and
TestSecondarySort pass.

TODO:
Implement the logic for PIG-927.
I plan to create a new interface (TupleRawComparator) and add a method to check
if during the comparison a field of type NULL was encountered. This interface
will be used instead of the simple RawComparator to hold the reference to our
raw comparators.

Write speed test.
Is there something already made that can be used to test the speed improvement?
The inputs for the unit test are of course too small.

Binary comparator for secondary sort

Key: PIG-1295
URL: https://issues.apache.org/jira/browse/PIG-1295
Project: Pig
Issue Type: Improvement
Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Gianmarco De Francisci Morales
Fix For: 0.8.0

Attachments: PIG-1295_0.1.patch, PIG-1295_0.10.patch,
PIG-1295_0.11.patch, PIG-1295_0.12.patch, PIG-1295_0.2.patch,
PIG-1295_0.3.patch, PIG-1295_0.4.patch, PIG-1295_0.5.patch,
PIG-1295_0.6.patch, PIG-1295_0.7.patch, PIG-1295_0.8.patch, PIG-1295_0.9.patch

When hadoop framework doing the sorting, it will try to use binary version of
comparator if available. The benefit of binary comparator is we do not need
to instantiate the object before we compare. We see a ~30% speedup after we
switch to binary comparator. Currently, Pig use binary comparator in
following case:
1. When semantics of order doesn't matter. For example, in distinct, we need
to do a sort in order to filter out duplicate values; however, we do not care
how comparator sort keys. Groupby also share this character. In this case, we
rely on hadoop's default binary comparator
2. Semantics of order matter, but the key is of simple type. In this case, we
have implementation for simple types, such as integer, long, float,
chararray, databytearray, string
However, if the key is a tuple and the sort semantics matters, we do not have
a binary comparator implementation. This especially matters when we switch to
use secondary sort. In secondary sort, we convert the inner sort of nested
foreach into the secondary key and rely on hadoop to sorting on both main key
and secondary key. The sorting key will become a two items tuple. Since the
secondary key the sorting key of the nested foreach, so the sorting semantics
matters. It turns out we do not have binary comparator once we use secondary
sort, and we see a significant slow down.
Binary comparator for tuple should be doable once we understand the binary
structure of the serialized tuple. We can focus on most common use cases
first, which is group by followed by a nested sort. In this case, we will
use secondary sort. Semantics of the first key does not matter but semantics
of secondary key matters. We need to identify the boundary of main key and
secondary key in the binary tuple buffer without instantiate tuple itself.
Then if the first key equals, we use a binary comparator to compare secondary
key. Secondary key can also be a complex data type, but for the first step,
we focus on simple secondary key, which is the most common use case.
We mark this issue to be a candidate project for Google summer of code 2010
program.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1496) Mandatory rule ImplicitSplitInserter


 [ 
https://issues.apache.org/jira/browse/PIG-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1496:
--

Attachment: PIG-1496.patch

More comments in code per the reviewer's comment.

 Mandatory rule ImplicitSplitInserter
 

 Key: PIG-1496
 URL: https://issues.apache.org/jira/browse/PIG-1496
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1496.patch


 Need to migrate ImplicitSplitInserter to new logical optimizer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1496) Mandatory rule ImplicitSplitInserter


 [ 
https://issues.apache.org/jira/browse/PIG-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1496:
--

Attachment: (was: PIG-1496.patch)

 Mandatory rule ImplicitSplitInserter
 

 Key: PIG-1496
 URL: https://issues.apache.org/jira/browse/PIG-1496
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1496.patch


 Need to migrate ImplicitSplitInserter to new logical optimizer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1496) Mandatory rule ImplicitSplitInserter


 [ 
https://issues.apache.org/jira/browse/PIG-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1496:
--

Status: Patch Available  (was: Open)

 Mandatory rule ImplicitSplitInserter
 

 Key: PIG-1496
 URL: https://issues.apache.org/jira/browse/PIG-1496
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1496.patch, PIG-1496.patch


 Need to migrate ImplicitSplitInserter to new logical optimizer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1496) Mandatory rule ImplicitSplitInserter


 [ 
https://issues.apache.org/jira/browse/PIG-1496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1496:
--

Attachment: PIG-1496.patch

 Mandatory rule ImplicitSplitInserter
 

 Key: PIG-1496
 URL: https://issues.apache.org/jira/browse/PIG-1496
 Project: Pig
  Issue Type: Sub-task
  Components: impl
Affects Versions: 0.8.0
Reporter: Daniel Dai
Assignee: Yan Zhou
 Fix For: 0.8.0

 Attachments: PIG-1496.patch, PIG-1496.patch


 Need to migrate ImplicitSplitInserter to new logical optimizer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1534) Code discovering UDFs in the script has a bug in a order by case

2010-08-04 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1534:


  Status: Patch Available  (was: Open)
Assignee: Pradeep Kamath

 Code discovering UDFs in the script has a bug in a order by case
 

 Key: PIG-1534
 URL: https://issues.apache.org/jira/browse/PIG-1534
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1534.patch


 Consider the following commandline:
 {noformat}
 java -cp /tmp/svncheckout/pig.jar:udf.jar:clusterdir org.apache.pig.Main -e 
 a = load 'studenttab' using udf.MyPigStorage(); b = order a by $0; dump b;
 {noformat}
 Notice there is no register udf.jar, instead udf.jar (which contains 
 udf.MyPigStorage) is in the classpath. Pig handles this case by shipping 
 udf.jar to the backend. However the above script with order by triggers the 
 bug with the following error message:
  ERROR 2997: Unable to recreate exception from backed error: 
 java.lang.RuntimeException: could not instantiate 
 'org.apache.pig.impl.builtin.RandomSampleLoader' with arguments 
 '[udf.MyPigStorage, 100]'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1534) Code discovering UDFs in the script has a bug in a order by case

2010-08-04 Thread Pradeep Kamath (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1534:


Attachment: PIG-1534.patch

Patch fixes SampleOptimizer to add the loadFunc funcspecs into the Mapreduce 
operators after optimization - this fixes the above order by error.

Here are results from running the test-patch target locally
[exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] +1 tests included.  The patch appears to include 3 new or 
modified tests.
 [exec]
 [exec] -1 javadoc.  The javadoc tool appears to have generated 1 
warning messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.
 [exec]

The javadoc warning is present on trunk and not related to this patch:
{noformat}
...
 [javadoc] Standard Doclet version 1.6.0_01
  [javadoc] Building tree for all the packages and classes...
  [javadoc] 
/tmp/svncheckout/src/org/apache/pig/newplan/logical/expression/ProjectExpression.java:192:
 warning - @param argument currentOp is not a parameter name.
  [javadoc] Building index for all the packages and classes...
...
{noformat}
Will run unit tests locally and update with results.

 Code discovering UDFs in the script has a bug in a order by case
 

 Key: PIG-1534
 URL: https://issues.apache.org/jira/browse/PIG-1534
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
 Fix For: 0.8.0

 Attachments: PIG-1534.patch


 Consider the following commandline:
 {noformat}
 java -cp /tmp/svncheckout/pig.jar:udf.jar:clusterdir org.apache.pig.Main -e 
 a = load 'studenttab' using udf.MyPigStorage(); b = order a by $0; dump b;
 {noformat}
 Notice there is no register udf.jar, instead udf.jar (which contains 
 udf.MyPigStorage) is in the classpath. Pig handles this case by shipping 
 udf.jar to the backend. However the above script with order by triggers the 
 bug with the following error message:
  ERROR 2997: Unable to recreate exception from backed error: 
 java.lang.RuntimeException: could not instantiate 
 'org.apache.pig.impl.builtin.RandomSampleLoader' with arguments 
 '[udf.MyPigStorage, 100]'

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1461) support union operation that merges based on column names


[ 
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895383#action_12895383
 ] 

Olga Natkovich commented on PIG-1461:
-

The patch looks good. A couple of comments:

(1) Looks like there is a type in the code that loads data for testing:
w.println(5\tdef\t3\t{(2,a),(2,b)}]); - contains an extra ] at the end
(2) This is not related to the patch but to the documentation above. Please, 
add info that UNION supports 2 or more inputs.
(3) In mergeSchemasByAlias, I think it is safer to make copy of the schema 
rather than just assigning it for the corner case of 1 schema.
(4) Need to add a comment about inner bag schema to 
mergeFieldSchemaFirstLevelSameAlias
(5) General comment on schema merging - we have completely different code path 
for posiiton vs. alias based merge. I am worried that we will have subtly 
different semantics either now or later.

 support union operation that merges based on column names
 -

 Key: PIG-1461
 URL: https://issues.apache.org/jira/browse/PIG-1461
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1461.1.patch, PIG-1461.patch


 When the data has schema, it often makes sense to union on column names in 
 schema rather than the position of the columns. 
 The behavior of existing union operator should remain backward compatible .
 This feature can be supported using either a new operator or extending union 
 to support 'using' clause . I am thinking of having a new operator called 
 either unionschema or merge . Does anybody have any other suggestions for the 
 syntax ?
 example -
 L1 = load 'x' as (a,b);
 L2 = load 'y' as (b,c);
 U = unionschema L1, L2;
 describe U;
 U: {a:bytearray, b:byetarray, c:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-04 Thread Aniket Mokashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Attachment: ScalarImplFinaleRebase.patch

Attaching rebased version of the patch...

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1536) use same logic for merging inner schemas in default union and union onschema

use same logic for merging inner schemas in default union and union onschema


 Key: PIG-1536
 URL: https://issues.apache.org/jira/browse/PIG-1536
 Project: Pig
  Issue Type: Task
Reporter: Thejas M Nair
 Fix For: 0.9.0


We should consider using logic for merging inner schema in case of the two 
different types of union. 

In case of 'default union', it merges the two inner schema of bags/tuples by 
position if the number of fields are same and the corresponding types are 
compatible. 

In case of 'union onschema', it considers tuple/bag with different innerschema 
to be incompatible types.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1536) use same logic for merging inner schemas in default union and union onschema


[ 
https://issues.apache.org/jira/browse/PIG-1536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895410#action_12895410
 ] 

Thejas M Nair commented on PIG-1536:



The way 'default union' deals with columns of different but compatible types in 
same position is not right. It creates a merged schema choosing a merged type, 
but there is not cast that happens to convert the rows to this type.
eg -

{code}
grunt l1 = load '/tmp/f1' as (a : chararray, t (a : int, c : long) );
grunt l2 = load '/tmp/f1' as (a : chararray, t (a : int, b : int) ); 
grunt u = union l1, l2;  
grunt describe u;
u: {a: chararray,t: (a: int,c: long)}

-- the result of u, only the rows originating from l1 will correspond to schema 
shown in describe.

MapReduce node 1-206
Map Plan
u: Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-203
|
|---u: Union[bag] - 1-202
|
|---l1: New For Each(false,false)[bag] - 1-195
|   |   |
|   |   Cast[chararray] - 1-192
|   |   |
|   |   |---Project[bytearray][0] - 1-191
|   |   |
|   |   Cast[tuple:(int,long)] - 1-194
|   |   |
|   |   |---Project[bytearray][1] - 1-193
|   |
|   |---l1: Load(/tmp/f1:org.apache.pig.builtin.PigStorage) - 1-190
|
|---l2: New For Each(false,false)[bag] - 1-201
|   |
|   Cast[chararray] - 1-198
|   |
|   |---Project[bytearray][0] - 1-197
|   |
|   Cast[tuple:(int,int)] - 1-200
|   |
|   |---Project[bytearray][1] - 1-199
|
|---l2: Load(/tmp/f1:org.apache.pig.builtin.PigStorage) - 1-196
Global sort: false


{code}

 use same logic for merging inner schemas in default union and union 
 onschema
 

 Key: PIG-1536
 URL: https://issues.apache.org/jira/browse/PIG-1536
 Project: Pig
  Issue Type: Task
Reporter: Thejas M Nair
 Fix For: 0.9.0


 We should consider using logic for merging inner schema in case of the two 
 different types of union. 
 In case of 'default union', it merges the two inner schema of bags/tuples by 
 position if the number of fields are same and the corresponding types are 
 compatible. 
 In case of 'union onschema', it considers tuple/bag with different 
 innerschema to be incompatible types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1461) support union operation that merges based on column names


[ 
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895414#action_12895414
 ] 

Thejas M Nair commented on PIG-1461:


Regarding 5, there are some differences in the way schema merge is done in both 
cases. I have created PIG-1536 to discuss/address this  .
I will make changes to address other comments.

 support union operation that merges based on column names
 -

 Key: PIG-1461
 URL: https://issues.apache.org/jira/browse/PIG-1461
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1461.1.patch, PIG-1461.patch


 When the data has schema, it often makes sense to union on column names in 
 schema rather than the position of the columns. 
 The behavior of existing union operator should remain backward compatible .
 This feature can be supported using either a new operator or extending union 
 to support 'using' clause . I am thinking of having a new operator called 
 either unionschema or merge . Does anybody have any other suggestions for the 
 syntax ?
 example -
 L1 = load 'x' as (a,b);
 L2 = load 'y' as (b,c);
 U = unionschema L1, L2;
 describe U;
 U: {a:bytearray, b:byetarray, c:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-346) Grunt (help) commands


 [ 
https://issues.apache.org/jira/browse/PIG-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-346:
---


We also need to make sure to cover commens that implemented in PigServer. This 
is from PIG-523 that I will close as duplicate of this bug

 Grunt (help) commands 
 --

 Key: PIG-346
 URL: https://issues.apache.org/jira/browse/PIG-346
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
Assignee: Olga Natkovich
 Fix For: 0.8.0


 I think there are 22 grunt commands  and 2 different lists of the 
 commands can be displayed.
 I. Grunt commands displayed with grunt help
 (1) put 22 grunt commands in alphabetical order
 (2) fix double entry for cd ... cd path and cd dir  keep cd path
 (3) fix notation for set key value ... set key 'value'
 (4) add explain
 (5) add illustrate
 (6) add help
 II. Grunt commands display with grunt asdf 
 The asdf is a mistake and generates msg Was expecting one of: and list of 
 grunt commands
 (1) put 22 grunt commands in alphabetical order
 (2) add define
 (3) add du
 
 22 Grunt commands in aphabetical order:
 cat src
 cd path
 copyFromLocal localsrc dst
 copyToLocal src localdst
 cp src dst
 define functionAlias functionSpec
 describe alias
 dump alias
 du path
 explain
 help
 illustrate
 kill job_id
 ls path
 mkdir path
 mv src dst
 pwd
 quit
 register udfJar
 rm src
 set key 'value'
 store alias into filename [using functionSpec]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-523) help in grunt should show all commands


 [ 
https://issues.apache.org/jira/browse/PIG-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-523.


Resolution: Duplicate

I moved this into PIG-346 

 help in grunt should show all commands
 --

 Key: PIG-523
 URL: https://issues.apache.org/jira/browse/PIG-523
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Olga Natkovich
Priority: Minor
 Fix For: 0.8.0


 curently, it only show commands directly supported by grunt parser and not 
 command supported by pig parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1461) support union operation that merges based on column names


[ 
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895430#action_12895430
 ] 

Thejas M Nair commented on PIG-1461:


Regarding Documentation for UNION ONSCHEMA:  -
As Olga mentioned, like the default union, 'union onschema' also supports 2 or 
more inputs.

 support union operation that merges based on column names
 -

 Key: PIG-1461
 URL: https://issues.apache.org/jira/browse/PIG-1461
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1461.1.patch, PIG-1461.patch


 When the data has schema, it often makes sense to union on column names in 
 schema rather than the position of the columns. 
 The behavior of existing union operator should remain backward compatible .
 This feature can be supported using either a new operator or extending union 
 to support 'using' clause . I am thinking of having a new operator called 
 either unionschema or merge . Does anybody have any other suggestions for the 
 syntax ?
 example -
 L1 = load 'x' as (a,b);
 L2 = load 'y' as (b,c);
 U = unionschema L1, L2;
 describe U;
 U: {a:bytearray, b:byetarray, c:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-04 Thread Aniket Mokashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Patch Available  (was: Open)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-04 Thread Aniket Mokashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-1434:


Status: Open  (was: Patch Available)

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1461) support union operation that merges based on column names


 [ 
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1461:
---

Attachment: PIG-1461.2.patch

Patch with changes as suggested in code review.

 support union operation that merges based on column names
 -

 Key: PIG-1461
 URL: https://issues.apache.org/jira/browse/PIG-1461
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1461.1.patch, PIG-1461.2.patch, PIG-1461.patch


 When the data has schema, it often makes sense to union on column names in 
 schema rather than the position of the columns. 
 The behavior of existing union operator should remain backward compatible .
 This feature can be supported using either a new operator or extending union 
 to support 'using' clause . I am thinking of having a new operator called 
 either unionschema or merge . Does anybody have any other suggestions for the 
 syntax ?
 example -
 L1 = load 'x' as (a,b);
 L2 = load 'y' as (b,c);
 U = unionschema L1, L2;
 describe U;
 U: {a:bytearray, b:byetarray, c:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1461) support union operation that merges based on column names


 [ 
https://issues.apache.org/jira/browse/PIG-1461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1461:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to trunk.

 support union operation that merges based on column names
 -

 Key: PIG-1461
 URL: https://issues.apache.org/jira/browse/PIG-1461
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1461.1.patch, PIG-1461.2.patch, PIG-1461.patch


 When the data has schema, it often makes sense to union on column names in 
 schema rather than the position of the columns. 
 The behavior of existing union operator should remain backward compatible .
 This feature can be supported using either a new operator or extending union 
 to support 'using' clause . I am thinking of having a new operator called 
 either unionschema or merge . Does anybody have any other suggestions for the 
 syntax ?
 example -
 L1 = load 'x' as (a,b);
 L2 = load 'y' as (b,c);
 U = unionschema L1, L2;
 describe U;
 U: {a:bytearray, b:byetarray, c:bytearray}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1434) Allow casting relations to scalars

2010-08-04 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1434:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed
Tags: documentation

The patch is committed to trunk. Thanks Aniket for contributing this feature.

 Allow casting relations to scalars
 --

 Key: PIG-1434
 URL: https://issues.apache.org/jira/browse/PIG-1434
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: scalarImpl.patch, ScalarImpl1.patch, ScalarImpl5.patch, 
 ScalarImplFinale.patch, ScalarImplFinale1.patch, ScalarImplFinaleRebase.patch


 This jira is to implement a simplified version of the functionality described 
 in https://issues.apache.org/jira/browse/PIG-801.
 The proposal is to allow casting relations to scalar types in foreach.
 Example:
 A = load 'data' as (x, y, z);
 B = group A all;
 C = foreach B generate COUNT(A);
 .
 X = 
 Y = foreach X generate $1/(long) C;
 Couple of additional comments:
 (1) You can only cast relations including a single value or an error will be 
 reported
 (2) Name resolution is needed since relation X might have field named C in 
 which case that field takes precedence.
 (3) Y will look for C closest to it.
 Implementation thoughts:
 The idea is to store C into a file and then convert it into scalar via a UDF. 
 I believe we already have a UDF that Ben Reed contributed for this purpose. 
 Most of the work would be to update the logical plan to
 (1) Store C
 (2) convert the cast to the UDF

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1199) help includes obsolete options

[
https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895460#action_12895460
]

Hadoop QA commented on PIG-1199:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12451182/PIG-1199.patch
against trunk revision 981984.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

-1 release audit. The applied patch generated 406 release audit warnings
(more than the trunk's current 405 warnings).

-1 core tests. The patch failed core unit tests.

-1 contrib tests. The patch failed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/374/testReport/
Release audit warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/374/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/374/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/374/console

This message is automatically generated.

help includes obsolete options
--

Key: PIG-1199
URL: https://issues.apache.org/jira/browse/PIG-1199
Project: Pig
Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
Fix For: 0.8.0

Attachments: PIG-1199.patch

This is confusing to users

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1199) help includes obsolete options


[ 
https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895462#action_12895462
 ] 

Olga Natkovich commented on PIG-1199:
-

The patch just changes help message which I tested manually - hence no new 
tests. Release audit warning is in html file. The tests are failing for 
unrelated reasons. I ran test commit and I think it should be sufficient for 
this patch since it is not touching any real code.



 help includes obsolete options
 --

 Key: PIG-1199
 URL: https://issues.apache.org/jira/browse/PIG-1199
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1199.patch


 This is confusing to users

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with

[
https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895463#action_12895463
]

Hadoop QA commented on PIG-1178:

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12451203/PIG-1178-5.patch
against trunk revision 982423.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 91 new or modified tests.

-1 patch. The patch command could not apply the patch.

Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/375/console

This message is automatically generated.

LogicalPlan and Optimizer are too complex and hard to work with
---

Key: PIG-1178
URL: https://issues.apache.org/jira/browse/PIG-1178
Project: Pig
Issue Type: Improvement
Reporter: Alan Gates
Assignee: Daniel Dai
Fix For: 0.8.0

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-347) Pig (help) Commands


 [ 
https://issues.apache.org/jira/browse/PIG-347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-347.


Resolution: Fixed

 Pig (help) Commands
 ---

 Key: PIG-347
 URL: https://issues.apache.org/jira/browse/PIG-347
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
Assignee: Olga Natkovich
Priority: Minor
 Fix For: 0.8.0


 Pig help can be specified 2 ways: $pig -help and $pig -h
 I. $pig -help (seen by external/internal users)
 (1) fix
 -c, -cluster clustername, kryptonite is default 
  remove kryptonite is default
 (2) change 
 -x, -exectype local|mapreduce, mapreduce is default 
  change mapdreduce to hadoop (maintain backward compatibility)
 II. $pig -h (seen by internal users users only)
 (1) fix typos
 -l, --latest   use latest, untested, unsupported version of pig.jar instaed 
 of relased, tested, supported version.
instead of released 
 (2) fix
 -c, -cluster clustername, kryptonite is default 
  remove kryptonite is default 
 (same as above)
 (3) change:  -x, -exectype local|mapreduce, mapreduce is default ... 
  change mapdreduce to hadoop (maintain backward compatibility)
 (same as above)
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-347) Pig (help) Commands


[ 
https://issues.apache.org/jira/browse/PIG-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895473#action_12895473
 ] 

Olga Natkovich commented on PIG-347:


(1) has been done for a while
(2) we don't support hadoop. We use value mapred or mapreduce and I am not sure 
we should change it now
(3) Part || is internal to yahoo.

Closing this bug

 Pig (help) Commands
 ---

 Key: PIG-347
 URL: https://issues.apache.org/jira/browse/PIG-347
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
Assignee: Olga Natkovich
Priority: Minor
 Fix For: 0.8.0


 Pig help can be specified 2 ways: $pig -help and $pig -h
 I. $pig -help (seen by external/internal users)
 (1) fix
 -c, -cluster clustername, kryptonite is default 
  remove kryptonite is default
 (2) change 
 -x, -exectype local|mapreduce, mapreduce is default 
  change mapdreduce to hadoop (maintain backward compatibility)
 II. $pig -h (seen by internal users users only)
 (1) fix typos
 -l, --latest   use latest, untested, unsupported version of pig.jar instaed 
 of relased, tested, supported version.
instead of released 
 (2) fix
 -c, -cluster clustername, kryptonite is default 
  remove kryptonite is default 
 (same as above)
 (3) change:  -x, -exectype local|mapreduce, mapreduce is default ... 
  change mapdreduce to hadoop (maintain backward compatibility)
 (same as above)
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1199) help includes obsolete options


[ 
https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895476#action_12895476
 ] 

Thejas M Nair commented on PIG-1199:


+1 .
We should change the statement about pig.cachedbag.memusage - Note that this 
memory is shared across all large bags used by the application. .
 InternalDistinctBag and InternalSortedBag are not aware of the actual number 
of bags that it needs to share the space with. The constructor argument is 
passed 3 as the number of bags in all cases ( distinct udf, PODistinct, POSort).


 help includes obsolete options
 --

 Key: PIG-1199
 URL: https://issues.apache.org/jira/browse/PIG-1199
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1199.patch


 This is confusing to users

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1199) help includes obsolete options


[ 
https://issues.apache.org/jira/browse/PIG-1199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895480#action_12895480
 ] 

Olga Natkovich commented on PIG-1199:
-

Thanks, Thejas. I am going to leave the statement as is until we actually 
figure out what it should. This is also what our documentation states - so we 
can update both places at once as needed.

I will commit the patch once I get review from Corinne.

 help includes obsolete options
 --

 Key: PIG-1199
 URL: https://issues.apache.org/jira/browse/PIG-1199
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1199.patch


 This is confusing to users

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-346) Grunt (help) commands


 [ 
https://issues.apache.org/jira/browse/PIG-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-346:
---

Status: Patch Available  (was: Open)

 Grunt (help) commands 
 --

 Key: PIG-346
 URL: https://issues.apache.org/jira/browse/PIG-346
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-346.patch


 I think there are 22 grunt commands  and 2 different lists of the 
 commands can be displayed.
 I. Grunt commands displayed with grunt help
 (1) put 22 grunt commands in alphabetical order
 (2) fix double entry for cd ... cd path and cd dir  keep cd path
 (3) fix notation for set key value ... set key 'value'
 (4) add explain
 (5) add illustrate
 (6) add help
 II. Grunt commands display with grunt asdf 
 The asdf is a mistake and generates msg Was expecting one of: and list of 
 grunt commands
 (1) put 22 grunt commands in alphabetical order
 (2) add define
 (3) add du
 
 22 Grunt commands in aphabetical order:
 cat src
 cd path
 copyFromLocal localsrc dst
 copyToLocal src localdst
 cp src dst
 define functionAlias functionSpec
 describe alias
 dump alias
 du path
 explain
 help
 illustrate
 kill job_id
 ls path
 mkdir path
 mv src dst
 pwd
 quit
 register udfJar
 rm src
 set key 'value'
 store alias into filename [using functionSpec]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-346) Grunt (help) commands


 [ 
https://issues.apache.org/jira/browse/PIG-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-346:
---

Attachment: PIG-346.patch

 Grunt (help) commands 
 --

 Key: PIG-346
 URL: https://issues.apache.org/jira/browse/PIG-346
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-346.patch


 I think there are 22 grunt commands  and 2 different lists of the 
 commands can be displayed.
 I. Grunt commands displayed with grunt help
 (1) put 22 grunt commands in alphabetical order
 (2) fix double entry for cd ... cd path and cd dir  keep cd path
 (3) fix notation for set key value ... set key 'value'
 (4) add explain
 (5) add illustrate
 (6) add help
 II. Grunt commands display with grunt asdf 
 The asdf is a mistake and generates msg Was expecting one of: and list of 
 grunt commands
 (1) put 22 grunt commands in alphabetical order
 (2) add define
 (3) add du
 
 22 Grunt commands in aphabetical order:
 cat src
 cd path
 copyFromLocal localsrc dst
 copyToLocal src localdst
 cp src dst
 define functionAlias functionSpec
 describe alias
 dump alias
 du path
 explain
 help
 illustrate
 kill job_id
 ls path
 mkdir path
 mv src dst
 pwd
 quit
 register udfJar
 rm src
 set key 'value'
 store alias into filename [using functionSpec]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-346) Grunt (help) commands


[ 
https://issues.apache.org/jira/browse/PIG-346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895483#action_12895483
 ] 

Olga Natkovich commented on PIG-346:


I have made the following changes:

(1) Removed depricated file system commands
(2) Organized the information the same way it is organized in the documentation
(3) Added more detailed information (used info from docs)
(4) Made sure that all commands are covered

 Grunt (help) commands 
 --

 Key: PIG-346
 URL: https://issues.apache.org/jira/browse/PIG-346
 Project: Pig
  Issue Type: Bug
Reporter: Corinne Chandel
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-346.patch


 I think there are 22 grunt commands  and 2 different lists of the 
 commands can be displayed.
 I. Grunt commands displayed with grunt help
 (1) put 22 grunt commands in alphabetical order
 (2) fix double entry for cd ... cd path and cd dir  keep cd path
 (3) fix notation for set key value ... set key 'value'
 (4) add explain
 (5) add illustrate
 (6) add help
 II. Grunt commands display with grunt asdf 
 The asdf is a mistake and generates msg Was expecting one of: and list of 
 grunt commands
 (1) put 22 grunt commands in alphabetical order
 (2) add define
 (3) add du
 
 22 Grunt commands in aphabetical order:
 cat src
 cd path
 copyFromLocal localsrc dst
 copyToLocal src localdst
 cp src dst
 define functionAlias functionSpec
 describe alias
 dump alias
 du path
 explain
 help
 illustrate
 kill job_id
 ls path
 mkdir path
 mv src dst
 pwd
 quit
 register udfJar
 rm src
 set key 'value'
 store alias into filename [using functionSpec]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1150) VAR() Variance UDF


[ 
https://issues.apache.org/jira/browse/PIG-1150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895487#action_12895487
 ] 

Olga Natkovich commented on PIG-1150:
-

Dmitry, the patch is missing unit tests. Once, you add them, I will commit it.

 VAR() Variance UDF
 --

 Key: PIG-1150
 URL: https://issues.apache.org/jira/browse/PIG-1150
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.5.0
 Environment: UDF, written in Pig 0.5 contrib/
Reporter: Russell Jurney
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.8.0

 Attachments: var.patch


 I've implemented a UDF in Pig 0.5 that implements Algebraic and calculates 
 variance in a distributed manner, based on the AVG() builtin.  It works by 
 calculating the count, sum and sum of squares, as described here: 
 http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Parallel_algorithm
 Is this a worthwhile contribution?  Taking the square root of this value 
 using the contrib SQRT() function gives Standard Deviation, which is missing 
 from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1371) Pig should handle deep casting of complex types


 [ 
https://issues.apache.org/jira/browse/PIG-1371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1371:


Fix Version/s: (was: 0.8.0)

It does not look like we will have time to do this in 0.8.0

 Pig should handle deep casting of complex types 
 

 Key: PIG-1371
 URL: https://issues.apache.org/jira/browse/PIG-1371
 Project: Pig
  Issue Type: Bug
Reporter: Pradeep Kamath
Assignee: Richard Ding
 Attachments: PIG-1371-partial.patch


 Consider input data in BinStorage format which has a field of bag type - 
 bg:{t:(i:int)}. In the load statement if the schema specified has the type 
 for this field specified as bg:{t:(c:chararray)}, the current behavior is 
 that Pig thinks of the field to be of type specified in the load statement 
 (bg:{t:(c:chararray)}) but no deep cast from bag of int (the real data) to 
 bag of chararray (the user specified schema) is made.
 There are two issues currently:
 1) The TypeCastInserter only considers the byte 'type' between the loader 
 presented schema and user specified schema to decided whether to introduce a 
 cast or not. In the above case since both schema have the type bag no cast 
 is inserted. This check has to be extended to consider the full FieldSchema 
 (with inner subschema) in order to decide whether a cast is needed.
 2) POCast should be changed to handle casting a complex type to the type 
 specified the user supplied FieldSchema. Here is there is one issue to be 
 considered - if the user specified the cast type to be bg:{t:(i:int, j:int)} 
 and the real data had only one field what should the result of the cast be:
  * A bag with two fields - the int field and a null? - In this approach pig 
 is assuming the lone field in the data is the first field which might be 
 incorrect if it in fact is the second field.
  * A null bag to indicate that the bag is of unknown value - this is the one 
 I personally prefer
  * The cast throws an IncompatibleCastException

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1526) HiveColumnarLoader Partitioning Support

[
https://issues.apache.org/jira/browse/PIG-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-1526:

Status: Resolved (was: Patch Available)
Resolution: Fixed

patch committed to the trunk. Thanks Gerrit!

HiveColumnarLoader Partitioning Support
---

Key: PIG-1526
URL: https://issues.apache.org/jira/browse/PIG-1526
Project: Pig
Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Gerrit Jansen van Vuuren
Assignee: Gerrit Jansen van Vuuren
Priority: Minor
Fix For: 0.8.0

Attachments: PIG-1526-2.patch, PIG-1526.patch

I've made allot improvements on the HiveColumnarLoader:
- Added support for LoadMetadata and data path Partitioning
- Improved and simplefied column loading
Data Path Partitioning:
Hive stores partitions as folders like to
/mytable/partition1=[value]/partition2=[value]. That is the table mytable
contains 2 partitions [partition1, partition2].
The HiveColumnarLoader will scan the inputpath /mytable and add to the
PigSchema the columns partition2 and partition2.
These columns can then be used in filtering.
For example: We've got year,month,day,hour partitions in our data uploads.
So a table might look like mytable/year=2010/month=02/day=01.
Loading with the HiveColumnarLoader allows our pig scripts do filter by date
using the standard pig Filter operator.
I've added 2 classes for this:
- PathPartitioner
- PathPartitionHelper
These classes are not hive dependent and could be used by any other loader
that wants to support partitioning and helps with implementing the
LoadMetadata interface.
For this reason I though it best to put it into the package
org.apache.pig.piggybank.storage.partition.
What would be nice is in the future have the PigStorage also use these 2
classes to provide automatic path partitioning support.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1533) Compression codec should be a per-store property

2010-08-04 Thread Richard Ding (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12895496#action_12895496
 ] 

Richard Ding commented on PIG-1533:
---

Locally ran and passed core tests. 

 Compression codec should be a per-store property
 

 Key: PIG-1533
 URL: https://issues.apache.org/jira/browse/PIG-1533
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1533.patch


 The following script with multi-query optimization
 {code}
 a = load 'input';
 store a into 'outout.bz2';
 store a into 'outout2'
 {code}
 generates two .bz files, while only one of them should be compressed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1386) UDF to extend functionalities of MaxTupleBy1stField