[jira] Commented: (PIG-1438) [Performance] MultiQueryOptimizer should also merge DISTINCT jobs

2010-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12876980#action_12876980
 ] 

Hadoop QA commented on PIG-1438:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12446652/PIG-1438_1.patch
  against trunk revision 952098.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/334/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/334/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/334/console

This message is automatically generated.

 [Performance] MultiQueryOptimizer should also merge DISTINCT jobs
 -

 Key: PIG-1438
 URL: https://issues.apache.org/jira/browse/PIG-1438
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1438.patch, PIG-1438_1.patch


 Current implementation doesn't merge jobs derived from DISTINCT statements. 
 The reason is that DISTINCT jobs are implemented using a special combiner 
 (DistinctCombiner). But we should be able to merge jobs that have the same 
 type of combiner (e.g. merge multiple DISTINCT jobs into one).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)
DefaultTuple underestimate the memory footprint for string
--

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


Currently, in DefaultTuple, we estimate the memory footprint for string as if 
it is char array. The formula we use is:  length * 2 + 12. It turns out we 
underestimate the memory usage for string. Here is a list of real memory 
footprint for string we get from memory dump:

| length of string | memory in bytes |
| 7 | 56 |
| 3 | 48 |
| 1 | 40 |

I did a search and find the following formula can accurately estimate the 
memory footprint for string:
{code}
8 * (int) (((length * 2) + 45) / 8) 
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877143#action_12877143
 ] 

Daniel Dai commented on PIG-1443:
-

Reference: http://www.javamex.com/tutorials/memory/string_memory_usage.shtml

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1444) [Zebra] Zebra build should have a test-smoke target

2010-06-09 Thread Gaurav Jain (JIRA)
[Zebra] Zebra build should have a test-smoke target
---

 Key: PIG-1444
 URL: https://issues.apache.org/jira/browse/PIG-1444
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.8.0
Reporter: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0


Zebra build should have a test-smoke target that should atleast use minicluster 
for its test-cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1443:


Status: Patch Available  (was: Open)

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1443-1.patch


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1443:


Attachment: PIG-1443-1.patch

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1443-1.patch


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1438) [Performance] MultiQueryOptimizer should also merge DISTINCT jobs

2010-06-09 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877150#action_12877150
 ] 

Ashutosh Chauhan commented on PIG-1438:
---

+1 please commit.

 [Performance] MultiQueryOptimizer should also merge DISTINCT jobs
 -

 Key: PIG-1438
 URL: https://issues.apache.org/jira/browse/PIG-1438
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1438.patch, PIG-1438_1.patch


 Current implementation doesn't merge jobs derived from DISTINCT statements. 
 The reason is that DISTINCT jobs are implemented using a special combiner 
 (DistinctCombiner). But we should be able to merge jobs that have the same 
 type of combiner (e.g. merge multiple DISTINCT jobs into one).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1445) Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented

2010-06-09 Thread Daniel Dai (JIRA)
Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented 
--

 Key: PIG-1445
 URL: https://issues.apache.org/jira/browse/PIG-1445
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


The following script fail due to ERROR 2013: Moving LOLimit in front of 
LOStream is not implemented.

{code}
A = LOAD 'data';
B = STREAM A THROUGH `stream.pl`;
C = LIMIT B 10;
explain C;
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterPythonUDF2.patch

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, scripting.tgz, 
 scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-928) UDFs in scripting languages

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: RegisterScriptUDFDefineParse.patch

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1445) Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1445:


Attachment: PIG-1445-1.patch

We should not push LOLimit in front of LOStream. Attach patch.

 Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented 
 --

 Key: PIG-1445
 URL: https://issues.apache.org/jira/browse/PIG-1445
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1445-1.patch


 The following script fail due to ERROR 2013: Moving LOLimit in front of 
 LOStream is not implemented.
 {code}
 A = LOAD 'data';
 B = STREAM A THROUGH `stream.pl`;
 C = LIMIT B 10;
 explain C;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1445) Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1445:


Status: Patch Available  (was: Open)

 Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented 
 --

 Key: PIG-1445
 URL: https://issues.apache.org/jira/browse/PIG-1445
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1445-1.patch


 The following script fail due to ERROR 2013: Moving LOLimit in front of 
 LOStream is not implemented.
 {code}
 A = LOAD 'data';
 B = STREAM A THROUGH `stream.pl`;
 C = LIMIT B 10;
 explain C;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1441) New test targets: unit and smoke

2010-06-09 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-1441.
-

Resolution: Fixed

patch committed to trunk and 0.6 and 0.7 branches. Thanks Daniel for review.

 New test targets: unit and smoke
 

 Key: PIG-1441
 URL: https://issues.apache.org/jira/browse/PIG-1441
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.8.0

 Attachments: PIG-1441.patch, PIG-1441_2.patch


 As we get more and more tests, adding more structure would help us to 
 minimize time spent on testing. Here are 2 new targets I propose we add. 
 (Hadoop has the same targets for the same purposes).
 unit - to run all true unit tests (those that trully testing apis and 
 internal functionality and not running e2e tests through junit. This test 
 should run relatively quick 10-15 minutes and if we are good at adding unit 
 tests will give good covergae.
 smoke - this would be a set of a few e2e tests that provide good overall 
 coverage within about 30 minutes.
 I would say that for simple patche, we would still require only commit tests 
 while for more involved patches, the developers should run both unit and 
 smoke before submitting the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-928) UDFs in scripting languages

2010-06-09 Thread Arnab Nandi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877197#action_12877197
 ] 

Arnab Nandi commented on PIG-928:
-

 register 'test.py' lang python;

How does one define an arbitrary lang? e.g. I would like to introduce Scala 
as a UDF engine, preferably as a jar itself. i.e. something like:

register scalascript.jar;
register 'test.py' USING scala.Engine();


 

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF2.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1444) [Zebra] Zebra build should have a test-smoke target

2010-06-09 Thread Gaurav Jain (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gaurav Jain updated PIG-1444:
-

Attachment: PIG-1444.patch


patch 1

 [Zebra] Zebra build should have a test-smoke target
 ---

 Key: PIG-1444
 URL: https://issues.apache.org/jira/browse/PIG-1444
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.8.0
Reporter: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1444.patch


 Zebra build should have a test-smoke target that should atleast use 
 minicluster for its test-cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1444) [Zebra] Zebra build should have a test-smoke target

2010-06-09 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1444:
--

Status: Patch Available  (was: Open)

 [Zebra] Zebra build should have a test-smoke target
 ---

 Key: PIG-1444
 URL: https://issues.apache.org/jira/browse/PIG-1444
 Project: Pig
  Issue Type: Task
  Components: build
Affects Versions: 0.8.0
Reporter: Gaurav Jain
Priority: Minor
 Fix For: 0.8.0

 Attachments: PIG-1444.patch


 Zebra build should have a test-smoke target that should atleast use 
 minicluster for its test-cases

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1302) Include zebra's pigtest ant target as a part of pig's ant test target

2010-06-09 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877229#action_12877229
 ] 

Pradeep Kamath commented on PIG-1302:
-

+1

 Include zebra's pigtest ant target as a part of pig's ant test target
 ---

 Key: PIG-1302
 URL: https://issues.apache.org/jira/browse/PIG-1302
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Pradeep Kamath
Assignee: Giridharan Kesavan
 Attachments: PIG-1302.patch


 There are changes made in Pig interfaces which break zebra loaders/storers. 
 It would be good to run the pig tests in the zebra unit tests as part of 
 running pig's core-test for each patch submission. So essentially in the 
 test ant target in pig, we would need to invoke zebra's pigtest target.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877256#action_12877256
 ] 

Hadoop QA commented on PIG-1443:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12446712/PIG-1443-1.patch
  against trunk revision 952098.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 139 javac compiler warnings (more 
than the trunk's current 138 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/321/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/321/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/321/console

This message is automatically generated.

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1443-1.patch


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1443:


Attachment: PIG-1443-2.patch

Deal with javac warning.

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1443-1.patch, PIG-1443-2.patch


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1443:


Status: Open  (was: Patch Available)

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1443-1.patch, PIG-1443-2.patch


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1443:


Status: Patch Available  (was: Open)

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1443-1.patch, PIG-1443-2.patch


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1443) DefaultTuple underestimate the memory footprint for string

2010-06-09 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1443:


Fix Version/s: 0.7.0
   (was: 0.8.0)

 DefaultTuple underestimate the memory footprint for string
 --

 Key: PIG-1443
 URL: https://issues.apache.org/jira/browse/PIG-1443
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1443-1.patch, PIG-1443-2.patch


 Currently, in DefaultTuple, we estimate the memory footprint for string as if 
 it is char array. The formula we use is:  length * 2 + 12. It turns out we 
 underestimate the memory usage for string. Here is a list of real memory 
 footprint for string we get from memory dump:
 | length of string | memory in bytes |
 | 7 | 56 |
 | 3 | 48 |
 | 1 | 40 |
 I did a search and find the following formula can accurately estimate the 
 memory footprint for string:
 {code}
 8 * (int) (((length * 2) + 45) / 8) 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Attachment: NestedDescribeFinale.patch

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-972) Make describe work with nested foreach

2010-06-09 Thread Aniket Mokashi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877293#action_12877293
 ] 

Aniket Mokashi commented on PIG-972:


Submitted patch with above changes. Also added test cases to test different 
scenarios.

{code}
grunt describe c:
c::d: {a0: int,a1: int}
{code}
It does not print any nested aliases. For printing nested aliases, we have 
describe c::d;


 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-972) Make describe work with nested foreach

2010-06-09 Thread Aniket Mokashi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-972:
---

Status: Patch Available  (was: Open)

 Make describe work with nested foreach
 --

 Key: PIG-972
 URL: https://issues.apache.org/jira/browse/PIG-972
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: NestedDescribeFinale.patch, NestedDescribeProp1.patch, 
 NestedDescribeProp2Initial.patch


 Currently Parser can't deal with that. This is because describe is part of 
 Grunt parser while the rest of nested foreach is handled by the QueryParser

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1445) Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented

2010-06-09 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12877318#action_12877318
 ] 

Hadoop QA commented on PIG-1445:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12446718/PIG-1445-1.patch
  against trunk revision 953109.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 383 release audit warnings 
(more than the trunk's current 382 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/322/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/322/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/322/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/322/console

This message is automatically generated.

 Pig error: ERROR 2013: Moving LOLimit in front of LOStream is not implemented 
 --

 Key: PIG-1445
 URL: https://issues.apache.org/jira/browse/PIG-1445
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1445-1.patch


 The following script fail due to ERROR 2013: Moving LOLimit in front of 
 LOStream is not implemented.
 {code}
 A = LOAD 'data';
 B = STREAM A THROUGH `stream.pl`;
 C = LIMIT B 10;
 explain C;
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.