[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor
[ https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12804986#action_12804986 ] Jeff Zhang commented on PIG-366: Does anyone continue maintain this issue ? And could the author contribute the latest source code, I can help about this jira. PigPen - Eclipse plugin for a graphical PigLatin editor --- Key: PIG-366 URL: https://issues.apache.org/jira/browse/PIG-366 Project: Pig Issue Type: New Feature Reporter: Shubham Chopra Assignee: Shubham Chopra Priority: Minor Attachments: org.apache.pig.pigpen_0.0.1.jar, org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, pigpen.patch, pigPen.patch, PigPen.tgz This is an Eclipse plugin that provides a GUI that can help users create PigLatin scripts and see the example generator outputs on the fly and submit the jobs to hadoop clusters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor
[ https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805115#action_12805115 ] Olga Natkovich commented on PIG-366: I don't think we have an owner. This could is looking for one :) PigPen - Eclipse plugin for a graphical PigLatin editor --- Key: PIG-366 URL: https://issues.apache.org/jira/browse/PIG-366 Project: Pig Issue Type: New Feature Reporter: Shubham Chopra Assignee: Shubham Chopra Priority: Minor Attachments: org.apache.pig.pigpen_0.0.1.jar, org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, pigpen.patch, pigPen.patch, PigPen.tgz This is an Eclipse plugin that provides a GUI that can help users create PigLatin scripts and see the example generator outputs on the fly and submit the jobs to hadoop clusters. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement
[ https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1169: -- Description: ??We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script.?? {code} raw_data = Load 'input_files' AS (f1, f2, ..., fn); grouped = group raw_data by (f1, f2); data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; ordered = order data by value DESC parallel 10; topn = limit ordered 10; store ordered into 'outputdir/full'; store topn into 'outputdir/topn'; {code} ??With the statement 'store ordered ...', top N results are incorrect, but without the statement, results are correct. Has anyone seen this before? I know a similar bug has been fixed in the multi-query release. We are on pig .4 and hadoop .20.1.?? was: ??We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script.?? {code} raw_data = Load 'input_files' AS (f1, f2, ..., fn); grouped = group raw_data by (f1, f2); data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; ordered = order data by value DESC parallel 10; topn = limit ordered 10; store ordered into 'outputdir/full'; store topn into 'outputdir/topn'; {code} ??With the statement 'store ordered ...', top N results are incorrect, but without the statement, results are correct. Has anyone seen this before? I know a similar bug has been fixed in the multi-query release. We are on pig .4 and hadoop .20.1.?? Fix Version/s: 0.7.0 Top-N queries produce incorrect results when a store statement is added between order by and limit statement Key: PIG-1169 URL: https://issues.apache.org/jira/browse/PIG-1169 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 ??We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script.?? {code} raw_data = Load 'input_files' AS (f1, f2, ..., fn); grouped = group raw_data by (f1, f2); data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; ordered = order data by value DESC parallel 10; topn = limit ordered 10; store ordered into 'outputdir/full'; store topn into 'outputdir/topn'; {code} ??With the statement 'store ordered ...', top N results are incorrect, but without the statement, results are correct. Has anyone seen this before? I know a similar bug has been fixed in the multi-query release. We are on pig .4 and hadoop .20.1.?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1169) Top-N queries produce incorrect results when a store statement is added between order by and limit statement
[ https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1169: - Assignee: Daniel Dai (was: Richard Ding) Top-N queries produce incorrect results when a store statement is added between order by and limit statement Key: PIG-1169 URL: https://issues.apache.org/jira/browse/PIG-1169 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Daniel Dai Fix For: 0.7.0 ??We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script.?? {code} raw_data = Load 'input_files' AS (f1, f2, ..., fn); grouped = group raw_data by (f1, f2); data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; ordered = order data by value DESC parallel 10; topn = limit ordered 10; store ordered into 'outputdir/full'; store topn into 'outputdir/topn'; {code} ??With the statement 'store ordered ...', top N results are incorrect, but without the statement, results are correct. Has anyone seen this before? I know a similar bug has been fixed in the multi-query release. We are on pig .4 and hadoop .20.1.?? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1202) explain plan throws out exception
explain plan throws out exception -- Key: PIG-1202 URL: https://issues.apache.org/jira/browse/PIG-1202 Project: Pig Issue Type: Bug Reporter: Ying He run the following script a = load 's/part*' as (id:int, f:chararray); b = load 's/part*' as (id:int, f:chararray); c = join a by id, b by id; d = filter c by a::f == 'apple'; explain d; got error message: ERROR 1067: Unable to explain alias d -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
Temporarily disable failed unit test in load-store-redesign branch which have external dependency - Key: PIG-1203 URL: https://issues.apache.org/jira/browse/PIG-1203 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 In load-store-redesign branch, two test suits, TestHBaseStorage and TestCounters always fail. TestHBaseStorage depends on https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future version of hadoop. We disable these two test suits temporarily, and will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
[ https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1203: Issue Type: Sub-task (was: Bug) Parent: PIG-966 Temporarily disable failed unit test in load-store-redesign branch which have external dependency - Key: PIG-1203 URL: https://issues.apache.org/jira/browse/PIG-1203 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 In load-store-redesign branch, two test suits, TestHBaseStorage and TestCounters always fail. TestHBaseStorage depends on https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future version of hadoop. We disable these two test suits temporarily, and will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1201) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra
[ https://issues.apache.org/jira/browse/PIG-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1201: -- Status: Open (was: Patch Available) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra -- Key: PIG-1201 URL: https://issues.apache.org/jira/browse/PIG-1201 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1201.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
[ https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1203: Attachment: PIG-1203-1.patch Patch for the load-store-redesign branch Temporarily disable failed unit test in load-store-redesign branch which have external dependency - Key: PIG-1203 URL: https://issues.apache.org/jira/browse/PIG-1203 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1203-1.patch In load-store-redesign branch, two test suits, TestHBaseStorage and TestCounters always fail. TestHBaseStorage depends on https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future version of hadoop. We disable these two test suits temporarily, and will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1201) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra
[ https://issues.apache.org/jira/browse/PIG-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805195#action_12805195 ] Yan Zhou commented on PIG-1201: --- HDFS listStatus calls by every mapper to the name node is costly, particularly if the target has huge number of disk entries, i.e., files and directories. Zebra has the problem in a couple of ways: 1) for unsorted tables, the index is not built on disk. The input split which is a tfile row split has file index that needs to be mapped to the file name using the index, which contains file names in order and their sizes, by each and every mapper. Building the index makes the listStatus call as it needs info of all files. And if the number of files are huge, this caused name node resource cramps. Instead, the file index can be well replaced with the file name so that the mapping, and consequently the index, is not needed at all for the routine ops like queries against the tables. For other informational requests like dumpInfo where a comprehensive picture is required, the index could be built as needed. The on-disk index is still preferred as it will save one listStatus call by the front end. But it would require more changes to support backward compatibility and the meta file that holds the index does not support versioning. Consequently, this work is deferred to a future release, although the on-disk index will be built for future convinience; 2) Each BasicTable.Reader, at construction, will check and mark all deleted CGs in the SchemaFile.setCGDeletedFlags method, which makes the listStatus call. This may not be as bad as the one in 1), but for the tables with lots of CGs, it could present a problem. Instead, the check can only be made by a front end and passed to mappers the info. The huge JobConf serialization size in Pig loader implementation will be fixed by only serializing the few configuration variables that Zebra need. [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra -- Key: PIG-1201 URL: https://issues.apache.org/jira/browse/PIG-1201 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1201.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805207#action_12805207 ] Pradeep Kamath commented on PIG-1090: - +1 for PIG-1090-15.patch, patch committed. Here are the results of running ant test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. [exec] Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1045) Integration with Hadoop 20 New API
[ https://issues.apache.org/jira/browse/PIG-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1045: -- Fix Version/s: 0.7.0 Integration with Hadoop 20 New API -- Key: PIG-1045 URL: https://issues.apache.org/jira/browse/PIG-1045 Project: Pig Issue Type: New Feature Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1045.patch, PIG-1045.patch Hadoop 21 is not yet released but we know that switch to new MR API is coming there. This JIRA is for early integration with the portion of this API that has been implemented in Hadoop 20. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1200) Using TableInputFormat in HBaseStorage
[ https://issues.apache.org/jira/browse/PIG-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1200: Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) +1, patch committed on Jeff's behalf - thanks Jeff! Using TableInputFormat in HBaseStorage -- Key: PIG-1200 URL: https://issues.apache.org/jira/browse/PIG-1200 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.7.0 Attachments: Pig_1200.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1203) Temporarily disable failed unit test in load-store-redesign branch which have external dependency
[ https://issues.apache.org/jira/browse/PIG-1203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805226#action_12805226 ] Pradeep Kamath commented on PIG-1203: - I committed the patch with a change to disable only TestCounters since PIG-1200 address TestHBaseStorage failures. Temporarily disable failed unit test in load-store-redesign branch which have external dependency - Key: PIG-1203 URL: https://issues.apache.org/jira/browse/PIG-1203 Project: Pig Issue Type: Sub-task Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1203-1.patch In load-store-redesign branch, two test suits, TestHBaseStorage and TestCounters always fail. TestHBaseStorage depends on https://issues.apache.org/jira/browse/PIG-1200, TestCounters depends on future version of hadoop. We disable these two test suits temporarily, and will enable them once the dependent issues are solved. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1090: Affects Version/s: 0.7.0 Fix Version/s: 0.7.0 Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Affects Versions: 0.7.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-12.patch, PIG-1090-13.patch, PIG-1090-14.patch, PIG-1090-15.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1094) Fix unit tests corresponding to source changes so far
[ https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath resolved PIG-1094. - Resolution: Fixed Fix Version/s: 0.7.0 Hadoop Flags: [Incompatible change] Marking this issuse as fixed since all unit tests except TestCounters now pass - TestCounters failure will be tracked in PIG-1203 Fix unit tests corresponding to source changes so far - Key: PIG-1094 URL: https://issues.apache.org/jira/browse/PIG-1094 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.7.0 Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, PIG-1094_4.patch, PIG-1094_5.patch, PIG-1094_6.patch, PIG-1094_7.patch The check-in's so far on load-store-redesign branch have nor addressed unit test failures due to interface changes. This jira is to track the task of making the common case unit tests work with the new interfaces. Some aspects of the new proposal like using LoadCaster interface for casting, making local mode work have not been completed yet. Tests which are failing due to those reasons will not be fixed in this jira and addressed in the jiras corresponding to those tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-613) Casting elements inside a tuple does not take effect
[ https://issues.apache.org/jira/browse/PIG-613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-613: --- Fix Version/s: 0.7.0 Assignee: Daniel Dai Casting elements inside a tuple does not take effect Key: PIG-613 URL: https://issues.apache.org/jira/browse/PIG-613 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.2.0 Reporter: Viraj Bhat Assignee: Daniel Dai Fix For: 0.7.0 Attachments: myfloatdata.txt, SQUARE.java Consider the following Pig script which casts return values of the SQUARE UDF which are tuples of doubles to long. The describe output of B shows it is long, however the result is still double. {code} register statistics.jar; A = load 'myfloatdata.txt' using PigStorage() as (doublecol:double); B = foreach A generate (tuple(long))statistics.SQUARE(doublecol) as squares:(loadtimesq); describe B; explain B; dump B; {code} === Describe output of B: B: {squares: (loadtimesq: long)} === Sample output of B: ((7885.44)) ((792098.2200010001)) ((1497360.9268889998)) ((50023.7956)) ((0.972196)) ((0.30980356)) ((9.9760144E-7)) === Cause: The cast for Tuples has not been implemented in POCast.java -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1201) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra
[ https://issues.apache.org/jira/browse/PIG-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1201: -- Attachment: PIG-1201.patch [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra -- Key: PIG-1201 URL: https://issues.apache.org/jira/browse/PIG-1201 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1201.patch, PIG-1201.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1201) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra
[ https://issues.apache.org/jira/browse/PIG-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1201: -- Status: Patch Available (was: Open) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra -- Key: PIG-1201 URL: https://issues.apache.org/jira/browse/PIG-1201 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1201.patch, PIG-1201.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1204) Join two streaming relations hang in local mode
[ https://issues.apache.org/jira/browse/PIG-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1204: -- Affects Version/s: (was: 0.5.0) 0.6.0 Status: Patch Available (was: Open) Join two streaming relations hang in local mode --- Key: PIG-1204 URL: https://issues.apache.org/jira/browse/PIG-1204 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1204.patch The following script hangs running in local mode when inpuf files contains many lines (e.g. 10K). The same script works when runing in MR mode. {code} A = load 'input1' as (a0, a1, a2); B = stream A through `head -1` as (a0, a1, a2); C = load 'input2' as (a0, a1, a2); D = stream C through `head -1` as (a0, a1, a2); E = join B by a0, D by a0; dump E {code} Here is one stack trace: Thread-13 prio=10 tid=0x09938400 nid=0x1232 in Object.wait() [0x8fffe000..0x8030] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x9b8e0a40 (a org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) at java.lang.Object.wait(Object.java:485) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextHelper(POStream.java:291) - locked 0x9b8e0a40 (a org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNext(POStream.java:214) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:272) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1204) Join two streaming relations hang in local mode
[ https://issues.apache.org/jira/browse/PIG-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1204: -- Attachment: PIG-1204.patch The cause was a final class variable was modified by another class, and, in local mode, all the mappers are running in the same JVM that resulted in the dead lock. This patch provides a fix. Join two streaming relations hang in local mode --- Key: PIG-1204 URL: https://issues.apache.org/jira/browse/PIG-1204 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1204.patch The following script hangs running in local mode when inpuf files contains many lines (e.g. 10K). The same script works when runing in MR mode. {code} A = load 'input1' as (a0, a1, a2); B = stream A through `head -1` as (a0, a1, a2); C = load 'input2' as (a0, a1, a2); D = stream C through `head -1` as (a0, a1, a2); E = join B by a0, D by a0; dump E {code} Here is one stack trace: Thread-13 prio=10 tid=0x09938400 nid=0x1232 in Object.wait() [0x8fffe000..0x8030] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x9b8e0a40 (a org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) at java.lang.Object.wait(Object.java:485) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextHelper(POStream.java:291) - locked 0x9b8e0a40 (a org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNext(POStream.java:214) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:272) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1204) Join two streaming relations hang in local mode
[ https://issues.apache.org/jira/browse/PIG-1204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805364#action_12805364 ] Hadoop QA commented on PIG-1204: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12431490/PIG-1204.patch against trunk revision 903030. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/190/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/190/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/190/console This message is automatically generated. Join two streaming relations hang in local mode --- Key: PIG-1204 URL: https://issues.apache.org/jira/browse/PIG-1204 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1204.patch The following script hangs running in local mode when inpuf files contains many lines (e.g. 10K). The same script works when runing in MR mode. {code} A = load 'input1' as (a0, a1, a2); B = stream A through `head -1` as (a0, a1, a2); C = load 'input2' as (a0, a1, a2); D = stream C through `head -1` as (a0, a1, a2); E = join B by a0, D by a0; dump E {code} Here is one stack trace: Thread-13 prio=10 tid=0x09938400 nid=0x1232 in Object.wait() [0x8fffe000..0x8030] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x9b8e0a40 (a org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) at java.lang.Object.wait(Object.java:485) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNextHelper(POStream.java:291) - locked 0x9b8e0a40 (a org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStream.getNext(POStream.java:214) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:272) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1201) [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra
[ https://issues.apache.org/jira/browse/PIG-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805366#action_12805366 ] Hadoop QA commented on PIG-1201: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12431488/PIG-1201.patch against trunk revision 903030. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/178/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/178/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/178/console This message is automatically generated. [zebra] HDFS meta queries are issued by all mappers; Pig Loader serialize all JobConf contents including those unused by zebra -- Key: PIG-1201 URL: https://issues.apache.org/jira/browse/PIG-1201 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0, 0.7.0 Attachments: PIG-1201.patch, PIG-1201.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.