[jira] [Commented] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478709#comment-13478709 ] Cheolsoo Park commented on PIG-2978: Here is the difference between hadoop-1.0.x and 2.0.x: {code:title=hadoop-1.0.x} Storer[3].init() Storer[3].setStoreFuncUDFContextSignature(A_1-1) Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job) Storer[3].getOutputFormat() Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job) {code} {code:title=hadoop-2.0.x} Storer[3].init() Storer[3].setStoreFuncUDFContextSignature(A_1-1) Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job) Storer[3].getOutputFormat() Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job) Storer[4].init() Storer[4].setStoreFuncUDFContextSignature(A_1-1) Storer[4].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job) Storer[4].getOutputFormat() Storer[4].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job) {code} For whatever reason, getStoreFunc is repeated with hadoop-2.0.x. The call stack of the extra 4th instantiation is below: {code} Storer[4].init called by org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:577) org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:232) org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:85) org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67) org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279) {code} TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x -- Key: PIG-2978 URL: https://issues.apache.org/jira/browse/PIG-2978 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.11 Attachments: PIG-2978.patch To reproduce, please run: {code} ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23 {code} This fails with the following error: {code} Error during parsing. Job in state DEFINE instead of RUNNING org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. Job in state DEFINE instead of RUNNING at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546) at org.apache.pig.PigServer.registerQuery(PigServer.java:516) at org.apache.pig.PigServer.registerQuery(PigServer.java:529) at org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332) Caused by: Failed to parse: Job in state DEFINE instead of RUNNING at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599) Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of RUNNING at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292) at org.apache.hadoop.mapreduce.Job.toString(Job.java:456) at java.lang.String.valueOf(String.java:2826) at org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270) at org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54) at org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115) at org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174) at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88) at org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839) at org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2738) pig.exec.reducers.max has no default value
[ https://issues.apache.org/jira/browse/PIG-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi resolved PIG-2738. -- Resolution: Duplicate pig.exec.reducers.max has no default value -- Key: PIG-2738 URL: https://issues.apache.org/jira/browse/PIG-2738 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Environment: Kubuntu 12.04 64Bit Reporter: Johannes Schwenk setDefaultsIfUnset in org/apache/pig/impl/util/PropertiesUtil.java does not set pig.exec.reducers.max to 999 as documented. As a consequence testDefaultPigProperties in org.apache.pig.test.TestPigServer fails with a NullPointerException accessing the property. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated
[ https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-2933: - Attachment: PIG-2933.patch Smallest patch ever :) HBaseStorage is using setScannerCaching which is deprecated --- Key: PIG-2933 URL: https://issues.apache.org/jira/browse/PIG-2933 Project: Pig Issue Type: Bug Reporter: Ted Malaska Priority: Minor Labels: hbase Attachments: PIG-2933.patch HTable.setScannerCaching is deprecated use Scan.setCaching(int) Note: I'm on vacation starting tomorrow. If you want I can fix this next week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated
[ https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-2933: - Patch Info: Patch Available HBaseStorage is using setScannerCaching which is deprecated --- Key: PIG-2933 URL: https://issues.apache.org/jira/browse/PIG-2933 Project: Pig Issue Type: Bug Reporter: Ted Malaska Assignee: Prashant Kommireddi Priority: Minor Labels: hbase Attachments: PIG-2933.patch HTable.setScannerCaching is deprecated use Scan.setCaching(int) Note: I'm on vacation starting tomorrow. If you want I can fix this next week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated
[ https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi reassigned PIG-2933: Assignee: Prashant Kommireddi HBaseStorage is using setScannerCaching which is deprecated --- Key: PIG-2933 URL: https://issues.apache.org/jira/browse/PIG-2933 Project: Pig Issue Type: Bug Reporter: Ted Malaska Assignee: Prashant Kommireddi Priority: Minor Labels: hbase Attachments: PIG-2933.patch HTable.setScannerCaching is deprecated use Scan.setCaching(int) Note: I'm on vacation starting tomorrow. If you want I can fix this next week. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy reassigned PIG-2985: --- Assignee: Rohini Palaniswamy TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Rohini Palaniswamy Fix For: 0.11 To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-2985: Attachment: PIG-2985.patch PigStatsUtil.java: {code} jobClient = job.getJobClient(); counters = jobClient.getJob(job.getAssignedJobID()).getCounters(); {code} should be {code} counters = new Counters(job.getJob().getCounters()); {code} for H23. Each Job and JobClient has its own instance of LocalJobRunner. To access the job information, need to use the same Job/JobClient that the job was submitted with. In H20, job is submitted using JobClient, while in H23 job is submitted using Job. TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Rohini Palaniswamy Fix For: 0.11 Attachments: PIG-2985.patch To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-2985: Status: Patch Available (was: Open) TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Rohini Palaniswamy Fix For: 0.11 Attachments: PIG-2985.patch To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2967) Fix Glob_local test failure for Pig E2E Test Framework
[ https://issues.apache.org/jira/browse/PIG-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-2967: Fix Version/s: (was: 0.10.1) 0.12 0.11 Status: Patch Available (was: Open) Fix Glob_local test failure for Pig E2E Test Framework -- Key: PIG-2967 URL: https://issues.apache.org/jira/browse/PIG-2967 Project: Pig Issue Type: Sub-task Components: e2e harness Affects Versions: 0.10.1 Reporter: Sushant Joshi Priority: Minor Fix For: 0.11, 0.12 Attachments: glob_local.patch The Glob_3_local, Glob_4_local, Glob_5_local E2E tests fails due check sum mismatch with benchmark data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2989) Illustrate for Rank Operator
[ https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Avendaño updated PIG-2989: Issue Type: Bug (was: Improvement) Illustrate for Rank Operator Key: PIG-2989 URL: https://issues.apache.org/jira/browse/PIG-2989 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11 Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Minor Attachments: patch_1 A small update for rank operator, specifically the implementation of illustrate command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2989) Illustrate for Rank Operator
[ https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Avendaño updated PIG-2989: Priority: Major (was: Minor) Illustrate for Rank Operator Key: PIG-2989 URL: https://issues.apache.org/jira/browse/PIG-2989 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11 Reporter: Allan Avendaño Assignee: Allan Avendaño Attachments: patch_1 A small update for rank operator, specifically the implementation of illustrate command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2989) Illustrate for Rank Operator
[ https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allan Avendaño updated PIG-2989: Description: Specifically useful, when it's required a quick view of final results of Rank operator use. (was: A small update for rank operator, specifically the implementation of illustrate command.) Illustrate for Rank Operator Key: PIG-2989 URL: https://issues.apache.org/jira/browse/PIG-2989 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11 Reporter: Allan Avendaño Assignee: Allan Avendaño Attachments: patch_1 Specifically useful, when it's required a quick view of final results of Rank operator use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479102#comment-13479102 ] Travis Crawford commented on PIG-2582: -- Hey [~prkommireddi], this came up while working on PIG-2573. Basically it felt a bit janky to round the size to MB rather than just keep the size in bytes. Feel free to close if the consensus is to keep it as-is. Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Priority: Minor In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown
[ https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479178#comment-13479178 ] Jonathan Coveney commented on PIG-2778: --- Dmitriy, I see no reason not to commit this given the test report looks good. Agreed? Add 'matches' operator to predicate pushdown Key: PIG-2778 URL: https://issues.apache.org/jira/browse/PIG-2778 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Cheolsoo Park Attachments: PIG-2778.patch, test_e2e.log, test_unit.log Currently the regex match operation does not get pushed down to LoadMetadata (and Expression does not have an enum value for it); it would be quite useful to enable this for some optimizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2796) Local temporary paths are not always valid HDFS path names.
[ https://issues.apache.org/jira/browse/PIG-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479176#comment-13479176 ] Jonathan Coveney commented on PIG-2796: --- I feel like the real fix is to use FilerLocalizer.getTemporaryPath(PigContext), no? This gives you a temporary path in HDFS. We can make sure that works on Windows (and should) Local temporary paths are not always valid HDFS path names. --- Key: PIG-2796 URL: https://issues.apache.org/jira/browse/PIG-2796 Project: Pig Issue Type: Sub-task Affects Versions: 0.10.0 Reporter: John Gordon Assignee: John Gordon Fix For: 0.11 Attachments: 0006-Local-Remote-file-mapping-for-tests-with-temps.patch A number of pig scripts follow the pattern: File tempFile = File.createTempFile(this, .txt); copyFromLocalToCluster (tempFile.to_string(), tempFile.to_string()); tempFile.delete(); The goal, here, seems to be to generate a temp filename to avoid issues on the next run if the file doesn't get cleaned up. The problem is that File.createTempFile on Windows creates files with names like C:\users\myuser\App data\local\temp\file.txt. The problem is that : is not a valid DFS character and so the put fails. The easy fix on this is to remove colons from the path before upload. Then we get something like C\users\myuser\App data\local\temp\file.txt which is a valid DFS pathname with minimal impact to the tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479181#comment-13479181 ] Prashant Kommireddi commented on PIG-2582: -- I agree on storing size in bytes, just wanted to make sure I understand all the reasons you had in mind. It wouldn't be a huge change to make it happen, but changing the scope might be tricky if someone is using it outside of the Pig project. What do you think about marking the setter setmBytes(Long) deprecated and creating a new setter for bytes? To start with, we can atleast have Pig refer to byte-based methods. Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Priority: Minor In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2657) Print warning if using wrong jython version
[ https://issues.apache.org/jira/browse/PIG-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479186#comment-13479186 ] Jonathan Coveney commented on PIG-2657: --- Bumping again. Gotta love that JIRA report. Print warning if using wrong jython version --- Key: PIG-2657 URL: https://issues.apache.org/jira/browse/PIG-2657 Project: Pig Issue Type: Bug Reporter: Fabian Alenius Fix For: 0.11, 0.10.1 Attachments: PIG-2657.1.patch, PIG-2657.2.patch Hi, It would be good if Pig would print a warning (or refuse to run) if you are using an unsupported version of jython. I spent a couple of hours before figuring out that you had to use 2.5.0. I've seen posts indicating that others have run into this problem as well. Might write up a patch if others agree this is an issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479191#comment-13479191 ] Travis Crawford commented on PIG-2582: -- Sounds good! Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Priority: Minor In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479213#comment-13479213 ] Gianmarco De Francisci Morales commented on PIG-2985: - This is a simple bug fix, should go to both 0.11 and trunk. TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Rohini Palaniswamy Fix For: 0.11 Attachments: PIG-2985.patch To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2985: Resolution: Fixed Status: Resolved (was: Patch Available) +1, committed to both trunk and 0.11. Thanks Rohini! Interestingly, tests with hadoop-2.0 take 1/3 of the time compared to hadoop-1.0 TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Rohini Palaniswamy Fix For: 0.11 Attachments: PIG-2985.patch To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2991) Clarify document of Algebraic contracts
Andy Schlaikjer created PIG-2991: Summary: Clarify document of Algebraic contracts Key: PIG-2991 URL: https://issues.apache.org/jira/browse/PIG-2991 Project: Pig Issue Type: Improvement Components: documentation Affects Versions: 0.10.0 Reporter: Andy Schlaikjer Documentation of Algebraic contracts is somewhat confusing. It took me a while to understand that Initial impl exec method is passed a singleton bag of X, and should return the single X value so that Intermed exec gets a proper bag of X. The builtins like SUM and COUNT are generally clearly written, but this specific point isn't easy to deduce from those impls either. It would be great if the discussion at the following URL could be improved to make all Algebraic contracts more explicit: http://pig.apache.org/docs/r0.10.0/udf.html#algebraic-interface Also, detailed answers to the following questions would be great to include in some form: Q: Does Pig make use of Initial, Intermed, Final class outputSchema methods? If so, how? Q: If my Intermed or Final classes additionally implement Accumulator interface, does Pig take advantage of this? Q: Should the parent UDF's outputSchema method always expect to be passed the same input schema, regardless of the context (algebraic, accumulative, regular exec) in which it is used? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479236#comment-13479236 ] Prashant Kommireddi commented on PIG-2582: -- Should methods like getAvgRecordSize() be returning size in bytes? Its not called from within the project, but not sure if such a change would be acceptable to users. Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Priority: Minor In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown
[ https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479244#comment-13479244 ] Dmitriy V. Ryaboy commented on PIG-2778: +1 Add 'matches' operator to predicate pushdown Key: PIG-2778 URL: https://issues.apache.org/jira/browse/PIG-2778 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Cheolsoo Park Attachments: PIG-2778.patch, test_e2e.log, test_unit.log Currently the regex match operation does not get pushed down to LoadMetadata (and Expression does not have an enum value for it); it would be quite useful to enable this for some optimizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: CHANGES.txt in branches
OK, I fixed a bunch of them. There was also some misspelling on Pig issue numbers, which made everything even more confusing :) Cheers, -- Gianmarco On Tue, Oct 16, 2012 at 10:59 PM, Bill Graham billgra...@gmail.com wrote: Also guilty as of about 15 minutes ago. I just moved my entry for PIG-2976 to the Pig 0.11 section on the trunk. Great catch. On Tue, Oct 16, 2012 at 9:46 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Guilty.. I guess we should be putting them under 0.11 in trunk. On Tue, Oct 16, 2012 at 8:18 PM, Jonathan Coveney jcove...@gmail.com wrote: AFAIK (and I don't really know), I thought that if we put it in both, that it'd go in the pig 11 section in trunk, and if not, we don't. Is this correct? Good job noticing this. 2012/10/16 Gianmarco De Francisci Morales g...@apache.org Hi devs, I noticed there is a misalignment in CHANGES.txt between 0.11 and trunk. It seems some people are putting patches on top in both versions of the file, while other are putting changes that get into 0.11 in the 0.11 section of the trunk file. Let me show an example: This is 0.11 Pig Change Log Release 0.11.0 (unreleased) INCOMPATIBLE CHANGES PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates) IMPROVEMENTS PIG-2947: Documentation for Rank operator (xalan via azaroth) PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates) PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney) PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham) And this is trunk: Pig Change Log Trunk (unreleased changes) INCOMPATIBLE CHANGES IMPROVEMENTS PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not set (cheolsoo via sms) PIG-2793: Pig test: add utils to simplify testing on Windows (jgordon via gates) PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) OPTIMIZATIONS BUG FIXES PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24 (cheolsoo via dvryaboy) Release 0.11.0 (unreleased) INCOMPATIBLE CHANGES PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates) IMPROVEMENTS PIG-2947: Documentation for Rank operator (xalan via azaroth) PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas) PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney) PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham) Notice how PIG-2943, PIG-2793, PIG-2908 are marked as appearing in trunk in trunk and in 0.11 in 0.11. PIG-2910 is in 0.11 in trunk but not in 0.11 (I guess it is a small mistake). So, what's the correct behavior? Do we mark a patch in CHANGES.txt at the earliest place it appears in the code (so that CHANGES.txt is consistent across releases)? Or do we treat the branches independently, and thus we put each patch always at the top? Personally, I put PIG-2947 in the 0.11 section in trunk, but I don't have a strong opinion on it (as long as we are consistent). Cheers, -- Gianmarco -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Resolved] (PIG-2947) Documentation for Rank operator
[ https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-2947. - Resolution: Fixed Documentation for Rank operator --- Key: PIG-2947 URL: https://issues.apache.org/jira/browse/PIG-2947 Project: Pig Issue Type: Improvement Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Fix For: 0.11 Attachments: patch_01, patch_02, patch_03 User documentation for recently released Rank operator, with some basic explanation of usage and examples -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2922) Documentation and examples for RANK
[ https://issues.apache.org/jira/browse/PIG-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-2922. - Resolution: Duplicate Documentation and examples for RANK --- Key: PIG-2922 URL: https://issues.apache.org/jira/browse/PIG-2922 Project: Pig Issue Type: Improvement Components: documentation Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: documentation We need documentation and examples for the newly introduced RANK command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-2582: - Attachment: PIG-2582.patch Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Priority: Minor Attachments: PIG-2582.patch In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prashant Kommireddi updated PIG-2582: - Patch Info: Patch Available Assignee: Prashant Kommireddi Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Assignee: Prashant Kommireddi Priority: Minor Attachments: PIG-2582.patch In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2796) Local temporary paths are not always valid HDFS path names.
[ https://issues.apache.org/jira/browse/PIG-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479320#comment-13479320 ] John Gordon commented on PIG-2796: -- That sounds very promising, I will investigate and come back with findings or a patch. Local temporary paths are not always valid HDFS path names. --- Key: PIG-2796 URL: https://issues.apache.org/jira/browse/PIG-2796 Project: Pig Issue Type: Sub-task Affects Versions: 0.10.0 Reporter: John Gordon Assignee: John Gordon Fix For: 0.11 Attachments: 0006-Local-Remote-file-mapping-for-tests-with-temps.patch A number of pig scripts follow the pattern: File tempFile = File.createTempFile(this, .txt); copyFromLocalToCluster (tempFile.to_string(), tempFile.to_string()); tempFile.delete(); The goal, here, seems to be to generate a temp filename to avoid issues on the next run if the file doesn't get cleaned up. The problem is that File.createTempFile on Windows creates files with names like C:\users\myuser\App data\local\temp\file.txt. The problem is that : is not a valid DFS character and so the put fails. The easy fix on this is to remove colons from the path before upload. Then we get something like C\users\myuser\App data\local\temp\file.txt which is a valid DFS pathname with minimal impact to the tests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK
[ https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479352#comment-13479352 ] Leonardo Rangel Augusto commented on PIG-2405: -- in TestDataModel, the underlying problem is using an order-independent structure in an order-dependent test. What about keeping it simple and removing the HashMap from the Tuple, instead of replacing it with a LinkedHashMap? svn tags/release-0.9.1: some unit test case failed with open JDK Key: PIG-2405 URL: https://issues.apache.org/jira/browse/PIG-2405 Project: Pig Issue Type: Bug Affects Versions: 0.9.1 Environment: ant-1.8.2 open jdk: 1.6 Reporter: fang fang chen Assignee: fang fang chen Attachments: 2405_1.patch, 2405_2.patch [junit] Test org.apache.pig.test.TestDataModel FAILED Testcase: testTupleToString took 0.004 sec FAILED toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... junit.framework.ComparisonFailure: toString expected:...ad a little lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a little lamb)},[[goodbye#all,hello#world]],42,50,3.14... at org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec Testcase: testHeterogeneousScans took 0.018 sec Caused an ERROR java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) java.lang.RuntimeException: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162) at org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035) at org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980) at org.apache.hadoop.conf.Configuration.get(Configuration.java:436) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271) at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130) at org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809) at org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741) Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many open files) at java.io.FileInputStream.init(FileInputStream.java:112) at java.io.FileInputStream.init(FileInputStream.java:72) at sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70) at sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161) at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source) at org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at javax.xml.parsers.DocumentBuilder.parse(Unknown Source) at org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079) Caused an ERROR Could not resolve the DNS name of hostname:39611 java.lang.IllegalArgumentException: Could not resolve the DNS name of hostname:39611 at org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) at org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145) at
[jira] [Updated] (PIG-2926) TestPoissonSampleLoader failing on rhel environment
[ https://issues.apache.org/jira/browse/PIG-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2926: -- Resolution: Fixed Fix Version/s: 0.11 Status: Resolved (was: Patch Available) TestPoissonSampleLoader failing on rhel environment --- Key: PIG-2926 URL: https://issues.apache.org/jira/browse/PIG-2926 Project: Pig Issue Type: Sub-task Reporter: Koji Noguchi Assignee: Jonathan Coveney Priority: Minor Fix For: 0.11 Attachments: PIG-2926-0.patch Testing on rhel environment, TestPoissonSampleLoader fails with {noformat} Testcase: testNumSamples took 22.077 sec FAILED expected:47 but was:42 junit.framework.AssertionFailedError: expected:47 but was:42 at org.apache.pig.test.TestPoissonSampleLoader.testNumSamples(TestPoissonSampleLoader.java:125) {noformat} From {noformat} 124 count = testNumSamples(0.0001, 100); 125 assertEquals(count, 42); {noformat} This runs fine on my mac environment. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2958) Pig tests do not appear to have a logger attached
[ https://issues.apache.org/jira/browse/PIG-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2958: -- Resolution: Fixed Status: Resolved (was: Patch Available) Pig tests do not appear to have a logger attached - Key: PIG-2958 URL: https://issues.apache.org/jira/browse/PIG-2958 Project: Pig Issue Type: Sub-task Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.11 Attachments: PIG-2958-1.patch This causes false failures in TestPigRunner, but also makes debugging somewhat more difficult than it has to be. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown
[ https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479392#comment-13479392 ] Jonathan Coveney commented on PIG-2778: --- Cheolsoo, it doesn't apply cleanly anymore. Any chance you can rebase it off trunk? Add 'matches' operator to predicate pushdown Key: PIG-2778 URL: https://issues.apache.org/jira/browse/PIG-2778 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Cheolsoo Park Attachments: PIG-2778.patch, test_e2e.log, test_unit.log Currently the regex match operation does not get pushed down to LoadMetadata (and Expression does not have an enum value for it); it would be quite useful to enable this for some optimizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479402#comment-13479402 ] Koji Noguchi commented on PIG-2975: --- bq. Result incorrect (when order-by used). [0.11 and trunk] Reading the code, I was able to come up with incorrect result case in 0.10. It's probably rare since the type has to be unknown. Any Datatype that has less than 2 bytes of header size in BinInterSedes.java can hit this issue. {noformat} $ pig -version USING: /grid/0/gs/pig/current Apache Pig version 0.10.1.0.1206081058 (r1348169) $ cat pig-2975-mixed.pig a = load 'pig-2975-mixed1.txt' as (a0:chararray, a1:chararray); b = load 'pig-2975-mixed2.txt' as (b0:int); y = union a,b; z = order y by $0; dump z; $ cat pig-2975-mixed1.txt a b b c d e $ cat pig-2975-mixed2.txt 0 1 0 1 {noformat} TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479404#comment-13479404 ] Koji Noguchi commented on PIG-2975: --- Silly me. Result of above script was {noformat} (0) (0) (0) (0) (a,b) (b,c) (d,e) {noformat} TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown
[ https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479410#comment-13479410 ] Cheolsoo Park commented on PIG-2778: Hi Jonathan, sorry about that. It applies to the github mirror, but I haven't tried to the svn trunk. The svn repository is not responding at the moment... I will rebase it as soon as I can checkout. Thanks! Add 'matches' operator to predicate pushdown Key: PIG-2778 URL: https://issues.apache.org/jira/browse/PIG-2778 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Cheolsoo Park Attachments: PIG-2778.patch, test_e2e.log, test_unit.log Currently the regex match operation does not get pushed down to LoadMetadata (and Expression does not have an enum value for it); it would be quite useful to enable this for some optimizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479421#comment-13479421 ] Jonathan Coveney commented on PIG-2975: --- Koji, I don't think we need to sacrifice performance if we use BinInterSedes.BinInterSedesRawComparator. It traverses the bytes, it doesn't deserialize or make any objects (and I think I found an improvement we can make). As far as sort order, I think it's meant to be somewhat odd on purporse. TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Build failed in Jenkins: Pig-trunk #1339
See https://builds.apache.org/job/Pig-trunk/1339/changes Changes: [jcoveney] Fix CHANGES.txt (jcoveney) [jcoveney] Fix CHANGES.txt (jcoveney) [jcoveney] PIG-2958: Pig tests do not appear to have a logger attached (daijyc via jcoveney) [jcoveney] PIG-2972: TestPoissonSampleLoader failing on rhel environment (jcoveney) [gdfm] Fixed problems with CHANGES.txt [gdfm] PIG-2985: TestRank1,2,3 fail with hadoop-2.0.x (rohini via azaroth) -- [...truncated 6629 lines...] [findbugs] org.apache.hadoop.fs.FSDataInputStream [findbugs] org.python.core.PyObject [findbugs] jline.History [findbugs] org.jruby.embed.internal.LocalContextProvider [findbugs] org.apache.hadoop.io.BooleanWritable [findbugs] org.apache.log4j.Logger [findbugs] org.apache.hadoop.hbase.filter.FamilyFilter [findbugs] org.codehaus.jackson.annotate.JsonPropertyOrder [findbugs] groovy.lang.Tuple [findbugs] org.antlr.runtime.IntStream [findbugs] org.apache.hadoop.util.ReflectionUtils [findbugs] org.apache.hadoop.fs.ContentSummary [findbugs] org.jruby.runtime.builtin.IRubyObject [findbugs] org.jruby.RubyInteger [findbugs] org.python.core.PyTuple [findbugs] org.mortbay.log.Log [findbugs] org.apache.hadoop.conf.Configuration [findbugs] com.google.common.base.Joiner [findbugs] org.apache.hadoop.mapreduce.lib.input.FileSplit [findbugs] org.apache.hadoop.mapred.Counters$Counter [findbugs] com.jcraft.jsch.Channel [findbugs] org.apache.hadoop.mapred.JobPriority [findbugs] org.apache.commons.cli.Options [findbugs] org.apache.hadoop.mapred.JobID [findbugs] org.apache.hadoop.util.bloom.BloomFilter [findbugs] org.python.core.PyFrame [findbugs] org.apache.hadoop.hbase.filter.CompareFilter [findbugs] org.apache.hadoop.util.VersionInfo [findbugs] org.python.core.PyString [findbugs] org.apache.hadoop.io.Text$Comparator [findbugs] org.jruby.runtime.Block [findbugs] org.antlr.runtime.MismatchedSetException [findbugs] org.apache.hadoop.io.BytesWritable [findbugs] org.apache.hadoop.fs.FsShell [findbugs] org.joda.time.Months [findbugs] org.mozilla.javascript.ImporterTopLevel [findbugs] org.apache.hadoop.hbase.mapreduce.TableOutputFormat [findbugs] org.apache.hadoop.mapred.TaskReport [findbugs] org.apache.hadoop.security.UserGroupInformation [findbugs] org.antlr.runtime.tree.RewriteRuleSubtreeStream [findbugs] org.apache.commons.cli.HelpFormatter [findbugs] com.google.common.collect.Maps [findbugs] org.joda.time.ReadableInstant [findbugs] org.mozilla.javascript.NativeObject [findbugs] org.apache.hadoop.hbase.HConstants [findbugs] org.apache.hadoop.io.serializer.Deserializer [findbugs] org.antlr.runtime.FailedPredicateException [findbugs] org.apache.hadoop.io.compress.CompressionCodec [findbugs] org.jruby.RubyNil [findbugs] org.apache.hadoop.fs.FileStatus [findbugs] org.apache.hadoop.hbase.client.Result [findbugs] org.apache.hadoop.mapreduce.JobContext [findbugs] org.codehaus.jackson.JsonGenerator [findbugs] org.apache.hadoop.mapreduce.TaskAttemptContext [findbugs] org.apache.hadoop.io.BytesWritable$Comparator [findbugs] org.apache.hadoop.io.LongWritable$Comparator [findbugs] org.codehaus.jackson.map.util.LRUMap [findbugs] org.apache.hadoop.hbase.util.Bytes [findbugs] org.antlr.runtime.MismatchedTokenException [findbugs] org.codehaus.jackson.JsonParser [findbugs] com.jcraft.jsch.UserInfo [findbugs] org.python.core.PyException [findbugs] org.apache.commons.cli.ParseException [findbugs] org.apache.hadoop.io.compress.CompressionOutputStream [findbugs] org.apache.hadoop.hbase.filter.WritableByteArrayComparable [findbugs] org.antlr.runtime.tree.CommonTreeNodeStream [findbugs] org.apache.log4j.Level [findbugs] org.apache.hadoop.hbase.client.Scan [findbugs] org.jruby.anno.JRubyMethod [findbugs] org.apache.hadoop.mapreduce.Job [findbugs] com.google.common.util.concurrent.Futures [findbugs] org.apache.commons.logging.LogFactory [findbugs] org.apache.commons.collections.IteratorUtils [findbugs] org.apache.commons.codec.binary.Base64 [findbugs] org.codehaus.jackson.map.ObjectMapper [findbugs] org.apache.hadoop.fs.FileSystem [findbugs] org.jruby.embed.LocalContextScope [findbugs] org.apache.hadoop.hbase.filter.FilterList$Operator [findbugs] org.jruby.RubySymbol [findbugs] org.apache.hadoop.hbase.io.ImmutableBytesWritable [findbugs] org.apache.hadoop.io.serializer.SerializationFactory [findbugs] org.antlr.runtime.tree.TreeAdaptor [findbugs] org.apache.hadoop.mapred.RunningJob [findbugs] org.antlr.runtime.CommonTokenStream [findbugs] org.apache.hadoop.io.DataInputBuffer [findbugs] org.apache.hadoop.io.file.tfile.TFile [findbugs] org.apache.commons.cli.GnuParser [findbugs] org.mozilla.javascript.Context [findbugs] org.apache.hadoop.io.FloatWritable [findbugs]
[jira] [Commented] (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor
[ https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479472#comment-13479472 ] gustavo riveros commented on PIG-366: - [~arov] I'm new to pig and I have trouble getting started, I've been trying to install PigEditor in Eclipse Helios(sr2), but once I add the following file to the plugins folder of eclipse: alexrovner-PigEditor-398a1af/PigEclipseUpdateSite/plugin / org.apache.pigeditor_1.0.0.4.jar , no change occurs within the platform, not start anything pig environment, appreciate your help as I see that you are an expert in pig. (is part of an implementation for my thesis) PigPen - Eclipse plugin for a graphical PigLatin editor --- Key: PIG-366 URL: https://issues.apache.org/jira/browse/PIG-366 Project: Pig Issue Type: New Feature Reporter: Shubham Chopra Assignee: Robert Gibbon Priority: Minor Labels: gsoc, mentor Attachments: org.apache.pig.pigpen_0.0.1.jar, org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, org.apache.pig.pigpen-0.7.0.tar.gz, org.apache.pig.pigpen_0.7.2.jar, org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.7.4.jar, org.apache.pig.pigpen-0.7.4.tar.gz, org.apache.pig.pigpen_0.7.5.jar, org.apache.pig.pigpen-0.7.5.tar.gz, pigpen.patch, pigPen.patch, PigPen.tgz This is an Eclipse plugin that provides a GUI that can help users create PigLatin scripts and see the example generator outputs on the fly and submit the jobs to hadoop clusters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1283) COUNT on null bag causes failure
[ https://issues.apache.org/jira/browse/PIG-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479476#comment-13479476 ] Jonathan Coveney commented on PIG-1283: --- Anand, Thanks for the contribution! Committed. COUNT on null bag causes failure Key: PIG-1283 URL: https://issues.apache.org/jira/browse/PIG-1283 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Assignee: Anand L Ranganathan Labels: newbie Attachments: PIG-1283-1.patch, PIG-1283-2.patch, pig_1283-3.patch grunt l = load '/tmp/e.bag' as (b : bag{t: (i : int)}, a : int); # b is null for the only row grunt c = foreach l generate COUNT(b); grunt dump c It results in following exception- org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing count in COUNT at org.apache.pig.builtin.COUNT.exec(COUNT.java:59) at org.apache.pig.builtin.COUNT.exec(COUNT.java:39) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) Caused by: java.lang.NullPointerException at org.apache.pig.builtin.COUNT.exec(COUNT.java:46) ... 12 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1283) COUNT on null bag causes failure
[ https://issues.apache.org/jira/browse/PIG-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-1283: -- Resolution: Fixed Status: Resolved (was: Patch Available) COUNT on null bag causes failure Key: PIG-1283 URL: https://issues.apache.org/jira/browse/PIG-1283 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Assignee: Anand L Ranganathan Labels: newbie Attachments: PIG-1283-1.patch, PIG-1283-2.patch, pig_1283-3.patch grunt l = load '/tmp/e.bag' as (b : bag{t: (i : int)}, a : int); # b is null for the only row grunt c = foreach l generate COUNT(b); grunt dump c It results in following exception- org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while computing count in COUNT at org.apache.pig.builtin.COUNT.exec(COUNT.java:59) at org.apache.pig.builtin.COUNT.exec(COUNT.java:39) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176) Caused by: java.lang.NullPointerException at org.apache.pig.builtin.COUNT.exec(COUNT.java:46) ... 12 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479478#comment-13479478 ] Prasanth J commented on PIG-2582: - I am using ResourceStatistics to store intermediate stats in PIG-2831. I am not using getmBytes() method but I use getAvgRecordSize() for storing and returning average record size in bytes. Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Assignee: Prashant Kommireddi Priority: Minor Attachments: PIG-2582.patch In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown
[ https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479485#comment-13479485 ] Jonathan Coveney commented on PIG-2778: --- This has been applied to trunk. Thanks, Cheolsoo! Add 'matches' operator to predicate pushdown Key: PIG-2778 URL: https://issues.apache.org/jira/browse/PIG-2778 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Cheolsoo Park Attachments: PIG-2778.patch, test_e2e.log, test_unit.log Currently the regex match operation does not get pushed down to LoadMetadata (and Expression does not have an enum value for it); it would be quite useful to enable this for some optimizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2778) Add 'matches' operator to predicate pushdown
[ https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2778: -- Resolution: Fixed Fix Version/s: 0.12 Status: Resolved (was: Patch Available) Add 'matches' operator to predicate pushdown Key: PIG-2778 URL: https://issues.apache.org/jira/browse/PIG-2778 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Cheolsoo Park Fix For: 0.12 Attachments: PIG-2778.patch, test_e2e.log, test_unit.log Currently the regex match operation does not get pushed down to LoadMetadata (and Expression does not have an enum value for it); it would be quite useful to enable this for some optimizations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479488#comment-13479488 ] Prashant Kommireddi commented on PIG-2582: -- Hi Prasanth, I looked at PIG-2831. The current impl of getAvgRecordSize in trunk returns size in MB. That is not what you want? Also, it might be better to access the values from getters instead of directly accessing them. That will allow us to clean-up the class further in future. Those members should really be private. Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Assignee: Prashant Kommireddi Priority: Minor Attachments: PIG-2582.patch In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479506#comment-13479506 ] Prasanth J commented on PIG-2582: - I am not using setmBytes() or mBytes at all. So as per the implementation logic, only if mBytes!=null and numRecords!=null it will return the size in MB else it will return whatever it contains which in my case works fine. I also din't see any places using this. Please let me know if there is going to be any changes to the APIs, so that I will modify the patch accordingly. Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Assignee: Prashant Kommireddi Priority: Minor Attachments: PIG-2582.patch In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2931) $ signs in the replacement string make parameter substitution fail
[ https://issues.apache.org/jira/browse/PIG-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Coveney updated PIG-2931: -- Resolution: Fixed Status: Resolved (was: Patch Available) Thanks Cheolsoo! It's in $ signs in the replacement string make parameter substitution fail -- Key: PIG-2931 URL: https://issues.apache.org/jira/browse/PIG-2931 Project: Pig Issue Type: Bug Affects Versions: 0.10.0 Reporter: Cheolsoo Park Assignee: Cheolsoo Park Fix For: 0.11 Attachments: PIG-2931.patch To reproduce the issue, use the following pig script: {code:title=test.pig} a = load 'data'; b = filter by $FILTER; {code} and run the following command: {code} pig -x local -dryrun -f test.pig -p FILTER=(\$0 == 'a') {code} This generates the following script: {code:title=test.pig.substituted} a = load 'data'; b = filter by ($FILTER == 'a'); {code} However this should be: {code} a = load 'data'; b = filter by ($0 == 'a'); {code} This is because Pig calls replaceFirst() with a replacement string that include a $ sign as follows: {code} $FILTER.replaceFirst(\\$FILTER, ($0 == 'a'))); {code} To treat $ signs as literals in the replacement string, we must escape them. Please see the [Java doc|http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#replaceFirst(java.lang.String)] for Matcher class for explanation: {quote} Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string. Dollar signs may be treated as references to captured subsequences as described above, and backslashes are used to escape literal characters in the replacement string. {quote} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479523#comment-13479523 ] Jonathan Coveney commented on PIG-2975: --- FWIW I think my patch fixes this, and I don't think it has any downsides. It just uses the normal RawComparator. TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics
[ https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479528#comment-13479528 ] Prashant Kommireddi commented on PIG-2582: -- Sure. So as per the implementation logic, only if mBytes!=null and numRecords!=null it will return the size in MB else it will return whatever it contains which in my case works fine. - this works but might be confusing/inconsistent though. I will wait for Travis/others to comment on this patch and let's stay in touch on how this might affect PIG-2831. Store size in bytes (not mbytes) in ResourceStatistics -- Key: PIG-2582 URL: https://issues.apache.org/jira/browse/PIG-2582 Project: Pig Issue Type: Bug Reporter: Travis Crawford Assignee: Prashant Kommireddi Priority: Minor Attachments: PIG-2582.patch In [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup] we see mBytes is public, and has a public getter/setter. {code} 47public Long mBytes; // size in megabytes 196 public Long getmBytes() { 197 return mBytes; 198 } 199 public ResourceStatistics setmBytes(Long mBytes) { 200 this.mBytes = mBytes; 201 return this; 202 } {code} Typically sizes are stored as bytes, potentially having convenience functions to return with different units. If mBytes can be marked private without causing woes it might be worth storing size as bytes instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Subscription: PIG patch available
Issue Subscription Filter: PIG patch available (31 issues) Subscriber: pigdaily Key Summary PIG-2978TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x https://issues.apache.org/jira/browse/PIG-2978 PIG-2968ColumnMapKeyPrune fails to prune a subtree inside foreach https://issues.apache.org/jira/browse/PIG-2968 PIG-2967Fix Glob_local test failure for Pig E2E Test Framework https://issues.apache.org/jira/browse/PIG-2967 PIG-2960Increase the timeout for unit test https://issues.apache.org/jira/browse/PIG-2960 PIG-2959Add a pig.cmd for Pig to run under Windows https://issues.apache.org/jira/browse/PIG-2959 PIG-2957TetsScriptUDF fail due to volume prefix in jar https://issues.apache.org/jira/browse/PIG-2957 PIG-2956Invalid cache specification for some streaming statement https://issues.apache.org/jira/browse/PIG-2956 PIG-2955 Fix bunch of Pig e2e tests on Windows https://issues.apache.org/jira/browse/PIG-2955 PIG-2954 TestParamSubPreproc still depends on bash to run https://issues.apache.org/jira/browse/PIG-2954 PIG-2953which utility does not exist on Windows https://issues.apache.org/jira/browse/PIG-2953 PIG-2942DevTests, TestLoad has a false failure on Windows https://issues.apache.org/jira/browse/PIG-2942 PIG-2940HBaseStorage store fails in secure cluster https://issues.apache.org/jira/browse/PIG-2940 PIG-2904Scripting UDFs should allow DEFINE statements to pass parameters to the UDF's constructor https://issues.apache.org/jira/browse/PIG-2904 PIG-2898Parallel execution of e2e tests https://issues.apache.org/jira/browse/PIG-2898 PIG-2881Add SUBTRACT eval function https://issues.apache.org/jira/browse/PIG-2881 PIG-2873Converting bin/pig shell script to python https://issues.apache.org/jira/browse/PIG-2873 PIG-2834MultiStorage requires unused constructor argument https://issues.apache.org/jira/browse/PIG-2834 PIG-2824Pushing checking number of fields into LoadFunc https://issues.apache.org/jira/browse/PIG-2824 PIG-2801grunt sh command should invoke the shell implicitly instead of calling exec directly with the command tokens https://issues.apache.org/jira/browse/PIG-2801 PIG-2798pig streaming tests assume interpreters are auto-resolved https://issues.apache.org/jira/browse/PIG-2798 PIG-2796Local temporary paths are not always valid HDFS path names. https://issues.apache.org/jira/browse/PIG-2796 PIG-2795Fix test cases that generate pig scripts with load + pathStr to encode \ in the path https://issues.apache.org/jira/browse/PIG-2795 PIG-2661Pig uses an extra job for loading data in Pigmix L9 https://issues.apache.org/jira/browse/PIG-2661 PIG-2657Print warning if using wrong jython version https://issues.apache.org/jira/browse/PIG-2657 PIG-2495Using merge JOIN from a HBaseStorage produces an error https://issues.apache.org/jira/browse/PIG-2495 PIG-2417Streaming UDFs - allow users to easily write UDFs in scripting languages with no JVM implementation. https://issues.apache.org/jira/browse/PIG-2417 PIG-2405svn tags/release-0.9.1: some unit test case failed with open JDK https://issues.apache.org/jira/browse/PIG-2405 PIG-2362Rework Ant build.xml to use macrodef instead of antcall https://issues.apache.org/jira/browse/PIG-2362 PIG-2312NPE when relation and column share the same name and used in Nested Foreach https://issues.apache.org/jira/browse/PIG-2312 PIG-1942script UDF (jython) should utilize the intended output schema to more directly convert Py objects to Pig objects https://issues.apache.org/jira/browse/PIG-1942 PIG-1237Piggybank MutliStorage - specify field to write in output https://issues.apache.org/jira/browse/PIG-1237 You may edit this subscription at: https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384
[jira] [Commented] (PIG-2927) SHIP and use JRuby gems in JRuby UDFs
[ https://issues.apache.org/jira/browse/PIG-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479567#comment-13479567 ] Cheolsoo Park commented on PIG-2927: Although I am no Ruby expert, I think that Jonathan's patch works well. Here is my test. 1) installed a non-trivial rubygem library (rubygem-json) on the client only and confirmed that it is not installed on any datanode on the cluster. {code} /usr/lib/ruby/gems/1.8/gems/json-1.4.6/ {code} 2) wrote a ruby udf that parses json string: {code} require 'rubygems' require 'pigudf' require 'json' class Myudfs PigUdf outputSchema result:chararray def parseJson input result = JSON.parse(input) end end {code} 3) wrote a short pig script that loads a jsonstring and calls my ruby udf: {code} register 'test.rb' using jruby as myfuncs; a = load 'json.txt' using PigStorage() as (i:chararray); b = foreach a generate myfuncs.parseJson(i); dump b; {code} 4) got the expected result as follows: {code:title=input} {id:1,nested:{value1:first1,next:{complex_record:{id:2,nested:{value1:second1,next:null,value2:second2}}},value2:first2}} {code} {code:title=result} ([id#1,nested#{value1=first1, value2=first2, next={complex_record={id=2, nested={value1=second1, value2=second2, next=null]) {code} Without Jonathan's patch, I get the following error in the front-end as expected: {code} LoadError: no such file to load -- json require at org/jruby/RubyKernel.java:1042 require at file:/home/cheolsoo/pig-ruby/build/ivy/lib/Pig/jruby-complete-1.6.7.jar!/META-INF/jruby.home/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36 (root) at test.rb:3 2012-10-18 17:09:24,323 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. (LoadError) no such file to load -- json {code} I also ran the Scripting e2e test cases with the patch on a Hadoop-1.0.x cluster, and they all passed. So it seems good to commit to me. Btw, I wanted to write an e2e test case using rubygems-json, but I realized that rubygems-json is under GPL and can't include in Pig. We should either find another rubygem library that is under the Apache licence or make the test configurable so that it will run only if rubygem-json is installed. Thanks! SHIP and use JRuby gems in JRuby UDFs - Key: PIG-2927 URL: https://issues.apache.org/jira/browse/PIG-2927 Project: Pig Issue Type: New Feature Components: parser Affects Versions: 0.11 Environment: JRuby UDFs Reporter: Russell Jurney Assignee: Jonathan Coveney Priority: Minor Fix For: 0.11 Attachments: PIG-2927-0.patch, PIG-2927-1.patch, PIG-2927-2.patch, PIG-2927-3.patch It would be great to use JRuby gems in JRuby UDFs without installing them on all machines on the cluster. Some way to SHIP them automatically with the job would be great. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira