[jira] [Commented] (PIG-2978) TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x

2012-10-18 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13478709#comment-13478709
 ] 

Cheolsoo Park commented on PIG-2978:


Here is the difference between hadoop-1.0.x and 2.0.x:
{code:title=hadoop-1.0.x}
Storer[3].init()
Storer[3].setStoreFuncUDFContextSignature(A_1-1)
Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job)
Storer[3].getOutputFormat()
Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job)
{code}
{code:title=hadoop-2.0.x}
Storer[3].init()
Storer[3].setStoreFuncUDFContextSignature(A_1-1)
Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job)
Storer[3].getOutputFormat()
Storer[3].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job)
Storer[4].init()
Storer[4].setStoreFuncUDFContextSignature(A_1-1)
Storer[4].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job)
Storer[4].getOutputFormat()
Storer[4].setStoreLocation(bar, org.apache.hadoop.mapreduce.Job)
{code}
For whatever reason, getStoreFunc is repeated with hadoop-2.0.x. The call stack 
of the extra 4th instantiation is below:
{code}
Storer[4].init called by 
org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:577)
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getStoreFunc(POStore.java:232)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.getCommitters(PigOutputCommitter.java:85)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.init(PigOutputCommitter.java:67)
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat.getOutputCommitter(PigOutputFormat.java:279)
{code}

 TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
 --

 Key: PIG-2978
 URL: https://issues.apache.org/jira/browse/PIG-2978
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.11

 Attachments: PIG-2978.patch


 To reproduce, please run:
 {code}
 ant clean test -Dtestcase=TestLoadStoreFuncLifeCycle -Dhadoopversion=23
 {code}
 This fails with the following error:
 {code}
 Error during parsing. Job in state DEFINE instead of RUNNING
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. Job in state DEFINE instead of RUNNING
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1607)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1546)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:516)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:529)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.testLoadStoreFunc(TestLoadStoreFuncLifeCycle.java:332)
 Caused by: Failed to parse: Job in state DEFINE instead of RUNNING
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:193)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1599)
 Caused by: java.lang.IllegalStateException: Job in state DEFINE instead of 
 RUNNING
 at org.apache.hadoop.mapreduce.Job.ensureState(Job.java:292)
 at org.apache.hadoop.mapreduce.Job.toString(Job.java:456)
 at java.lang.String.valueOf(String.java:2826)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.logCaller(TestLoadStoreFuncLifeCycle.java:270)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle.access$000(TestLoadStoreFuncLifeCycle.java:41)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.logCaller(TestLoadStoreFuncLifeCycle.java:54)
 at 
 org.apache.pig.TestLoadStoreFuncLifeCycle$InstrumentedStorage.getSchema(TestLoadStoreFuncLifeCycle.java:115)
 at 
 org.apache.pig.newplan.logical.relational.LOLoad.getSchemaFromMetaData(LOLoad.java:174)
 at org.apache.pig.newplan.logical.relational.LOLoad.init(LOLoad.java:88)
 at 
 org.apache.pig.parser.LogicalPlanBuilder.buildLoadOp(LogicalPlanBuilder.java:839)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.load_clause(LogicalPlanGenerator.java:3236)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1315)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
 at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2738) pig.exec.reducers.max has no default value

2012-10-18 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi resolved PIG-2738.
--

Resolution: Duplicate

 pig.exec.reducers.max has no default value
 --

 Key: PIG-2738
 URL: https://issues.apache.org/jira/browse/PIG-2738
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
 Environment: Kubuntu 12.04 64Bit
Reporter: Johannes Schwenk

 setDefaultsIfUnset in org/apache/pig/impl/util/PropertiesUtil.java does not 
 set pig.exec.reducers.max to 999 as documented. As a consequence 
 testDefaultPigProperties in org.apache.pig.test.TestPigServer fails with a 
 NullPointerException accessing the property.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated

2012-10-18 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2933:
-

Attachment: PIG-2933.patch

Smallest patch ever :)

 HBaseStorage is using setScannerCaching which is deprecated
 ---

 Key: PIG-2933
 URL: https://issues.apache.org/jira/browse/PIG-2933
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Priority: Minor
  Labels: hbase
 Attachments: PIG-2933.patch


 HTable.setScannerCaching is deprecated use Scan.setCaching(int)
 Note: I'm on vacation starting tomorrow.  If you want I can fix this next 
 week.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated

2012-10-18 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2933:
-

Patch Info: Patch Available

 HBaseStorage is using setScannerCaching which is deprecated
 ---

 Key: PIG-2933
 URL: https://issues.apache.org/jira/browse/PIG-2933
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Prashant Kommireddi
Priority: Minor
  Labels: hbase
 Attachments: PIG-2933.patch


 HTable.setScannerCaching is deprecated use Scan.setCaching(int)
 Note: I'm on vacation starting tomorrow.  If you want I can fix this next 
 week.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated

2012-10-18 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi reassigned PIG-2933:


Assignee: Prashant Kommireddi

 HBaseStorage is using setScannerCaching which is deprecated
 ---

 Key: PIG-2933
 URL: https://issues.apache.org/jira/browse/PIG-2933
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Prashant Kommireddi
Priority: Minor
  Labels: hbase
 Attachments: PIG-2933.patch


 HTable.setScannerCaching is deprecated use Scan.setCaching(int)
 Note: I'm on vacation starting tomorrow.  If you want I can fix this next 
 week.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-18 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy reassigned PIG-2985:
---

Assignee: Rohini Palaniswamy

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Rohini Palaniswamy
 Fix For: 0.11


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-18 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2985:


Attachment: PIG-2985.patch

PigStatsUtil.java:
{code}
jobClient = job.getJobClient();
counters = jobClient.getJob(job.getAssignedJobID()).getCounters();
{code}

should be 

{code}
counters = new Counters(job.getJob().getCounters());
{code}

for H23. Each Job and JobClient has its own instance of LocalJobRunner. To 
access the job information, need to use the same Job/JobClient that the job was 
submitted with. In H20, job is submitted using JobClient, while in H23 job is 
submitted using Job.

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Rohini Palaniswamy
 Fix For: 0.11

 Attachments: PIG-2985.patch


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-18 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2985:


Status: Patch Available  (was: Open)

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Rohini Palaniswamy
 Fix For: 0.11

 Attachments: PIG-2985.patch


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2967) Fix Glob_local test failure for Pig E2E Test Framework

2012-10-18 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2967:


Fix Version/s: (was: 0.10.1)
   0.12
   0.11
   Status: Patch Available  (was: Open)

 Fix Glob_local test failure for Pig E2E Test Framework
 --

 Key: PIG-2967
 URL: https://issues.apache.org/jira/browse/PIG-2967
 Project: Pig
  Issue Type: Sub-task
  Components: e2e harness
Affects Versions: 0.10.1
Reporter: Sushant Joshi
Priority: Minor
 Fix For: 0.11, 0.12

 Attachments: glob_local.patch


 The Glob_3_local, Glob_4_local, Glob_5_local E2E tests fails due check sum 
 mismatch with benchmark data.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2989) Illustrate for Rank Operator

2012-10-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Avendaño updated PIG-2989:


Issue Type: Bug  (was: Improvement)

 Illustrate for Rank Operator
 

 Key: PIG-2989
 URL: https://issues.apache.org/jira/browse/PIG-2989
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.11
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Minor
 Attachments: patch_1


 A small update for rank operator, specifically the implementation of 
 illustrate command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2989) Illustrate for Rank Operator

2012-10-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Avendaño updated PIG-2989:


Priority: Major  (was: Minor)

 Illustrate for Rank Operator
 

 Key: PIG-2989
 URL: https://issues.apache.org/jira/browse/PIG-2989
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.11
Reporter: Allan Avendaño
Assignee: Allan Avendaño
 Attachments: patch_1


 A small update for rank operator, specifically the implementation of 
 illustrate command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2989) Illustrate for Rank Operator

2012-10-18 Thread JIRA

 [ 
https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allan Avendaño updated PIG-2989:


Description: Specifically useful, when it's required a quick view of final 
results of Rank operator use.  (was: A small update for rank operator, 
specifically the implementation of illustrate command.)

 Illustrate for Rank Operator
 

 Key: PIG-2989
 URL: https://issues.apache.org/jira/browse/PIG-2989
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.11
Reporter: Allan Avendaño
Assignee: Allan Avendaño
 Attachments: patch_1


 Specifically useful, when it's required a quick view of final results of Rank 
 operator use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Travis Crawford (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479102#comment-13479102
 ] 

Travis Crawford commented on PIG-2582:
--

Hey [~prkommireddi], this came up while working on PIG-2573. Basically it felt 
a bit janky to round the size to MB rather than just keep the size in bytes. 
Feel free to close if the consensus is to keep it as-is.

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Priority: Minor

 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479178#comment-13479178
 ] 

Jonathan Coveney commented on PIG-2778:
---

Dmitriy,

I see no reason not to commit this given the test report looks good. Agreed?

 Add 'matches' operator to predicate pushdown
 

 Key: PIG-2778
 URL: https://issues.apache.org/jira/browse/PIG-2778
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Cheolsoo Park
 Attachments: PIG-2778.patch, test_e2e.log, test_unit.log


 Currently the regex match operation does not get pushed down to LoadMetadata 
 (and Expression does not have an enum value for it); it would be quite useful 
 to enable this for some optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2796) Local temporary paths are not always valid HDFS path names.

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479176#comment-13479176
 ] 

Jonathan Coveney commented on PIG-2796:
---

I feel like the real fix is to use FilerLocalizer.getTemporaryPath(PigContext), 
no? This gives you a temporary path in HDFS. We can make sure that works on 
Windows (and should)

 Local temporary paths are not always valid HDFS path names.
 ---

 Key: PIG-2796
 URL: https://issues.apache.org/jira/browse/PIG-2796
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.10.0
Reporter: John Gordon
Assignee: John Gordon
 Fix For: 0.11

 Attachments: 0006-Local-Remote-file-mapping-for-tests-with-temps.patch


 A number of pig scripts follow the pattern:
 File tempFile = File.createTempFile(this, .txt);
 copyFromLocalToCluster (tempFile.to_string(), tempFile.to_string());
 tempFile.delete();
 The goal, here, seems to be to generate a temp filename to avoid issues on 
 the next run if the file doesn't get cleaned up.  The problem is that 
 File.createTempFile on Windows creates files with names like 
 C:\users\myuser\App data\local\temp\file.txt.  The problem is that : is not 
 a valid DFS character and so the put fails.
 The easy fix on this is to remove colons from the path before upload.  Then 
 we get something like C\users\myuser\App data\local\temp\file.txt which is a 
 valid DFS pathname with minimal impact to the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479181#comment-13479181
 ] 

Prashant Kommireddi commented on PIG-2582:
--

I agree on storing size in bytes, just wanted to make sure I understand all the 
reasons you had in mind. It wouldn't be a huge change to make it happen, but 
changing the scope might be tricky if someone is using it outside of the Pig 
project. What do you think about marking the setter setmBytes(Long) 
deprecated and creating a new setter for bytes? To start with, we can atleast 
have Pig refer to byte-based methods.

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Priority: Minor

 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2657) Print warning if using wrong jython version

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479186#comment-13479186
 ] 

Jonathan Coveney commented on PIG-2657:
---

Bumping again. Gotta love that JIRA report.

 Print warning if using wrong jython version
 ---

 Key: PIG-2657
 URL: https://issues.apache.org/jira/browse/PIG-2657
 Project: Pig
  Issue Type: Bug
Reporter: Fabian Alenius
 Fix For: 0.11, 0.10.1

 Attachments: PIG-2657.1.patch, PIG-2657.2.patch


 Hi,
 It would be good if Pig would print a warning (or refuse to run) if you are 
 using an unsupported version of jython. I spent a couple of hours before 
 figuring out that you had to use 2.5.0. I've seen posts indicating that 
 others have run into this problem as well.
 Might write up a patch if others agree this is an issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Travis Crawford (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479191#comment-13479191
 ] 

Travis Crawford commented on PIG-2582:
--

Sounds good!

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Priority: Minor

 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479213#comment-13479213
 ] 

Gianmarco De Francisci Morales commented on PIG-2985:
-

This is a simple bug fix, should go to both 0.11 and trunk.

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Rohini Palaniswamy
 Fix For: 0.11

 Attachments: PIG-2985.patch


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2985:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1, committed to both trunk and 0.11.
Thanks Rohini!

Interestingly, tests with hadoop-2.0 take 1/3 of the time compared to hadoop-1.0

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Rohini Palaniswamy
 Fix For: 0.11

 Attachments: PIG-2985.patch


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2991) Clarify document of Algebraic contracts

2012-10-18 Thread Andy Schlaikjer (JIRA)
Andy Schlaikjer created PIG-2991:


 Summary: Clarify document of Algebraic contracts 
 Key: PIG-2991
 URL: https://issues.apache.org/jira/browse/PIG-2991
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.10.0
Reporter: Andy Schlaikjer


Documentation of Algebraic contracts is somewhat confusing.

It took me a while to understand that Initial impl exec method is passed a 
singleton bag of X, and should return the single X value so that Intermed exec 
gets a proper bag of X.

The builtins like SUM and COUNT are generally clearly written, but this 
specific point isn't easy to deduce from those impls either.

It would be great if the discussion at the following URL could be improved to 
make all Algebraic contracts more explicit:

http://pig.apache.org/docs/r0.10.0/udf.html#algebraic-interface

Also, detailed answers to the following questions would be great to include in 
some form:

Q: Does Pig make use of Initial, Intermed, Final class outputSchema methods? If 
so, how?

Q: If my Intermed or Final classes additionally implement Accumulator 
interface, does Pig take advantage of this?

Q: Should the parent UDF's outputSchema method always expect to be passed the 
same input schema, regardless of the context (algebraic, accumulative, regular 
exec) in which it is used?


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479236#comment-13479236
 ] 

Prashant Kommireddi commented on PIG-2582:
--

Should methods like getAvgRecordSize() be returning size in bytes? Its not 
called from within the project, but not sure if such a change would be 
acceptable to users.

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Priority: Minor

 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown

2012-10-18 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479244#comment-13479244
 ] 

Dmitriy V. Ryaboy commented on PIG-2778:


+1

 Add 'matches' operator to predicate pushdown
 

 Key: PIG-2778
 URL: https://issues.apache.org/jira/browse/PIG-2778
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Cheolsoo Park
 Attachments: PIG-2778.patch, test_e2e.log, test_unit.log


 Currently the regex match operation does not get pushed down to LoadMetadata 
 (and Expression does not have an enum value for it); it would be quite useful 
 to enable this for some optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: CHANGES.txt in branches

2012-10-18 Thread Gianmarco De Francisci Morales
OK, I fixed a bunch of them.
There was also some misspelling on Pig issue numbers, which made
everything even more confusing :)

Cheers,
--
Gianmarco


On Tue, Oct 16, 2012 at 10:59 PM, Bill Graham billgra...@gmail.com wrote:
 Also guilty as of about 15 minutes ago. I just moved my entry for PIG-2976
 to the Pig 0.11 section on the trunk. Great catch.

 On Tue, Oct 16, 2012 at 9:46 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Guilty.. I guess we should be putting them under 0.11 in trunk.

 On Tue, Oct 16, 2012 at 8:18 PM, Jonathan Coveney jcove...@gmail.com
 wrote:
  AFAIK (and I don't really know), I thought that if we put it in both,
 that
  it'd go in the pig 11 section in trunk, and if not, we don't.
 
  Is this correct?
 
  Good job noticing this.
 
  2012/10/16 Gianmarco De Francisci Morales g...@apache.org
 
  Hi devs,
 
  I noticed there is a misalignment in CHANGES.txt between 0.11 and trunk.
  It seems some people are putting patches on top in both versions of
  the file, while other are putting changes that get into 0.11 in the
  0.11 section of the trunk file.
 
  Let me show an example:
 
  This is 0.11
 
  Pig Change Log
 
  Release 0.11.0 (unreleased)
 
  INCOMPATIBLE CHANGES
  PIG-1891 Enable StoreFunc to make intelligent decision based on job
  success or failure (initialcontext via gates)
 
  IMPROVEMENTS
  PIG-2947: Documentation for Rank operator (xalan via azaroth)
 
  PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS
  method for code health (jgordon via dvryaboy)
 
  PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon
 via
  gates)
 
  PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
 
  PIG-2965: RANDOM should allow seed initialization for ease of testing
  (jcoveney)
 
  PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend
  visibility of couple methods on same class (prkommireddi via
  billgraham)
 
 
 
 
 
 
  And this is trunk:
 
  Pig Change Log
 
  Trunk (unreleased changes)
 
  INCOMPATIBLE CHANGES
 
  IMPROVEMENTS
 
  PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS
  method for code health (jgordon via dvryaboy)
 
  PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not
  set (cheolsoo via sms)
 
  PIG-2793: Pig test: add utils to simplify testing on Windows (jgordon
 via
  gates)
 
  PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
 
  OPTIMIZATIONS
 
  BUG FIXES
 
  PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24
  (cheolsoo via dvryaboy)
 
  Release 0.11.0 (unreleased)
 
  INCOMPATIBLE CHANGES
  PIG-1891 Enable StoreFunc to make intelligent decision based on job
  success or failure (initialcontext via gates)
 
  IMPROVEMENTS
 
  PIG-2947: Documentation for Rank operator (xalan via azaroth)
 
  PIG-2910: Add function to read schema from outout of Schema.toString()
  (initialcontext via thejas)
 
  PIG-2965: RANDOM should allow seed initialization for ease of testing
  (jcoveney)
 
  PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend
  visibility of couple methods on same class (prkommireddi via
  billgraham)
 
 
 
  Notice how PIG-2943, PIG-2793, PIG-2908 are marked as appearing in
  trunk in trunk and in 0.11 in 0.11.
  PIG-2910 is in 0.11 in trunk but not in 0.11 (I guess it is a small
  mistake).
 
 
  So, what's the correct behavior?
  Do we mark a patch in CHANGES.txt at the earliest place it appears in
  the code (so that CHANGES.txt is consistent across releases)?
  Or do we treat the branches independently, and thus we put each patch
  always at the top?
 
  Personally, I put PIG-2947 in the 0.11 section in trunk, but I don't
  have a strong opinion on it (as long as we are consistent).
 
  Cheers,
  --
  Gianmarco
 




 --
 *Note that I'm no longer using my Yahoo! email address. Please email me at
 billgra...@gmail.com going forward.*


[jira] [Resolved] (PIG-2947) Documentation for Rank operator

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-2947.
-

Resolution: Fixed

 Documentation for Rank operator
 ---

 Key: PIG-2947
 URL: https://issues.apache.org/jira/browse/PIG-2947
 Project: Pig
  Issue Type: Improvement
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Fix For: 0.11

 Attachments: patch_01, patch_02, patch_03


 User documentation for recently released Rank operator, with some basic 
 explanation of usage and examples

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2922) Documentation and examples for RANK

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-2922.
-

Resolution: Duplicate

 Documentation and examples for RANK
 ---

 Key: PIG-2922
 URL: https://issues.apache.org/jira/browse/PIG-2922
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: documentation

 We need documentation and examples for the newly introduced RANK command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2582:
-

Attachment: PIG-2582.patch

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2582:
-

Patch Info: Patch Available
  Assignee: Prashant Kommireddi

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2796) Local temporary paths are not always valid HDFS path names.

2012-10-18 Thread John Gordon (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479320#comment-13479320
 ] 

John Gordon commented on PIG-2796:
--

That sounds very promising, I will investigate and come back with findings or a 
patch.

 Local temporary paths are not always valid HDFS path names.
 ---

 Key: PIG-2796
 URL: https://issues.apache.org/jira/browse/PIG-2796
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.10.0
Reporter: John Gordon
Assignee: John Gordon
 Fix For: 0.11

 Attachments: 0006-Local-Remote-file-mapping-for-tests-with-temps.patch


 A number of pig scripts follow the pattern:
 File tempFile = File.createTempFile(this, .txt);
 copyFromLocalToCluster (tempFile.to_string(), tempFile.to_string());
 tempFile.delete();
 The goal, here, seems to be to generate a temp filename to avoid issues on 
 the next run if the file doesn't get cleaned up.  The problem is that 
 File.createTempFile on Windows creates files with names like 
 C:\users\myuser\App data\local\temp\file.txt.  The problem is that : is not 
 a valid DFS character and so the put fails.
 The easy fix on this is to remove colons from the path before upload.  Then 
 we get something like C\users\myuser\App data\local\temp\file.txt which is a 
 valid DFS pathname with minimal impact to the tests.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2405) svn tags/release-0.9.1: some unit test case failed with open JDK

2012-10-18 Thread Leonardo Rangel Augusto (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479352#comment-13479352
 ] 

Leonardo Rangel Augusto commented on PIG-2405:
--

in TestDataModel, the underlying problem is using an order-independent 
structure in an order-dependent test. What about keeping it simple and removing 
the HashMap from the Tuple, instead of replacing it with a LinkedHashMap?

 svn tags/release-0.9.1: some unit test case failed with open JDK
 

 Key: PIG-2405
 URL: https://issues.apache.org/jira/browse/PIG-2405
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.9.1
 Environment: ant-1.8.2
 open jdk: 1.6
Reporter: fang fang chen
Assignee: fang fang chen
 Attachments: 2405_1.patch, 2405_2.patch


 [junit] Test org.apache.pig.test.TestDataModel FAILED
 Testcase: testTupleToString took 0.004 sec
 FAILED
 toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
 junit.framework.ComparisonFailure: toString expected:...ad a little 
 lamb)},[[hello#world,goodbye#all]],42,50,3.14... but was:...ad a 
 little lamb)},[[goodbye#all,hello#world]],42,50,3.14...
  at 
 org.apache.pig.test.TestDataModel.testTupleToString(TestDataModel.java:269
 [junit] Test org.apache.pig.test.TestHBaseStorage FAILED
 Tests run: 18, Failures: 0, Errors: 12, Time elapsed: 188.612 sec
 Testcase: testHeterogeneousScans took 0.018 sec
 Caused an ERROR
 java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml (Too many 
 open files)
 java.lang.RuntimeException: java.io.FileNotFoundException: 
 /root/pigtest/conf/hadoop-site.xml (Too many open files)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1162)
 at 
 org.apache.hadoop.conf.Configuration.loadResources(Configuration.java:1035)
 at 
 org.apache.hadoop.conf.Configuration.getProps(Configuration.java:980)
 at org.apache.hadoop.conf.Configuration.get(Configuration.java:436)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.init(HConnectionManager.java:271)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:155)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:167)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:130)
 at 
 org.apache.pig.test.TestHBaseStorage.prepareTable(TestHBaseStorage.java:809)
 at 
 org.apache.pig.test.TestHBaseStorage.testHeterogeneousScans(TestHBaseStorage.java:741)
 Caused by: java.io.FileNotFoundException: /root/pigtest/conf/hadoop-site.xml 
 (Too many open files)
 at java.io.FileInputStream.init(FileInputStream.java:112)
 at java.io.FileInputStream.init(FileInputStream.java:72)
 at 
 sun.net.www.protocol.file.FileURLConnection.connect(FileURLConnection.java:70)
 at 
 sun.net.www.protocol.file.FileURLConnection.getInputStream(FileURLConnection.java:161)
 at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown 
 Source)
 at 
 org.apache.xerces.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source)
 at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
 at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
 at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
 at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
 at 
 org.apache.hadoop.conf.Configuration.loadResource(Configuration.java:1079)
 Caused an ERROR
 Could not resolve the DNS name of hostname:39611
 java.lang.IllegalArgumentException: Could not resolve the DNS name of 
 hostname:39611
 at 
 org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
 at 
 org.apache.hadoop.hbase.HServerAddress.init(HServerAddress.java:66)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:755)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:590)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:145)
 at 
 

[jira] [Updated] (PIG-2926) TestPoissonSampleLoader failing on rhel environment

2012-10-18 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2926:
--

   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

 TestPoissonSampleLoader failing on rhel environment
 ---

 Key: PIG-2926
 URL: https://issues.apache.org/jira/browse/PIG-2926
 Project: Pig
  Issue Type: Sub-task
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
Priority: Minor
 Fix For: 0.11

 Attachments: PIG-2926-0.patch


 Testing on rhel environment, TestPoissonSampleLoader fails with 
 {noformat}
 Testcase: testNumSamples took 22.077 sec
 FAILED
 expected:47 but was:42
 junit.framework.AssertionFailedError: expected:47 but was:42
 at 
 org.apache.pig.test.TestPoissonSampleLoader.testNumSamples(TestPoissonSampleLoader.java:125)
 {noformat}
 From 
 {noformat}
 124 count = testNumSamples(0.0001, 100);
 125 assertEquals(count, 42);
 {noformat}
 This runs fine on my mac environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2958) Pig tests do not appear to have a logger attached

2012-10-18 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2958:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 Pig tests do not appear to have a logger attached
 -

 Key: PIG-2958
 URL: https://issues.apache.org/jira/browse/PIG-2958
 Project: Pig
  Issue Type: Sub-task
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.11

 Attachments: PIG-2958-1.patch


 This causes false failures in TestPigRunner, but also makes debugging 
 somewhat more difficult than it has to be.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479392#comment-13479392
 ] 

Jonathan Coveney commented on PIG-2778:
---

Cheolsoo, it doesn't apply cleanly anymore. Any chance you can rebase it off 
trunk?

 Add 'matches' operator to predicate pushdown
 

 Key: PIG-2778
 URL: https://issues.apache.org/jira/browse/PIG-2778
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Cheolsoo Park
 Attachments: PIG-2778.patch, test_e2e.log, test_unit.log


 Currently the regex match operation does not get pushed down to LoadMetadata 
 (and Expression does not have an enum value for it); it would be quite useful 
 to enable this for some optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-18 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479402#comment-13479402
 ] 

Koji Noguchi commented on PIG-2975:
---

bq. Result incorrect (when order-by used). [0.11 and trunk]

Reading the code, I was able to come up with incorrect result case in 0.10. 
It's probably rare since the type has to be unknown. Any Datatype that has less 
than 2 bytes of header size in BinInterSedes.java can hit this issue.

{noformat}
$ pig -version
USING: /grid/0/gs/pig/current
Apache Pig version 0.10.1.0.1206081058 (r1348169) 

$ cat pig-2975-mixed.pig
a = load 'pig-2975-mixed1.txt' as (a0:chararray, a1:chararray);
b = load 'pig-2975-mixed2.txt' as (b0:int);
y = union a,b;
z = order y by $0;
dump z;
$ cat pig-2975-mixed1.txt
a   b
b   c
d   e
$ cat pig-2975-mixed2.txt
0
1
0
1
{noformat}

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, 
 pig-2975-trunk_v02-broken.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-18 Thread Koji Noguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479404#comment-13479404
 ] 

Koji Noguchi commented on PIG-2975:
---

Silly me. Result of above script was 
{noformat}
(0)
(0)
(0)
(0)
(a,b)
(b,c)
(d,e)
{noformat}

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, 
 pig-2975-trunk_v02-broken.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown

2012-10-18 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479410#comment-13479410
 ] 

Cheolsoo Park commented on PIG-2778:


Hi Jonathan, sorry about that.

It applies to the github mirror, but I haven't tried to the svn trunk. The svn 
repository is not responding at the moment... I will rebase it as soon as I can 
checkout.

Thanks!

 Add 'matches' operator to predicate pushdown
 

 Key: PIG-2778
 URL: https://issues.apache.org/jira/browse/PIG-2778
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Cheolsoo Park
 Attachments: PIG-2778.patch, test_e2e.log, test_unit.log


 Currently the regex match operation does not get pushed down to LoadMetadata 
 (and Expression does not have an enum value for it); it would be quite useful 
 to enable this for some optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479421#comment-13479421
 ] 

Jonathan Coveney commented on PIG-2975:
---

Koji,

I don't think we need to sacrifice performance if we use 
BinInterSedes.BinInterSedesRawComparator. It traverses the bytes, it doesn't 
deserialize or make any objects (and I think I found an improvement we can 
make).

As far as sort order, I think it's meant to be somewhat odd on purporse.

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, 
 pig-2975-trunk_v02-broken.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk #1339

2012-10-18 Thread Apache Jenkins Server
See https://builds.apache.org/job/Pig-trunk/1339/changes

Changes:

[jcoveney] Fix CHANGES.txt (jcoveney)

[jcoveney] Fix CHANGES.txt (jcoveney)

[jcoveney] PIG-2958: Pig tests do not appear to have a logger attached (daijyc 
via jcoveney)

[jcoveney] PIG-2972: TestPoissonSampleLoader failing on rhel environment 
(jcoveney)

[gdfm] Fixed problems with CHANGES.txt

[gdfm] PIG-2985: TestRank1,2,3 fail with hadoop-2.0.x (rohini via azaroth)

--
[...truncated 6629 lines...]
 [findbugs]   org.apache.hadoop.fs.FSDataInputStream
 [findbugs]   org.python.core.PyObject
 [findbugs]   jline.History
 [findbugs]   org.jruby.embed.internal.LocalContextProvider
 [findbugs]   org.apache.hadoop.io.BooleanWritable
 [findbugs]   org.apache.log4j.Logger
 [findbugs]   org.apache.hadoop.hbase.filter.FamilyFilter
 [findbugs]   org.codehaus.jackson.annotate.JsonPropertyOrder
 [findbugs]   groovy.lang.Tuple
 [findbugs]   org.antlr.runtime.IntStream
 [findbugs]   org.apache.hadoop.util.ReflectionUtils
 [findbugs]   org.apache.hadoop.fs.ContentSummary
 [findbugs]   org.jruby.runtime.builtin.IRubyObject
 [findbugs]   org.jruby.RubyInteger
 [findbugs]   org.python.core.PyTuple
 [findbugs]   org.mortbay.log.Log
 [findbugs]   org.apache.hadoop.conf.Configuration
 [findbugs]   com.google.common.base.Joiner
 [findbugs]   org.apache.hadoop.mapreduce.lib.input.FileSplit
 [findbugs]   org.apache.hadoop.mapred.Counters$Counter
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs]   org.apache.hadoop.mapred.JobPriority
 [findbugs]   org.apache.commons.cli.Options
 [findbugs]   org.apache.hadoop.mapred.JobID
 [findbugs]   org.apache.hadoop.util.bloom.BloomFilter
 [findbugs]   org.python.core.PyFrame
 [findbugs]   org.apache.hadoop.hbase.filter.CompareFilter
 [findbugs]   org.apache.hadoop.util.VersionInfo
 [findbugs]   org.python.core.PyString
 [findbugs]   org.apache.hadoop.io.Text$Comparator
 [findbugs]   org.jruby.runtime.Block
 [findbugs]   org.antlr.runtime.MismatchedSetException
 [findbugs]   org.apache.hadoop.io.BytesWritable
 [findbugs]   org.apache.hadoop.fs.FsShell
 [findbugs]   org.joda.time.Months
 [findbugs]   org.mozilla.javascript.ImporterTopLevel
 [findbugs]   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
 [findbugs]   org.apache.hadoop.mapred.TaskReport
 [findbugs]   org.apache.hadoop.security.UserGroupInformation
 [findbugs]   org.antlr.runtime.tree.RewriteRuleSubtreeStream
 [findbugs]   org.apache.commons.cli.HelpFormatter
 [findbugs]   com.google.common.collect.Maps
 [findbugs]   org.joda.time.ReadableInstant
 [findbugs]   org.mozilla.javascript.NativeObject
 [findbugs]   org.apache.hadoop.hbase.HConstants
 [findbugs]   org.apache.hadoop.io.serializer.Deserializer
 [findbugs]   org.antlr.runtime.FailedPredicateException
 [findbugs]   org.apache.hadoop.io.compress.CompressionCodec
 [findbugs]   org.jruby.RubyNil
 [findbugs]   org.apache.hadoop.fs.FileStatus
 [findbugs]   org.apache.hadoop.hbase.client.Result
 [findbugs]   org.apache.hadoop.mapreduce.JobContext
 [findbugs]   org.codehaus.jackson.JsonGenerator
 [findbugs]   org.apache.hadoop.mapreduce.TaskAttemptContext
 [findbugs]   org.apache.hadoop.io.BytesWritable$Comparator
 [findbugs]   org.apache.hadoop.io.LongWritable$Comparator
 [findbugs]   org.codehaus.jackson.map.util.LRUMap
 [findbugs]   org.apache.hadoop.hbase.util.Bytes
 [findbugs]   org.antlr.runtime.MismatchedTokenException
 [findbugs]   org.codehaus.jackson.JsonParser
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   org.python.core.PyException
 [findbugs]   org.apache.commons.cli.ParseException
 [findbugs]   org.apache.hadoop.io.compress.CompressionOutputStream
 [findbugs]   org.apache.hadoop.hbase.filter.WritableByteArrayComparable
 [findbugs]   org.antlr.runtime.tree.CommonTreeNodeStream
 [findbugs]   org.apache.log4j.Level
 [findbugs]   org.apache.hadoop.hbase.client.Scan
 [findbugs]   org.jruby.anno.JRubyMethod
 [findbugs]   org.apache.hadoop.mapreduce.Job
 [findbugs]   com.google.common.util.concurrent.Futures
 [findbugs]   org.apache.commons.logging.LogFactory
 [findbugs]   org.apache.commons.collections.IteratorUtils
 [findbugs]   org.apache.commons.codec.binary.Base64
 [findbugs]   org.codehaus.jackson.map.ObjectMapper
 [findbugs]   org.apache.hadoop.fs.FileSystem
 [findbugs]   org.jruby.embed.LocalContextScope
 [findbugs]   org.apache.hadoop.hbase.filter.FilterList$Operator
 [findbugs]   org.jruby.RubySymbol
 [findbugs]   org.apache.hadoop.hbase.io.ImmutableBytesWritable
 [findbugs]   org.apache.hadoop.io.serializer.SerializationFactory
 [findbugs]   org.antlr.runtime.tree.TreeAdaptor
 [findbugs]   org.apache.hadoop.mapred.RunningJob
 [findbugs]   org.antlr.runtime.CommonTokenStream
 [findbugs]   org.apache.hadoop.io.DataInputBuffer
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile
 [findbugs]   org.apache.commons.cli.GnuParser
 [findbugs]   org.mozilla.javascript.Context
 [findbugs]   org.apache.hadoop.io.FloatWritable
 [findbugs]   

[jira] [Commented] (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2012-10-18 Thread gustavo riveros (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479472#comment-13479472
 ] 

gustavo riveros commented on PIG-366:
-

[~arov] I'm new to pig and I have trouble getting started, I've been trying to 
install PigEditor in Eclipse Helios(sr2), but once I add the following file to 
the plugins folder of eclipse: 
alexrovner-PigEditor-398a1af/PigEclipseUpdateSite/plugin / 
org.apache.pigeditor_1.0.0.4.jar , no change occurs within the platform, not 
start anything pig environment, appreciate your help as I see that you are an 
expert in pig.
(is part of an implementation for my thesis)

 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Robert Gibbon
Priority: Minor
  Labels: gsoc, mentor
 Attachments: org.apache.pig.pigpen_0.0.1.jar, 
 org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
 org.apache.pig.pigpen-0.7.0.tar.gz, org.apache.pig.pigpen_0.7.2.jar, 
 org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.7.4.jar, 
 org.apache.pig.pigpen-0.7.4.tar.gz, org.apache.pig.pigpen_0.7.5.jar, 
 org.apache.pig.pigpen-0.7.5.tar.gz, pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1283) COUNT on null bag causes failure

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479476#comment-13479476
 ] 

Jonathan Coveney commented on PIG-1283:
---

Anand,

Thanks for the contribution! Committed.

 COUNT on null bag causes failure
 

 Key: PIG-1283
 URL: https://issues.apache.org/jira/browse/PIG-1283
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Assignee: Anand L Ranganathan
  Labels: newbie
 Attachments: PIG-1283-1.patch, PIG-1283-2.patch, pig_1283-3.patch


 grunt  l = load '/tmp/e.bag' as (b : bag{t: (i : int)}, a : int);
 # b is null for the only row
 grunt c = foreach l generate COUNT(b);   
 grunt dump c   
 It results in following exception-
 org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
 computing count in COUNT
 at org.apache.pig.builtin.COUNT.exec(COUNT.java:59)
 at org.apache.pig.builtin.COUNT.exec(COUNT.java:39)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
 Caused by: java.lang.NullPointerException
 at org.apache.pig.builtin.COUNT.exec(COUNT.java:46)
 ... 12 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1283) COUNT on null bag causes failure

2012-10-18 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-1283:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 COUNT on null bag causes failure
 

 Key: PIG-1283
 URL: https://issues.apache.org/jira/browse/PIG-1283
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
Assignee: Anand L Ranganathan
  Labels: newbie
 Attachments: PIG-1283-1.patch, PIG-1283-2.patch, pig_1283-3.patch


 grunt  l = load '/tmp/e.bag' as (b : bag{t: (i : int)}, a : int);
 # b is null for the only row
 grunt c = foreach l generate COUNT(b);   
 grunt dump c   
 It results in following exception-
 org.apache.pig.backend.executionengine.ExecException: ERROR 2106: Error while 
 computing count in COUNT
 at org.apache.pig.builtin.COUNT.exec(COUNT.java:59)
 at org.apache.pig.builtin.COUNT.exec(COUNT.java:39)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:212)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:293)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:358)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:232)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:227)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:52)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:583)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:176)
 Caused by: java.lang.NullPointerException
 at org.apache.pig.builtin.COUNT.exec(COUNT.java:46)
 ... 12 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479478#comment-13479478
 ] 

Prasanth J commented on PIG-2582:
-

I am using ResourceStatistics to store intermediate stats in PIG-2831. I am not 
using getmBytes() method but I use getAvgRecordSize() for storing and returning 
average record size in bytes. 

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2778) Add 'matches' operator to predicate pushdown

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479485#comment-13479485
 ] 

Jonathan Coveney commented on PIG-2778:
---

This has been applied to trunk. Thanks, Cheolsoo!

 Add 'matches' operator to predicate pushdown
 

 Key: PIG-2778
 URL: https://issues.apache.org/jira/browse/PIG-2778
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Cheolsoo Park
 Attachments: PIG-2778.patch, test_e2e.log, test_unit.log


 Currently the regex match operation does not get pushed down to LoadMetadata 
 (and Expression does not have an enum value for it); it would be quite useful 
 to enable this for some optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2778) Add 'matches' operator to predicate pushdown

2012-10-18 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2778:
--

   Resolution: Fixed
Fix Version/s: 0.12
   Status: Resolved  (was: Patch Available)

 Add 'matches' operator to predicate pushdown
 

 Key: PIG-2778
 URL: https://issues.apache.org/jira/browse/PIG-2778
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Assignee: Cheolsoo Park
 Fix For: 0.12

 Attachments: PIG-2778.patch, test_e2e.log, test_unit.log


 Currently the regex match operation does not get pushed down to LoadMetadata 
 (and Expression does not have an enum value for it); it would be quite useful 
 to enable this for some optimizations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479488#comment-13479488
 ] 

Prashant Kommireddi commented on PIG-2582:
--

Hi Prasanth, I looked at PIG-2831. The current impl of getAvgRecordSize in 
trunk returns size in MB. That is not what you want?
Also, it might be better to access the values from getters instead of directly 
accessing them. That will allow us to clean-up the class further in future. 
Those members should really be private.

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prasanth J (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479506#comment-13479506
 ] 

Prasanth J commented on PIG-2582:
-

I am not using setmBytes() or mBytes at all. So as per the implementation 
logic, only if mBytes!=null and numRecords!=null it will return the size in MB 
else it will return whatever it contains which in my case works fine. I also 
din't see any places using this. Please let me know if there is going to be any 
changes to the APIs, so that I will modify the patch accordingly. 

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2931) $ signs in the replacement string make parameter substitution fail

2012-10-18 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2931:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks Cheolsoo! It's in

 $ signs in the replacement string make parameter substitution fail
 --

 Key: PIG-2931
 URL: https://issues.apache.org/jira/browse/PIG-2931
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Cheolsoo Park
Assignee: Cheolsoo Park
 Fix For: 0.11

 Attachments: PIG-2931.patch


 To reproduce the issue, use the following pig script:
 {code:title=test.pig}
 a = load 'data';
 b = filter by $FILTER;
 {code}
 and run the following command:
 {code}
 pig -x local -dryrun -f test.pig -p FILTER=(\$0 == 'a')
 {code}
 This generates the following script:
 {code:title=test.pig.substituted}
 a = load 'data';
 b = filter by ($FILTER == 'a');
 {code}
 However this should be:
 {code}
 a = load 'data';
 b = filter by ($0 == 'a');
 {code}
 This is because Pig calls replaceFirst() with a replacement string that 
 include a $ sign as follows:
 {code}
 $FILTER.replaceFirst(\\$FILTER, ($0 == 'a')));
 {code}
 To treat $ signs as literals in the replacement string, we must escape them. 
 Please see the [Java 
 doc|http://docs.oracle.com/javase/6/docs/api/java/util/regex/Matcher.html#replaceFirst(java.lang.String)]
  for Matcher class for explanation:
 {quote}
 Note that backslashes (\) and dollar signs ($) in the replacement string may 
 cause the results to be different than if it were being treated as a literal 
 replacement string. Dollar signs may be treated as references to captured 
 subsequences as described above, and backslashes are used to escape literal 
 characters in the replacement string.
 {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-18 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479523#comment-13479523
 ] 

Jonathan Coveney commented on PIG-2975:
---

FWIW I think my patch fixes this, and I don't think it has any downsides. It 
just uses the normal RawComparator.

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, pig-2975-trunk_v01.txt, 
 pig-2975-trunk_v02-broken.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-18 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479528#comment-13479528
 ] 

Prashant Kommireddi commented on PIG-2582:
--

Sure. 

So as per the implementation logic, only if mBytes!=null and numRecords!=null 
it will return the size in MB else it will return whatever it contains which in 
my case works fine. - this works but might be confusing/inconsistent though. I 
will wait for Travis/others to comment on this patch and let's stay in touch on 
how this might affect PIG-2831.

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2012-10-18 Thread jira
Issue Subscription
Filter: PIG patch available (31 issues)

Subscriber: pigdaily

Key Summary
PIG-2978TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
https://issues.apache.org/jira/browse/PIG-2978
PIG-2968ColumnMapKeyPrune fails to prune a subtree inside foreach
https://issues.apache.org/jira/browse/PIG-2968
PIG-2967Fix Glob_local test failure for Pig E2E Test Framework
https://issues.apache.org/jira/browse/PIG-2967
PIG-2960Increase the timeout for unit test
https://issues.apache.org/jira/browse/PIG-2960
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2954 TestParamSubPreproc still depends on bash to run 
https://issues.apache.org/jira/browse/PIG-2954
PIG-2953which utility does not exist on Windows
https://issues.apache.org/jira/browse/PIG-2953
PIG-2942DevTests, TestLoad has a false failure on Windows
https://issues.apache.org/jira/browse/PIG-2942
PIG-2940HBaseStorage store fails in secure cluster
https://issues.apache.org/jira/browse/PIG-2940
PIG-2904Scripting UDFs should allow DEFINE statements to pass parameters to 
the UDF's constructor
https://issues.apache.org/jira/browse/PIG-2904
PIG-2898Parallel execution of e2e tests
https://issues.apache.org/jira/browse/PIG-2898
PIG-2881Add SUBTRACT eval function
https://issues.apache.org/jira/browse/PIG-2881
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2801grunt sh command should invoke the shell implicitly instead of 
calling exec directly with the command tokens
https://issues.apache.org/jira/browse/PIG-2801
PIG-2798pig streaming tests assume interpreters are auto-resolved
https://issues.apache.org/jira/browse/PIG-2798
PIG-2796Local temporary paths are not always valid HDFS path names.
https://issues.apache.org/jira/browse/PIG-2796
PIG-2795Fix test cases that generate pig scripts with load  + pathStr to 
encode \ in the path
https://issues.apache.org/jira/browse/PIG-2795
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2657Print warning if using wrong jython version
https://issues.apache.org/jira/browse/PIG-2657
PIG-2495Using merge JOIN from a HBaseStorage produces an error
https://issues.apache.org/jira/browse/PIG-2495
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2405svn tags/release-0.9.1: some unit test case failed with open JDK
https://issues.apache.org/jira/browse/PIG-2405
PIG-2362Rework Ant build.xml to use macrodef instead of antcall
https://issues.apache.org/jira/browse/PIG-2362
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1237Piggybank MutliStorage - specify field to write in output
https://issues.apache.org/jira/browse/PIG-1237

You may edit this subscription at:
https://issues.apache.org/jira/secure/FilterSubscription!default.jspa?subId=13225filterId=12322384


[jira] [Commented] (PIG-2927) SHIP and use JRuby gems in JRuby UDFs

2012-10-18 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479567#comment-13479567
 ] 

Cheolsoo Park commented on PIG-2927:


Although I am no Ruby expert, I think that Jonathan's patch works well. Here is 
my test.

1) installed a non-trivial rubygem library (rubygem-json) on the client only 
and confirmed that it is not installed on any datanode on the cluster.
{code}
/usr/lib/ruby/gems/1.8/gems/json-1.4.6/
{code}
2) wrote a ruby udf that parses json string:
{code}
require 'rubygems'
require 'pigudf'
require 'json'

class Myudfs  PigUdf
   outputSchema result:chararray
   def parseJson input
  result = JSON.parse(input)
   end
end
{code}
3) wrote a short pig script that loads a jsonstring and calls my ruby udf:
{code}
register 'test.rb' using jruby as myfuncs;
a = load 'json.txt' using PigStorage() as (i:chararray);
b = foreach a generate myfuncs.parseJson(i);
dump b;
{code}
4) got the expected result as follows:
{code:title=input}
{id:1,nested:{value1:first1,next:{complex_record:{id:2,nested:{value1:second1,next:null,value2:second2}}},value2:first2}}
{code}
{code:title=result}
([id#1,nested#{value1=first1, value2=first2, next={complex_record={id=2, 
nested={value1=second1, value2=second2, next=null])
{code}

Without Jonathan's patch, I get the following error in the front-end as 
expected:
{code}
LoadError: no such file to load -- json
  require at org/jruby/RubyKernel.java:1042
  require at 
file:/home/cheolsoo/pig-ruby/build/ivy/lib/Pig/jruby-complete-1.6.7.jar!/META-INF/jruby.home/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36
   (root) at test.rb:3
2012-10-18 17:09:24,323 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2999: Unexpected internal error. (LoadError) no such file to load -- json
{code}
I also ran the Scripting e2e test cases with the patch on a Hadoop-1.0.x 
cluster, and they all passed. So it seems good to commit to me.

Btw, I wanted to write an e2e test case using rubygems-json, but I realized 
that rubygems-json is under GPL and can't include in Pig. We should either find 
another rubygem library that is under the Apache licence or make the test 
configurable so that it will run only if rubygem-json is installed.

Thanks!

 SHIP and use JRuby gems in JRuby UDFs
 -

 Key: PIG-2927
 URL: https://issues.apache.org/jira/browse/PIG-2927
 Project: Pig
  Issue Type: New Feature
  Components: parser
Affects Versions: 0.11
 Environment: JRuby UDFs
Reporter: Russell Jurney
Assignee: Jonathan Coveney
Priority: Minor
 Fix For: 0.11

 Attachments: PIG-2927-0.patch, PIG-2927-1.patch, PIG-2927-2.patch, 
 PIG-2927-3.patch


 It would be great to use JRuby gems in JRuby UDFs without installing them on 
 all machines on the cluster. Some way to SHIP them automatically with the job 
 would be great.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira