[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-06 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-712:
---

Attachment: (was: Pig_712_Patch.txt)

 Need utilities to create schemas for bags and tuples
 

 Key: PIG-712
 URL: https://issues.apache.org/jira/browse/PIG-712
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: 0.3.0


 Pig should provide utilities to create bag and tuple schemas. Currently, 
 users return schemas in outputSchema method and end up with very verbose 
 boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
 code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-06 Thread Jeff Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-712:
---

Attachment: Pig_712_Patch_Merged.txt

I've merged the two patches into one patch,because the testcase dependent on 
the implemenation.

 Need utilities to create schemas for bags and tuples
 

 Key: PIG-712
 URL: https://issues.apache.org/jira/browse/PIG-712
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: 0.3.0

 Attachments: Pig_712_Patch_Merged.txt


 Pig should provide utilities to create bag and tuple schemas. Currently, 
 users return schemas in outputSchema method and end up with very verbose 
 boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
 code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-712) Need utilities to create schemas for bags and tuples

2009-04-06 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12695985#action_12695985
 ] 

Jeff Zhang edited comment on PIG-712 at 4/6/09 3:43 AM:


I've merged the two patches into one patch,because the TestSchemaUtil dependend 
on the SchemaUtil. But It seems I can not submit this patch. Why? 

  was (Author: zjffdu):
I've merged the two patches into one patch,because the testcase dependent 
on the implemenation.
  
 Need utilities to create schemas for bags and tuples
 

 Key: PIG-712
 URL: https://issues.apache.org/jira/browse/PIG-712
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: 0.3.0

 Attachments: Pig_712_Patch_Merged.txt


 Pig should provide utilities to create bag and tuple schemas. Currently, 
 users return schemas in outputSchema method and end up with very verbose 
 boiler plate code. It will be very nice if Pig encapsulates the boiler plate 
 code in utility methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-04-06 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696244#action_12696244
 ] 

Pradeep Kamath commented on PIG-733:


Tests are not included in this patch since there are existing tests for order 
by.

All core unit tests did pass and finbugs gave the same number of warnings with 
and without the patch (output below). The excess warnings produced by the patch 
have been addressed in the new version of the patch (PIG-733-v2.patch).

{noformat}
=== CORE UNIT TESTS OUTPUT WITH PATCH
[prade...@afterside:/tmp/PIG-733/trunk]


test-core:
[mkdir] Created dir: /tmp/PIG-733/trunk/build/test/logs
[junit] Running org.apache.pig.test.TestAdd
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.056 sec
...
[junit] Running org.apache.pig.test.TestTypeCheckingValidatorNoSchema
[junit] Tests run: 13, Failures: 0, Errors: 0, Time elapsed: 0.629 sec
[junit] Running org.apache.pig.test.TestUnion
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 49.94 sec

test-contrib:

BUILD SUCCESSFUL
Total time: 77 minutes 47 seconds

=== FINDBUGS OUTPUT WITH PATCH
[prade...@afterside:/tmp/PIG-733/trunk]

[prade...@chargesize:/tmp/PIG-733/trunk]ant 
-Dfindbugs.home=/homes/pradeepk/findbugs-1.3.8 findbugs
Buildfile: build.xml
...
findbugs:
[mkdir] Created dir: /tmp/PIG-733/trunk/build/test/findbugs
 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] Warnings generated: 665
 [findbugs] Calculating exit code...
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 1
 [findbugs] Java Result: 1
 [findbugs] Output saved to 
/tmp/PIG-733/trunk/build/test/findbugs/pig-findbugs-report.xml
 [xslt] Processing 
/tmp/PIG-733/trunk/build/test/findbugs/pig-findbugs-report.xml to 
/tmp/PIG-733/trunk/build/test/findbugs/pig-findbugs-report.html
 [xslt] Loading stylesheet 
/homes/pradeepk/findbugs-1.3.8/src/xsl/default.xsl

=== FINDBUGS OUTPUT WITHOUT PATCH
[prade...@chargesize:/tmp/svncheckout/trunk]ant 
-Dfindbugs.home=/homes/pradeepk/findbugs-1.3.8 findbugs
Buildfile: build.xml

check-for-findbugs:

...
findbugs:
[mkdir] Created dir: /tmp/svncheckout/trunk/build/test/findbugs
 [findbugs] Executing findbugs from ant task
 [findbugs] Running FindBugs...
 [findbugs] Warnings generated: 665
 [findbugs] Calculating exit code...
 [findbugs] Setting 'bugs found' flag (1)
 [findbugs] Exit code set to: 1
 [findbugs] Java Result: 1
 [findbugs] Output saved to 
/tmp/svncheckout/trunk/build/test/findbugs/pig-findbugs-report.xml
 [xslt] Processing 
/tmp/svncheckout/trunk/build/test/findbugs/pig-findbugs-report.xml to 
/tmp/svncheckout/trunk/build/test/findbugs/pig-findbugs-report.html
 [xslt] Loading stylesheet 
/homes/pradeepk/findbugs-1.3.8/src/xsl/default.xsl



{noformat}

 Order by sampling dumps entire sample to hdfs which causes dfs FileSystem 
 closed error on large input
 ---

 Key: PIG-733
 URL: https://issues.apache.org/jira/browse/PIG-733
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-733.patch


 Order by has a sampling job which samples the input and creates a sorted list 
 of sample items. CUrrently the number of items sampled is 100 per map task. 
 So if the input is large resulting in many maps (say 50,000) the sample is 
 big. This sorted sample is stored on dfs. The WeightedRangePartitioner 
 computes quantile boundaries and weighted probabilities for repeating values 
 in each map by reading the samples file from DFS. In queries with many maps 
 (in the order of 50,000) the dfs read of the sample file fails with 
 FileSystem closed error. This seems to point to a dfs issue wherein a big 
 dfs file being read simultaneously by many dfs clients (in this case all 
 maps) causes the clients to be closed. However on the pig side, loading the 
 sample from each map in the final map reduce job and computing the quantile 
 boundaries and weighted probabilities is inefficient. We should do this 
 computation through a FindQuantiles udf in the same map reduce job which 
 produces the sorted samples. This way lesser data is written to dfs and in 
 the final map reduce job, the weightedRangePartitioner needs to just load the 
 computed information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-733) Order by sampling dumps entire sample to hdfs which causes dfs FileSystem closed error on large input

2009-04-06 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-733:
---

Attachment: PIG-733-v2.patch

 Order by sampling dumps entire sample to hdfs which causes dfs FileSystem 
 closed error on large input
 ---

 Key: PIG-733
 URL: https://issues.apache.org/jira/browse/PIG-733
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Fix For: 0.3.0

 Attachments: PIG-733-v2.patch, PIG-733.patch


 Order by has a sampling job which samples the input and creates a sorted list 
 of sample items. CUrrently the number of items sampled is 100 per map task. 
 So if the input is large resulting in many maps (say 50,000) the sample is 
 big. This sorted sample is stored on dfs. The WeightedRangePartitioner 
 computes quantile boundaries and weighted probabilities for repeating values 
 in each map by reading the samples file from DFS. In queries with many maps 
 (in the order of 50,000) the dfs read of the sample file fails with 
 FileSystem closed error. This seems to point to a dfs issue wherein a big 
 dfs file being read simultaneously by many dfs clients (in this case all 
 maps) causes the clients to be closed. However on the pig side, loading the 
 sample from each map in the final map reduce job and computing the quantile 
 boundaries and weighted probabilities is inefficient. We should do this 
 computation through a FindQuantiles udf in the same map reduce job which 
 produces the sorted samples. This way lesser data is written to dfs and in 
 the final map reduce job, the weightedRangePartitioner needs to just load the 
 computed information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-754) Bugs with load and store and filenames passed with -param containing periods

2009-04-06 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696263#action_12696263
 ] 

Viraj Bhat commented on PIG-754:


Ciemo there is a workaround in this form, if we make a param_file known as 
testparamfile which contains the param infile, it works. 

param_file option passes through a different code path which avoids problems 
which the -param faces.

{code} testmachine~/pigscripts cat testparamfile 

infile = 'file.right'

pig -exectype local -param_file testparamfile infile.pig

{code}

I suspect that this issues is somewhat related to the following Jira, maybe . 
dot should be included as a problematic special character.

http://issues.apache.org/jira/browse/PIG-564

 Bugs with load and store and filenames passed with -param containing periods
 

 Key: PIG-754
 URL: https://issues.apache.org/jira/browse/PIG-754
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz

 This one drove me batty.
 I have two files file and file.right.
 file:
 {code}
 WRONG 
 This is file, not file.right.
 {code}
 file.right:
 {code}
 RIGHT
 This is file.right..
 {code}
 infile.pig:
 {code}
 A = load '$infile' using PigStorage();
 dump A;
 {code}
 When I pass in file.right as the infile parameter value, the wrong file is 
 read:
 {code}
 -bash-3.00$ pig -exectype local -param infile=file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:18:36,291 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:18:36,292 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (WRONG )
 (This is file, not file.right.)
 {code}
 However, if I pass in infile as ./file.right, the script magically works.
 {code}
 -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:20:46,735 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:20:46,736 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I do not have this problem if I use the file name with a period in the script 
 itself:
 infile2.pig
 {code}
 A = load 'file.right' using PigStorage();
 dump A;
 {code}
 {code}
 -bash-3.00$ pig -exectype local infile2.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:22:47,022 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:22:47,023 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I also experience similar problems when I try to pass in param outfile in a 
 store statement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-754) Bugs with load and store and filenames passed with -param containing periods

2009-04-06 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696265#action_12696265
 ] 

Viraj Bhat commented on PIG-754:


Another workaround as suggested in PIG:564 :)

{code}
pig -exectype local -param infile=\'file.right\' infile.pig

(RIGHT)
(This is file.right..)

{code}




 Bugs with load and store and filenames passed with -param containing periods
 

 Key: PIG-754
 URL: https://issues.apache.org/jira/browse/PIG-754
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz

 This one drove me batty.
 I have two files file and file.right.
 file:
 {code}
 WRONG 
 This is file, not file.right.
 {code}
 file.right:
 {code}
 RIGHT
 This is file.right..
 {code}
 infile.pig:
 {code}
 A = load '$infile' using PigStorage();
 dump A;
 {code}
 When I pass in file.right as the infile parameter value, the wrong file is 
 read:
 {code}
 -bash-3.00$ pig -exectype local -param infile=file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:18:36,291 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:18:36,292 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (WRONG )
 (This is file, not file.right.)
 {code}
 However, if I pass in infile as ./file.right, the script magically works.
 {code}
 -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:20:46,735 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:20:46,736 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I do not have this problem if I use the file name with a period in the script 
 itself:
 infile2.pig
 {code}
 A = load 'file.right' using PigStorage();
 dump A;
 {code}
 {code}
 -bash-3.00$ pig -exectype local infile2.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:22:47,022 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:22:47,023 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I also experience similar problems when I try to pass in param outfile in a 
 store statement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-754) Bugs with load and store and filenames passed with -param containing periods

2009-04-06 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696265#action_12696265
 ] 

Viraj Bhat edited comment on PIG-754 at 4/6/09 2:43 PM:


Another workaround as suggested in PIG-564 :)

{code}
pig -exectype local -param infile=\'file.right\' infile.pig

(RIGHT)
(This is file.right..)

{code}




  was (Author: viraj):
Another workaround as suggested in PIG:564 :)

{code}
pig -exectype local -param infile=\'file.right\' infile.pig

(RIGHT)
(This is file.right..)

{code}



  
 Bugs with load and store and filenames passed with -param containing periods
 

 Key: PIG-754
 URL: https://issues.apache.org/jira/browse/PIG-754
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz

 This one drove me batty.
 I have two files file and file.right.
 file:
 {code}
 WRONG 
 This is file, not file.right.
 {code}
 file.right:
 {code}
 RIGHT
 This is file.right..
 {code}
 infile.pig:
 {code}
 A = load '$infile' using PigStorage();
 dump A;
 {code}
 When I pass in file.right as the infile parameter value, the wrong file is 
 read:
 {code}
 -bash-3.00$ pig -exectype local -param infile=file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:18:36,291 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:18:36,292 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (WRONG )
 (This is file, not file.right.)
 {code}
 However, if I pass in infile as ./file.right, the script magically works.
 {code}
 -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:20:46,735 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:20:46,736 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I do not have this problem if I use the file name with a period in the script 
 itself:
 infile2.pig
 {code}
 A = load 'file.right' using PigStorage();
 dump A;
 {code}
 {code}
 -bash-3.00$ pig -exectype local infile2.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:22:47,022 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:22:47,023 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I also experience similar problems when I try to pass in param outfile in a 
 store statement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-564) Parameter Substitution using -param option does not seem to work when parameters contain special characters such as +,=,-,?,'

2009-04-06 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696266#action_12696266
 ] 

David Ciemiewicz commented on PIG-564:
--

Period (.) is also a special character that seems to cause problems.

See related JIRA PIG-754

 Parameter Substitution using -param option does not seem to work when 
 parameters contain special characters such as +,=,-,?,' 
 ---

 Key: PIG-564
 URL: https://issues.apache.org/jira/browse/PIG-564
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: Viraj Bhat

 Consider the following Pig script which uses parameter substitution
 {code}
 %default qual '/user/viraj'
 %default mydir 'mydir_myextraqual'
 VISIT_LOGS = load '$qual/$mydir' as (a,b,c);
 dump VISIT_LOGS;
 {code}
 If you run the script as:
 ==
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param mydir=mydir-myextraqual mypigparamsub.pig
 ==
 You get the following error:
 ==
 2008-12-15 19:49:43,964 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - java.io.IOException: /user/viraj/mydir does not exist
 at 
 org.apache.pig.backend.executionengine.PigSlicer.validate(PigSlicer.java:109)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
 at 
 org.apache.pig.impl.io.ValidatingInputFileSpec.init(ValidatingInputFileSpec.java:44)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:200)
 at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:742)
 at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:370)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
 at 
 org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
 at java.lang.Thread.run(Thread.java:619)
 java.io.IOException: Unable to open iterator for alias: VISIT_LOGS [Job 
 terminated with anomalous status FAILED]
 at org.apache.pig.PigServer.openIterator(PigServer.java:389)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:269)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:178)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 ... 6 more
 ==
 Also tried using:  -param mydir='mydir\-myextraqual'
 This behavior occurs if the parameter value contains characters such as +,=, 
 ?. 
 A workaround for this behavior is using a param_file which contains 
 param_name=param_value on each line, with the param_value enclosed by 
 quotes. For example:
 mydir='mydir-myextraqual' and then running the pig script as:
 java -cp pig.jar:${HADOOP_HOME}/conf/ -Dhod.server='' org.apache.pig.Main 
 -param_file myparamfile mypigparamsub.pig
 The following issues need to be fixed:
 1) In -param option if parameter value contains special characters, it is 
 truncated
 2) In param_file, if  param_value contains a special characters, it should be 
 enclosed in quotes
 3) If 2 is a known issue then it should be documented in 
 http://wiki.apache.org/pig/ParameterSubstitution

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-754) Bugs with load and store and filenames passed with -param containing periods

2009-04-06 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696273#action_12696273
 ] 

Viraj Bhat commented on PIG-754:


Something that I am still not understanding is why does the following work when 
you supply the full path!! even if it has a special character,??

{code}
-bash-3.00$ pig -exectype local -param infile=./file.right infile.pig

(RIGHT)
(This is file.right..)

{code}

or for that matter when you supply the full path

{code}
-bash-3.00$ pig -exectype local -param infile=/full/path/to/file.right 
infile.pig

(RIGHT)
(This is file.right..)

{code}






 Bugs with load and store and filenames passed with -param containing periods
 

 Key: PIG-754
 URL: https://issues.apache.org/jira/browse/PIG-754
 Project: Pig
  Issue Type: Bug
Reporter: David Ciemiewicz

 This one drove me batty.
 I have two files file and file.right.
 file:
 {code}
 WRONG 
 This is file, not file.right.
 {code}
 file.right:
 {code}
 RIGHT
 This is file.right..
 {code}
 infile.pig:
 {code}
 A = load '$infile' using PigStorage();
 dump A;
 {code}
 When I pass in file.right as the infile parameter value, the wrong file is 
 read:
 {code}
 -bash-3.00$ pig -exectype local -param infile=file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:18:36,291 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:18:36,292 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (WRONG )
 (This is file, not file.right.)
 {code}
 However, if I pass in infile as ./file.right, the script magically works.
 {code}
 -bash-3.00$ pig -exectype local -param infile=./file.right infile.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:20:46,735 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:20:46,736 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I do not have this problem if I use the file name with a period in the script 
 itself:
 infile2.pig
 {code}
 A = load 'file.right' using PigStorage();
 dump A;
 {code}
 {code}
 -bash-3.00$ pig -exectype local infile2.pig
 USING: /grid/0/gs/pig/current
 2009-04-05 23:22:47,022 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-05 23:22:47,023 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (RIGHT)
 (This is file.right.)
 {code}
 I also experience similar problems when I try to pass in param outfile in a 
 store statement.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-755) Difficult to debug parameter substitution problems based on the error messages when running in local mode

2009-04-06 Thread Viraj Bhat (JIRA)
Difficult to debug parameter substitution problems based on the error messages 
when running in local mode
-

 Key: PIG-755
 URL: https://issues.apache.org/jira/browse/PIG-755
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
 Fix For: 0.3.0


I have a script in which I do a parameter substitution for the input file. I 
have a use case where I find it difficult to debug based on the error messages 
in local mode.

{code}
A = load '$infile' using PigStorage() as
 (
   date: chararray,
   count   : long,
   gmean   : double
);

dump A;
{code}

1) I run it in local mode with the input file in the current working directory
{code}
prompt  $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main -exectype 
local -param infile='inputfile.txt' localparamsub.pig
{code}
2009-04-07 00:03:51,967 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore
 - Received error from storer function: 
org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
setup the load function.
2009-04-07 00:03:51,970 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - Failed jobs!!
2009-04-07 00:03:51,971 [main] INFO  
org.apache.pig.backend.local.executionengine.LocalPigLauncher - 1 out of 1 
failed!
2009-04-07 00:03:51,974 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1066: Unable to open iterator for alias A

Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062631414.log

ERROR 1066: Unable to open iterator for alias A
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias A
at org.apache.pig.PigServer.openIterator(PigServer.java:439)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
at org.apache.pig.Main.main(Main.java:352)
Caused by: java.io.IOException: Job terminated with anomalous status FAILED
at org.apache.pig.PigServer.openIterator(PigServer.java:433)
... 5 more


2) I run it in map reduce mode
{code}
prompt  $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main -param 
infile='inputfile.txt' localparamsub.pig
{code}

2009-04-07 00:07:31,660 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: hdfs://localhost:9000
2009-04-07 00:07:32,074 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
map-reduce job tracker at: localhost:9001
2009-04-07 00:07:34,543 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-04-07 00:07:39,540 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-04-07 00:07:39,540 [main] ERROR 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Map reduce job failed
2009-04-07 00:07:39,563 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2100: inputfile does not exist.

Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062851400.log

ERROR 2100: inputfile does not exist.
org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias A
at org.apache.pig.PigServer.openIterator(PigServer.java:439)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
at org.apache.pig.Main.main(Main.java:352)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
Unable to store alias A
at org.apache.pig.PigServer.store(PigServer.java:470)
at org.apache.pig.PigServer.openIterator(PigServer.java:427)
... 5 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
Unable to store alias A
at 

[jira] Updated: (PIG-755) Difficult to debug parameter substitution problems based on the error messages when running in local mode

2009-04-06 Thread Viraj Bhat (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Viraj Bhat updated PIG-755:
---

Attachment: localparamsub.pig
inputfile.txt

Script and testfile

 Difficult to debug parameter substitution problems based on the error 
 messages when running in local mode
 -

 Key: PIG-755
 URL: https://issues.apache.org/jira/browse/PIG-755
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
 Fix For: 0.3.0

 Attachments: inputfile.txt, localparamsub.pig


 I have a script in which I do a parameter substitution for the input file. I 
 have a use case where I find it difficult to debug based on the error 
 messages in local mode.
 {code}
 A = load '$infile' using PigStorage() as
  (
date: chararray,
count   : long,
gmean   : double
 );
 dump A;
 {code}
 1) I run it in local mode with the input file in the current working directory
 {code}
 prompt  $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main 
 -exectype local -param infile='inputfile.txt' localparamsub.pig
 {code}
 2009-04-07 00:03:51,967 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore
  - Received error from storer function: 
 org.apache.pig.backend.executionengine.ExecException: ERROR 2081: Unable to 
 setup the load function.
 2009-04-07 00:03:51,970 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Failed jobs!!
 2009-04-07 00:03:51,971 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 1 out of 1 
 failed!
 2009-04-07 00:03:51,974 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1066: Unable to open iterator for alias A
 
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062631414.log
 
 ERROR 1066: Unable to open iterator for alias A
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias A
 at org.apache.pig.PigServer.openIterator(PigServer.java:439)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:352)
 Caused by: java.io.IOException: Job terminated with anomalous status FAILED
 at org.apache.pig.PigServer.openIterator(PigServer.java:433)
 ... 5 more
 
 2) I run it in map reduce mode
 {code}
 prompt  $ java -cp pig.jar:/path/to/hadoop/conf/ org.apache.pig.Main -param 
 infile='inputfile.txt' localparamsub.pig
 {code}
 2009-04-07 00:07:31,660 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: hdfs://localhost:9000
 2009-04-07 00:07:32,074 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: localhost:9001
 2009-04-07 00:07:34,543 [Thread-7] WARN  org.apache.hadoop.mapred.JobClient - 
 Use GenericOptionsParser for parsing the arguments. Applications should 
 implement Tool for the same.
 2009-04-07 00:07:39,540 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 0% complete
 2009-04-07 00:07:39,540 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Map reduce job failed
 2009-04-07 00:07:39,563 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2100: inputfile does not exist.
 
 Details at logfile: /home/viraj/pig-svn/trunk/pig_1239062851400.log
 
 ERROR 2100: inputfile does not exist.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias A
 at org.apache.pig.PigServer.openIterator(PigServer.java:439)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:359)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:193)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
 at org.apache.pig.Main.main(Main.java:352)
 

[jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization

2009-04-06 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696350#action_12696350
 ] 

Pradeep Kamath commented on PIG-627:


+1, patch committed. Thanks for the contribution Gunther!

 PERFORMANCE: multi-query optimization
 -

 Key: PIG-627
 URL: https://issues.apache.org/jira/browse/PIG-627
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
 Attachments: file_cmds-0305.patch, fix_store_prob.patch, 
 merge_741727_HEAD__0324.patch, merge_741727_HEAD__0324_2.patch, 
 multi-store-0303.patch, multi-store-0304.patch, multiquery-phase2_0313.patch, 
 multiquery-phase2_0323.patch, multiquery_0223.patch, multiquery_0224.patch, 
 multiquery_0306.patch, multiquery_explain_fix.patch, 
 non_reversible_store_load_dependencies.patch, 
 non_reversible_store_load_dependencies_2.patch, 
 noop_filter_absolute_path_flag.patch, 
 noop_filter_absolute_path_flag_0401.patch


 Currently, if your Pig script contains multiple stores and some shared 
 computation, Pig will execute several independent queries. For instance:
 A = load 'data' as (a, b, c);
 B = filter A by a  5;
 store B into 'output1';
 C = group B by b;
 store C into 'output2';
 This script will result in map-only job that generated output1 followed by a 
 map-reduce job that generated output2. As the resuld data is read, parsed and 
 filetered twice which is unnecessary and costly. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-732) Utility UDFs

2009-04-06 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12696404#action_12696404
 ] 

Ankur commented on PIG-732:
---

Hi Olga, can you please take a look and suggest what's wrong?

 Utility UDFs 
 -

 Key: PIG-732
 URL: https://issues.apache.org/jira/browse/PIG-732
 Project: Pig
  Issue Type: New Feature
Reporter: Ankur
Priority: Minor
 Attachments: udf.v1.patch, udf.v2.patch, udf.v3.patch


 Two utility UDFs and their respective test cases.
 1. TopN - Accepts number of tuples (N) to retain in output, field number 
 (type long) to use for comparison, and an sorted/unsorted bag of tuples. It 
 outputs a bag containing top N tuples.
 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines 
 (Yahoo, Google, AOL, Live) and extracts and normalizes the search query 
 present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.