[jira] Commented: (PIG-738) Regexp passed from pigscript fails in UDF

2009-09-22 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758272#action_12758272
 ] 

Olga Natkovich commented on PIG-738:


+1, please, commit

 Regexp passed from pigscript fails in UDF  
 ---

 Key: PIG-738
 URL: https://issues.apache.org/jira/browse/PIG-738
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
Assignee: Pradeep Kamath
 Fix For: 0.6.0

 Attachments: myregexp.jar, PIG-738.patch, RegexGroupCount.java, 
 regexp.pig, regexpinput.txt


 Consider a pig script which parses and counts regular expressions from a text 
 file. 
 The regular expression supplied in the Pig script needs to escape the .  
 (dot) character.
 {code}
 register myregexp.jar;
 -- pattern not picked up
 define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports');
 A = load '/user/viraj/regexpinput.txt'  using PigStorage() as (source : 
 chararray);
 B = foreach A generate minelogs(source) as sportslogs;
 dump B;
 {code}
 Snippet of UDF RegexGroupCount.java
 {code}
 public class RegexGroupCount extends EvalFuncInteger {
 private final Pattern pattern_;
 public RegexGroupCount(String patternStr) {
System.out.println(My pattern supplied is +patternStr);
System.out.println(Equality test 
 +patternStr.equals(www\\.yahoo\\.com/sports));
pattern_ = Pattern.compile(patternStr, 
 Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
}
   public Integer exec(Tuple input)  throws IOException {
}
 }
 {code}
 Running the above script on the following dataset :
 
 dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl
 kas;dka;sd
 jsjsjwww.yahoo.com/sports
 jsdLSJDcom/sports
 wwwJyahooMcom/sports
 
 Results in the following:
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: 
 ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: 
 viraj-Sat Mar 28 02:06:31 PDT 2009-14)
 Userfunc fs: int
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 2009-03-28 02:06:43,923 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-03-28 02:06:43,923 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 (0)
 (0)
 (0)
 (0)
 (0)
 
 In essence there seems to be no way of passing this type of constructor 
 argument through the Pig script. The only workaround seems to be hard coding 
 the values in the UDF!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-738) Regexp passed from pigscript fails in UDF

2009-03-30 Thread Viraj Bhat (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12693970#action_12693970
 ] 

Viraj Bhat commented on PIG-738:


This works, as the Pig parser ignores single front slash (\) followed by u \u 

 Regexp passed from pigscript fails in UDF  
 ---

 Key: PIG-738
 URL: https://issues.apache.org/jira/browse/PIG-738
 Project: Pig
  Issue Type: Bug
  Components: grunt
Affects Versions: 0.3.0
Reporter: Viraj Bhat
 Fix For: 0.3.0

 Attachments: myregexp.jar, RegexGroupCount.java, regexp.pig, 
 regexpinput.txt


 Consider a pig script which parses and counts regular expressions from a text 
 file. 
 The regular expression supplied in the Pig script needs to escape the .  
 (dot) character.
 {code}
 register myregexp.jar;
 -- pattern not picked up
 define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports');
 A = load '/user/viraj/regexpinput.txt'  using PigStorage() as (source : 
 chararray);
 B = foreach A generate minelogs(source) as sportslogs;
 dump B;
 {code}
 Snippet of UDF RegexGroupCount.java
 {code}
 public class RegexGroupCount extends EvalFuncInteger {
 private final Pattern pattern_;
 public RegexGroupCount(String patternStr) {
System.out.println(My pattern supplied is +patternStr);
System.out.println(Equality test 
 +patternStr.equals(www\\.yahoo\\.com/sports));
pattern_ = Pattern.compile(patternStr, 
 Pattern.DOTALL|Pattern.CASE_INSENSITIVE);
}
   public Integer exec(Tuple input)  throws IOException {
}
 }
 {code}
 Running the above script on the following dataset :
 
 dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl
 kas;dka;sd
 jsjsjwww.yahoo.com/sports
 jsdLSJDcom/sports
 wwwJyahooMcom/sports
 
 Results in the following:
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: 
 ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: 
 viraj-Sat Mar 28 02:06:31 PDT 2009-14)
 Userfunc fs: int
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 My pattern supplied is www\\.yahoo\\.com/sports
 Equality test false
 2009-03-28 02:06:43,923 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - 100% complete
 2009-03-28 02:06:43,923 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
  - Success!
 (0)
 (0)
 (0)
 (0)
 (0)
 
 In essence there seems to be no way of passing this type of constructor 
 argument through the Pig script. The only workaround seems to be hard coding 
 the values in the UDF!!

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.