[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-03-05 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841789#action_12841789
 ] 

Dmitriy V. Ryaboy commented on PIG-1248:


Gerrit, as far as Pig the language is concerned, all bags are unordered.

Alan, re capitalization -- I see your point, and agree (even made LENGTH 
all-caps originally).
Can we put off renaming these until we copy them into builtins? Users will 
change the paths to take advantage of them anyway (from piggybank to builtin), 
so changing the capitalization shouldn't be a big deal.
I think it should be a copy, not a move, by the way, so as not to break 
existing scripts. Perhaps we can deprecate the piggybank classes when we copy 
them to builtin.

 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-03-05 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12841930#action_12841930
 ] 

Alan Gates commented on PIG-1248:
-

bq. Can we put off renaming these until we copy them into builtins? Users will 
change the paths to take advantage of them anyway (from piggybank to builtin), 
so changing the capitalization shouldn't be a big deal.
I think that's fine.  It may even help reduce confusion for users, given the 
next point.

bq. I think it should be a copy, not a move, by the way, so as not to break 
existing scripts. Perhaps we can deprecate the piggybank classes when we copy 
them to builtin.
+1, we want to make life as easy as possible for users.

So at this point, I'm +1 on committing this patch.

 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-03-02 Thread Bill Graham (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12840279#action_12840279
 ] 

Bill Graham commented on PIG-1248:
--

How exactly would split differ from the TOKENIZE function if split returned a 
bag? TOKENIZE returns an unordered bag of words. Having a function that returns 
an ordered tuple of words is very useful IMO. I had to write my own version of 
a tokenize UDF to do this. 

 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-03-01 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12839891#action_12839891
 ] 

Alan Gates commented on PIG-1248:
-

I agree camel case is easier on the eyes (and on the fingers).  But we seem to 
have chosen all caps for built in functions.  Some of these we'll eventually 
want to move into builtins for Pig.  I really don't want to bring in builtins 
that aren't all caps.  For functions we think we might want to bring into 
builtin someday it seems like it would be better to start them out in all caps 
now rather than changing them later.

One other thought on the tuples via bags.  If you have a script like:

{code}
A = load 'bla' using TextLoader();
B = foreach A generate flatten(Split($0));
{code}

Assume a file that contains : Mary had a little lamb.  If Split returns 
tuples then B will return 1 record, Mary had a little lamb.  If Split returns 
a bag, then B will generate 5 records (Mary, had, a, little, lamb).  
I don't have any guess of which of those users will want more.


 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff, PIG_1248.diff, PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-02-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836584#action_12836584
 ] 

Hadoop QA commented on PIG-1248:


-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12436549/PIG_1248.diff
  against trunk revision 912064.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 15 new or modified tests.

-1 javadoc.  The javadoc tool appears to have generated 1 warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 526 release audit warnings 
(more than the trunk's current 517 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/210/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/210/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/210/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/210/console

This message is automatically generated.

 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-02-22 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12836838#action_12836838
 ] 

Alan Gates commented on PIG-1248:
-

Some of these methods use all caps, some camel case.  Is there some reasoning 
here?  We should pick one convention.  The convention in Pig seems to be all 
caps (COUNT, etc.), so we should use that.

I would like to see SUBSTRING match SQL semantics, where the 3rd argument is 
the length of the substring instead of the terminal index.  Since this is a 
change in interface I'm open to delaying that until we pull SUBSTRING in as a 
Pig builtin.

The javadocs for the Split class appear to have been copied from SUBSTRING.

Should Split return a tuple of chararrays or a bag of tuples, each containing 
one chararray?  The advantage I can see for the latter is it's possible to a 
priori determine the full schema of splits output, while in the former you 
cannot.  (You can say that this returns a tuple, but you don't know what's in 
the tuple.  So when that tuple is flattened, the schema will become unknown.)  
Since lack of schema is contagious, this could cause problems for Pig features 
(such as optimization, metadata, certain storage mechanisms like Zebra) that do 
much better with schemas.  The same question applies to RegexExtractAll.








 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff, PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1248) [piggybank] useful String functions

2010-02-22 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12837064#action_12837064
 ] 

Dmitriy V. Ryaboy commented on PIG-1248:


I'll fix up the javadocs.
I didn't modify substring behavior except to make it be nice to short strings, 
which I think is what people tend to expect.
The contract should probably stay the same for backwards compatibility? We can 
change on migration, but that'd be confusing. 

I am returning tuples instead of bags because they are ordered, and bags are 
not. You are right about the flattening issue.. I guess a user could call the 
hypothetial tupleToBag udf if that was the use case, though.  

-D

 [piggybank] useful String functions
 ---

 Key: PIG-1248
 URL: https://issues.apache.org/jira/browse/PIG-1248
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Fix For: 0.7.0

 Attachments: PIG_1248.diff, PIG_1248.diff


 Pig ships with very few evalFuncs for working with strings. This jira is for 
 adding a few more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.