Semantics of TOKENIZE are not clear
-----------------------------------

                 Key: PIG-683
                 URL: https://issues.apache.org/jira/browse/PIG-683
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
             Fix For: types_branch


The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as 
input a string and returns a bag. The bag contains 1 tuple per token. The tuple 
in turn contains a single token. A better approach would be to return a tuple 
(instead of a bag) that contains as many elements as there are tokens.

On a secondary note, the outputSchema method in TOKENIZE is broken. It should 
return a bag with a tuple that contains a string and not just a string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to