Semantics of TOKENIZE are not clear

                 Key: PIG-683
             Project: Pig
          Issue Type: Bug
          Components: impl
    Affects Versions: types_branch
            Reporter: Santhosh Srinivasan
             Fix For: types_branch

The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as 
input a string and returns a bag. The bag contains 1 tuple per token. The tuple 
in turn contains a single token. A better approach would be to return a tuple 
(instead of a bag) that contains as many elements as there are tokens.

On a secondary note, the outputSchema method in TOKENIZE is broken. It should 
return a bag with a tuple that contains a string and not just a string.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to