Semantics of TOKENIZE are not clear ----------------------------------- Key: PIG-683 URL: https://issues.apache.org/jira/browse/PIG-683 Project: Pig Issue Type: Bug Components: impl Affects Versions: types_branch Reporter: Santhosh Srinivasan Fix For: types_branch
The semantics of TOKENIZE are not clear. In its current form, TOKENIZE takes as input a string and returns a bag. The bag contains 1 tuple per token. The tuple in turn contains a single token. A better approach would be to return a tuple (instead of a bag) that contains as many elements as there are tokens. On a secondary note, the outputSchema method in TOKENIZE is broken. It should return a bag with a tuple that contains a string and not just a string. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.