[
https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597138#action_12597138
]
Pi Song commented on PIG-85:
----------------------------
In Load we use regex to split:-
{noformat}
String[] splitString = textLine.split(delimiter, -1);
JavaDoc: String[] split(String regex) Splits this string around
matches of the given regular expression
{noformat}
But in Store we use normal string concatenation:-
{noformat}
if (it.hasNext()) buf.append(delim);
{noformat}
This can be very confusing for users.
For example, if I do
{noformat}
a = LOAD '/tmp/datatest1.txt' USING PigStorage('\\d') ;
{noformat}
"\ \d" will get unescaped as in Ben's solution to '\d' which is seperating by
digits.
Then I do
{noformat}
STORE a INTO '/tmp/dataout' USING PigStorage('\\d')
{noformat}
the output will have the actual string "\d" as separators
Are there many users relying on this Regex stuff? If not +1 for removing it. We
can introduce a new built-in storage that supports Regex.
> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
> Key: PIG-85
> URL: https://issues.apache.org/jira/browse/PIG-85
> Project: Pig
> Issue Type: Bug
> Reporter: Anand Murugappan
>
> A PIG command like -
> store abc into 'abc' using PigStorage('\x01');
> does not recognize hat the user is requesting the data to by ^A separated.
> Instead the data that is stored is literally separated by the string '\x01'.
> Neither does punching in ^A directly through the editor, nor do any other
> strings like \u0001 help.
> Using a ^A directly through the editor complains about it being an invalid
> XML character and bails out.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.