[ 
https://issues.apache.org/jira/browse/PIG-85?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12597138#action_12597138
 ] 

Pi Song commented on PIG-85:
----------------------------

In Load we use regex to split:-
{noformat}
String[] splitString = textLine.split(delimiter, -1);

JavaDoc: String[]       split(String regex)      Splits this string around 
matches of the given regular expression
{noformat}
But in Store we use normal string concatenation:-
{noformat}
if (it.hasNext()) buf.append(delim);
{noformat} 
  
This can be very confusing for users.
For example, if I do
{noformat}
a = LOAD '/tmp/datatest1.txt' USING PigStorage('\\d') ;
{noformat}
 "\ \d" will get unescaped as in Ben's solution to '\d' which is seperating by 
digits.

Then I  do
{noformat}
STORE a INTO '/tmp/dataout' USING PigStorage('\\d')
{noformat}
the output will have the actual string "\d" as separators 

Are there many users relying on this Regex stuff? If not +1 for removing it. We 
can introduce a new built-in storage that supports Regex.


> Unable to specify CTRL-A as a delimiter for the PigStorage function
> -------------------------------------------------------------------
>
>                 Key: PIG-85
>                 URL: https://issues.apache.org/jira/browse/PIG-85
>             Project: Pig
>          Issue Type: Bug
>            Reporter: Anand Murugappan
>
> A PIG command like - 
> store abc into 'abc' using PigStorage('\x01');
>  does not recognize hat the user is requesting the data to by ^A separated. 
> Instead the data that is stored is literally separated by the string '\x01'. 
> Neither does punching in ^A directly through the editor, nor do any other 
> strings like \u0001 help. 
> Using a ^A directly through the editor complains about it being an invalid 
> XML character and bails out. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to