I used pig to do some ETL job, but met with a strange bug of the
built-in REPLACE function.

After I replace '[' with '' in '[02/Aug/2012:05:01:17' , the whole
string just went blank.

Here I posted some info that may help debug.

My pig version is: Apache Pig version 0.11.0-SNAPSHOT (r1364475)
compiled Jul 23 2012, 10:30:53

The original text file:
ip.ip.ip.ip - - [02/Aug/2012:05:01:17 -0600] "GET
/player.php/sid/XNDM0Njk3MjEy/v.swf HTTP/1.1" 302 26

The whole pig script is :
read = load '/home/test/apacheLog'
using PigStorage(' ')
as (
          ip:chararray
        , indentity:chararray
        , name:chararray
        , date:chararray
        , timezone:chararray
        , method:chararray
        , path:chararray
        , protocol:chararray
        , status:chararray
        , size:chararray
);
dump read;
--(ip.ip.ip.ip,-,-,[02/Aug/2012:05:01:17,-0600],"GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1",302,26)
data = foreach read generate
          ip
        , REPLACE(date,'[','')
        , REPLACE(timezone,']','')
        , REPLACE(method,'"','')
        , path
        , REPLACE(protocol,'"','')
        , status
        , size;
describe data;
--data: {ip: chararray,date: chararray,timezone: chararray,method:
chararray,path: chararray,protocol: chararray,status: chararray,size:
chararray}
dump data;
--(ip.ip.ip.ip,,-0600,GET,/player.php/sid/XNDM0Njk3MjEy/v.swf,HTTP/1.1,302,26)

Reply via email to