I tried to subscribe but a mail client box came up, not what I wanted, so we'll see if this works.
I wrote this script: register s3n://uw-cse344-code/myudfs.jar -- load the test file into Pig --raw = LOAD 's3n://uw-cse344-test/cse344-test-file' USING TextLoader as (line:chararray); -- later you will load to other files, example: raw = LOAD 's3n://uw-cse344/btc-2010-chunk-000' USING TextLoader as (line:chararray); -- parse each line into ntriples ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray); --filter 1 subjects1 = filter ntriples by subject matches '.*rdfabout\\.com.*' PARALLEL 50; --filter 2 subjects2 = subjects1; but I got the error: 2012-03-10 01:19:18,039 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: <line 4, column 21> mismatched input ';' expecting LEFT_PAREN Details at logfile: /home/hadoop/pig_1331342327467.log how do I simply set one variable equal to another? I also tried subjects2 = dump subjects1; thanks! -- ~Colleen Ross
