fixed = FOREACH my_relation GENERATE (my_field IS NOT NULL ? my_field : 'other') as my_field, *;
Russell Jurney twitter.com/rjurney [email protected] datasyndrome.com On Jun 8, 2012, at 3:01 PM, Dragan Nedeljkovic <[email protected]> wrote: > You can use an UDF like the one bellow to deal with the NULLs. > > register 'mypiggybank.jar'; > define Nvl piggybank.Nvl(); > > input_lines = LOAD 'test_Nvl.in' AS (line:chararray); > describe input_lines; > dump input_lines; > > new_list = FOREACH input_lines GENERATE Nvl(line, 'n/a'); > describe new_list; > dump new_list; > > -- eof > > package piggybank; > > import java.io.IOException; > > import org.apache.pig.EvalFunc; > import org.apache.pig.data.Tuple; > > public class Nvl > extends EvalFunc<String> > { > public String exec(Tuple inputA) > throws IOException > { > try > { > String inputVal = (String)inputA.get(0); > String defaultVal = (String)inputA.get(1); > > if(inputVal == null || inputVal.length() == 0) > { > return defaultVal; > } > > return inputVal; > } > catch(Exception e) > { > // Throwing an exception will cause the task to fail. > throw new IOException("Something bad happened!", e); > } > } > } > // eof > > > > >> ________________________________ >> From: Mario Lassnig <[email protected]> >> To: [email protected] >> Sent: Thursday, June 7, 2012 7:37:36 AM >> Subject: Replace null with string >> >> Hello, >> >> I'm having a lot of null entries in my data. Due to later processing it >> would be very helpful if I could set a default value for null to be the >> string "other". I couldn't find a way to do this (version 0.8.1-cdh3u4) >> >> Also, I have some variables in my GENERATE statements that can potentially >> return null, and I would need something similar to the SQL DECODE function >> to get the "other" string instead of null. >> >> Example: >> >> tmp = FOREACH dump GENERATE site, REGEX_EXTRACT(name, '^(?:([^.]+)\\.?){1}', >> 1) AS project, ((ami MATCHES '.*datatype.*') ? REGEX_EXTRACT(name, >> '^(?:([^.]+)\\.?){5}', 1) : 'other') AS datatype, ami, duid, nbfiles, >> length, rnbfiles, rlength, name; >> >> Here: 'site' and 'datatype' could return an empty string (which is valid) >> and is interpreted as null, but should be "other" instead. >> >> Thanks a lot, >> Mario >> >>
