Cheers Pablo. I was wondering if there was something like this that already existed in the built-ins, but apparently not.
Mozilla's Akela project seems to have a bunch of useful UDFs, including one like this, so I might have a look to see if that suits our purpose. https://github.com/mozilla-metrics/akela https://github.com/mozilla-metrics/akela/blob/master/src/main/java/com/mozilla/pig/filter/map/IsMap.java Thanks again, Lexual. On Fri, Nov 23, 2012 at 4:48 AM, pablomar <[email protected]>wrote: > did you try with a filter function ? > something like: > > import java.io.IOException; > import org.apache.pig.FilterFunc; > import org.apache.pig.data.Tuple; > import org.apache.pig.impl.util.WrappedIOException; > > public class IsMap extends FilterFunc > { > public Boolean exec(Tuple input) throws IOException > { > if (input == null || input.size() == 0) > return null; > > try > { > return(input.get(0) instanceof java.util.Map); > } > catch(Exception e) > { > throw WrappedIOException.wrap("ouch!", e); > } > } > } > > > and then: > > filtered = FILTER some_data BY IsMap(some_variable); > > PS: I didn't try it with your data > > > > On Wed, Nov 21, 2012 at 8:54 PM, Lex H <[email protected]> wrote: > > > Attached is a tiny testcase illustrating my problem. > > > > What I would like to know is how to filter by Pig datatype. > > e.g. something like: > > filtered = FILTER some_data BY some_variable IS_MAP_TYPE; > > > > Can anyone advise if this can be accomplished with Pig? > > > > We have a field that is sometimes a 'map' sometimes a chararray. > > > > Doing something like the following statement fails, presumable because > > it's trying to a key-value lookup on something that's not a 'map'. > > > > -- json#'data' is sometimes a map, sometimes not. > > trivias = FOREACH data GENERATE json#'data'#'trivia' AS trivia:charray; > > > > This has come about from us working with JSON data with Pig via Elephant > > Bird's JsonLoader. > > > > Thanks, > > > > Lex. > > >
