Jeff, 1) It should not. If it does push, then it is a bug in pig.
2) I think it should be fine. 3) Look at PColFilterExtractor and PartitionFilterOptimizer Regards, Rohini On Thu, Mar 14, 2013 at 1:31 PM, Jeff Yuan <[email protected]> wrote: > I am writing a loader for a storage format, which partitions by a > particular field in the record. So I would like to implement something > which can push down filters on the partitioned field so that the > record reader does not need to read files that are outside the > filtered range. In the interface "LoadMetadata", the > "getPartitionKeys" and "setPartitionFilter" functions seem to support > what I need (where Pig should pass the filtering expression on the > declared partition keys to "setPartitionFilter", but I have a couple > of questions. I'm going to reference the following example, where > timestamp is the partition key. > > a = load 'stored_data' using CustomLoader(); > b = filter a by timestamp = CUSTOM_UDF(date, month); > > 1. Would partitioning work in this case where the partition key filter > includes a UDF? > > 2. Does the partition statement need to be directly after the load > statement? What I mean is, if I declare a variable c between a and b > which does some other operation on a, will Pig pass the filter > expression of b when loading a? > > 3. Can you point out roughly where this "setPartitionFilter" function > is called in Pig code during the load process? I couldn't seem to find > it through a search of the Pig source. > > Thanks a lot! >
