Thanks Jon for your thoughts.
I have a patch which renames the null values in dimension values to unknown
and use null for rollups. For a sample input tuple
red, null, 12
a = cube inp by ($0, $1);
the above query will emit following combinations
red, unknown, 12
, unknown, 12
red, , 12
, ,
Option 1 (throwing an error) is bad. It violates Pigs eat anything (see
http://pig.apache.org/philosophy.html).
Do we need to give users an ability to name this unknown column? Why not just
label it unknown and be done?
Alan.
On Jun 6, 2012, at 2:24 PM, Prasanth J wrote:
Hello everyone
Thanks Alan and Dmitriy for your thoughts.
I think we have two different approaches now.
In one approach, if we encounter a null in dimension values we can just label
it as unknown and use NULL string to represent rollups. Whereas, in other
approach, if we encounter a null in dimension values,
you could always make the value pluggable, going with Unknown for now, and
then down the line if we want, we could add an ONNULL value to the parser
that sets it.
2012/6/8 Prasanth J buckeye.prasa...@gmail.com
Thanks Alan and Dmitriy for your thoughts.
I think we have two different approaches
But how do the user specify custom value for *? In the current implementation
I am passing NULL string to the CubeDimensions constructor. If we need to get
that value from user then we need some changes in grammar like
a = CUBE b BY (x,y,z) ALL as AllProducts;
also what should be the default
Hello everyone
I would like to bring up this discussion about the ways for handling NULL
values in dimensions specified for cubing. For example, if we have a dimension
color with following values
red
blue
null
green
how do we differentiate if the null value represent rollup of all colors
Note that the current CubeDimensions UDF does a third thing -- instead
of rebranding nulls as unknown and using null to mean * or all
values, the UDF allows you to specify a custom value to stand for *
or all values. That way null can be an individual valid cell value.
This is (imho) much nicer