Hi Lionel,
What do you think of my implementation?
Thanks
Zhao
On Wed, Sep 11, 2019 at 8:18 PM 钊 wrote:
> Hi Lionel,
>
> The JIRA ticket is https://issues.apache.org/jira/browse/GRIFFIN-289
>
> Here we have made an inner version
>
> We have upgraded griffin measure code, and there are three main upgrades
> 1, DQ job configure parameters.
> In DQ job configure parameters, we add a "error.confs" key in "rules", it
> looks like
> [image: image.png]
> [image: image.png]
> "regex" means "regular expression" mode, and "enumeration" means
> "enumeration" mode
>
>
> 2,DQConfig scala file. According to the updates of DQ job configure
> parameters, update DQConfig to deserialize new parameters
>
> 3, CompletenessExpr2DQSteps scala file. To generate sql to count how many
> incomplete rows.
>
> Could you please consider our implementation?
>
> Thanks
>
> Zhao
>
>
> On Wed, Sep 11, 2019 at 1:36 PM Lionel Liu wrote:
>
>> Hi Zhao,
>>
>> Your requirement makes sense, that would be a common usage of COMPLETENESS
>> cases.
>> You can submit a JIRA ticket for Griffin community with the description:
>> https://issues.apache.org/jira/browse/griffin, and then someone would
>> pick
>> the ticket and do the implementation.
>>
>> Thanks,
>> Lionel
>>
>> On Mon, Sep 9, 2019 at 6:56 PM 钊 wrote:
>>
>> > Hello
>> >
>> > Now we use griffin measure module to check batch data quality. In
>> > COMPLETENESS dq type, griffin checks how many incomplete records in
>> table,
>> > and griffin only check if one column is 'null' or not.
>> >
>> > However, only "null" is not enough to consider whether one column is
>> > invalid or not. In our condition, analysts may consider other value is
>> > invalid even though they are not "null". For example, one column named
>> > "company", if company in ("a", "b", "c"), this record is invalid.
>> >
>> > Here we need two ways for user to filter incomplete record, one is
>> > "enumeration", users write all invalid values they think for one column;
>> > the other is "regular expression", users write regular expression to
>> match
>> > invalid values for one column.
>> >
>> > Could griffin updates COMPLETENESS dq type to support our "enumeration"
>> and
>> > "regular expression" way to filter incomplete records?
>> >
>> > Regards
>> >
>> > Zhao
>> >
>>
>