Re: Profiling limitation in the Docker image + additional sources beyond Hive

john@apache Fri, 13 Apr 2018 04:06:01 -0700

> There is a particular reason why I cannot select “Regular Expression
Match” in the Advanced statistics? It is a good feature to test.

Am I missing something or that feature is not available in the Docker image?

The regex match is not support yet in the UI. Actually, this is not the
only feature not supported in the UI now. You can, however, still try this
feature via API:

https://github.com/apache/incubator-griffin/blob/master/griffin-doc/service/api-guide.md

https://github.com/apache/incubator-griffin/tree/master/griffin-doc/service/postman

We are trying to adding more useful and stable features into Griffin,
specifically in the manner of service/api, while there might be not enough
effort to exert UI to catch up with all the features. And on the other
side, I would like to invite to join us to make it better: your
contribution is definitely welcome.

> Also, in the measure tab I see that I can select a source and a
destination table to check if they are equal (for instance), there is any

plan in the future to include checking between different data sources? For
example, between Hive table and RedShift tables?

As you are using Spark as computation engine, it should not be so hard to
implement as Spark as connectors for both systems.

Griffin supports comparing data from different data sources, via the
connectors. In your case, all you need is a RedShift data connector. You
can start with here:
https://github.com/apache/incubator-griffin/tree/master/measure/src/main/scala/org/apache/griffin/measure/data/connector/batch
.

On Fri, Apr 13, 2018 at 5:29 PM, Enrico D'Urso <a-edu...@hotels.com> wrote:

> Hi,
>
> I am running Griffin on the Docker image, but on the profiling tab on the
> UI I see that the options are limited to:
>
>   *   Simple Statistics
>      *   Null count
>      *   Distinct count
>   *   Summary statistics
>      *   Total Count
>
>
>   *   Advanced statistics
>      *   Enum detection TOP 5 count
>
> There is a particular reason why I cannot select “Regular Expression
> Match” in the Advanced statistics? It is a good feature to test.
> Am I missing something or that feature is not available in the Docker
> image?
>
> Also, in the measure tab I see that I can select a source and a
> destination table to check if they are equal (for instance), there is any
> plan in the future to include checking between different data sources? For
> example, between Hive table and RedShift tables?
> As you are using Spark as computation engine, it should not be so hard to
> implement as Spark as connectors for both systems.
>
>
> Thanks,
>
> Enrico
>

Re: Profiling limitation in the Docker image + additional sources beyond Hive

Reply via email to