Apache Griffin deploy in AWS
Hi, Has anybody deployed Apache Griffin in AWS and integrated it with EMR? Is it enough with an EC2 instance to deploy Apache Griffin in AWS? Waiting for your answer Regards
Apache Griffin on AWS with S3
Hi, Has anybody deployed Apache Griffin in AWS with S3? Is it possible? Kind regards
Data connecGrffior for Oracle
Hello, One question, is there a data connector for Oracle currently available in Apache Griffin? Waiting for your answer. Regards
Change Apache Griffin UI port
Hello, We need to change Apache Griffin UI port, where is default port 8080 configured and how can we change it? Regards
Streaming checkpoint parameters
Hi, We are testing Apache Griffin streaming, and we have several parameters (info.path, ready.time.interval, ready.time.delay, time.range, updatable) in checkpoint configuration that we do not know which is its function: "checkpoint": { "type": "json", "file.path": "hdfs:///griffin/streaming/dump/source", "info.path": "source", "ready.time.interval": "10s", "ready.time.delay": "0", "time.range": ["-5m", "0"], "updatable": true } Does anybody know where we can find a description of these parameters? Regards
Streaming analysis in MapR
Hi, We are trying to use Griffin to analyze streaming data quality in MapR, but MapR Streaming does not Support Kafka Wire Protocol, is there any way in Griffin to analyze MapR Streams data quality? Regards
Custom sink for PostgreSQL
Hello, Has anybody developed a custom sink for PostgreSQL? Regards
RE: Measure stream data quality in MapR cluster
Hello, Yes, we want to use Apache Griffin to measure streaming data quality in a MapR cluster, where MapR Streams is used for streaming, in Griffin documentation only Apache Kafka appears as streaming source, is there any solution for MapR Streams or a custom source has to be developed? Regards -Original Message- From: William Guo Sent: Thursday, September 26, 2019 4:23 PM To: dev@griffin.apache.org Subject: Re: Measure stream data quality in MapR cluster hi, Thanks for your email, to better support, Could you tell us your use case? For kafka streaming, you can check the following docs, http://griffin.apache.org/docs/usecases.html https://github.com/apache/griffin/blob/master/measure/src/main/resources/env-streaming.json https://github.com/apache/griffin/blob/master/measure/src/main/resources/config-streaming.json Thanks, William On Thu, Sep 26, 2019 at 9:21 PM < jose.martin_santacruz@boehringer-ingelheim.com> wrote: > Hello, > > We are trying to use Apache Griffin to measure stream data quality in > a MapR cluster, is there any documentation about how to configure > Griffin for this scenario? > > Waiting for your answer. > > Regards >
Measure stream data quality in MapR cluster
Hello, We are trying to use Apache Griffin to measure stream data quality in a MapR cluster, is there any documentation about how to configure Griffin for this scenario? Waiting for your answer. Regards
RE: Average of the measures
Hi William, The use case is the following, we have a datalake that is structured in datasets, each one of these datasets can have a set of quality measures and user wants to have a global measure of the dataset quality that is the average of all dataset quality measures. To do this what we have done is defining a custom measure that calculates this average, but it implies calculating again all quality measures and we were trying to find a way of calculating the average without recalculating quality measures. Regards -Original Message- From: William Guo Sent: Tuesday, September 24, 2019 3:14 AM To: dev@griffin.apache.org Subject: Re: Average of the measures hi, Could you tell us your use case? Normally, you can use avg function from spark sql. Griffin support spark sql directly. Thanks, William On Thu, Sep 19, 2019 at 6:50 PM < jose.martin_santacruz@boehringer-ingelheim.com> wrote: > Hello, > > We need to create an average of the measures for a certain data set, > has anybody done this with Apache Griffin? > > Regards >
Apache Griffin with MapR Streams
Hi, Does anybody know if Apache Griffin is compatible with MapR Streams? Regards
Average of the measures
Hello, We need to create an average of the measures for a certain data set, has anybody done this with Apache Griffin? Regards
Distinctness measure
Hello, In Apache Griffin documentation there is a type of measure named "Distinctness", but we are not able to create it although it is documented. We have reviewed DqType and the types allowed are: ACCURACY, PROFILING, TIMELINESS, UNIQUENESS, COMPLETENESS, CONSISTENCY Is distinctness measure discontinued? We are also trying to create a timeliness measure for a timestamp field and we are getting an error because the field has to be numeric or interval, is there any way to create a timeliness measure over a timestamp field? Waiting for your answer Regards
Custom measures in UI
Hello, >From Griffin UI we can only créate Accuracy and Profiling measures, I imagine >that the rest of measures types have to be created as custom measures, is it >correct? If it is, where can we find the documentation about how to create and >configure the rest of measure types? Waiting for your answer. Regards
RE: Connect Griffin to Hive secured metastore
Hello, No the authentication in the cluster is MapR-SASL. Regards -Original Message- From: Qian Wang Sent: Thursday, August 22, 2019 7:04 PM To: dev@griffin.apache.org Cc: zxBCN De_La_Fuente_Diaz,Alvaro (IT EDS) EXTERNAL Subject: RE: Connect Griffin to Hive secured metastore Hi, Do you have kerberos auth? If you have, you need config the livy.need.kerberos=true #if livy need kerberos is false then don't need set following two properties livy.server.auth.kerberos.principal=livy/kerberos.principal livy.server.auth.kerberos.keytab=/path/to/livy/keytab/file Best, Qian On Aug 22, 2019, 4:04 AM -0700, jose.martin_santacruz@boehringer-ingelheim.com, wrote: > Hi Qian, > > Thank you very much for your help, we changed the connection to Hive Metadata > to Hive JDBC and now we are able to get Hive Metadata. > But now we have a problem with Livy authorization, the problem is that we do > not know how to configure user and password for Livy in Griffin, does anybody > know how to do it. > The error we are getting is the following: > > 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] > o.a.g.c.j.LivyTaskSubmitHelper [230] : Post To Livy URI is: > https://inhas68625.eu.boehringer.com:8998/batches > 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] > o.a.g.c.j.LivyTaskSubmitHelper [232] : Need Kerberos:false > 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] > o.a.g.c.j.LivyTaskSubmitHelper [244] : The livy server doesn't need > Kerberos Authentication > 2019-08-22 10:50:01.462 ERROR 83698 --- [ryBean_Worker-2] > o.a.g.c.j.SparkSubmitJob [116] : Post spark task ERROR. > > org.springframework.web.client.HttpClientErrorException: 401 > Authentication required at > org.springframework.web.client.DefaultResponseErrorHandler.handleError > (DefaultResponseErrorHandler.java:91) > ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] > at > org.springframework.web.client.RestTemplate.handleResponse(RestTemplat > e.java:700) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] > at > org.springframework.web.client.RestTemplate.doExecute(RestTemplate.jav > a:653) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] > at > org.springframework.web.client.RestTemplate.execute(RestTemplate.java: > 613) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] > at > org.springframework.web.client.RestTemplate.postForObject(RestTemplate > .java:380) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] > at > org.apache.griffin.core.job.LivyTaskSubmitHelper.postToLivy(LivyTaskSu > bmitHelper.java:248) ~[classes!/:0.6.0-SNAPSHOT] at > org.apache.griffin.core.job.SparkSubmitJob.post2Livy(SparkSubmitJob.ja > va:131) ~[classes!/:0.6.0-SNAPSHOT] at > org.apache.griffin.core.job.SparkSubmitJob.post2LivyWithRetry(SparkSub > mitJob.java:224) ~[classes!/:0.6.0-SNAPSHOT] at > org.apache.griffin.core.job.SparkSubmitJob.saveJobInstance(SparkSubmit > Job.java:213) ~[classes!/:0.6.0-SNAPSHOT] at > org.apache.griffin.core.job.SparkSubmitJob.execute(SparkSubmitJob.java > :113) [classes!/:0.6.0-SNAPSHOT] at > org.quartz.core.JobRunShell.run(JobRunShell.java:202) > [quartz-2.2.2.jar!/:?] at > org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.ja > va:573) [quartz-2.2.2.jar!/:?] > > Waiting for your answer. > > Regards > > -Original Message- > From: Qian Wang > Sent: Wednesday, August 21, 2019 7:17 PM > To: dev@griffin.apache.org > Subject: Re: Connect Griffin to Hive secured metastore > > Hi, > > You have an alternative method to get Hive Metadata by using Hive JDBC. If > you want to use JDBC, you need change > org.apache.griffin.core.metastore.hive.HiveMetaStoreController: > @Autowired > @Qualifier(value = "jdbcSvc") > private HiveMetaStoreService hiveMetaStoreService; Also, if your Hive is > Authenticated by Kerberos, you need setup application.properties: > #Hive jdbc > hive.jdbc.className=org.apache.hive.jdbc.HiveDriver > hive.jdbc.url=jdbc:hive2://localhost:1/ #your Hive url > hive.need.kerberos=true # if you need Kerberos Auth > hive.keytab.user=x...@xx.com hive.keytab.path=/path/to/keytab/file #here is > absolute path Hopefully can answer your question. > > Best, > Eric > On Aug 21, 2019, 7:52 AM -0700, > jose.martin_santacruz@boehringer-ingelheim.com, wrote: > > Hello, > > > > We are trying to connect Griffin to a secured Hive metastore, does anybody > > know how to configure Griffin for this connection? We are getting > > authorization errors in the metastore. > > > > Waiting for your answer. > > > > Regards
RE: Connect Griffin to Hive secured metastore
Hi Qian, Thank you very much for your help, we changed the connection to Hive Metadata to Hive JDBC and now we are able to get Hive Metadata. But now we have a problem with Livy authorization, the problem is that we do not know how to configure user and password for Livy in Griffin, does anybody know how to do it. The error we are getting is the following: 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] o.a.g.c.j.LivyTaskSubmitHelper [230] : Post To Livy URI is: https://inhas68625.eu.boehringer.com:8998/batches 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] o.a.g.c.j.LivyTaskSubmitHelper [232] : Need Kerberos:false 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] o.a.g.c.j.LivyTaskSubmitHelper [244] : The livy server doesn't need Kerberos Authentication 2019-08-22 10:50:01.462 ERROR 83698 --- [ryBean_Worker-2] o.a.g.c.j.SparkSubmitJob[116] : Post spark task ERROR. org.springframework.web.client.HttpClientErrorException: 401 Authentication required at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:700) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:653) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.web.client.RestTemplate.execute(RestTemplate.java:613) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.web.client.RestTemplate.postForObject(RestTemplate.java:380) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.apache.griffin.core.job.LivyTaskSubmitHelper.postToLivy(LivyTaskSubmitHelper.java:248) ~[classes!/:0.6.0-SNAPSHOT] at org.apache.griffin.core.job.SparkSubmitJob.post2Livy(SparkSubmitJob.java:131) ~[classes!/:0.6.0-SNAPSHOT] at org.apache.griffin.core.job.SparkSubmitJob.post2LivyWithRetry(SparkSubmitJob.java:224) ~[classes!/:0.6.0-SNAPSHOT] at org.apache.griffin.core.job.SparkSubmitJob.saveJobInstance(SparkSubmitJob.java:213) ~[classes!/:0.6.0-SNAPSHOT] at org.apache.griffin.core.job.SparkSubmitJob.execute(SparkSubmitJob.java:113) [classes!/:0.6.0-SNAPSHOT] at org.quartz.core.JobRunShell.run(JobRunShell.java:202) [quartz-2.2.2.jar!/:?] at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) [quartz-2.2.2.jar!/:?] Waiting for your answer. Regards -Original Message- From: Qian Wang Sent: Wednesday, August 21, 2019 7:17 PM To: dev@griffin.apache.org Subject: Re: Connect Griffin to Hive secured metastore Hi, You have an alternative method to get Hive Metadata by using Hive JDBC. If you want to use JDBC, you need change org.apache.griffin.core.metastore.hive.HiveMetaStoreController: @Autowired @Qualifier(value = "jdbcSvc") private HiveMetaStoreService hiveMetaStoreService; Also, if your Hive is Authenticated by Kerberos, you need setup application.properties: #Hive jdbc hive.jdbc.className=org.apache.hive.jdbc.HiveDriver hive.jdbc.url=jdbc:hive2://localhost:1/ #your Hive url hive.need.kerberos=true # if you need Kerberos Auth hive.keytab.user=x...@xx.com hive.keytab.path=/path/to/keytab/file #here is absolute path Hopefully can answer your question. Best, Eric On Aug 21, 2019, 7:52 AM -0700, jose.martin_santacruz@boehringer-ingelheim.com, wrote: > Hello, > > We are trying to connect Griffin to a secured Hive metastore, does anybody > know how to configure Griffin for this connection? We are getting > authorization errors in the metastore. > > Waiting for your answer. > > Regards
Connect Griffin to Hive secured metastore
Hello, We are trying to connect Griffin to a secured Hive metastore, does anybody know how to configure Griffin for this connection? We are getting authorization errors in the metastore. Waiting for your answer. Regards
Error connecting Griffin to Hive metastore
Hello, We are having this error when connecting Griffin to Hive metastore, it is not able to get databases from Hive metastore, does anybody know how could we solve it? 2019-08-21 11:00:34.209 INFO 14970 --- [nio-8080-exec-2] h.metastore : Trying to connect to metastore with URI thrift://inhas68626.eu.boehringer.com:9083 2019-08-21 11:00:34.211 INFO 14970 --- [nio-8080-exec-2] h.metastore : Opened a connection to metastore, current connections: 1 2019-08-21 11:00:34.211 INFO 14970 --- [nio-8080-exec-2] h.metastore : Connected to metastore. 2019-08-21 11:00:34.211 ERROR 14970 --- [nio-8080-exec-2] o.a.g.c.m.h.HiveMetaStoreService : Can not get databases : {} org.apache.hadoop.hive.metastore.api.MetaException: Got exception: org.apache.thrift.transport.TTransportException null at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1342) ~[hive-metastore-2.2.0.jar!/:2.2.0] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1156) ~[hive-metastore-2.2.0.jar!/:2.2.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191] Waiting for your answer. Regards
Apache Griffin compatible with Hive 2.3.3?
Hello, Does anybody know if Apache Griffin is compatible with Hive 2.3.3? We have an error in our environment when Griffin is getting databases from Hive, our Hive version is Hive 2.3.3-mapr-1808 and Griffin is using hive metastore 2.2.0 2019-08-16 05:25:46.910 ERROR 73551 --- [nio-8080-exec-1] h.log : Converting exception to MetaException 2019-08-16 05:25:46.911 INFO 73551 --- [nio-8080-exec-1] h.metastore : Closed a connection to metastore, current connections: 0 2019-08-16 05:25:46.911 INFO 73551 --- [nio-8080-exec-1] h.metastore : Trying to connect to metastore with URI thrift://inhas68626.eu.boehringer.com:9083 2019-08-16 05:25:46.914 INFO 73551 --- [nio-8080-exec-1] h.metastore : Opened a connection to metastore, current connections: 1 2019-08-16 05:25:46.914 INFO 73551 --- [nio-8080-exec-1] h.metastore : Connected to metastore. 2019-08-16 05:25:46.914 ERROR 73551 --- [nio-8080-exec-1] o.a.g.c.m.h.HiveMetaStoreService : Can not get databases : {} org.apache.hadoop.hive.metastore.api.MetaException: Got exception: org.apache.thrift.transport.TTransportException null at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1342) ~[hive-metastore-2.2.0.jar!/:2.2.0] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1156) ~[hive-metastore-2.2.0.jar!/:2.2.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2295) ~[hive-metastore-2.2.0.jar!/:2.2.0] at com.sun.proxy.$Proxy142.getAllDatabases(Unknown Source) ~[?:?] at org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl.getAllDatabases(HiveMetaStoreServiceImpl.java:69) [classes!/:0.4.0] at org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl.getAllTable(HiveMetaStoreServiceImpl.java:125) [classes!/:0.4.0] at org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl$$FastClassBySpringCGLIB$$d0fbb087.invoke() [classes!/:0.4.0] at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) [spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721) [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157) [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.cache.interceptor.CacheInterceptor$1.invoke(CacheInterceptor.java:52) [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.cache.interceptor.CacheAspectSupport.invokeOperation(CacheAspectSupport.java:345) [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:408) [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:327) [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61) [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179) [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656) [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE] at org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl$$EnhancerBySpringCGLIB$$a8c6e34f.getAllTable() [classes!/:0.4.0] at org.apache.griffin.core.metastore.hive.HiveMetaStoreController.getAllTables(HiveMetaStoreController.java:57) [classes!/:0.4.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_191] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191] at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
Error connecting to encrypted hive metastore
Hello, We are getting an error when trying to connect to an encrypter hive metastore: 2019-08-13 14:35:15.890 INFO 84294 --- [nio-8080-exec-4] h.metastore : Trying to connect to metastore with URI thrift://inhas68626.eu.boehringer.com:9083 2019-08-13 14:35:15.893 INFO 84294 --- [nio-8080-exec-4] h.metastore : Opened a connection to metastore, current connections: 1 2019-08-13 14:35:15.893 INFO 84294 --- [nio-8080-exec-4] h.metastore : Connected to metastore. 2019-08-13 14:35:15.893 ERROR 84294 --- [nio-8080-exec-4] o.a.g.c.m.h.HiveMetaStoreService : Can not get databases : {} org.apache.hadoop.hive.metastore.api.MetaException: Got exception: org.apache.thrift.transport.TTransportException null at org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1342) ~[hive-metastore-2.2.0.jar!/:2.2.0] at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1156) ~[hive-metastore-2.2.0.jar!/:2.2.0] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_191] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_191] Does Griffin support this kind of connection? Regards
Metrics not stored in ElasticSearch
Hello, We have installed Apache Griffin, but when jobs are executed from the UI we are not able to see the results because metrics are being stored only in HDFS, but not in ElasticSearch. Does anybody know how can we see why metrics are not being stored in ElasticSearch. Configuration in application.properties for Elasticsearch is the default configuration: # elasticsearch elasticsearch.host=127.0.0.1 elasticsearch.port=9200 elasticsearch.scheme=http # elasticsearch.user = user # elasticsearch.password = password Regards
Apache Griffin without ElasticSearch
Hello, Does anybody know if it is possible to work with Apache Griffin without having ElasticSearch only with HDFS? Regards
RE: Griffin storage requirements
Hello, One question about griffin measures repository, where is it stored by default? Is there any way on configuring it? Regards -Mensaje original- De: Kevin Yao Enviado el: Friday, July 19, 2019 5:38 AM Para: dev@griffin.apache.org Asunto: Re: Griffin storage requirements Hi, Thanks for being interested. You can check the following documents. https://github.com/apache/griffin/blob/master/README.md#references https://github.com/apache/griffin/tree/master/griffin-doc https://cwiki.apache.org/confluence/display/GRIFFIN/3.+Usage+Guidance Thanks, Kevin On Wed, Jul 17, 2019 at 10:51 PM < jose.martin_santacruz@boehringer-ingelheim.com> wrote: > Hello, > > We are starting a new Project with Apache Griffin and we need to know > which are the storage requirements for Griffin and we have found no > documentation about it, can you give us this information? > > Waiting for your answer. > > Regards >
Griffin HDFS Configuration
Hello, One question about Griffin HDFS configuration, if I want to use HDFS as metric repository I only need to create a HDFS directory and configure a HDFS sink for this directory, then all the measures from the different Jobs in Griffin will be stored within this HDFS directory as well as the missing records from accuracy measurements, is it correct? Waiting for your answer. Regards
RE: Apache Griffin storage requirements
Hello Lionel, I understand the computing requirements for the spark cluster, but in our architecture Griffin is running in an edge node within the cluster, are there also memory requirements for this node? As far as I know in this node there are several processes running (scheduler, measure launcher, monitor/alert, reporting, …) and I don’t know the memory requirements for this processes. Waiting for your answer. Regards De: bhlx3l...@163.com En nombre de Lionel Liu Enviado el: Wednesday, July 17, 2019 6:06 PM Para: zxBCN Martin_Santacruz,Jose (IT EDS) EXTERNAL CC: dev Asunto: Re: Apache Griffin storage requirements Hi, You are right, Griffin can persist metrics to different sinks like ES and HDFS, with the missing records in HDFS in accuracy measurements. The storage requirement depends on your data size, metrics are always small, the missing records might be large if the accuracy is not good, up to the data source, if all the data are mismatched. I agree with William that, normal metrics will not take too much storage, metrics in HDFS could also be optional, and the memory resource of spark cluster just depends on your data size, in our case, we could use 10 workers with 8G memories to calculate the accuracy metric for 800M lines of data in 3 minutes. Storage is not the strict resource for Griffin, so HDFS is not your limit, but a larger spark cluster can accelerate the performance. Thanks, Lionel On 07/17/2019 23:13, jose.martin_santacruz.ext<mailto:jose.martin_santacruz@boehringer-ingelheim.com> wrote: Hello William, OK, but which would be the minimum storage and the recommended storage for the cluster node where Apache Griffin is running? The metrics are always stored in elastic?, in the documentation I have seen that you can define different sinks for the metrics (HDFS, Elastic, MongoDB, ...). Waiting for your answer Regards -Mensaje original- De: William Guo mailto:gu...@apache.org>> Enviado el: miércoles, 17 de julio de 2019 17:01 Para: dev@griffin.apache.org<mailto:dev@griffin.apache.org> Asunto: Re: Apache Griffin storage requirements hi, There are no special storage requirements for griffin, the storage depends on your spark jobs and scale of your dataset. We only temporarily store some intermediate cache in spark and store the metrics in elastic. metrics should be small. Thanks, William On Wed, Jul 17, 2019 at 10:54 PM < jose.martin_santacruz@boehringer-ingelheim.com<mailto:jose.martin_santacruz@boehringer-ingelheim.com>> wrote: > Hello, > > We are starting a new Project with Apache Griffin and we need to know > which are the storage requirements for Griffin and we have found no > documentation about it, can you give us this information? > > Waiting for your answer. > > Regards > >
Griffin UI User Access Management
Hello, We are evaluating Griffin as a Data Quality Solution and we have not found any user access management in Griffin UI module in the documentation, has Griffin UI any kind of user access management to the metrics calculated? Waiting for your answer. Regards
RE: Apache Griffin storage requirements
Hello William, OK, but which would be the minimum storage and the recommended storage for the cluster node where Apache Griffin is running? The metrics are always stored in elastic?, in the documentation I have seen that you can define different sinks for the metrics (HDFS, Elastic, MongoDB, ...). Waiting for your answer Regards -Mensaje original- De: William Guo Enviado el: miércoles, 17 de julio de 2019 17:01 Para: dev@griffin.apache.org Asunto: Re: Apache Griffin storage requirements hi, There are no special storage requirements for griffin, the storage depends on your spark jobs and scale of your dataset. We only temporarily store some intermediate cache in spark and store the metrics in elastic. metrics should be small. Thanks, William On Wed, Jul 17, 2019 at 10:54 PM < jose.martin_santacruz@boehringer-ingelheim.com> wrote: > Hello, > > We are starting a new Project with Apache Griffin and we need to know > which are the storage requirements for Griffin and we have found no > documentation about it, can you give us this information? > > Waiting for your answer. > > Regards > >
Apache Griffin storage requirements
Hello, We are starting a new Project with Apache Griffin and we need to know which are the storage requirements for Griffin and we have found no documentation about it, can you give us this information? Waiting for your answer. Regards
Griffin storage requirements
Hello, We are starting a new Project with Apache Griffin and we need to know which are the storage requirements for Griffin and we have found no documentation about it, can you give us this information? Waiting for your answer. Regards
Configuration Documentation
Hello, We are starting a new data quality Project with apache Griffin and we have no documentation about Apache Griffin environment and data quality configuration, where can we get this documentation? Waiting for your answer Regards