Apache Griffin deploy in AWS

2020-04-20 Thread jose.martin_santacruz.ext
Hi,

Has anybody deployed Apache Griffin in AWS and integrated it with EMR? Is it 
enough with an EC2 instance to deploy Apache Griffin in AWS?

Waiting for your answer

Regards


Apache Griffin on AWS with S3

2020-04-09 Thread jose.martin_santacruz.ext
Hi,

Has anybody deployed Apache Griffin in AWS with S3? Is it possible?

Kind regards


Data connecGrffior for Oracle

2020-03-30 Thread jose.martin_santacruz.ext
Hello,

One question, is there a data connector for Oracle currently available in 
Apache Griffin?

Waiting for your answer.

Regards


Change Apache Griffin UI port

2020-02-11 Thread jose.martin_santacruz.ext
Hello,

We need to change Apache Griffin UI port, where is default port 8080 configured 
and how can we change it?

Regards


Streaming checkpoint parameters

2020-01-23 Thread jose.martin_santacruz.ext
Hi,

We are testing Apache Griffin streaming, and we have several parameters 
(info.path, ready.time.interval, ready.time.delay, time.range, updatable) in 
checkpoint configuration that we do not know which is its function:

  "checkpoint": {
"type": "json",
"file.path": "hdfs:///griffin/streaming/dump/source",
"info.path": "source",
"ready.time.interval": "10s",
"ready.time.delay": "0",
"time.range": ["-5m", "0"],
"updatable": true
  }
Does anybody know where we can find a description of these parameters?

Regards


Streaming analysis in MapR

2019-12-10 Thread jose.martin_santacruz.ext
Hi,

We are trying to use Griffin to analyze streaming data quality in MapR, but 
MapR Streaming does not Support Kafka Wire Protocol, is there any way in 
Griffin to analyze MapR Streams data quality?

Regards


Custom sink for PostgreSQL

2019-10-23 Thread jose.martin_santacruz.ext
Hello,

Has anybody developed a custom sink for PostgreSQL?

Regards


RE: Measure stream data quality in MapR cluster

2019-09-27 Thread jose.martin_santacruz.ext
Hello,

Yes, we want to use Apache Griffin to measure streaming data quality in a MapR 
cluster, where MapR Streams is used for streaming, in Griffin documentation 
only Apache Kafka appears as streaming source, is there any solution for MapR 
Streams or a custom source has to be developed?

Regards 

-Original Message-
From: William Guo  
Sent: Thursday, September 26, 2019 4:23 PM
To: dev@griffin.apache.org
Subject: Re: Measure stream data quality in MapR cluster

hi,

Thanks for your email, to better support, Could you tell us your use case?
For kafka streaming, you can check the following docs,

http://griffin.apache.org/docs/usecases.html
https://github.com/apache/griffin/blob/master/measure/src/main/resources/env-streaming.json
https://github.com/apache/griffin/blob/master/measure/src/main/resources/config-streaming.json

Thanks,
William

On Thu, Sep 26, 2019 at 9:21 PM <
jose.martin_santacruz@boehringer-ingelheim.com> wrote:

> Hello,
>
> We are trying to use Apache Griffin to measure stream data quality in 
> a MapR cluster, is there any documentation about how to configure 
> Griffin for this scenario?
>
> Waiting for your answer.
>
> Regards
>


Measure stream data quality in MapR cluster

2019-09-26 Thread jose.martin_santacruz.ext
Hello,

We are trying to use Apache Griffin to measure stream data quality in a MapR 
cluster, is there any documentation about how to configure Griffin for this 
scenario?

Waiting for your answer.

Regards


RE: Average of the measures

2019-09-25 Thread jose.martin_santacruz.ext
Hi William,

The use case is the following, we have a datalake that is structured in 
datasets, each one of these datasets can have a set of quality measures and 
user wants to have a global measure of the dataset quality that is the average 
of all dataset quality measures. To do this what we have done is defining a 
custom measure that calculates this average, but it implies calculating again 
all quality measures and we were trying to find a way of calculating the 
average without recalculating quality measures.

Regards

-Original Message-
From: William Guo  
Sent: Tuesday, September 24, 2019 3:14 AM
To: dev@griffin.apache.org
Subject: Re: Average of the measures

hi,

Could you tell us your use case?
Normally, you can use avg function from spark sql.
Griffin support spark sql directly.

Thanks,
William

On Thu, Sep 19, 2019 at 6:50 PM <
jose.martin_santacruz@boehringer-ingelheim.com> wrote:

> Hello,
>
> We need to create an average of the measures for a certain data set, 
> has anybody done this with Apache Griffin?
>
> Regards
>


Apache Griffin with MapR Streams

2019-09-25 Thread jose.martin_santacruz.ext
Hi,

Does anybody know if Apache Griffin is compatible with MapR Streams?

Regards


Average of the measures

2019-09-19 Thread jose.martin_santacruz.ext
Hello,

We need to create an average of the measures for a certain data set, has 
anybody done this with Apache Griffin?

Regards


Distinctness measure

2019-08-28 Thread jose.martin_santacruz.ext
Hello,

In Apache Griffin documentation there is a type of measure named 
"Distinctness", but we are not able to create it although it is documented. We 
have reviewed DqType and the types allowed are:
ACCURACY,
   PROFILING,
   TIMELINESS,
   UNIQUENESS,
   COMPLETENESS,
   CONSISTENCY

Is distinctness measure discontinued?

We are also trying to create a timeliness measure for a timestamp field and we 
are getting an error because the field has to be numeric or interval, is there 
any way to create a timeliness measure over a timestamp field?

Waiting for your answer

Regards


Custom measures in UI

2019-08-24 Thread jose.martin_santacruz.ext
Hello,

>From Griffin UI we can only créate Accuracy and Profiling measures, I imagine 
>that the rest of measures types have to be created as custom measures, is it 
>correct? If it is, where can we find the documentation about how to create and 
>configure the rest of measure types?

Waiting for your answer.

Regards


RE: Connect Griffin to Hive secured metastore

2019-08-23 Thread jose.martin_santacruz.ext
Hello,

No the authentication in the cluster is MapR-SASL.

Regards

-Original Message-
From: Qian Wang  
Sent: Thursday, August 22, 2019 7:04 PM
To: dev@griffin.apache.org
Cc: zxBCN De_La_Fuente_Diaz,Alvaro (IT EDS) EXTERNAL 

Subject: RE: Connect Griffin to Hive secured metastore

Hi,

Do you have kerberos auth? If you have, you need config the

livy.need.kerberos=true
#if livy need kerberos is false then don't need set following two properties 
livy.server.auth.kerberos.principal=livy/kerberos.principal
livy.server.auth.kerberos.keytab=/path/to/livy/keytab/file
Best,
Qian

On Aug 22, 2019, 4:04 AM -0700, 
jose.martin_santacruz@boehringer-ingelheim.com, wrote:
> Hi Qian,
>
> Thank you very much for your help, we changed the connection to Hive Metadata 
> to Hive JDBC and now we are able to get Hive Metadata.
> But now we have a problem with Livy authorization, the problem is that we do 
> not know how to configure user and password for Livy in Griffin, does anybody 
> know how to do it.
> The error we are getting is the following:
>
> 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] 
> o.a.g.c.j.LivyTaskSubmitHelper [230] : Post To Livy URI is: 
> https://inhas68625.eu.boehringer.com:8998/batches
> 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] 
> o.a.g.c.j.LivyTaskSubmitHelper [232] : Need Kerberos:false
> 2019-08-22 10:50:00.830 INFO 83698 --- [ryBean_Worker-2] 
> o.a.g.c.j.LivyTaskSubmitHelper [244] : The livy server doesn't need 
> Kerberos Authentication
> 2019-08-22 10:50:01.462 ERROR 83698 --- [ryBean_Worker-2] 
> o.a.g.c.j.SparkSubmitJob [116] : Post spark task ERROR.
>
> org.springframework.web.client.HttpClientErrorException: 401 
> Authentication required at 
> org.springframework.web.client.DefaultResponseErrorHandler.handleError
> (DefaultResponseErrorHandler.java:91) 
> ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at 
> org.springframework.web.client.RestTemplate.handleResponse(RestTemplat
> e.java:700) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at 
> org.springframework.web.client.RestTemplate.doExecute(RestTemplate.jav
> a:653) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at 
> org.springframework.web.client.RestTemplate.execute(RestTemplate.java:
> 613) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at 
> org.springframework.web.client.RestTemplate.postForObject(RestTemplate
> .java:380) ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
> at 
> org.apache.griffin.core.job.LivyTaskSubmitHelper.postToLivy(LivyTaskSu
> bmitHelper.java:248) ~[classes!/:0.6.0-SNAPSHOT] at 
> org.apache.griffin.core.job.SparkSubmitJob.post2Livy(SparkSubmitJob.ja
> va:131) ~[classes!/:0.6.0-SNAPSHOT] at 
> org.apache.griffin.core.job.SparkSubmitJob.post2LivyWithRetry(SparkSub
> mitJob.java:224) ~[classes!/:0.6.0-SNAPSHOT] at 
> org.apache.griffin.core.job.SparkSubmitJob.saveJobInstance(SparkSubmit
> Job.java:213) ~[classes!/:0.6.0-SNAPSHOT] at 
> org.apache.griffin.core.job.SparkSubmitJob.execute(SparkSubmitJob.java
> :113) [classes!/:0.6.0-SNAPSHOT] at 
> org.quartz.core.JobRunShell.run(JobRunShell.java:202) 
> [quartz-2.2.2.jar!/:?] at 
> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.ja
> va:573) [quartz-2.2.2.jar!/:?]
>
> Waiting for your answer.
>
> Regards
>
> -Original Message-
> From: Qian Wang 
> Sent: Wednesday, August 21, 2019 7:17 PM
> To: dev@griffin.apache.org
> Subject: Re: Connect Griffin to Hive secured metastore
>
> Hi,
>
> You have an alternative method to get Hive Metadata by using Hive JDBC. If 
> you want to use JDBC, you need change 
> org.apache.griffin.core.metastore.hive.HiveMetaStoreController:
> @Autowired
> @Qualifier(value = "jdbcSvc")
> private HiveMetaStoreService hiveMetaStoreService; Also, if your Hive is 
> Authenticated by Kerberos, you need setup application.properties:
> #Hive jdbc
> hive.jdbc.className=org.apache.hive.jdbc.HiveDriver
> hive.jdbc.url=jdbc:hive2://localhost:1/ #your Hive url 
> hive.need.kerberos=true # if you need Kerberos Auth 
> hive.keytab.user=x...@xx.com hive.keytab.path=/path/to/keytab/file #here is 
> absolute path Hopefully can answer your question.
>
> Best,
> Eric
> On Aug 21, 2019, 7:52 AM -0700, 
> jose.martin_santacruz@boehringer-ingelheim.com, wrote:
> > Hello,
> >
> > We are trying to connect Griffin to a secured Hive metastore, does anybody 
> > know how to configure Griffin for this connection? We are getting 
> > authorization errors in the metastore.
> >
> > Waiting for your answer.
> >
> > Regards


RE: Connect Griffin to Hive secured metastore

2019-08-22 Thread jose.martin_santacruz.ext
Hi Qian,

Thank you very much for your help, we changed the connection to Hive Metadata 
to Hive JDBC and now we are able to get Hive Metadata.
But now we have a problem with Livy authorization, the problem is that we do 
not know how to configure user and password for Livy in Griffin, does anybody 
know how to do it.
The error we are getting is the following:

2019-08-22 10:50:00.830  INFO 83698 --- [ryBean_Worker-2] 
o.a.g.c.j.LivyTaskSubmitHelper  [230]  : Post To Livy URI is: 
https://inhas68625.eu.boehringer.com:8998/batches
2019-08-22 10:50:00.830  INFO 83698 --- [ryBean_Worker-2] 
o.a.g.c.j.LivyTaskSubmitHelper  [232]  : Need Kerberos:false
2019-08-22 10:50:00.830  INFO 83698 --- [ryBean_Worker-2] 
o.a.g.c.j.LivyTaskSubmitHelper  [244]  : The livy server doesn't need 
Kerberos Authentication
2019-08-22 10:50:01.462 ERROR 83698 --- [ryBean_Worker-2] 
o.a.g.c.j.SparkSubmitJob[116]  : Post spark task ERROR.

org.springframework.web.client.HttpClientErrorException: 401 Authentication 
required
at 
org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91)
 ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:700)
 ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:653) 
~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.web.client.RestTemplate.execute(RestTemplate.java:613) 
~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.web.client.RestTemplate.postForObject(RestTemplate.java:380)
 ~[spring-web-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.apache.griffin.core.job.LivyTaskSubmitHelper.postToLivy(LivyTaskSubmitHelper.java:248)
 ~[classes!/:0.6.0-SNAPSHOT]
at 
org.apache.griffin.core.job.SparkSubmitJob.post2Livy(SparkSubmitJob.java:131) 
~[classes!/:0.6.0-SNAPSHOT]
at 
org.apache.griffin.core.job.SparkSubmitJob.post2LivyWithRetry(SparkSubmitJob.java:224)
 ~[classes!/:0.6.0-SNAPSHOT]
at 
org.apache.griffin.core.job.SparkSubmitJob.saveJobInstance(SparkSubmitJob.java:213)
 ~[classes!/:0.6.0-SNAPSHOT]
at 
org.apache.griffin.core.job.SparkSubmitJob.execute(SparkSubmitJob.java:113) 
[classes!/:0.6.0-SNAPSHOT]
at org.quartz.core.JobRunShell.run(JobRunShell.java:202) 
[quartz-2.2.2.jar!/:?]
at 
org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:573) 
[quartz-2.2.2.jar!/:?]

Waiting for your answer.

Regards

-Original Message-
From: Qian Wang  
Sent: Wednesday, August 21, 2019 7:17 PM
To: dev@griffin.apache.org
Subject: Re: Connect Griffin to Hive secured metastore

Hi,

You have an alternative method to get Hive Metadata by using Hive JDBC. If you 
want to use JDBC, you need change 
org.apache.griffin.core.metastore.hive.HiveMetaStoreController:
@Autowired
@Qualifier(value = "jdbcSvc")
private HiveMetaStoreService hiveMetaStoreService; Also, if your Hive is 
Authenticated by Kerberos, you need setup application.properties:
#Hive jdbc
hive.jdbc.className=org.apache.hive.jdbc.HiveDriver
hive.jdbc.url=jdbc:hive2://localhost:1/ #your Hive url 
hive.need.kerberos=true # if you need Kerberos Auth 
hive.keytab.user=x...@xx.com hive.keytab.path=/path/to/keytab/file #here is 
absolute path Hopefully can answer your question.

Best,
Eric
On Aug 21, 2019, 7:52 AM -0700, 
jose.martin_santacruz@boehringer-ingelheim.com, wrote:
> Hello,
>
> We are trying to connect Griffin to a secured Hive metastore, does anybody 
> know how to configure Griffin for this connection? We are getting 
> authorization errors in the metastore.
>
> Waiting for your answer.
>
> Regards


Connect Griffin to Hive secured metastore

2019-08-21 Thread jose.martin_santacruz.ext
Hello,

We are trying to connect Griffin to a secured Hive metastore, does anybody know 
how to configure Griffin for this connection? We are getting authorization 
errors in the metastore.

Waiting for your answer.

Regards


Error connecting Griffin to Hive metastore

2019-08-21 Thread jose.martin_santacruz.ext
Hello,

We are having this error when connecting Griffin to Hive metastore, it is not 
able to get databases from Hive metastore, does anybody know how could we solve 
it?

2019-08-21 11:00:34.209  INFO 14970 --- [nio-8080-exec-2] h.metastore   
   : Trying to connect to metastore with URI 
thrift://inhas68626.eu.boehringer.com:9083
2019-08-21 11:00:34.211  INFO 14970 --- [nio-8080-exec-2] h.metastore   
   : Opened a connection to metastore, current connections: 1
2019-08-21 11:00:34.211  INFO 14970 --- [nio-8080-exec-2] h.metastore   
   : Connected to metastore.
2019-08-21 11:00:34.211 ERROR 14970 --- [nio-8080-exec-2] 
o.a.g.c.m.h.HiveMetaStoreService : Can not get databases : {}

org.apache.hadoop.hive.metastore.api.MetaException: Got exception: 
org.apache.thrift.transport.TTransportException null
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1342)
 ~[hive-metastore-2.2.0.jar!/:2.2.0]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1156)
 ~[hive-metastore-2.2.0.jar!/:2.2.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_191]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_191]

Waiting for your answer.

Regards


Apache Griffin compatible with Hive 2.3.3?

2019-08-15 Thread jose.martin_santacruz.ext
Hello,

Does anybody know if Apache Griffin is compatible with Hive 2.3.3?

We have an error in our environment when Griffin is getting databases from 
Hive, our Hive version is Hive 2.3.3-mapr-1808 and Griffin is using hive 
metastore 2.2.0

2019-08-16 05:25:46.910 ERROR 73551 --- [nio-8080-exec-1] h.log 
   : Converting exception to MetaException
2019-08-16 05:25:46.911  INFO 73551 --- [nio-8080-exec-1] h.metastore   
   : Closed a connection to metastore, current connections: 0
2019-08-16 05:25:46.911  INFO 73551 --- [nio-8080-exec-1] h.metastore   
   : Trying to connect to metastore with URI 
thrift://inhas68626.eu.boehringer.com:9083
2019-08-16 05:25:46.914  INFO 73551 --- [nio-8080-exec-1] h.metastore   
   : Opened a connection to metastore, current connections: 1
2019-08-16 05:25:46.914  INFO 73551 --- [nio-8080-exec-1] h.metastore   
   : Connected to metastore.
2019-08-16 05:25:46.914 ERROR 73551 --- [nio-8080-exec-1] 
o.a.g.c.m.h.HiveMetaStoreService : Can not get databases : {}

org.apache.hadoop.hive.metastore.api.MetaException: Got exception: 
org.apache.thrift.transport.TTransportException null
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1342)
 ~[hive-metastore-2.2.0.jar!/:2.2.0]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1156)
 ~[hive-metastore-2.2.0.jar!/:2.2.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_191]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_191]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient$SynchronizedHandler.invoke(HiveMetaStoreClient.java:2295)
 ~[hive-metastore-2.2.0.jar!/:2.2.0]
at com.sun.proxy.$Proxy142.getAllDatabases(Unknown Source) ~[?:?]
at 
org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl.getAllDatabases(HiveMetaStoreServiceImpl.java:69)
 [classes!/:0.4.0]
at 
org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl.getAllTable(HiveMetaStoreServiceImpl.java:125)
 [classes!/:0.4.0]
at 
org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl$$FastClassBySpringCGLIB$$d0fbb087.invoke()
 [classes!/:0.4.0]
at 
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204) 
[spring-core-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:721)
 [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:157)
 [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.cache.interceptor.CacheInterceptor$1.invoke(CacheInterceptor.java:52)
 [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.cache.interceptor.CacheAspectSupport.invokeOperation(CacheAspectSupport.java:345)
 [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:408)
 [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.cache.interceptor.CacheAspectSupport.execute(CacheAspectSupport.java:327)
 [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.cache.interceptor.CacheInterceptor.invoke(CacheInterceptor.java:61)
 [spring-context-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:179)
 [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:656)
 [spring-aop-4.3.6.RELEASE.jar!/:4.3.6.RELEASE]
at 
org.apache.griffin.core.metastore.hive.HiveMetaStoreServiceImpl$$EnhancerBySpringCGLIB$$a8c6e34f.getAllTable()
 [classes!/:0.4.0]
at 
org.apache.griffin.core.metastore.hive.HiveMetaStoreController.getAllTables(HiveMetaStoreController.java:57)
 [classes!/:0.4.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_191]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_191]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[?:1.8.0_191]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_191]
at 
org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
 

Error connecting to encrypted hive metastore

2019-08-13 Thread jose.martin_santacruz.ext
Hello,

We are getting an error when trying to connect to an encrypter hive metastore:

2019-08-13 14:35:15.890  INFO 84294 --- [nio-8080-exec-4] h.metastore   
   : Trying to connect to metastore with URI 
thrift://inhas68626.eu.boehringer.com:9083
2019-08-13 14:35:15.893  INFO 84294 --- [nio-8080-exec-4] h.metastore   
   : Opened a connection to metastore, current connections: 1
2019-08-13 14:35:15.893  INFO 84294 --- [nio-8080-exec-4] h.metastore   
   : Connected to metastore.
2019-08-13 14:35:15.893 ERROR 84294 --- [nio-8080-exec-4] 
o.a.g.c.m.h.HiveMetaStoreService : Can not get databases : {}

org.apache.hadoop.hive.metastore.api.MetaException: Got exception: 
org.apache.thrift.transport.TTransportException null
at 
org.apache.hadoop.hive.metastore.MetaStoreUtils.logAndThrowMetaException(MetaStoreUtils.java:1342)
 ~[hive-metastore-2.2.0.jar!/:2.2.0]
at 
org.apache.hadoop.hive.metastore.HiveMetaStoreClient.getAllDatabases(HiveMetaStoreClient.java:1156)
 ~[hive-metastore-2.2.0.jar!/:2.2.0]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[?:1.8.0_191]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[?:1.8.0_191]

Does Griffin support this kind of connection?

Regards


Metrics not stored in ElasticSearch

2019-08-01 Thread jose.martin_santacruz.ext
Hello,

We have installed Apache Griffin, but when jobs are executed from the UI we are 
not able to see the results because metrics are being stored only in HDFS, but 
not in ElasticSearch.
Does anybody know how can we see why metrics are not being stored in 
ElasticSearch.
Configuration in application.properties for Elasticsearch is the default 
configuration:

# elasticsearch
elasticsearch.host=127.0.0.1
elasticsearch.port=9200
elasticsearch.scheme=http
# elasticsearch.user = user
# elasticsearch.password = password

Regards


Apache Griffin without ElasticSearch

2019-07-24 Thread jose.martin_santacruz.ext
Hello,

Does anybody know if it is possible to work with Apache Griffin without having 
ElasticSearch only with HDFS?

Regards


RE: Griffin storage requirements

2019-07-19 Thread jose.martin_santacruz.ext
Hello,

One question about griffin measures repository, where is it stored by default? 
Is there any way on configuring it?

Regards

-Mensaje original-
De: Kevin Yao  
Enviado el: Friday, July 19, 2019 5:38 AM
Para: dev@griffin.apache.org
Asunto: Re: Griffin storage requirements

Hi,

Thanks for being interested.

You can check the following documents.

https://github.com/apache/griffin/blob/master/README.md#references

https://github.com/apache/griffin/tree/master/griffin-doc

https://cwiki.apache.org/confluence/display/GRIFFIN/3.+Usage+Guidance

Thanks,
Kevin

On Wed, Jul 17, 2019 at 10:51 PM <
jose.martin_santacruz@boehringer-ingelheim.com> wrote:

> Hello,
>
> We are starting a new Project with Apache Griffin and we need to know 
> which are the storage requirements for Griffin and we have found no 
> documentation about it, can you give us this information?
>
> Waiting for your answer.
>
> Regards
>


Griffin HDFS Configuration

2019-07-18 Thread jose.martin_santacruz.ext
Hello,

One question about Griffin HDFS configuration, if I want to use HDFS as metric 
repository I only need to create a HDFS directory and configure a HDFS sink for 
this directory, then all the measures from the different Jobs in Griffin will 
be stored within this HDFS directory as well as the missing records from 
accuracy measurements, is it correct?

Waiting for your answer.

Regards


RE: Apache Griffin storage requirements

2019-07-18 Thread jose.martin_santacruz.ext
Hello Lionel,

I understand the computing requirements for the spark cluster, but in our 
architecture Griffin is running in an edge node within the cluster, are there 
also memory requirements for this node?
As far as I know in this node there are several processes running (scheduler, 
measure launcher, monitor/alert, reporting, …) and I don’t know the memory 
requirements for this processes.

Waiting for your answer.

Regards


De: bhlx3l...@163.com  En nombre de Lionel Liu
Enviado el: Wednesday, July 17, 2019 6:06 PM
Para: zxBCN Martin_Santacruz,Jose (IT EDS) EXTERNAL 

CC: dev 
Asunto: Re: Apache Griffin storage requirements

Hi,

You are right, Griffin can persist metrics to different sinks like ES and HDFS, 
with the missing records in HDFS in accuracy measurements. The storage 
requirement depends on your data size, metrics are always small, the missing 
records might be large if the accuracy is not good, up to the data source, if 
all the data are mismatched.
I agree with William that, normal metrics will not take too much storage, 
metrics in HDFS could also be optional, and the memory resource of spark 
cluster just depends on your data size, in our case, we could use 10 workers 
with 8G memories to calculate the accuracy metric for 800M lines of data in 3 
minutes.
Storage is not the strict resource for Griffin, so HDFS is not your limit, but 
a larger spark cluster can accelerate the performance.

Thanks,
Lionel

On 07/17/2019 23:13, 
jose.martin_santacruz.ext<mailto:jose.martin_santacruz@boehringer-ingelheim.com>
 wrote:
Hello William,

OK, but which would be the minimum storage and the recommended storage for the 
cluster node where Apache Griffin is running?
The metrics are always stored in elastic?, in the documentation I have seen 
that you can define different sinks for the metrics (HDFS, Elastic, MongoDB, 
...).

Waiting for your answer

Regards

-Mensaje original-
De: William Guo mailto:gu...@apache.org>>
Enviado el: miércoles, 17 de julio de 2019 17:01
Para: dev@griffin.apache.org<mailto:dev@griffin.apache.org>
Asunto: Re: Apache Griffin storage requirements

hi,

There are no special storage requirements for griffin, the storage depends on 
your spark jobs and scale of your dataset.
We only temporarily store some intermediate cache in spark and store the 
metrics in elastic. metrics should be small.


Thanks,
William


On Wed, Jul 17, 2019 at 10:54 PM <
jose.martin_santacruz@boehringer-ingelheim.com<mailto:jose.martin_santacruz@boehringer-ingelheim.com>>
 wrote:

> Hello,
>
> We are starting a new Project with Apache Griffin and we need to know
> which are the storage requirements for Griffin and we have found no
> documentation about it, can you give us this information?
>
> Waiting for your answer.
>
> Regards
>
>


Griffin UI User Access Management

2019-07-17 Thread jose.martin_santacruz.ext
Hello,

We are evaluating Griffin as a Data Quality Solution and we have not found any 
user access management in Griffin UI module in the documentation, has Griffin 
UI any kind of user access management to the metrics calculated?

Waiting for your answer.

Regards



RE: Apache Griffin storage requirements

2019-07-17 Thread jose.martin_santacruz.ext
Hello William,

OK, but which would be the minimum storage and the recommended storage for the 
cluster node where Apache Griffin is running?
The metrics are always stored in elastic?, in the documentation I have seen 
that you can define different sinks for the metrics (HDFS, Elastic, MongoDB, 
...).

Waiting for your answer

Regards

-Mensaje original-
De: William Guo  
Enviado el: miércoles, 17 de julio de 2019 17:01
Para: dev@griffin.apache.org
Asunto: Re: Apache Griffin storage requirements

hi,

There are no special storage requirements for griffin, the storage depends on 
your spark jobs and scale of your dataset.
We only temporarily store some intermediate cache in spark and store the 
metrics in elastic. metrics should be small.


Thanks,
William


On Wed, Jul 17, 2019 at 10:54 PM <
jose.martin_santacruz@boehringer-ingelheim.com> wrote:

> Hello,
>
> We are starting a new Project with Apache Griffin and we need to know 
> which are the storage requirements for Griffin and we have found no 
> documentation about it, can you give us this information?
>
> Waiting for your answer.
>
> Regards
>
>


Apache Griffin storage requirements

2019-07-17 Thread jose.martin_santacruz.ext
Hello,

We are starting a new Project with Apache Griffin and we need to know which are 
the storage requirements for Griffin and we have found no documentation about 
it, can you give us this information?

Waiting for your answer.

Regards



Griffin storage requirements

2019-07-17 Thread jose.martin_santacruz.ext
Hello,

We are starting a new Project with Apache Griffin and we need to know which are 
the storage requirements for Griffin and we have found no documentation about 
it, can you give us this information?

Waiting for your answer.

Regards


Configuration Documentation

2019-07-05 Thread jose.martin_santacruz.ext
Hello,

We are starting a new data quality Project with apache Griffin and we have no 
documentation about Apache Griffin environment and data quality configuration, 
where can we get this documentation?

Waiting for your answer

Regards